How to Block or 404 out your Index

Preventing access to index.php to avoid duplicate content.

1413 views
d

By. Jacob

Edited: 2019-09-11 16:27

This tutorial shows how to show avoid duplicate content by showing a 404 error for your index.php file. For instance, you could own a website on the following domain http://beamtic.com/, the Directory Index is usually named index.html by default. Most people who are comfortable navigating the internet know this, and often they will type in index.html in their browsers address bar – for whatever reason.

The risk you run is, someone might link to your index.html which can lead to indexing by search engines. Or, it might get picked up by search engines some other way.

There is a number of ways to prevent this, the first I'm going to cover uses the robots.txt to disallow access to the page.

Using Robots.txt

Blocking search engines from indexing the index.html, from within the robots.txt, can be done fairly easy. Most major search engines recognise the robots.txt file, and will also respect the rules you declare inside of it. I.e.:

User-agent: *
 Disallow: /index.html

Of cause you can also disallow access to specific search engines.

User-agent: *
 User-agent: Google
 User-agent: Yahoo
 Disallow: /

Using PHP

If you are using PHP on your site, then I think this is one of the best solutions.

Using PHP, we can simply check to see if the requested path equals /index.php. This can be done with a standard PHP if statement.

The $_SERVER['REQUEST_URI'] variable contains the requested path as a root-relative text string, and we can easily check that this does not match /index.php. In practice, simply include something like the below (at the top in your script, before sending any output):

if ($_SERVER['REQUEST_URI'] == '/index.php') {
  header('HTTP/1.1 404 Not Found');
  include_once '404.php';
  exit();
}

The 404 error page is optional, but I do recommend that you make one. Remember to send the appropriate HTTP header for a 404 error if you do.

Tell us what you think:

  1. An in-dept look at the use of headings (h1-h6) and sections in HTML pages.
  2. Pagination can be a confusing thing to get right both practically and programmatically. I have put a lot of thought into this subject, and here I am giving you a few of the ideas I have been working with.
  3. The best way to deal with a trailing question mark is probably just to make it a bad request, because it is a very odd thing to find in a request URL.
  4. How to optimize image-loading and automatically include width and height attributes on img elements with PHP.
  5. HTTP headers are not case-sensitive, so we are free to convert them to all-lowercase in our applications.

More in: Web development