Share via:

How to Block or 404 out your Index

Preventing access to index.php to avoid duplicate content.


Edited: 2019-09-11 16:27

This tutorial shows how to show avoid duplicate content by showing a 404 error for your index.php file. For instance, you could own a website on the following domain, the Directory Index is usually named index.html by default. Most people who are comfortable navigating the internet know this, and often they will type in index.html in their browsers address bar – for whatever reason.

The risk you run is, someone might link to your index.html which can lead to indexing by search engines. Or, it might get picked up by search engines some other way.

There is a number of ways to prevent this, the first I'm going to cover uses the robots.txt to disallow access to the page.

Using Robots.txt

Blocking search engines from indexing the index.html, from within the robots.txt, can be done fairly easy. Most major search engines recognise the robots.txt file, and will also respect the rules you declare inside of it. I.e.:

User-agent: *
 Disallow: /index.html

Of cause you can also disallow access to specific search engines.

User-agent: *
 User-agent: Google
 User-agent: Yahoo
 Disallow: /

Using PHP

If you are using PHP on your site, then I think this is one of the best solutions.

Using PHP, we can simply check to see if the requested path equals /index.php. This can be done with a standard PHP if statement.

The $_SERVER['REQUEST_URI'] variable contains the requested path as a root-relative text string, and we can easily check that this does not match /index.php. In practice, simply include something like the below (at the top in your script, before sending any output):

if ($_SERVER['REQUEST_URI'] == '/index.php') {
  header('HTTP/1.1 404 Not Found');
  include_once '404.php';

The 404 error page is optional, but I do recommend that you make one. Remember to send the appropriate HTTP header for a 404 error if you do.