PHP: Block URL Parameters

How to block unused URL parameters (non-existent) in PHP web applications.

2595 views
d

By. Jacob

Edited: 2020-04-04 11:38

404 on non-existent query strings.

URL Parameters is the part of the URL after the Question Mark (?), usually they are used by PHP to deliver content based on what is entered as the parameter value. Inside PHP scripts, they may be accessed through the $_GET super global.

Usually we do not want non-existent parameters to work, as it could result in duplicate content issues with search engines—and besides that, it is just wrong.

When users enter parameters that are not in use by your application, ideally we should show them an error, such as a 404 not found or 400 Bad Request error. However, this is not going to work in all cases, since some third-party modules/plugins might add parameters to our URLs for tracking purposes, and that is where a white-list comes in handy.

An easy way to make sure that only pre-approved parameters in our white-list will work, is to keep an array of pre-approved parameters that are actually in use by our application. When certain parameters is used, we will probably also need to add a rel=canonical meta element in order to tell search engines about the canonical URL for the resource.

// Create a white-list of allowed URL parameters
$defined_url_parameters = array();
$defined_url_parameters['some_page_id'] = '';
$defined_url_parameters['another_parameter'] = '';
$defined_url_parameters['third_url_parameter'] = '';

// Check if requested parameters is allowed by the application
$requested_parameters = array_keys($_GET);

foreach ($requested_parameters as &$parameter) {
    if(!isset($defined_url_parameters["{$parameter}"])) {
      http_response_code(404);
      echo 'My king. Nothing by the name of <b>'.htmlspecialchars($parameter).'</b> is known in this world.';exit();
    }
}

Checking if a URL parameter exists

For the script to work, you will have to pre-define the URL parameters that is being used by your application. I suggest you keep the $defined array in a settings file somewhere, so that you will not clutter your code with settings.

In the script above, we are are defining the URL parameters in the $defined_url_parameters array. To check if a requested parameter exists, we simply perform a if (!isset...){} check. The exclamation mark means "not", so if the array index is undefined, we will throw a 404 error.

if(!isset($defined_url_parameters["{$url_parameter}"])) {
  // Send the 404 Response Code
  http_response_code(404);
  // Send the HTML body
  echo 'My king. Nothing by that name is known in this world.';exit();
}

The exit(); in this case is required, since we are working inside a loop. The loop itself will iterate over the different parameters from the $_GET super global, and exit if an unknown parameter is requested.

Canonical URLs

Those of you who do not have access to the PHP code of your CMS can instead include a rel=canonical URL, which was made with this problem in mind. It will also be easier if you are not very good at altering PHP code yourself.

In some cases we might also need to include it when specific parameters are requested. For example, social media sites has been known to add tracking parameters to the URLs. We can deal with this problem in two ways; either we accept the small risk of having a tracking parameter indexed by search engines, or we create specific code to insert a canonical meta element for known tracking parameters. Another option is to simply redirect the user, but that might break some features offered by social media sites, so you may not want to do that.

The rel=canonical URL should be placed in the head part of the HTML. The HTML code looks like this:

<link rel="canonical" href="http://example.com/this-is-my-article">

You should replace the example URL with the URL of your article.

Tell us what you think:

  1. An in-dept look at the use of headings (h1-h6) and sections in HTML pages.
  2. Pagination can be a confusing thing to get right both practically and programmatically. I have put a lot of thought into this subject, and here I am giving you a few of the ideas I have been working with.
  3. The best way to deal with a trailing question mark is probably just to make it a bad request, because it is a very odd thing to find in a request URL.
  4. How to optimize image-loading and automatically include width and height attributes on img elements with PHP.
  5. HTTP headers are not case-sensitive, so we are free to convert them to all-lowercase in our applications.

More in: Web development