Trailing Question Mark in URLs

The best way to deal with a trailing question mark is probably just to make it a bad request, because it is a very odd thing to find in a request URL.

63 views
d

By. Jacob

Edited: 2023-10-28 13:49

If the REQUEST_URI contains a question mark and no associated URL parameters, we risk "duplicate content" in search engines, and should do one of the following:

  1. Return a 400 Bad Request (Preferred due to the nature of URL parameters)
  2. Return a 404 Not Found
  3. Include a canonical that points to the URL without the question mark

But, it is not enough to test for the presence of lonesome trailing question mark, because URLs could, for various reasons, contain multiple literal question marks. So, we should probably rebuild the canonical URL and re-add any parameters found.

In PHP we can detect the presence of a question mark anywhere in the URL like this:

if (str_contains($_SERVER['REQUEST_URI'], '?') && count($_GET) === 0) {
 echo 'err';exit();
}

Because a raw question mark is only used to indicate the presence of URL parameters, it should not occur anywhere else in the URL, and therefor we can opt to return a 400 Bad Request message. You should do the same for non-existent or "unused" URL parameters, but you may opt to whitelist certain parameters used for tracking. E.g. fbclid for Facebook, since you could otherwise break features offered by third parties that uses them.

Note. If you do white list such parameters, it is important to still include a canonical URL meta tag to avoid duplicate content and maintain technical consistency.

Build a canonical URL based on Request URI

If you are performing a redirect or adding a canonical header, you will need to reconstruct a canonical URL after removing the lonesome question mark. E.g:

// Construct the canonical URL without parameters
// E.g: https://beamtic.com/get-full-request-url-in-php

$full_request_url = '';  // Initialize the URL string

// Check if the request is made using HTTPS
if ((!empty($_SERVER['HTTPS'])) && ($_SERVER['HTTPS'] !== 'off')) {
  $full_request_url .= 'https://';
} else {
  $full_request_url .= 'http://';
}

// Append the HTTP_HOST and REQUEST_URI to the URL
$full_request_url .= $_SERVER['HTTP_HOST'] . parse_url($_SERVER["REQUEST_URI"], PHP_URL_PATH);

// Check if there are any URL parameters (GET parameters)
if (count($_GET) > 0) {
  $full_request_url .= '?';  // Add the '?' separator for URL parameters

  // Get the first key-value pair and add it to the URL
  $get_arr = $_GET;
  $full_request_url .= key($get_arr) . '=' . current($get_arr);
  next($get_arr);

  // Remove the first item from the array
  array_shift($get_arr);

  // Loop through the remaining key-value pairs and encode them
  foreach ($get_arr as $key => $value) {
    $full_request_url .= '&' . urlencode($key) . '=' . urlencode($value);
  }
}

// Set the content type and echo the full canonical URL
header('content-type: text/plain; charset=utf-8');
echo $full_request_url;

Why return a 400 Bad Request rather than a 404 or a redirect?

I have opted to show a 400 Bad Request message because it feels more true to the HTTP standard. Performing a redirect could be more user-friendly if the request was sent in error. E.g. The user mistyped the URL or clicked a malformed link, but not what you would intuitively expect to happen when requesting non-existent resources.

Consider the following:

  1. If a resource does not exist, we should return a 404. See also: What to do About Deleted and Discontinued Products
  2. If a resource has existed previously, but it was deleted, we should return a 404.
  3. If a resource was moved to another permanent location, we should return a 301 redirect to the new location.

Ideally a redirect should remain in-place indefinitely, because old hyperlinks might still point to the old location, and someone might still follow those links. But, contrary to what you may have heard, SEO is an irrelevant consideration in these rare edge-cases, so I would do what makes sense technically and in terms of maintainability. Placing redirects just for the sake of SEO will potentially send a technically inaccurate signal to bots and search engines.

Because URL parameters are normally used to control behaviour of the requested resource, a 400 Bad Request message will be more appropriate than a 404.

Although Content Management Systems these days use friendly URLs, and make limited use of plain URL parameters, there may still be cases where the URL parameters themselves are used to request specific resources. E.g. If using a routing pattern like: ?page=[page name] to access resources, obviously a 404 response will be the more appropriate choice, but probably only when a requested resource is not found through such a specific parameter.

For a lonesome trailing question mark, I would personally stick with a 400 response regardless of how parameters are normally used in your application.

Tell us what you think:

  1. An in-dept look at the use of headings (h1-h6) and sections in HTML pages.
  2. Pagination can be a confusing thing to get right both practically and programmatically. I have put a lot of thought into this subject, and here I am giving you a few of the ideas I have been working with.
  3. How to optimize image-loading and automatically include width and height attributes on img elements with PHP.
  4. HTTP headers are not case-sensitive, so we are free to convert them to all-lowercase in our applications.

More in: Web development