Beamtic's logo

Share via:

PHP: Parsing HTTP Response Headers

When using file_get_contents to perform HTTP requests, the server response headers is stored in a reserved variable after each successful request; we can iterate over this when we need to access individual response headers.

158 views

Edited: 2020-08-23 12:09

When a HTTP request is sent using file-functions, such as file_get_contents, stream_get_contents, and stream_context_create the server response headers will automatically be stored in a reserved variable, $http_response_header as an indexed array — we can refer to this variable when we want to access the response headers.

It is a bit strange that the headers are stored as a useless indexed array instead of the more usable associative array; but this is just a small inconvenience, since we can easily convert to an associative array on our own.

To parse the response headers and create an associative array, we can use this solution, as discussed later in the article:

$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
  if(false !== ($matches = explode(':', $value, 2))) {
    $headers["{$matches[0]}"] = trim($matches[1]);
  }                
}

Note. This also applies to cURL. To learn how to obtain the response headers as an indexed array when using cURL in PHP, read this article: Response Headers (cURL)

Why we want to access response headers

Accessing the response headers is useful for many things, one example would be when we want to check the server response code or content type header, maybe in order to check for broken links on our website.

Note. Server response headers will not always be correctly used or configured, and we might encounter pages with incorrect content-types and similar — so we should not rely too much on these values.

In order to access the headers, we will first need to send a HTTP request:

$options = array('http' => array(
  'method'  => 'GET',
  'header'  => 
    "Content-Type: text/plain\r\n" .
    "Content-Length: " . strlen($data) . "\r\n"
));
$context  = stream_context_create($options);
$contents = file_get_contents('https://beamtic.com/', false, $context);

Now, since the $http_response_header variable holds an array, we will need to loop through its elements to show the headers. We can also easily convert the array to a string with PHP's explode function.

So, either we do like this:

echo implode("\n", $http_response_header);

Or, in case we need to iterate over the array:

foreach ($http_response_header as $value) {
   echo $value . "\n"; 
}

Parsing server response headers

To more easily juggle the server response headers we should convert it to an associative array. The first element in the $http_response_header array is always the HTTP status code — even when reading the raw headers the status code always comes first. The status code is not itself a header, so we should get rid of it before attempting to further parse the headers. If we need the status code, we can store it in a separate variable:

$status_message = array_shift($http_response_header);

The array_shift function is serving two purposes here; to return the first element in the array, and to remove the first element from the array. All of the numerical array keys will also be updated accordingly. We may now convert the remaining array to an associative array.

HTTP headers are made up by key: value pairs, but we can not just convert them to an array by colon (:), since header values might also contain colons. So, what we can do, without resorting to regular expressions, is to split the string by the first colon in the string, since that always comes after the key name.

Note. While it is easier to use a regular expression, it is about 10% to 68% faster to use either a combination of stripos and substr or explode — not that it matters much in practice — but nevertheless, I think we should stick to what is fastest.

All the following approaches are actually fairly straight forward to use and switch between, so which one to use should probably be down to which is the most efficient.

Solution 1. Using the explode function is about 68% faster than using preg_match:

$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
  if(false !== ($matches = explode(':', $value, 2))) {
    $headers["{$matches[0]}"] = trim($matches[1]);
  }                
}

Solution 2. This is how to use stripos and substr, which is about 10% faster than using a regular expression:

$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
  $pos = stripos($value, ':');
  $key = substr($value, 0, $pos);
  $value = substr($value, $pos+1);
  $headers["$key"] = trim($value);
}
print_r($headers);

Solution 3. If you for some reason prefer to use a regular expression, feel free to do so; the speed difference will be insignificant for most websites — but keep in mind, the more concurrent users you got, the more you will benefit from even minor optimizations. Here is how to use preg_match to do the same:

$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
  if (preg_match('/^([^:]+):([^\n]+)/', $value, $matches)) {
    $headers["{$matches[1]}"] = trim($matches[2]);
  }  
}
print_r($headers);

Of course we could also get rid of the trim function when using regular expressions; I only left it there to get a fair benchmark result.

How the benchmarks was performed

Each test was performed by parsing the headers 1 million times like this:

$start_time = microtime(true);
$repeat = 0;
$status_message = array_shift($http_response_header);
while ($repeat < 1000000) {
  $headers = array();
  $status_message = array_shift($http_response_header);
  foreach ($http_response_header as $value) {
    if(false !== ($matches = explode(':', $value, 2))) {
      $headers["{$matches[0]}"] = trim($matches[1]);
    }
  }
  ++$repeat;
}
$end_time = microtime(true);
echo $end_time - $start_time . "\n\n";
var_dump($headers);exit();

Conclusion

It does not matter much in practice whether we use string functions or regular expressions, at least not for simple stuff such as this; but I still recommend using what we know to be the fastest. While some things may depend on the PHP version we are running, other things are probably more fixed and less likely to change. Perhaps, in the future, our IDE will automatically suggest the faster algorithm for various things?

In this case, without the trim function, using preg_match is only about 10% slower than stripos and substr — but a massive 68% slower than using explode.

While this may look like a lot — and it is under some circumstances — we should keep in mind that benchmark tests will often execute a script more than a million times in order to get a more clear picture. This means that we would need hundreds or thousands of concurrent users before we will notice any meaningful difference.

However, I will personally always pick the solution that I know to be faster, especially when the solutions are so easy to work with. preg_match is only going to be easier to read if the developer understands regular expressions; in practice this should not matter, and a good developer should be willing to learn both.

Comments

  1. How to create a custom error handler for PHP that handles non-fetal errors.
  2. We can access other classes properties (variables) and methods (functions) by injecting the class object in the class that needs it.
  3. Setting custom HTTP Headers with cURL is useful when changing User Agent or Cookies. Headers can be changed two ways, both using the curl_setopt function.
  4. Should you use file_get_contnts or cURL when performing HTTP requests from PHP? It does not matter; but regardless of which you use, you should still handle errors properly!

More in: PHP Tutorials