Parsing Response Headers in PHP
When using file_get_contents to perform HTTP requests, the server response headers is stored in a reserved variable after each successful request; we can iterate over this when we need to access individual response headers.
By. Jacob
Edited: 2022-03-18 11:51
When you are using the build-in PHP file- functions to perform HTTP requests, the response headers will automatically be made available in a special variable, $http_response_header; this is very useful when using the build-in file functions, but it does not work when using the cURL library. In this tutorial, you will learn how to parse the request headers regardless if your are using cURL or file- functions like file_get_contents:
It is a bit strange that the headers are stored as an indexed array, instead of a more user-friendly associative array; but this is just a small inconvenience, since we can easily convert to an associative array on our own.
To parse the response headers and create an associative array, we can use this solution for the build-in file- functions:
$response_headers = [];
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
if(false !== ($matches = explode(':', $value, 2))) {
$response_headers["{$matches[0]}"] = trim($matches[1]);
}
}
And this one when using the cURL library:
// Define the $response_headers array for later use
$response_headers = [];
// Get the first line (The Status Code)
$line = strtok($headers, "\r\n");
$status_code = trim($line);
// Parse the string, saving it into an array instead
while (($line = strtok("\r\n")) !== false) {
if(false !== ($matches = explode(':', $line, 2))) {
$response_headers["{$matches[0]}"] = trim($matches[1]);
}
}
Doing this makes it possible to easily check if a given header exists, simply by using isset on the array key:
if (isset($response_headers["content-type"])) {
echo '<p>The "content-type" header was found, and the content is:</p>';
echo $response_headers["content-type"];
exit();
}
File- functions
As mentioned earlier, to obtain the response headers using PHP's build-in file functions, you can iterate over the $http_response_header variable; PHP will automatically make the response headers available to you as an indexed array in this variable, after you have performed a HTTP request.
The functions used to make HTTP requests using the file-functions commonly include file_get_contents, stream_context_create, and stream_get_contents — how to use them is covered in other tutorials.
The first element in the $http_response_header array is always the HTTP status code — even when reading the raw headers the status code always comes first. It may be useful to store the status code in a separate variable:
$status_message = array_shift($http_response_header);
The array_shift function serve two purposes here:
- It returns the first element in the array.
- It removes the first element from the array.
All of the numerical array keys will also be updated accordingly, so there will not be any missing keys.
HTTP headers are made up by key: value pairs, but we can not just convert them to an array by colon (:), since header values might also contain colons. So, what we can do, without resorting to regular expressions, is to split the string by the first colon in the string, since that always comes after the key name.
Note. While it is easier to use a regular expression, it is about 10% to 68% faster to use either a combination of stripos and substr or explode — not that it matters much in practice — but nevertheless, I think we should stick to what is fastest.
All the following approaches are actually fairly straight forward to use, so which to use should probably be down to whichever is the most efficient.
Solution 1:
Using the explode function is about 68% faster than using preg_match:
$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
if(false !== ($matches = explode(':', $value, 2))) {
$headers["{$matches[0]}"] = trim($matches[1]);
}
}
Solution 2:
This is how to use stripos and substr, which is about 10% faster than using a regular expression:
$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
$pos = stripos($value, ':');
$key = substr($value, 0, $pos);
$value = substr($value, $pos+1);
$headers["$key"] = trim($value);
}
print_r($headers);
Solution 3:
If you for some reason prefer to use a regular expression, feel free to do so; the speed difference will be insignificant for most websites — but keep in mind, the more concurrent users you got, the more you will benefit from even minor optimizations. Here is how to use preg_match to do the same:
$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
if (preg_match('/^([^:]+):([^\n]+)/', $value, $matches)) {
$headers["{$matches[1]}"] = trim($matches[2]);
}
}
print_r($headers);
Of course we could also get rid of the trim function when using regular expressions; I only left it there to get a fair benchmark result.
How the benchmarks was performed
Each test was performed by parsing the headers 1 million times like this:
$start_time = microtime(true);
$repeat = 0;
$status_message = array_shift($http_response_header);
while ($repeat < 1000000) {
$headers = array();
$status_message = array_shift($http_response_header);
foreach ($http_response_header as $value) {
if(false !== ($matches = explode(':', $value, 2))) {
$headers["{$matches[0]}"] = trim($matches[1]);
}
}
++$repeat;
}
$end_time = microtime(true);
echo $end_time - $start_time . "\n\n";
var_dump($headers);exit();
cURL
Obtaining the response headers with cURL is more difficult, since they will not be immediately available, unlike when you use the build-in file- functions. Instead, we will have to manually extract the headers from the request. This can be done with the curl_getinfo function, after having performed a request:
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$headers = substr($response, 0, $header_size);
$body = substr($response, $header_size);
The headers are then stored as a string in the $headers variable.
In order to create an associative array from the string, you can iterate over each line in the string, saving its contents into the array as you go:
// Define the $response_headers array for later use
$response_headers = [];
// Get the first line (The Status Code)
$line = strtok($headers, "\r\n");
$status_code = trim($line);
// Parse the string, saving it into an array instead
while (($line = strtok("\r\n")) !== false) {
if(false !== ($matches = explode(':', $line, 2))) {
$response_headers["{$matches[0]}"] = trim($matches[1]);
}
}
Update based on your comments
I have been testing various situations when parsing raw cURL headers, and I put op the following test. This test is still work in progress, but may be useful for some to follow, even in its current early state. I am busy with work, so have not had time to finish this test yet.
Apparently HTTP headers used to be able to fold to the next line, although this is not discouraged. I do not know if this is a problem when relying on PHP's inbuilt $http_response_headers, but it could be when using cURL. I have not yet tested that. Header lines longer than 1024 characters will be ignored according to php.net.
The following test-code is for cURL, and should account for folded lines as well as headers with no name (headers that just start with ":". I'll probably return to fine tune this later if needed.
// C1. Normal headers
$headers_str = "200 Ok\r\n"
// C2. Normal headers
. "test: haaa\r\n"
. "test2: haaa2\r\n"
// C3. Folding of header lines; Note. If I understand the spec correctly, folded lines start with a single blank space
. "date: Thu, 23 Sep \n 2021 06:25:14 GMT\r\n"
// C4. Malformed header that results in an empty name
. ": test\r\n";
// Define the $response_headers array for later use
$response_headers = [];
// C1. Get the first line (The Status Code)
$line = strtok($headers_str, "\r\n");
$status_code = trim($line);
$last_header = null;
// C2. Parse the string, saving it into an array instead
while (($line = strtok("\r\n")) !== false) {
// C3. If the header is folded over multiple lines
if ($line[0] !== ' ') {
if ((false !== ($matches = explode(':', $line, 2)))
// C4. Ignore if name is empty
&& ($matches[0] !== '')) {
$response_headers["{$matches[0]}"] = trim($matches[1]);
$last_header = $matches[0];
}
} else {
if ($last_header !== null) {
$response_headers["$last_header"] .= $line;
}
}
}
echo "\n$status_code\n\n";
print_r($response_headers);
Conclusion
It does not seem matter much in practice whether we use string functions or regular expressions — at least not for simple stuff such as this. But I still recommend using what is know to be the fastest option.
In this case, without the trim function, using preg_match is only about 10% slower than stripos and substr — but a massive 68% slower than using explode.
While this may look like a lot — and it is under some circumstances — we should keep in mind that benchmark tests will often execute a script more than a million times in order to get a more clear picture. This means that we would need hundreds or thousands of concurrent users before we will notice any meaningful difference.
However, I will personally always pick the solution that I know to be faster, especially when the solutions are so easy to work with. preg_match is only going to be easier to read if the developer understands regular expressions; in practice this should not matter, and a good developer should be willing to learn both.
Tell us what you think:
This solutions for parsing headers is incorrect because can
truncate some header values.
For example header:
Date: Date: Thu, 23 Sep 2021 06:25:14 GMT
After
if(false !== ($matches = explode(':', $line, 2))) {
$response_headers["{$matches[0]}"] = trim($matches[1]);
}
we have "Thu, 23 Sep 2021 06" as header value that incorrect.
The right way is
if(false !== ($matches = explode(':', $line))) {
$name = $matches[0];
array_shift($matches);
$value = trim(implode(":", $matches));
$response_headers[$name] = $value;
}
@Igor D
- Date: Date: Thu, 23 Sep 2021 06:25:14 GMT
The header you mention is malformed from the server-side, so of course it does not work. The duplicated "Date:" should not be there.
There is nothing to do about headers that are malformed except to ignore them.
@Jacob
You're wrong, but right, but wrong, but right...?
The extra "Date:" looks like a typo in the post, but shouldn't be a problem since it would just mean the header value starts with "Date:" (semantically wrong, but syntactically valid).
Besides, the complaint is about truncation after "06". But that shouldn't happen because you're only exploding into (up to) 2 pieces, so later colons shouldn't matter. $response_headers[$matches[0]]=trim($matches[1]) should work even if "Date:" is duplicated. It'll only trigger a notice if there was no colon on the line at all (and that WOULD be an invalid header).
Where your code would break is in the presence of headers that wrap
around to the next line, but those are obsolete so you shouldn't be seeing them anyway.
Having the duplicated "Date:" does not make the header malformed - it just means the value of the header starts with "Date:". Besides, I think that's a typo, because the problem complained about is the truncation after "06".
@Watson
I'd still say it's malformed – typo or not. You can correct small typos in headers if you want, but then there are a lot of other headers you need to correct as well. I'd just ignore them.
I tried to reproduce the truncation several times, and it does not happen for me, so I assume OP has a mistake in their own code.
E.g. Quick test code:
$test_string = "Date: Thu, 23 Sep 2021 06:25:14 GMT";
print_r(explode(":", $test_string, 2));exit();
As for your example with the wrapped header, I would like to support that for backwards compatibility, but it looks like PHP itself does not even support it.
I am not even sure how you would test it from PHP; sending a header() that begins with space seem to completely error a browser. I tried both these:
header("Date: Thu, 23 \n Jan 2022 07:10:10 GMT");
And:
header("Date: Thu, 23 ");
header(" Jan 2022 07:10:10 GMT")
The latter will result in "ERR_HTTP2_PROTOCOL_ERROR"
Maybe I need to test that on an older HTTP version. It does not look like sending a wrapped header is supported from PHP anyway.
Besides, if you rely on the $http_response_header array, then PHP would probably already have combined any wrapped headers automatically, if they support it, so the error is more likely in core PHP code. That sort of thing can be hard to work around.