Beamtic's logo

Share via:

Beware of preg_replace and Disappearing Backslashes

Backslashes in replacement string seem to disappear when the replacement is done using preg_replace

128 views

Edited: 2020-11-26 08:15

I recently noticed an issue with preg_replace that would cause backslashes to disappear in the replacement string. This issue really shocked me, since you would not expect the replacement to be influenced at all — often you will want to keep it intact without modifying it.

This is so dumb that you would think it was bug — users do not expect the replacement string (needle) to change — however, as it turns out, it actually will do just that.

The problem probably happens because the backslash is used for back-references in regular expressions, and therefor it needs to be escaped. Yes, it is probably more common to use $1, $2, $3 for backreferences, but apparently \\1, \\2, \\3 will also work.

Lets say a replacement string contains four backslashes somewhere, it could look like this <pre>\\\\</pre>, and you need to place this at a specific location in your HTML; if you use preg_replace to do this, then you will loose backslashes doing the replacement operation, and the final output will be corrupted!

The following test code demonstrates the problem:

$html = '<div>REPLACEMENT_ID</div>';

$replacement_id = 'REPLACEMENT_ID';
$replacement = '<pre>\\\\</pre>';


$html = preg_replace("|{$replacement_id}|", $replacement, $html);
echo $html;

Corrupted Output:

<div><pre>\</pre></div>

Expected output:

<div><pre>\\\\</pre></div>

So what can we do to solve the problem? Since we can not use preg_quote around the replacement string, as that will just add unwanted backslashes all over the place, I think one option is to not use preg_replace; we can instead use str_replace — which is also faster!

Another option would be to escape the backslashes. How to do this will depend on where you got the data from. If the data is stored in a variable inside a PHP script, then each literal backslash should be double-escaped. If you load the data from a database or file, then you only need to escape each literal backslash once.

Escaping backslashes can be done with str_replace; addslashes should not be used, since it will also add slashes to other characters such as single quote (') and double quote (").

Using str_replace instead of preg_replace

It is true that str_replace is faster, but it is also harder to work with, and in practice, the speed difference will not matter for most users; but, as I have recently discovered, issues with preg_replace means that str_replace is probably the safer choice.

To make matters even worse, you can also not have a literal backslash in a string that is defined within a PHP script; this makes the problem even harder to debug. To solve this problem you can use addslashes while testing — I am not sure how safe it is to use it in a live environment, however!

The same replacement can be performed like this using str_replace:

$html = '<div>REPLACEMENT_ID</div>';

$replacement_id = 'REPLACEMENT_ID';
$replacement = addslashes('<pre>\\\\</pre>');
$html = str_replace($replacement_id, $replacement, $html);
echo $html;

Output:

<div><pre>\\\\</pre></div>

Alternatively, if you either load the "data with backslashes" from a file or from a database, you will only have to think about escaping the backslashes in preg_replace.

Comments