How Safe are PHP's File Functions?
Are the bare file- functions in PHP safe to use?
By. Jacob
Edited: 2020-08-25 09:52
You might find yourself considering the safety of using PHP's build-in file- handling functions, such as file_get_contents() and file_put_contents(), in concurrency situations.
I am maybe a bit obsessive when it comes to trying to prevent errors. So, in the past, I would actually think about this issue every time I used the file handling functions. This gave me a nagging feeling that something was not properly accounted for in my code. I even made a couple half-hearted attempts to handle reading and writing to files in one place, but my first attempts was full of flaws.
So, I decided to make a class to handle all this stuff in one place, which I would then include in my other classes via Dependency injection.
But, to get to the point; the build-in, bare, file handling functions in PHP are actually fairly safe to use. However, as with many other things, it depends on what you are using them for as well as the specific circumstances. Ultimately it comes down to weighting the risks.
Educated assumptions
When you use the functions, without properly handling errors and concurrency, you are basically making assumptions about the system the functions are running on. This is fine if you understand what assumptions you are making, and under which circumstances your assumptions become unsafe.
The most obvious assumption you are making is that a file or path is writable. I think most of us have dealt with problems due to this in Linux, and also found solutions for it. A less obvious assumption is that a script will always write data to a file correctly. In concurrency situations, the latter could be costly.
You often find code on the internet that uses the naked file- functions, without properly handling all the situations that might arise. I do not think this is a good idea since you can "easily" make reasonable efforts towards dealing with the most obvious of problems.
However, the chance that something very bad might happen because a file is not writable is usually insignificant. You will, A: most likely have configured the permissions already, since you know this might be a problem, and B: concluded that the chance that someone or something changes the permissions later is insignificant.
In reality, however, you would really want to account for all thinkable cases in a critical system, since it could cause other, more serious problems if suddenly a file is corrupted or no longer writable.
Usually I would want to do this from the start, regardless of how critical the system or application is. The reason is, it is hard to tell when/if an application will become critical in the future—and it takes little effort to do!
Let's say I am sometimes working on the live site of Beamtic, and maybe I am changing permissions on a sub-directory and accidentally change the permissions on a directory that needs to be writable by my CMS; this has happened before, and has also wasted time as I was trying to debug the issue — but now that I got a decent file handler, I can usually tell very easily when there is a problem with the file permissions somewhere.
Concurrency should probably be handled
As a minimum, you would probably want to handle concurrency.
The problem with the bare file- functions maybe first starts to show, when you have concurrent users. That is, when two or more users tries to write to a file at almost the same time. This might cause a file corruption, and/or loss of data, so it is important we try to deal with it.
As far as I know, reading from a file is not usually a problem; but it is when someone is attempting to write to the file before another user finishes reading it — this will result in the data being inaccurate (currupted) for the user reading from the file.
In my own projects, I made this class to deal with the problem.
Concurrency and file- handling
When two or more users request a script (usually via HTTP), that relies on the file- functions to read or write to a file, what is known in programming as a race condition might happen.
It is not safe to simply append the written data to the end of the file, since there still is a change you might corrupt the data. The LOCK_EX is an attempt to deal with concurrency, and it should work fairly well on both Windows and Linux.
In many cases, you could assume that the file or location is writable, and then simply do like this:
file_put_contents($file, $input, FILE_APPEND | LOCK_EX);
In other cases, you might also want to account for situations where the file permissions has either changed, or where the system permissions does not allow writing to the file.
To deal with concurrency, you will be required to use the same type of locking everywhere you use the file functions, in all of your scripts. Therefor, it might be much better to use a class that handles it for you, since this also avoids a problem with developers forgetting to include the locking mechanism. If you do not use the same lock everywhere, locking might break, and you will still end up with corrupted data.
For example, if someone forgets to include the lock like this:
file_put_contents($file, $input, FILE_APPEND);
A previously set lock will be ignored, and data will be written even though the file is locked.
To easily deal with all this, my file handler class should be useful.
Tell us what you think: