Working with Shopware's convoluted media folders
Shopware has a convoluted media storage implementation, and Media URLs can break when you move server or create a new test environment for an existing Shopware installation, but luckily it is usually easily fixed.
By. Jacob
Created: 2023-10-24 22:10
Shopware's way of storing and working with media files is extremely convoluted, and unnecessarily so for most use cases. If someone accidentally deletes a file from the media library, then you can not simply restore it by restoring it from a .tar.gz backup file, because the media folder structure you see in the administration backend is just a virtual abstraction that does not actually exist physically in the file system. This is extremely unfriendly and inhumane, and unfortunately it also means that the media and thumbnail paths can break in various ways if you do not understand the system properly.
Of course, there may be performance considerations behind this convoluted mess, but even so, on modern file systems you should still be able to keep tens of thousands, if not more than a hundred thousand files, before you start noticing performance issues. Most users are extremely unlikely to reach this huge number of files in any single folder, if based on a more human readable category based directory structure. More on that later.
Note. You cannot simply switch SHOPWARE_CDN_STRATEGY_DEFAULT and expect things to be okay, as your URLs may break! Even if you do manage to switch, and subsequently update the URLs for the linked thumbnails throughout your content, keep in mind that your old thumbnails could still be deleted doing a clean up, and therefor these old thumbnails could result in lots of 404s down the road.
What if suddenly the media URLs are broken
Besides it being an extremely distracting and annoying waste of time, it is actually easily fixed.
If suddenly you find that the media paths in one of your Shopware environments are inaccurate and images are not showing in the storefront, then fear not! It can usually be corrected, and thumbnails can be re-generated.
For some reason, the media and thumbnail paths is sometimes based on the servers PHP timezone (dependent on SHOPWARE_CDN_STRATEGY_DEFAULT), so if you move your site to another server (environment) with a different PHP timezone, the media thumbnail URLs can suddenly break. E.g. If you use a physical_filename setting. But, Simply adjusting the timezone and re-generating your thumbnails may fix the issue.
The following values can be used for SHOPWARE_CDN_STRATEGY_DEFAULT:
1. physical_filename – Thumbnail URLs are based on an MD5 hash, generated from timestamp and filename. E.g:
/thumbnail/ff/c0/22/1698842240/man-with-funny-hat_1920x1920.png
2. filename – Thumbnail URLs are based on an MD5 hash, generated from the filename.
3. id Thumbnail URLs are based on an MD5 hash generated from the media ID.
4. plain – Thumbnail URLs are based on timestamp alone. E.g:
/thumbnail/1698842240/man-with-funny-hat_1920x1920.png
See also: https://github.com/shopware/shopware/tree/trunk/src/Core/Content/Media/Pathname/PathnameStrategy
For most of us, which strategy to use matters very little, as it is a "virtual" abstraction of sorts. It would have been nice with the option to arrange the media in a "human readable" way instead of this confusing system.
How to fix broken media paths
Here are a few things you should try out:
1. The timezone needs to be adjusted both in your CLI and FPM environments, as they use separate configurations. E.g:
- /etc/php/8.2/fpm/php.ini
- /etc/php/8.2/cli/php.ini
- /etc/php/8.2/fpm/conf.d/10-custom.ini (Recommended)
- /etc/php/8.2/cli/conf.d/10-custom.ini (Recommended)
At least on Debian based distributions you should add your own custom settings in the conf.d directory (usually located at within /etc/php/8.2/fpm/conf.d/ – this directory is intended for user modifications. If you create a 10-custom.ini file, you should note the 10- part of the name is important! It means the file will be prioritized over files with a higher number or no number in the file name.
You can fill the file with typical content, such as memory_limit and upload_max_filesize. I use the following in a docker container:
[global]
error_log = /dev/stderr
log_level = notice
date.timezone = Europe/Berlin
[PHP]
max_execution_time = 60
memory_limit = 1024M
error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT
display_errors = On
upload_max_filesize = 200M
zend.detect_unicode = 0
opcache.interned_strings_buffer = 20
2. Make sure the SHOPWARE_CDN_STRATEGY_DEFAULT environment variable matches that of your original server. If it is missing, simply removing it from your new environment as well should make Shopware use the default setting physical_filename – this setting is dependent on the timezone setting, because it uses a timestamp as a salt to the MD5 based file path! – Then regenerate the thumbnails.
3.
Media thumbnails can be regenerated by running:
bin/console media:generate-thumbnails
You may need to delete the old thumbnails first:
rm -rf /var/www/shopware/public/thumbnails/*
This was one of the first things I noticed when I started working on Shopware, and although I have gotten used to it, it has recently wasted a few hours of my time again as I was setting up an environment in Docker.
Media and thumbnail URLs in Shopware
If you are confused about the unfriendly folder names for media and thumbnails, then you are not alone. It is one of the things that truly baffles me, because I find it super unnecessary. Seemingly there is little documentation as to why it is constructed in this unfriendly and convoluted way; this is my theory as to why it is this way, and not instead using a more human-readable and logical file structure.
File systems tend to perform very well, so even with tens of thousand files in a single folder, you should not notice significant impact on performance. I am not sure of the specifics, but it is more than what most users can realistically reach in the lifetime of their shop. I have a suspicion it even matters less for modern file systems, and it seems even less for web servers that are just serving individual files rather than doing directory listings. Of course, as hardware becomes more powerful the issues are also mitigated; there is going to be a significant difference between an SSD and old mechanical hard disk.
Test with 50.000 files
I performed a test by creating 50.000 files of 1Kb to 20Kb in size. All files were created in a single folder on an EXT4 based file system, on a SSD based storage. Here is a bash script used to create 50.000 files of varying sizes:
#!/bin/bash
# Min- and max size of the files to generate
min_size=1024
max_size=20480
$testdirname="50000filestestdir"
mkdir="$testdirname"
# Directory where you want to create the files
target_directory="$testdirname/"
# Check if the target directory exists
if [ ! -d "$target_directory" ]; then
echo "Target directory does not exist."
exit 1
fi
# Number of files to create
num_files=50000
# Create the files
for ((i = 1; i <= num_files; i++)); do
# touch "$target_directory/file_$i.txt"
# Generate a random size between min_size and max_size
size=$((min_size + RANDOM % (max_size - min_size + 1)))
# Use dd to generate random data and save it to the output file
dd if=/dev/urandom of="$target_directory/file_$i.txt" bs=1 count="$size"
done
echo "Created $num_files files in $target_directory."
In theory, certain commands such as ls should get slower as the number of files increases, it should be ideal for testing because the effect is more pronounced for this command.
Performing a single ls on this directory still finished in less than a second:
ubuntu@testserver:~$ time ls dirtest/
real 0m0.373s
user 0m0.248s
sys 0m0.124s
This huge number of files does not seem to have significant impact on performance. Of course, the picture will be different with concurrent users and other processes running. But, if the number of files will be so big before it matters, why make things more convoluted for the average user?!
In fact, this is not just annoying to users. Even developers will think these automatically generated directory trees disheartening to work with, because for the most part it is an unnecessary abstraction. There may be performance gains, but you will also need some way to index the files, and that is probably going to be in your database. You can no longer simply move the files, because then your database index will be inaccurate if you do!!
A co-worker once accidentally deleted a whole media folder and asked me to restore it; this would normally be a very trivial matter of simply copying the deleted folder from an older .tar.gz backup. I was new to Shopware at the time, and I was truly baffled at how complex such an otherwise simple task was. Do not make something "smarter", without providing replacement functionality for the features you remove in the process of making things "smarter".
Be that as it may, certain actions do become slower as the number of files in a folder increases. This can be tested by performing a simple ls command to list the contents of a folder.
A user-friendly file structure
To avoid this, developers can employ various strategies to improve perfomance, such as distributing the files in a directory structure where directories are named by upload date, and the files themselves are indexed in a database. I personally dislike that approach, so for most things I would personally prefer a more logical approach. E.g. A structure based on category names instead:
- Root - top category
- Cars
- parked-tesla.jpg
- ford-driving-on-highway.jpg
- ...
- Airplanes
- airbus-in-flight.jpg
- boeing747-in-flight.jpg
- ...
- Cars
Ideally, we should allow the user to organize things from the CMS, and the CMS should only be a reflection of a physical file system structure. If users want to "abuse" the top category for keeping all their files, then we have no right to say they should not do so. Even if they did, any performance impact would not be noticeable until hitting tens of thousands of files.
This is much, much more user-friendly to humans working with the file system, and given users will actually learn to categorize their files properly, it will also delay and mitigate potential performance issues from having "too many" files in a single folder. In fact, this is not much different from how a human would organize their private photo albums, and of course another advantage to this is that image URLs will look cleaner and more friendly in the storefront, with the potential to add more keywords (category names) as part of the URL.
But, perhaps most importantly, you will not be creating unnecessary abstractions to do something that the file system itself ideally should be handling on its own.
Relying on the file system directly can enable us to easily backup and restore existing files, as well as upload files via SFTP directly to the folder categories, entirely outside of the web interface of a given CMS. This is super intuitive, and IMO the ideal way to handle physical files.
Final words about this article
This article is a total distraction from what I should have been doing tonight, and it probably took me a couple of hours to write it.
I realize it became very technical about file systems and my personal preferences, which is not super relevant to most readers, but it is a reflection of a recurrent frustration.
There is currently not much about how to configure your media storage. I know it is possible to change these configurations, but I am not sure to what extent and what exactly can be changed. I know you can configure an external media storage, but I am currently only working with a local storage. For these things, I hope to come back and improve the article with more information on a later date, but for now I really need to get on with what I was doing.
Tell us what you think: