Zipdownload: various improvements to this plugin#10175
Conversation
* Apache doesn't trust Content-Length headers by default; it will strip them and, only if length is known (on disk/all buffered), add it back * Brotli already avoided compressing files already compressed, do the same with the deflate method * Don't recompress zips created by the zipdownload plugin
It's already the default if unspecified, and the zip format has supported UTF-8 since 2006, so there's little reason to do otherwise. https://www.loc.gov/preservation/digital/formats/digformatspecs/APPNOTE(20060929)_Version_6.3.0.txt
Mbox format not yet implemented
Using a custom StreamInterface to transfer emails between $imap->get_* and ZipStream::addFile* in 512 KiB chunks reduces memory usage when larger emails or attachments (34MiB) are present to about a fifth of the version without the custom StreamInterface. This also makes implementing the Mbox format easy.
|
This is the result of entirely unscientific testing on a small VM, running Dovecot/Apache/php-fpm and little else, with one vcpu and about 600MB free RAM. I ran each test a few times in a row, which may have warmed up the disk cache, and took the best results from each. I recorded three metrics for each test (the column groups): RAM used in MB, time taken in seconds, and zip files size in MB. For time and RAM, I took them from the php-fpm status page. I ran three test scenarios (the columns):
I tested four implementations (the rows), each also included PR #10151.
The email corpus (for test scenarios 2 & 3) includes:
I would expect implementations 2 & 3 to scale upwards in RAM usage in proportion to the size of the largest email (or attachment for that test) included in the download, and for implementations 1 & 4 to remain constant. Finally I should note that it's "unfair" to compare ZipArchive to ZipStream given that the former uses its default zip compression level (of 9), and the latter specifies a level of 1. Here are the results:
|
This includes several improvements to zipdownload. Feel free to cherry-pick what looks useful, or ask me to split this into multiple PRs, or whatever makes most sense.
.htaccess, see 1972a9e for details..emlfile to the IMAP internal time to allow sorting extracted emails.The biggest change is adding optional support for the ZipStream library. This allows streaming emails from the storage backend to the web server while zipping them without any temporary files and without much additional RAM overhead.
It's feasible to download entire INBOXes of multiple GB with this update. Without, temporary disk space of around 130% of the total email size is required.
Potential issues so far:
README. AndCHANGELOG.md?maennchen/zipstream-phpas asuggestto the maincomposer.json, I'm not sure if I should have only added it to the zipdownloadcomposer.json?maennchen/zipstream-phpto theMakefilefor the complete .tar?I'm happy to make any requested changes given some feedback. In the mean time, I'll loosely quantify the "without much additional RAM" claim in a follow-up.