Skip to content

Zipdownload: various improvements to this plugin#10175

Draft
gurnec wants to merge 8 commits into
roundcube:masterfrom
gurnec:zipstream
Draft

Zipdownload: various improvements to this plugin#10175
gurnec wants to merge 8 commits into
roundcube:masterfrom
gurnec:zipstream

Conversation

@gurnec

@gurnec gurnec commented May 15, 2026

Copy link
Copy Markdown
Contributor

This includes several improvements to zipdownload. Feel free to cherry-pick what looks useful, or ask me to split this into multiple PRs, or whatever makes most sense.

  • Currently the entire zip is written to the output buffer before sending which can easily cause an OOM. Instead, flush every 512 KiB.
  • Update the .htaccess, see 1972a9e for details.
  • For maildir exports, change the modified-time of each .eml file to the IMAP internal time to allow sorting extracted emails.
  • Change the default charset in the config to UTF-8, see 286c893.
  • The 1b55aba refactor commit shouldn't change any behavior (if it does it's a bug), it exists solely to make the later commits easier to follow.

The biggest change is adding optional support for the ZipStream library. This allows streaming emails from the storage backend to the web server while zipping them without any temporary files and without much additional RAM overhead.

It's feasible to download entire INBOXes of multiple GB with this update. Without, temporary disk space of around 130% of the total email size is required.

Potential issues so far:

  • No config option to choose which method, ZipArchive or ZipStream. Instead ZipStream is always chosen if available with a fallback to ZipArchive which may be good enough?
  • ZipArchive is required to show the menu items even when ZipStream is available. This is done to avoid autoloading ZipStream just for displaying the menu, though it's not ideal.
  • No new unit tests, though E2E does test basic ZipStream downloads which may be sufficient?
  • No documentation yet, should probably add something to the zipdownload config or README. And CHANGELOG.md?
  • I added maennchen/zipstream-php as a suggest to the main composer.json, I'm not sure if I should have only added it to the zipdownload composer.json?
  • Is there any interest in adding maennchen/zipstream-php to the Makefile for the complete .tar?

I'm happy to make any requested changes given some feedback. In the mean time, I'll loosely quantify the "without much additional RAM" claim in a follow-up.

gurnec added 8 commits May 15, 2026 11:13
 * Apache doesn't trust Content-Length headers by default; it will strip
   them and, only if length is known (on disk/all buffered), add it back
 * Brotli already avoided compressing files already compressed,
   do the same with the deflate method
 * Don't recompress zips created by the zipdownload plugin
It's already the default if unspecified, and the zip format has
supported UTF-8 since 2006, so there's little reason to do otherwise.

https://www.loc.gov/preservation/digital/formats/digformatspecs/APPNOTE(20060929)_Version_6.3.0.txt
Mbox format not yet implemented
Using a custom StreamInterface to transfer emails between $imap->get_*
and ZipStream::addFile* in 512 KiB chunks reduces memory usage when
larger emails or attachments (34MiB) are present to about a fifth
of the version without the custom StreamInterface.

This also makes implementing the Mbox format easy.
@gurnec

gurnec commented May 16, 2026

Copy link
Copy Markdown
Contributor Author

This is the result of entirely unscientific testing on a small VM, running Dovecot/Apache/php-fpm and little else, with one vcpu and about 600MB free RAM. I ran each test a few times in a row, which may have warmed up the disk cache, and took the best results from each.

I recorded three metrics for each test (the column groups): RAM used in MB, time taken in seconds, and zip files size in MB. For time and RAM, I took them from the php-fpm status page.

I ran three test scenarios (the columns):

  1. Attachment download - an email with two attachments, one 24.5 MB and one 0.5 MB (the worst case scenario given a max message size of about 34 MB)
  2. Maildir export, all emails
  3. Mbox export, all emails

I tested four implementations (the rows), each also included PR #10151.

  1. ZipArchive - nothing else
  2. Zipstream w/strings - the "initial ZipStream support" commit ab7d10a
  3. Zipstream w/memory - a commit (not online) which uses php://memory files instead of strings
  4. Zipstream w/fibers - this entire PR

The email corpus (for test scenarios 2 & 3) includes:

  • 10996 emails
  • 830 MB
  • 389 "From " lines which needed escaping
  • 4 emails > 34 MB in size (these are the worst-case emails which push up memory usage for implementations 2 & 3)
  • 67 emails > 512 KB in size

I would expect implementations 2 & 3 to scale upwards in RAM usage in proportion to the size of the largest email (or attachment for that test) included in the download, and for implementations 1 & 4 to remain constant.

Finally I should note that it's "unfair" to compare ZipArchive to ZipStream given that the former uses its default zip compression level (of 9), and the latter specifies a level of 1.

Here are the results:

Attach. RAM Maildir RAM Mbox RAM Attach. time Maildir time Mbox time Attach. zip Maildir zip Mbox zip
ZipArchive 34 10 16 1.5 89.6 84.5 25 331 324
Zipstream w/strings 119 131 n/a 2.2 53.2 n/a 25 354 n/a
Zipstream w/memory 95 97 95 2.6 54.0 56.6 25 354 348
Zipstream w/fibers 26 28 34 1.8 51.2 48.6 25 354 348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant