Run our large tests in parallel in GitHub actions#1098
Conversation
008ad58 to
8b85a90
Compare
| branches: | ||
| - "test-large/**" | ||
| schedule: | ||
| - cron: "3 0 * * *" # Run every day at 03:00 UTC. |
There was a problem hiding this comment.
I'm actually not sure we should run this every day because it's a lot of processing to do. I would expect we do it manually before a release instead, as we do right now.
There was a problem hiding this comment.
How about weekly then? My thinking is that we need to catch regressions as early as possible, and not at release time. Especially if we plan to release container images more frequently than that.
| run: |- | ||
| sudo apt update -y | ||
| sudo apt install -y \ | ||
| podman git-lfs libxml2-utils \ |
There was a problem hiding this comment.
We might not actually need podman here. My understanding is that we only merge the results together.
| run: |- | ||
| poetry run python tests/test_docs_large/report.py \ | ||
| ${{ runner.temp }}/final_results/results.xml \ | ||
| | tee ${{ runner.temp }}/final_results/report.txt |
There was a problem hiding this comment.
I'm not sure to have all the context, but should this report.py be on this repo rather than on the dangerzone-test-set one? 🤔
There was a problem hiding this comment.
It's there actually: https://github.com/freedomofpress/dangerzone-test-set/blob/main/report.py. I can move the merge_large_tests_results.py file there as well, if it makes sense to you.
There was a problem hiding this comment.
Yeah, having everything at the same place makes sense :-)
There was a problem hiding this comment.
I've tried to consolidate all the scripts related with large tests under ./dev_scripts/large_tests (see ea3c259). Hopefully this will make things simpler.
| @@ -0,0 +1,76 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Should we try to rely on already existing tools for this, or what we do is somehow too specific? A quick lookup on the PyPI index got me this: https://pypi.org/project/junitparser/
There was a problem hiding this comment.
D'oh, I didn't even check for that 🤦 . Great find, I'll incorporate it in this dev script.
There was a problem hiding this comment.
Sigh, if you don't mind, that's something I will not pursue for now. I'm fine for this script to reinvent the wheel for the time being, but I wish I knew about junitparser sooner.
df3a4b3 to
ec43eff
Compare
|
I have rebased this PR on top of (note, I couldn't upload the original Besides the change in the container image, this new run contains ~400 more documents, because Dangerzone no longer hangs when processing group 13. Here is the difference between the new run and the previous one: --- /home/apyrgio/Downloads/large-tests-report-0.9.0.txt 2025-10-22 18:03:37.968353083 +0300
+++ /home/apyrgio/Downloads/large-tests-report-0.10.0.txt 2025-10-22 18:04:23.215343422 +0300
@@ -1,10 +1,10 @@
==== RESULTS SUMMARY ===
errors: 0
- failures: 84
+ failures: 92
skipped: 0
- tests: 7819
- failure rate: 0.010743061772605193
+ tests: 8230
+ failure rate: 0.01117861482381531
=== TEST OVERVIEW ===
@@ -32,51 +32,127 @@
=== MOST COMMON CONTAINER OUTPUT ===
Top 30:
- 7737
- 2893 convert /tmp/input_file -> /tmp/input_file.pdf using filter : writer_pdf_Export
- 1236 convert /tmp/input_file -> /tmp/input_file.pdf using filter : calc_pdf_Export
- 693 convert /tmp/input_file -> /tmp/input_file.pdf using filter : impress_pdf_Export
- 139 Installing LibreOffice extension 'hXorestart.oxt'
- 129 Archive: /opt/libreoffice_ext/hXorestart.oxt
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/TypeDetection.xcu
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_types.xcu
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_filters.xcu
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/logger.properties
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/icon/HXOrestart.png
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_ko.txt
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_en.txt
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description.xml
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/META-INF/manifest.xml
- 129 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/HXOrestart.jar
- 46 convert /tmp/input_file -> /tmp/input_file.pdf using filter : draw_pdf_Export
- 35 Conversion to PDF with LibreOffice failed
- 24 document closed or encrypted
- 10 libpng warning: iCCP: known incorrect sRGB profile
- 10 bad start page number
- 9 Corrupt JPEG data: premature end of data segment
- 6 The document format is not supported
- 2 libpng warning: sBIT: invalid
- 2 kill container: No such process
- 2 cannot find page X in page tree
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container ffeXeebdXcccXeXadXdfbfeXeXfdXdaXeXcXcXcaXfX: container has already been removed"
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container fdeXeXddaXaXdcffXfXeXeXafdXdXcdXcXfX: container has already been removed"
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container fdaXbXbaXdaXeXcXdaXfXeXfXdeeXfXffXcXdbX: container has already been removed"
- 1 time="X-X-XTX:X:XZ" level=error msg="forwarding signal X to container fdXfXaXeXaXcXacbfXfXdXeXfXaXcdXbXbfXebXfaXdbXdaX: no container with ID fdXfXaXeXaXcXacbfXfXdXeXfXaXcdXbXbfXebXfaXdbXdaX found in database: no such container"
+ 10099
+ 3055 convert /tmp/input_file as a Writer document -> /tmp/input_file.pdf using filter : writer_pdf_Export
+ 1303 convert /tmp/input_file as a Calc document -> /tmp/input_file.pdf using filter : calc_pdf_Export
+ 735 convert /tmp/input_file as a Impress document -> /tmp/input_file.pdf using filter : impress_pdf_Export
+ 553 MuPDF error: syntax error: syntax error in content stream
+ 240 MuPDF error: unsupported error: cannot create appearance stream for BCLC_PDmorFZone annotations
+ 143 Installing LibreOffice extension 'hXorestart.oxt'
+ 133 Archive: /opt/libreoffice_ext/hXorestart.oxt
+ 133 extracting: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/mimetype
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/types.rdb
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/TypeDetection.xcu
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_types.xcu
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/registry/HXOrestart_filters.xcu
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/logger.properties
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/icon/HXOrestart.png
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_ko.txt
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description/desc_en.txt
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/description.xml
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/META-INF/manifest.xml
+ 133 inflating: /usr/lib/libreoffice/share/extensions/hXorestart.oxt/HXOrestart.jar
+ 103 MuPDF error: syntax error: syntax error in array
+ 93 MuPDF error: syntax error: invalid key in dict
+ 68 MuPDF error: syntax error: unknown keyword: 'X.X.X'
+ 48 convert /tmp/input_file as a Draw document -> /tmp/input_file.pdf using filter : draw_pdf_Export
+ 41 Conversion to PDF with LibreOffice failed
+ 34 MuPDF error: unsupported error: cannot create appearance stream for Screen annotations
+ 32 MuPDF error: syntax error: cannot find ExtGState resource ''
+ 29 MuPDF error: syntax error: cannot find ExtGState resource 'RX'
+ 26 MuPDF error: syntax error: unknown keyword: '-XeX'
+ 25 document closed or encrypted
=== FAILURE REASONS ===
All failures:
- 195 Converting page X/X to pixels
- 35 Conversion to PDF with LibreOffice failed
- 24 document closed or encrypted
- 10 bad start page number
+ 196 Converting page X/X to pixels
+ 185
+ 64 MuPDF error: syntax error: syntax error in content stream
+ 41 Conversion to PDF with LibreOffice failed
+ 25 document closed or encrypted
+ 21 MuPDF error: format error: corrupt object stream (X X R)
+ 16 Traceback (most recent call last):
+ 16 Exception was from C++/Python callback:
+ 16 ========
+ 16 virtual void SwigDirector_DiagnosticCallback::_print(const char*)
+ 16 <frozen runpy>:X:_run_module_as_main():
+ 16 <frozen runpy>:X:_run_code():
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/utils.py:X:get_pixmap(): dl = page.get_displaylist(annots=annots)
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/__init__.py:X:message(): print(text, file=_g_out_message, flush=X)
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/__init__.py:X:get_displaylist(): dl = mupdf.fz_new_display_list_from_page(self.this)
+ 16 /usr/lib/pythonX/dist-packages/pymupdf/__init__.py:X:JM_mupdf_error(): message(f'MuPDF error: {text}\n')
+ 16 /usr/lib/pythonX/dist-packages/mupdf/__init__.py:X:fz_new_display_list_from_page(): return _mupdf.fz_new_display_list_from_page(page)
+ 16 /usr/lib/pythonX/dist-packages/mupdf/__init__.py:X:_print(): self.printfn( message)
+ 16 /usr/lib/pythonX/dist-packages/mupdf/__init__.py:X:_print(): raise
+ 16 /usr/lib/pythonX.X/asyncio/runners.py:X:run(): return self._loop.run_until_complete(task)
+ 16 /usr/lib/pythonX.X/asyncio/runners.py:X:run(): return runner.run(main)
+ 16 /usr/lib/pythonX.X/asyncio/events.py:X:_run(): self._context.run(self._callback, *self._args)
+ 16 /usr/lib/pythonX.X/asyncio/base_events.py:X:run_until_complete(): self.run_forever()
+ 16 /usr/lib/pythonX.X/asyncio/base_events.py:X:run_forever(): self._run_once()
+ 16 /usr/lib/pythonX.X/asyncio/base_events.py:X:_run_once(): handle._run()
+ 16 /opt/dangerzone/dangerzone/conversion/doc_to_pixels.py:X:main(): await converter.convert()
+ 16 /opt/dangerzone/dangerzone/conversion/doc_to_pixels.py:X:convert(): pix = page.get_pixmap(dpi=DEFAULT_DPI)
+ 16 /opt/dangerzone/dangerzone/conversion/doc_to_pixels.py:X:<module>(): sys.exit(asyncio.run(main()))
+ 10 MuPDF error: syntax error: syntax error in array
10 Installing LibreOffice extension 'hXorestart.oxt'
+ 8 Director exception handler, message is:
6 The document format is not supported
- 3
+ 3 MuPDF error: syntax error: unknown keyword: 'X.X.X'
+ 3 MuPDF error: syntax error: unknown keyword: '.X.X'
+ 3 MuPDF error: syntax error: invalid key in dict
+ 3 MuPDF error: syntax error: cannot find ExtGState resource 'RX'
+ 3 MuPDF error: format error: cannot find object in xref (X X R)
+ 3 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udccd' in position X: surrogates not allowed
+ 3 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udccd' in position X: surrogates not allowed
+ 3 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: '-X.X\udccd'\n", X, X, 'surrogates not allowed')
2 kill container: No such process
- 2 cannot find page X in page tree
- 1 image width is zero (or less)
- 1 Corrupt JPEG data: premature end of data segment
+ 2 code=X: cannot find page X in page tree
+ 2 MuPDF error: syntax error: unknown keyword: 'jX'
+ 2 MuPDF error: syntax error: unknown keyword: 'Xd'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.Xa'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.X.X.X'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.X.'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.X'
+ 2 MuPDF error: syntax error: unknown keyword: 'X.'
+ 2 MuPDF error: syntax error: unknown keyword: '-XsTd'
+ 2 MuPDF error: syntax error: unknown keyword: '-XP.'
+ 2 MuPDF error: library error: zlib error: invalid distance too far back
+ 2 MuPDF error: format error: non-page object in page tree
+ 2 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcfa' in position X: surrogates not allowed
+ 2 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udceX' in position X: surrogates not allowed
+ 2 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcfa' in position X: surrogates not allowed
+ 2 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udceX' in position X: surrogates not allowed
+ 1 MuPDF error: syntax error: unknown keyword: 'Xu.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'Xr'
+ 1 MuPDF error: syntax error: unknown keyword: 'XmX.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'Xm'
+ 1 MuPDF error: syntax error: unknown keyword: 'Xf'
+ 1 MuPDF error: syntax error: unknown keyword: 'XTX.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'XP.'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.XiX.'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.Xi'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.XhX'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.XTJ'
+ 1 MuPDF error: syntax error: unknown keyword: 'X.X.Xo'
+ 1 MuPDF error: syntax error: unknown keyword: 'X..X.Xo'
+ 1 MuPDF error: syntax error: unknown keyword: 'X..X.X.X.X.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'T.X'
+ 1 MuPDF error: syntax error: unknown keyword: 'SX'
+ 1 MuPDF error: syntax error: unknown keyword: 'ETX'
+ 1 MuPDF error: syntax error: unknown keyword: '.X.'
+ 1 MuPDF error: syntax error: unknown keyword: '-XnX'
+ 1 MuPDF error: syntax error: unknown keyword: '-XA'
+ 1 MuPDF error: syntax error: unknown keyword: '-X.XsTd'
+ 1 MuPDF error: syntax error: unknown keyword: '-X.Xf*'
+ 1 MuPDF error: syntax error: unknown keyword: '-X..X'
+ 1 Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcX' in position X: surrogates not allowed
+ 1 DiagnosticCallback[error]::s_print() ignoring exception from _print(): Director error: <class 'UnicodeEncodeError'>: 'utf-X' codec can't encode character '\udcX' in position X: surrogates not allowed
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: 'Xa\udcfaX.X'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: 'X.X\udceX'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: 'X.X\udcX'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: '-X.Xa\udcfaX.X'\n", X, X, 'surrogates not allowed')
+ 1 DiagnosticCallbackPython[error]._print(): Warning: exception from self.printfn=<function JM_mupdf_error at XxXedX>: e=UnicodeEncodeError('utf-X', "MuPDF error: syntax error: unknown keyword: '-X.X.X\udceX.X'\n", X, X, 'surrogates not allowed')
=== TIMEOUTS === |
|
With the goal of not letting this PR open for too long, I'm proposing we merge it as-is, as it's still better than having to maintain the branch on the long run. We can open issues about it if we want. The standing comments are small adjustments, shouldn't stand in the way of merging this, IMO. |
This reverts commit f44df6f.
There is no need to OCR the documents during large tests, since this operation now happens on the host, and is very costly.
ec43eff to
c3d2b71
Compare
|
@almet: I've added a few more fixes to this branch, based on your comments. The only thing I didn't pursue is the rewrite of the script that merges test results, which hopefully won't be much of an issue. At the same time, I've added a fix (see c3d2b71) that ignores our large tests when running |
43ec1d7 to
18b25e8
Compare
|
We had a discussion on this front, and we decided that this PR should wait before it gets merged, because we are in the midst of separating the Dangerzone repo in two, where the Dangerzone converter/image will be in a separate repo. This repo should be the target of these tests, and the testing logic may have to change in order to get merged there. Again, the good thing about this PR is that we can run the tests any time we want, so the only thing that we currently miss is the biweekly cadence. |
Our large tests where always hard to run on our own infra, because they require too much time, and too much space. Thankfully, GitHub actions allows us to run 20 concurrent jobs, which we can use it to our advantage. In this PR we:
pytest-test-groupsdependency, which can split a large pytest run into groups, and sample them with a pre-determined seed.Makefile, and define some envvars that control the number of groups, the specific group to run, and the random seed.large-tests.yml, which installs the necessary requirements, builds the container image (if not built already), and runs each group in a separate Ubuntu runnerCloses freedomofpress/dangerzone-image#13