Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
2188149
ci: Use YAML anchors to factor out distros
apyrgio Oct 22, 2025
a76d070
Update commit for freedomofpress/dangerzone-test-set repo
apyrgio Mar 12, 2025
7717492
Add pytest-test-groups test dependency
apyrgio Mar 12, 2025
db7cad2
Enable splitting our large tests into groups
apyrgio Mar 12, 2025
a335dd5
ci: Add CI job for our large tests
apyrgio Mar 12, 2025
e318786
WIP: Debug with ssh
apyrgio Mar 12, 2025
d329836
Revert "WIP: Debug with ssh"
apyrgio Mar 12, 2025
1b5d7b1
Update the dangerzone-test-set repo
apyrgio Mar 12, 2025
bcef584
Do not OCR the documents during large tests
apyrgio Mar 12, 2025
b7d2808
Fix small bugs in the Makefile for large tests
apyrgio Mar 12, 2025
654a347
fixup! ci: Add CI job for our large tests
apyrgio Mar 12, 2025
0564d18
WIP: Sample a few of the tests to make things go quicker
apyrgio Mar 12, 2025
53f3650
WIP: Add script that combines XML files into one
apyrgio Mar 12, 2025
717d46f
Update dangerzone-test-set repo
apyrgio Mar 12, 2025
c551851
ci: Produce final report
apyrgio Mar 12, 2025
0aec093
REMOVE_ME: Debug job with SSH
apyrgio Mar 12, 2025
1e50703
fixup! ci: Produce final report
apyrgio Mar 12, 2025
1b78d58
Remove debugging
apyrgio Mar 12, 2025
7de3b82
ci: Run the whole test suite
apyrgio Mar 12, 2025
e5514a4
REMOVE_ME: Bypass group 13 because Dangerzone hangs
apyrgio Mar 12, 2025
d7a1c20
Revert "REMOVE_ME: Bypass group 13 because Dangerzone hangs"
apyrgio Oct 22, 2025
2af1811
FIXUP: ICU changes
apyrgio Oct 22, 2025
2b4ac09
FIXUP: Run every Monday at 03:00 UTC
apyrgio Feb 24, 2026
fbf3b21
WIP: Remove Podman
apyrgio Feb 24, 2026
e55783f
Update submodule
apyrgio Feb 24, 2026
c3d2b71
WIP: Skip large tests by default
apyrgio Feb 24, 2026
ea3c259
WIP: Consolidate all large test logic into a single place
apyrgio Feb 24, 2026
adc75d7
Add changelog entry
apyrgio Feb 24, 2026
86dea77
WIP: Reinstate previous version of ci.yml
apyrgio Feb 24, 2026
b8cd984
fixup! WIP: Consolidate all large test logic into a single place
apyrgio Feb 24, 2026
7fd73e6
REMOVEME: Make tests run faster
apyrgio Feb 24, 2026
18b25e8
REMOVEME: Add debugging
apyrgio Feb 24, 2026
da70ad7
fixup! REMOVEME: Add debugging
apyrgio Feb 25, 2026
6cd1cf9
fixup! WIP: Consolidate all large test logic into a single place
apyrgio Feb 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 156 additions & 0 deletions .github/workflows/large-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
name: Large tests
on:
push:
branches:
- "test-large/**"
schedule:
- cron: "0 3 * * 1" # Run at 03:00 on Monday
workflow_dispatch:

# Disable multiple concurrent runs on the same branch
# When a new CI build is triggered, it will cancel the
# other in-progress ones (for the same branch)
concurrency:
group: ${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
# This is already built daily by the "build.yml" file
Comment thread
almet marked this conversation as resolved.
# But we also want to include this in the checks that run on each push.
build-container-image:
name: Build, push and sign container image
uses: ./.github/workflows/build-push-image.yml
with:
registry: "ghcr.io/${{ github.repository_owner }}"
registry_user: ${{ github.repository_owner }}
image_name: "dangerzone-testing/v1"
reproduce: false
sign: true
fill_cache: true
key_name: "tests/assets/dangerzone-testing"
secrets:
registry_token: ${{ secrets.GITHUB_TOKEN }}

run-large-tests:
runs-on: ubuntu-latest
needs:
- build-container-image
strategy:
matrix:
group: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
steps:
- name: Checkout
uses: actions/checkout@v5

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install requirements
run: |-
sudo apt update -y
sudo apt install -y \
git-lfs libxml2-utils \
libqt6gui6 libxcb-cursor0 qt6-qpa-plugins \
python3 python3-poetry make
poetry install

- name: Cache mazette assets
id: cache-mazette
uses: actions/cache@v4
with:
path: |
share/tessdata/
share/vendor/
key: v1-mazette-linux-${{ hashFiles('./mazette.lock') }}

- name: Install mazette assets
if: steps.cache-mazette.outputs.cache-hit != 'true'
run: poetry run mazette install

- name: Restore container image
uses: actions/cache/restore@v4
with:
path: |-
share/container.tar
share/freedomofpress-dangerzone.pub
share/image-name.txt
enableCrossOsArchive: true
fail-on-cache-miss: true
key: v6-container-${{ needs.build-container-image.outputs.image_uri }}

- name: Smoke test before the party begins
run: |-
poetry run ./dev_scripts/dangerzone-cli tests/test_docs/sample-pdf.pdf

- name: Run large tests
continue-on-error: true # We expect test failures
run: |-
export TEST_GROUP_COUNT=20000
export TEST_GROUP=${{ matrix.group }}
poetry run make test-large

- name: Show results
run: |-
cat tests/test_docs_large/results/junit/*

- name: Upload results
uses: actions/upload-artifact@v4
with:
name: results-${{ matrix.group }}
path: tests/test_docs_large/results/junit
if-no-files-found: error
retention-days: 1

report:
runs-on: ubuntu-latest
needs:
- run-large-tests
steps:
- name: Download results
uses: actions/download-artifact@v4
with:
path: ${{ runner.temp }}/results
pattern: results-*
merge-multiple: true

- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install requirements
run: |-
sudo apt update -y
sudo apt install -y \
git-lfs libxml2-utils \
python3 python3-poetry make
poetry install

- name: Download the large test repo
run: |-
make test-large-init

- name: Combine the results
run: |-
mkdir -p ${{ runner.temp }}/final_results
./dev_scripts/large_tests/merge_results.py \
${{ runner.temp }}/results \
${{ runner.temp }}/final_results/results.xml

- name: Generate a report
run: |-
poetry run python dev_scripts/large_tests/report.py \
${{ runner.temp }}/final_results/results.xml \
| tee ${{ runner.temp }}/final_results/report.txt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to have all the context, but should this report.py be on this repo rather than on the dangerzone-test-set one? 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there actually: https://github.com/freedomofpress/dangerzone-test-set/blob/main/report.py. I can move the merge_large_tests_results.py file there as well, if it makes sense to you.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, having everything at the same place makes sense :-)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to consolidate all the scripts related with large tests under ./dev_scripts/large_tests (see ea3c259). Hopefully this will make things simpler.


- name: Upload final results and report
uses: actions/upload-artifact@v4
with:
name: final-results
path: ${{ runner.temp }}/final_results
if-no-files-found: error
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ since 0.4.1, and this project adheres to [Semantic Versioning](https://semver.or

### Added

- Dangerzone is able to function without a bundled `container.tar` file
- Dangerzone is able to function without a bundled `container.tar` file
([#1400](https://github.com/freedomofpress/dangerzone/pull/1400))
- Look for desktop entries in `XDG_DATA_DIRS` paths on Linux ([#1413](https://github.com/freedomofpress/dangerzone/issues/1413))
- It is now possible to convert another set of documents after a first batch ([#549](https://github.com/freedomofpress/dangerzone/issues/549))
Expand All @@ -25,6 +25,8 @@ since 0.4.1, and this project adheres to [Semantic Versioning](https://semver.or

- Run macOS Intel CI tests only on scheduled/manual runs to reduce PR CI time
([#1338](https://github.com/freedomofpress/dangerzone/issues/1338))
- Run our suite of large tests on a weekly cadence via GitHub actions
([#1142](https://github.com/freedomofpress/dangerzone/issues/1142))


## [0.10.0](https://github.com/freedomofpress/dangerzone/compare/v0.10.0...0.9.1)
Expand Down
21 changes: 16 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
LARGE_TEST_REPO_DIR:=tests/test_docs_large
GIT_DESC=$$(git describe)
GIT_DESC=$$(git describe --always)
JUNIT_FLAGS := --capture=sys -o junit_logging=all
TEST_GROUP_COUNT ?= 1
TEST_GROUP ?= 1
TEST_GROUP_RANDOM_SEED ?= 999999999

.PHONY: lint
lint: ## Check the code for linting, formatting, and typing issues with ruff and mypy
Expand All @@ -19,7 +22,7 @@ test: ## Run the tests
# shared state.
# See more in https://github.com/freedomofpress/dangerzone/issues/493
pytest --co -q tests/gui | grep -e '^tests/' | xargs -n 1 pytest -v
pytest -v --cov --ignore dev_scripts --ignore tests/gui --ignore tests/test_large_set.py
pytest -v --cov --ignore dev_scripts --ignore tests/gui


.PHONY: test-large-requirements
Expand All @@ -33,11 +36,19 @@ test-large-init: test-large-requirements
git submodule update $(LARGE_TEST_REPO_DIR)
cd $(LARGE_TEST_REPO_DIR) && $(MAKE) clone-docs

TEST_LARGE_RESULTS:=$(LARGE_TEST_REPO_DIR)/results/junit/commit_$(GIT_DESC).junit.xml
TEST_LARGE_RESULTS:=$(LARGE_TEST_REPO_DIR)/results/junit/commit_$(GIT_DESC)_$(TEST_GROUP).junit.xml
.PHONY: test-large
test-large: test-large-init ## Run large test set
python -m pytest --tb=no tests/test_large_set.py::TestLargeSet -v $(JUNIT_FLAGS) --junitxml=$(TEST_LARGE_RESULTS)
python $(TEST_LARGE_RESULTS)/report.py $(TEST_LARGE_RESULTS)
DZ_RUN_LARGE_TESTS=1 python -m pytest \
--tb=no \
--test-group-count=$(TEST_GROUP_COUNT) \
--test-group=$(TEST_GROUP) \
--test-group-random-seed=$(TEST_GROUP_RANDOM_SEED) \
--junitxml=$(TEST_LARGE_RESULTS) \
$(JUNIT_FLAGS) \
-v \
tests/test_large_set.py::TestLargeSet
python ./dev_scripts/large_tests/report.py $(TEST_LARGE_RESULTS)

Dockerfile: Dockerfile.env Dockerfile.in ## Regenerate the Dockerfile from its template
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
Expand Down
76 changes: 76 additions & 0 deletions dev_scripts/large_tests/merge_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/env python3

import glob
import sys
import xml.etree.ElementTree as ET


def combine_xmls(xml_files, output_file):
# Initialize accumulators for summary numbers
total_errors = 0
total_failures = 0
total_skipped = 0
total_tests = 0
total_time = 0.0

# Create the root element for the output XML
testsuites_elem = ET.Element("testsuites")

# Prepare a base testsuite element that will contain all testcases
combined_testsuite = ET.Element("testsuite", name="combined")

for xml_file in xml_files:
print(f"Parsing '{xml_file}'")

try:
tree = ET.parse(xml_file)
except ET.ParseError as e:
print(f"Error parsing {xml_file}: {e}")
continue

root = tree.getroot()
# Assuming structure: <testsuites><testsuite ...>
testsuite_elem = root.find("testsuite")
if testsuite_elem is None:
print(f"No <testsuite> element found in {xml_file}")
continue

# Sum up the attributes from each testsuite.
total_errors += int(testsuite_elem.attrib.get("errors", "0"))
total_failures += int(testsuite_elem.attrib.get("failures", "0"))
total_skipped += int(testsuite_elem.attrib.get("skipped", "0"))
total_tests += int(testsuite_elem.attrib.get("tests", "0"))
total_time += float(testsuite_elem.attrib.get("time", "0.0"))

# Move all <testcase> subelements to our combined testsuite.
for testcase in testsuite_elem.findall("testcase"):
combined_testsuite.append(testcase)

# Update the attributes of the combined testsuite
combined_testsuite.attrib["errors"] = str(total_errors)
combined_testsuite.attrib["failures"] = str(total_failures)
combined_testsuite.attrib["skipped"] = str(total_skipped)
combined_testsuite.attrib["tests"] = str(total_tests)
combined_testsuite.attrib["time"] = str(total_time)
# Optionally add a timestamp or hostname you want to combine from the originals.

# Append the combined testsuite to the testsuites root element.
testsuites_elem.append(combined_testsuite)

# Write out the combined XML file.
tree_out = ET.ElementTree(testsuites_elem)
tree_out.write(output_file, encoding="utf-8", xml_declaration=True)
print(f"Combined XML written to {output_file}")


if __name__ == "__main__":
# For instance, if all XML files are stored in the "xml_results" directory
folder = sys.argv[1]
output = sys.argv[2]
print(
f"Will search for XML files in '{folder}' and create a combined XML in"
f" '{output}'"
)
xml_files = glob.glob(f"{folder}/*.xml")
print("Found len(list(xml_files)) XML file(s)")
combine_xmls(xml_files, output)
Loading
Loading