This project aims to make information about committees (kommittéer) and directives (direktiv) for Swedish Government Official Reports (Statens offentliga utredningar, SOU) available as open data. This information is routinely used by many government agencies but also journalists and researchers.
Unfortunately, it is published in a very old-fashioned way by the Swedish government:
- as a list with unstructured metadata on sou.gov.se
- through annual summaries sent from the departments to the parliament every year on the 1st of March
Open Committees downloads the information from these sources and attempts to convert it to structured data to facilitate reuse. Unfortunately, some formatting errors are to be expected. But this should still save many hours to many people.
If you see any error, please report it and let's fix it.
data/pdf: downloaded PDFsdata/md: generated Markdowndata/csv: extracted CSV files
You need Python, I recommend uv to setup the project's dependencies automatically.
Run the full pipeline:
uv run run.pyThis calls the service modules in three steps:
services/download/regeringen.py: download PDFsservices/parser/pdf_parser.py: convert PDFs to Markdownservices/parser/markdown_parser.py: extract CSV data
This writes:
data/csv/committees.csvdata/csv/committee_members.csvdata/csv/publications.csv
The code is licensed as AGPLv3. The data is offentlig handling so CC0.