diff --git a/.gitignore b/.gitignore
index b188e6f0..b619b74b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -102,3 +102,4 @@ tags
 # Tests files
 *.parquet
 test_*.json
+.claude/settings.json
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 8bf85a4b..4cbe1dd6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,4 +1,49 @@
-# Changelog
+﻿# Changelog
+
+# 8.0.0
+
+* **Store backends** (choix du format de stockage des tables)
+  - **io/backends**: Backends HDF5, Parquet et Zarr (abstraction `StoreBackend`) ; `get_backend(name)`, `get_available_backend_names()`, `register_backend()` pour étendre.
+  - **Zarr** : backend optionnel (`pip install openfisca-survey-manager[zarr]`) ; une table = un groupe zarr dans un répertoire `.zarr` par survey.
+  - **Survey** : attribut `zarr_file_path` ; `fill_store(store_format="zarr")` et lecture via `get_values` pour zarr.
+  - **Table** : écriture/lecture et `_is_stored` délégués aux backends ; `_get_store_path_and_format()` unifie les chemins.
+  - **build-collection** : option `--zarr` en plus de `--parquet` ; défaut HDF5 avec avertissement.
+  - **Docs** : `docs/ZARR-BACKEND.md` (utilisation Zarr, compression, parallélisation).
+
+* **Manifest (RFC-002) : store_format**
+  - **manifest.yaml** : clé optionnelle `store_format` (hdf5, parquet, zarr) au niveau dataset ; par défaut `parquet` au chargement.
+  - **SurveyCollection.load** : depuis un manifest, applique `store_format` et déduit les chemins de store (`hdf5_file_path`, `parquet_file_path`, `zarr_file_path`) à partir de `default_output_dir`.
+  - **Script de migration** : infère `store_format` depuis le JSON legacy (`parquet_file_path` / `zarr_file_path` / `hdf5_file_path`) et l’écrit dans le manifest généré.
+  - **RFC-002** : exemple de manifest avec `store_format` ; section 3.5 et 4.2 mises à jour.
+
+# 7.0.0
+
+* **Breaking**: Version 7.0 — retrait des ré-exports et des DeprecationWarning
+  - **Suppression des modules de compatibilité** : `config`, `paths`, `tables`, `surveys`, `survey_collections`, `read_sas`, `read_spss`, `read_dbf`, `calibration`, `calmar`, `utils` sont supprimés. Utiliser les imports canoniques (voir `docs/MIGRATION_IMPORTS.md`).
+  - **`load_table`** : déplacé de `utils` vers `openfisca_survey_manager.core.dataset` (et exporté depuis `core`).
+  - Tous les imports internes ont été migrés vers `configuration.paths`, `configuration.models`, `core.dataset`, `core.survey`, `core.table`, `io.readers`, `processing.weights`, `common.misc`.
+
+* **Typing** (no breaking API changes)
+  - **policy**: Add `py.typed` marker; type hints on `legislation_asof`, `variables`, `coicop`, `matching`, `statshelpers`, `calmar`, `calibration`, `simulation_builder`, `aggregates`, `simulations`, and scenarios (`abstract_scenario`, `reform_scenario`).
+  - **configuration**: Type hints on `Config.__init__` and `save` in `configuration.models`.
+  - **processing**: Type return of `__getattr__` in `processing/__init__.py`.
+
+# 6.5.0
+
+* Typing (no breaking API changes)
+  - **core**: Type hints on `core.table` (Table), `core.survey` (Survey, NoMoreDataError), `core.dataset` (SurveyCollection); `TYPE_CHECKING` for circular refs; class attributes with defaults where needed
+  - **io**: Type hints on `io.readers` (read_sas, read_spss, read_dbf with `Optional[list[str]]` for cols); `io.writers` and `io.hdf` already typed
+  - **processing**: Type hints on `processing.cleaning`, `processing.harmonization`, `processing.weights.calmar` (linear, logit, calmar, check_calmar, etc.), `processing.weights.calibration` (Calibration class and methods)
+  - **Docs**: Update `REFACTORING_PLAN.md` §3.3 (typing core, io, processing done)
+
+# 6.4.0
+
+* Refactor (no breaking API changes)
+  - **io/hdf**: Extract HDF5 write logic into `io.hdf` (`write_table_to_hdf5`); `io.writers` re-exports for compatibility
+  - **processing/harmonization**: Add `harmonize_data_frame_columns` (lowercase, rename ident); used in `Survey.get_values`; export from `processing`
+  - **core**: Add `core.table`, `core.survey`, `core.dataset` (Table, Survey, NoMoreDataError, SurveyCollection); root `tables.py`, `surveys.py`, `survey_collections.py` re-export for compatibility
+  - **Logging**: Extend to all modules — add logger to `configuration.models`, `google_colab`, `statshelpers`; fix typo "folloging" → "following" in `core.table`
+  - **Docs**: Add `docs/MIGRATION_IMPORTS.md` (import mapping and steps when re-exports will be removed, with breaking-change warning); update `REFACTORING_PLAN.md` (§3.4 Logging done)
 
 # 6.3.1
 
diff --git a/docs/MIGRATION_IMPORTS.md b/docs/MIGRATION_IMPORTS.md
new file mode 100644
index 00000000..f6a9da10
--- /dev/null
+++ b/docs/MIGRATION_IMPORTS.md
@@ -0,0 +1,154 @@
+# Migration des imports après retrait des ré-exports
+
+Ce document décrit les changements à effectuer **lorsqu’on retirera les ré-exports** (fichiers de compatibilité) à la racine du package : mise à jour de tous les imports vers les nouveaux chemins, puis suppression des anciens modules.
+
+**Référence** : `docs/REFACTORING_PLAN.md`.
+
+---
+
+## Mise en garde
+
+Le retrait des ré-exports est une **breaking change** : tout code (interne ou externe) qui importe depuis les anciens chemins (`config`, `paths`, `tables`, `surveys`, `survey_collections`, `read_sas`, `read_spss`, `read_dbf`, `calibration`, `calmar`, `utils`) verra ses imports **échouer** (`ModuleNotFoundError`). Il faut migrer tous les imports **avant** de supprimer les fichiers listés en section 3, et documenter le changement dans le CHANGELOG pour les projets dépendants (ex. openfisca-france-data).
+
+---
+
+## 1. Correspondance ancien → nouveau
+
+| Ancien import (à supprimer) | Nouvel import (à utiliser) |
+|-----------------------------|----------------------------|
+| `from openfisca_survey_manager.config import Config` | `from openfisca_survey_manager.configuration.models import Config` |
+| `from openfisca_survey_manager.paths import ...` | `from openfisca_survey_manager.configuration.paths import ...` |
+| `from openfisca_survey_manager.tables import Table` | `from openfisca_survey_manager.core.table import Table` |
+| `from openfisca_survey_manager.surveys import Survey` | `from openfisca_survey_manager.core.survey import Survey` |
+| `from openfisca_survey_manager.surveys import NoMoreDataError` | `from openfisca_survey_manager.core.survey import NoMoreDataError` |
+| `from openfisca_survey_manager.survey_collections import SurveyCollection` | `from openfisca_survey_manager.core.dataset import SurveyCollection` |
+| `from openfisca_survey_manager.read_sas import read_sas` | `from openfisca_survey_manager.io.readers import read_sas` |
+| `from openfisca_survey_manager.read_spss import read_spss` | `from openfisca_survey_manager.io.readers import read_spss` |
+| `from openfisca_survey_manager.read_dbf import read_dbf` | `from openfisca_survey_manager.io.readers import read_dbf` |
+| `from openfisca_survey_manager.calibration import Calibration` | `from openfisca_survey_manager.processing.weights import Calibration` |
+| `from openfisca_survey_manager.calmar import calmar` | `from openfisca_survey_manager.processing.weights import calmar` |
+| `from openfisca_survey_manager.calmar import check_calmar` | `from openfisca_survey_manager.processing.weights import check_calmar` |
+| `from openfisca_survey_manager.utils import do_nothing, load_table, ...` | Voir section 2 (utils) |
+
+**Symboles exportés par `paths`** (même noms dans `configuration.paths`) :
+`config_ini`, `default_config_files_directory`, `is_in_ci`, `openfisca_survey_manager_location`, `private_run_with_data`, `test_config_files_directory`.
+
+**Symboles exportés par `utils`** :
+- Depuis `common.misc` : `asof`, `do_nothing`, `inflate_parameter_leaf`, `inflate_parameters`, `parameters_asof`, `variables_asof`.
+- Définis dans `utils.py` : `load_table` (à déplacer vers un module adapté, ex. `core` ou `io`, avant suppression de `utils.py`).
+
+---
+
+## 2. Fichiers à modifier quand on retire les ré-exports
+
+Avant (ou en même temps que) la suppression des fichiers listés en section 3, mettre à jour les imports dans les fichiers suivants.
+
+### 2.1 Imports depuis `config`, `paths`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `tests/input_dataframe_generator.py` | `paths` → `configuration.paths` (module déplacé dans `tests/`) |
+| `scripts/build_collection.py` | `paths` → `configuration.paths` |
+| `temporary.py` | `paths` → `configuration.paths` |
+| `google_colab.py` | `paths` → `configuration.paths` |
+| `coicop.py` | `paths` → `configuration.paths` |
+| `matching.py` | `paths` → `configuration.paths` |
+| `tests/test_read_sas.py` | `paths` → `configuration.paths` ; `read_sas` → `io.readers` |
+| `tests/test_quantile.py` | `paths` → `configuration.paths` |
+| `tests/test_scenario.py` | `paths` → `configuration.paths` |
+
+### 2.2 Imports depuis `survey_collections`, `surveys`, `tables`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `tests/input_dataframe_generator.py` | `survey_collections`, `surveys` → `core.dataset`, `core.survey` |
+| `simulations.py` | `survey_collections`, `utils` → `core.dataset` ; utils → `common.misc` + module de `load_table` |
+| `utils.py` | `survey_collections` → `core.dataset` (pour `load_table`) |
+| `scripts/build_collection.py` | `survey_collections`, `surveys` → `core.dataset`, `core.survey` |
+| `scenarios/abstract_scenario.py` | `calibration`, `surveys` → `processing.weights`, `core.survey` |
+| `tests/test_surveys.py` | `survey_collections`, `surveys` → `core.dataset`, `core.survey` |
+| `tests/test_coverage_boost.py` | `survey_collections`, `surveys`, `utils` → idem |
+| `tests/test_add_survey_to_collection.py` | `survey_collections` → `core.dataset` |
+| `tests/test_parquet.py` | `survey_collections` → `core.dataset` ; `surveys` (NoMoreDataError) → `core.survey` |
+
+### 2.3 Imports depuis `read_sas`, `read_spss`, `read_dbf`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `core/table.py` | `from openfisca_survey_manager import read_sas` → `from openfisca_survey_manager.io.readers import read_sas` ; `read_sas.read_sas` → `read_sas` dans `reader_by_source_format`. Puis `from openfisca_survey_manager.read_spss import read_spss` → `from openfisca_survey_manager.io.readers import read_spss` (dans le try/except). |
+| `tests/test_read_sas.py` | `from ...paths import ...` → `configuration.paths` ; `from ...read_sas import read_sas` → `from ...io.readers import read_sas` |
+
+### 2.4 Imports depuis `calibration`, `calmar`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `scenarios/abstract_scenario.py` | `calibration` → `processing.weights` |
+| `tests/test_calibration.py` | `calibration` → `processing.weights` |
+| `tests/test_calmar.py` | `calmar` → `processing.weights` |
+
+### 2.5 Imports depuis `utils`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `simulations.py` | `utils.do_nothing`, `utils.load_table` → `common.misc.do_nothing` + module contenant `load_table` |
+| `tests/test_coverage_boost.py` | `utils.do_nothing` → `common.misc.do_nothing` |
+| `tests/test_legislation_inflator.py` | `utils.inflate_parameters`, `parameters_asof` → `common.misc` |
+| `tests/test_tax_benefit_system_asof.py` | `utils.parameters_asof`, `variables_asof` → `common.misc` |
+
+**Note** : `load_table` dépend de `SurveyCollection` ; il doit vivre soit dans un module qui importe `core.dataset`, soit être déplacé (ex. `core.dataset` ou un module `io.loaders`) avant de supprimer `utils.py`.
+
+---
+
+## 3. Fichiers à supprimer (ré-exports)
+
+Une fois tous les imports mis à jour selon les sections 1 et 2, on pourra supprimer les fichiers suivants (ils ne contiennent que des ré-exports) :
+
+- `config.py`
+- `paths.py`
+- `tables.py`
+- `surveys.py`
+- `survey_collections.py`
+- `read_sas.py`
+- `read_spss.py`
+- `read_dbf.py`
+- `calibration.py`
+- `calmar.py`
+- `utils.py` (après déplacement de `load_table` et mise à jour des imports listés en 2.5)
+
+---
+
+## 4. Modules sans ré-export (imports canoniques)
+
+Ces modules n’ont pas de fichier ré-export à la racine ; le code interne les utilise déjà. Pour du code externe ou de la doc, les imports canoniques sont :
+
+| Symbole | Import canonique |
+|---------|------------------|
+| `harmonize_data_frame_columns` | `from openfisca_survey_manager.processing.harmonization import harmonize_data_frame_columns` (ou `from openfisca_survey_manager.processing import harmonize_data_frame_columns`) |
+| `write_table_to_hdf5` | `from openfisca_survey_manager.io.hdf import write_table_to_hdf5` (ou `from openfisca_survey_manager.io.writers import write_table_to_hdf5`) |
+| `write_table_to_parquet` | `from openfisca_survey_manager.io.writers import write_table_to_parquet` |
+
+---
+
+## 5. Package racine `openfisca_survey_manager`
+
+Aujourd’hui le `__init__.py` du package n’expose que les exceptions. Si du code externe fait par exemple `from openfisca_survey_manager import read_sas`, il s’appuie sur le sous-module `read_sas.py`. **Après retrait des ré-exports**, ces chemins d’import ne seront plus valides (échec à l’import) ; les migrer vers `from openfisca_survey_manager.io.readers import read_sas` (voir section 1).
+
+À faire avant ou après la migration : vérifier dans ce dépôt et les projets dépendants (openfisca-france-data, etc.) les imports depuis la racine du package ou depuis les anciens modules listés en section 3.
+
+---
+
+## 6. Ordre recommandé pour la migration
+
+1. **Déplacer `load_table`** vers un module définitif (ex. `core.dataset` ou `io.loaders`) et mettre à jour les appels (section 2.5).
+2. **Mettre à jour tous les imports internes** (section 2) vers les nouveaux chemins, fichier par fichier.
+3. **Lancer la suite de tests** : `pytest` ; corriger les oublis jusqu’à 0 échec.
+4. **Supprimer les fichiers de ré-export** listés en section 3.
+5. **Vérifier les usages externes** (section 5) et documenter les changements dans le CHANGELOG (breaking changes).
+
+---
+
+## 7. Évolutions optionnelles ultérieures
+
+- Renommer le dossier `common/` en `utils/` une fois `utils.py` supprimé (comme prévu dans le plan de refactoring).
+- Renommer `configuration/` en `config/` si on souhaite un nom plus court (en cohérence avec le plan).
+- Ces renommages impliqueront une nouvelle vague de mise à jour des imports (configuration → config, common → utils).
diff --git a/docs/REFACTORING_PLAN.md b/docs/REFACTORING_PLAN.md
index bfd2494d..bd56bf6d 100644
--- a/docs/REFACTORING_PLAN.md
+++ b/docs/REFACTORING_PLAN.md
@@ -35,9 +35,10 @@ openfisca_survey_manager/
 │   └── misc.py            # helpers partagés (éviter imports circulaires)
 │
 ├── scenarios/             # inchangé pour l’instant
+├── policy/                # simulations, simulation_builder, aggregates (à terme autre paquet)
 ├── scripts/
 ├── tests/
-└── ... (simulations, aggregates, etc. à placer selon responsabilité)
+└── ...
 ```
 
 **État actuel** : les dossiers suivants existent avec des `__init__.py` de préparation (pas de code déplacé encore) :
@@ -49,10 +50,12 @@ Le déplacement effectif des modules se fera par étapes pour garder la compatib
 
 **Réalisé** :
 - `io/readers.py` : `read_sas`, `read_spss`, `read_dbf` (anciens modules en ré-export).
-- `common/misc.py` : helpers sans dépendance survey (`do_nothing`, `inflate_parameters`, `asof`, `parameters_asof`, `variables_asof`, `stata_files_to_data_frames`) ; `utils.py` importe depuis `common.misc` et garde `load_table`.
+- `common/misc.py` : helpers sans dépendance survey (`do_nothing`, `inflate_parameters`, `asof`, `parameters_asof`, `variables_asof`) ; `utils.py` importe depuis `common.misc` et garde `load_table`.
 - **Nettoyage** : `print()` remplacés par `logging` (matching, calmar, scenarios, scripts/build_collection, simulations). Exceptions génériques remplacées par `SurveyManagerError` / `SurveyConfigError` / `SurveyIOError` (survey_collections, tables, simulations, simulation_builder, surveys, scenarios, calmar).
 - **processing/weights** : `calmar` et `Calibration` déplacés dans `processing/weights/calmar.py` et `processing/weights/calibration.py` ; `calibration.py` et `calmar.py` à la racine sont des ré-exports pour compatibilité.
 - **processing/cleaning** : `clean_data_frame` déplacé dans `processing/cleaning.py` ; `tables.py` importe depuis `processing.cleaning` (compatibilité conservée).
+- **policy/** : répertoire créé pour `simulations`, `simulation_builder`, `aggregates` (à terme déplacés dans un paquet dédié). Les modules à la racine (`simulations.py`, `simulation_builder.py`, `aggregates.py`) sont des placeholders avec `DeprecationWarning` qui ré-exportent depuis `policy`.
+- **policy/tests/** : tests concernant le paquet policy (test_aggregates, test_compute_aggregate, test_compute_pivot_table, test_compute_winners_losers, test_create_data_frame_by_entity, test_marginal_tax_rate, test_summarize_variables). Ils importent depuis `openfisca_survey_manager.policy` et utilisent `create_randomly_initialized_survey_scenario` depuis `openfisca_survey_manager.tests.test_scenario`.
 
 ---
 
@@ -83,24 +86,26 @@ Aujourd’hui ces couches sont entremêlées (ex. lecture + nettoyage dans `tabl
 
 ### 3.1 Fonctions longues (> 100 lignes)
 
-- Découper les grosses fonctions en étapes nommées, par exemple :
-  - `load_survey()` → `_parse_config()`, `_load_raw_data()`, `_transform()`, `_store()`.
-- Cible : lisibilité et testabilité, sans changer le comportement.
+- **Entamé** : découpage en étapes nommées sans changer le comportement.
+  - `core.table.Table.read_source` → `_read_csv_with_inferred_encoding()`, `_apply_stata_categorical_strategy()` ; `read_source()` orchestre.
+  - `core.survey.Survey.get_values` → `_get_values_from_hdf5()`, `_get_values_from_parquet()` ; `get_values()` orchestre et applique l’harmonisation.
+- À poursuivre : autres modules (simulations, scenarios, scripts, processing/weights/calmar, etc.).
 
 ### 3.2 Dépendances circulaires
 
+- **Vérifié** (imports à froid) : aucune dépendance circulaire. Chaîne cohérente : `exceptions` → `configuration` → `io`/`processing` → `core.table` → `core.survey` → `core.dataset` ; `utils` → `common.misc`, `survey_collections` ; `core.table` n'importe `Survey` qu'en tardif dans `Table.__init__`. Si des cycles apparaissent : extraire la logique commune dans `common/` ou `configuration/`.
 - Si des modules s’importent mutuellement, extraire la logique commune dans `utils/` (ou `config/`) et faire dépendre les deux côtés de ce module commun.
 - Vérifier avec des imports à froid (démarrer l’app et importer les sous-modules).
 
 ### 3.3 Typage Python
 
-- Ajouter progressivement des type hints sur les signatures publiques (arguments et retours).
-- Priorité : `core/`, `io/`, puis `processing/`.
+- **Entamé** : type hints sur les signatures publiques de `core/`, `io/` et `processing/` (cleaning, harmonization, weights/calmar, weights/calibration).
+- À poursuivre : reste du package (scenarios, simulations, etc.).
 
 ### 3.4 Logging
 
-- Remplacer les `print()` par du `logging` structuré (déjà entamé dans matching, calmar).
-- Étendre à tous les modules (readers, writers, calibration, etc.).
+- **Fait** : `print()` remplacés par du `logging` structuré (matching, calmar, scenarios, scripts/build_collection, simulations, readers, writers, calibration, core, processing, etc.).
+- **Fait** : logging étendu à tous les modules métier (configuration/models, google_colab, statshelpers, et l’ensemble des modules concernés).
 
 ### 3.5 Gestion d’erreurs centralisée
 
diff --git a/docs/RFC-001-OPENFISCA-DATA-STACK.md b/docs/RFC-001-OPENFISCA-DATA-STACK.md
new file mode 100644
index 00000000..168b603f
--- /dev/null
+++ b/docs/RFC-001-OPENFISCA-DATA-STACK.md
@@ -0,0 +1,134 @@
+# RFC-001 : OpenFisca Data Stack
+
+**Statut** : Draft  
+**Issue** : [#381](https://github.com/openfisca/openfisca-survey-manager/issues/381)  
+**Auteur(s)** : Équipe OpenFisca  
+**Date** : 2025-01
+
+---
+
+## Résumé
+
+Cette RFC formalise une **stack data OpenFisca** avec rôles et frontières explicites. Elle définit l’évolution cible de l’actuel `openfisca-survey-manager` vers un cœur data réutilisable (`openfisca-data-manager`), la place des dépôts pays (`openfisca-<country>-data`) et celle de la couche analyse (`openfisca-policy-analysis`). Elle sert de référence pour les PR de refactor et les évolutions à venir.
+
+---
+
+## 1. Contexte et motivation
+
+### 1.1 Problème
+
+Aujourd’hui, la gestion des données d’enquête et l’analyse de politique sont fortement couplées dans `openfisca-survey-manager`. Il en résulte :
+
+- une frontière floue entre « accès aux microdata » et « analyse (scénarios, réformes, agrégats) » ;
+- une réutilisation limitée du cœur data en dehors des cas d’usage policy ;
+- une évolution difficile (backend, schémas, reproductibilité) sans impacter toute la stack.
+
+### 1.2 Objectif
+
+Définir une **OpenFisca Data Stack** claire : briques, responsabilités, APIs cibles et règles de dépendance, afin de guider le refactor et les futures évolutions.
+
+---
+
+## 2. Objectifs et non-objectifs
+
+### 2.1 Objectifs
+
+- Séparer conceptuellement (et à terme en code) : **données** (accès, stockage, schémas) vs **analyse** (scénarios, réformes, agrégats).
+- Proposer une API data minimale stable (v1.0) pour l’accès aux microdata.
+- Clarifier le rôle de chaque brique (data-manager, country-data, policy-analysis) et leurs dépendances.
+- Aligner les PR de refactor (survey-manager) et les décisions long terme sur cette vision.
+
+### 2.2 Non-objectifs
+
+- Cette RFC ne fixe pas de calendrier de mise en œuvre ni d’ordre précis de migration.
+- Elle ne détaille pas l’implémentation technique (choix de librairies, formats internes) au-delà des principes et des APIs cibles.
+
+---
+
+## 3. Spécification : OpenFisca Data Stack (cible)
+
+### 3.1 Vue d’ensemble
+
+```
+OpenFisca Data Stack
+├── openfisca-data-manager   (cœur data, pays-agnostique)
+├── openfisca-<country>-data (adaptation microdata → OpenFisca par pays)
+├── openfisca-policy-analysis( scénarios, réformes, agrégats, indicateurs )
+└── OpenFisca Core           (moteur de calcul)
+```
+
+### 3.2 Brique 1 : openfisca-data-manager
+
+**Évolution cible de l’actuel openfisca-survey-manager (cœur data).**
+
+- **Rôle** : brique **universelle**, **indépendante des pays**.
+- **Responsabilités** :
+  - abstraction backend (parquet par défaut, HDF en transition) ;
+  - gestion de datasets versionnés ;
+  - pipeline data (lecture, nettoyage, écriture) ;
+  - validation de schéma ;
+  - métadonnées reproductibles ;
+  - **API stable d’accès aux microdata**.
+- **Ce qu’il ne doit pas faire** :
+  - dépendre d’un tax benefit system ;
+  - connaître des variables OpenFisca ;
+  - contenir de l’analyse policy.
+
+**API cible minimale stable (v1.0)** :
+
+```python
+dataset = DataManager.load("lfs", year=2019)
+df = dataset.to_pandas(columns=["income", "weight"])
+```
+
+Et exposition de :
+
+- `dataset.metadata`
+- `dataset.schema`
+- `dataset.hash`
+
+### 3.3 Brique 2 : openfisca-<country>-data
+
+Exemples : `openfisca-france-data`, `openfisca-tunisia-data`.
+
+- **Rôle** : préparer les **microdata pour ingestion OpenFisca**.
+- **Dépendances** : `openfisca-data-manager`, `openfisca-<country>` (Core).
+- **Responsabilités** :
+  - mapping variables enquête → variables OpenFisca ;
+  - création des entités et périodes ;
+  - validation de cohérence avec le TBS.
+
+**API possible** :
+
+```python
+adapter = CountryDataAdapter(dataset)
+of_input = adapter.to_openfisca_entities()
+```
+
+### 3.4 Brique 3 : openfisca-policy-analysis
+
+Contenu actuel du survey-manager à **migrer ou extraire** dans cette brique (ou un module dédié) :
+
+- survey scenarios (baseline vs reform) ;
+- agrégations pondérées ;
+- indicateurs d’inégalités ;
+- diagnostics.
+
+Cette brique s’appuie sur les microdata (via data-manager ou country-data) et sur OpenFisca Core pour les calculs.
+
+---
+
+## 4. Compatibilité et liaison avec le refactor
+
+- Les PR de **refactor openfisca-survey-manager** (réorganisation, nettoyage, processing/weights, core/io, typage, etc.) restent compatibles avec cette RFC : elles préparent la séparation des couches sans imposer de big-bang.
+- Les évolutions ultérieures (découplage data-manager / policy-analysis, exposition de l’API v1.0) pourront référencer cette RFC (et l’issue #381) comme objectif de long terme.
+- Aucune rupture d’API publique n’est requise à court terme ; la RFC décrit une cible et un cap.
+
+---
+
+## 5. Références
+
+- [Issue #381](https://github.com/openfisca/openfisca-survey-manager/issues/381) (vision Data Stack).
+- `docs/REFACTORING_PLAN.md` (réorganisation interne du survey-manager).
+- `docs/MIGRATION_IMPORTS.md` (migration des imports après retrait des ré-exports).
+- `docs/TICKET_OPENFISCA_DATA_STACK.md` (version ticket originale, à considérer comme remplacée par la présente RFC).
diff --git a/docs/RFC-002-METADATA-AND-CONFIG.md b/docs/RFC-002-METADATA-AND-CONFIG.md
new file mode 100644
index 00000000..e803f9de
--- /dev/null
+++ b/docs/RFC-002-METADATA-AND-CONFIG.md
@@ -0,0 +1,213 @@
+# RFC-002 : Architecture des métadonnées et de la configuration
+
+**Statut** : Implémenté (chargement config.yaml + manifest, compat legacy)  
+**Branche** : feature/backend  
+**Date** : 2025-01
+
+---
+
+## 1. Résumé
+
+Cette RFC propose une architecture **plus simple et plus standard** pour la gestion des métadonnées et des chemins dans openfisca-survey-manager, en s’appuyant sur les conventions XDG, un seul format de configuration par répertoire, et une structure de répertoires prévisible. Elle prévoit une **migration progressive** de l’existant.
+
+---
+
+## 2. État actuel (à migrer)
+
+### 2.1 Où est la config ?
+
+Le répertoire de configuration (« config_files_directory ») est résolu dans `configuration/paths.py` par une **cascade de hacks** :
+
+| Priorité | Condition | Répertoire |
+|----------|-----------|------------|
+| 1 | Package `taxipp` importé et répertoire existe | `taxipp_install/.config/openfisca-survey-manager` |
+| 2 | Package `openfisca_france_data` importé et répertoire existe | `BaseDirectory.save_config_path("openfisca-survey-manager")` → **~/.config/openfisca-survey-manager** |
+| 3 | CI ou pytest | `openfisca_survey_manager/tests/data_files` |
+| 4 | Fallback | `~/.config/openfisca-survey-manager` (XDG) |
+
+Problèmes : ordre dépendant des imports, écriture possible de `config.ini` dans les tests à l’import, assertion si le répertoire n’existe pas.
+
+### 2.2 Fichiers dans le répertoire de config
+
+Aujourd’hui, **deux INI** + des **JSON** externes :
+
+- **config.ini** (obligatoire dans le répertoire)
+  - `[collections]` : `collections_directory` + paires `nom_collection` = chemin vers un fichier JSON.
+  - `[data]` : `output_directory`, `tmp_directory` (et en tests `input_directory`).
+- **raw_data.ini** (utilisé uniquement par le script `build-collection`)
+  - Une section par collection : `[nom_collection]`.
+  - Clés = noms d’enquêtes, valeurs = chemins vers répertoire/fichier de données brutes.
+- **Fichiers JSON** (un par collection, chemin dans config.ini ou sous `collections_directory`)
+  - Contenu : `name`, `label`, `surveys` : { `survey_name` → métadonnées du survey (tables, hdf5_file_path, parquet_file_path, **informations** dont `csv_files`, `sas_files`, etc.) }.
+
+Les métadonnées sont donc réparties entre : config.ini (où trouver les JSON), raw_data.ini (où sont les données brutes, seulement pour build), et les JSON (décriture des surveys, chemins de stockage, listes de fichiers sources). Redondance et deux formats INI différents.
+
+### 2.3 Utilisation dans le code
+
+- **SurveyCollection** : lit `Config(config_files_directory)` → `config.ini` ; get/set `collections` (nom → json_path) ; `config.get("data", "output_directory")` pour `fill_store`.
+- **Survey** : `informations` (dict) contient p.ex. `csv_files`, `sas_files` ; utilisé dans `fill_store` pour savoir quels fichiers lire.
+- **build_collection** : lit `raw_data.ini` pour savoir quels répertoires associer à quelles enquêtes, puis crée/met à jour la collection JSON et les données.
+
+---
+
+## 3. Proposition : architecture cible
+
+### 3.1 Principes
+
+1. **Un seul répertoire de configuration** : XDG uniquement par défaut, ou chemin explicite (variable d’environnement ou argument). Plus de résolution selon `taxipp` / `openfisca_france_data`.
+2. **Un seul fichier de config par répertoire** : tout ce qui est « config globale » (chemins de base, options) dans un seul fichier (voir 3.2).
+3. **Métadonnées des datasets au plus près des données** : un « dataset » (ex-collection) = un répertoire dédié avec un manifeste (metadata) à l’intérieur, plutôt qu’un JSON éclaté référencé par un INI.
+4. **Standard et lisible** : YAML ou INI clair pour la config ; YAML ou JSON pour les manifests (alignement possible avec RFC-001 Data Stack).
+
+### 3.2 Répertoire de configuration (XDG)
+
+**Emplacement par défaut** : `$XDG_CONFIG_HOME/openfisca-survey-manager/` (sinon `~/.config/openfisca-survey-manager/`).
+
+Contenu proposé :
+
+```
+~/.config/openfisca-survey-manager/
+├── config.yaml          # unique fichier de config (remplace config.ini + raw_data.ini pour la partie “où sont les choses”)
+```
+
+**config.yaml** (exemple) :
+
+```yaml
+# Répertoire où sont stockées les collections/datasets (manifests + données dérivées)
+collections_dir: ~/.local/share/openfisca-survey-manager/collections
+
+# Répertoire de sortie par défaut pour build / fill_store (optionnel, peut être overridé par dataset)
+default_output_dir: ~/.local/share/openfisca-survey-manager/output
+
+# Répertoire temporaire (optionnel)
+tmp_dir: /tmp/openfisca-survey-manager
+```
+
+Alternative si on garde l’INI : un seul **config.ini** avec des sections claires, p.ex. :
+
+```ini
+[paths]
+collections_dir = ~/.local/share/openfisca-survey-manager/collections
+default_output_dir = ~/.local/share/openfisca-survey-manager/output
+tmp_dir = /tmp/openfisca-survey-manager
+```
+
+On supprime : `[collections]` avec une entrée par collection (les manifests seront dans chaque dataset, voir 3.3). On supprime **raw_data.ini** : les sources brutes seront décrites dans le manifest du dataset.
+
+### 3.3 Structure d’un dataset (ex-collection)
+
+Un dataset = un répertoire sous `collections_dir` (ou chemin absolu configuré), avec un **manifeste** à l’intérieur :
+
+```
+collections_dir/
+└── erfs/
+    ├── manifest.yaml    # métadonnées du dataset + liste des surveys
+    ├── erfs_2019/       # (optionnel) données dérivées par survey
+    │   ├── data.parquet
+    │   └── ...
+    └── erfs_2020/
+        └── ...
+```
+
+**manifest.yaml** (exemple) :
+
+```yaml
+name: erfs
+label: "Enquête Revenus Fiscaux et Sociaux"
+
+# Backend de stockage des tables (hdf5, parquet, zarr) ; par défaut parquet
+store_format: parquet
+
+# Par survey : sources brutes (remplace raw_data.ini + informations)
+surveys:
+  erfs_2019:
+    label: "ERFS 2019"
+    source:
+      format: sas  # ou csv, stata, parquet
+      path: /data/erfs/2019   # répertoire ou fichier
+    # optionnel : chemins de sortie relatifs au dataset
+    output_subdir: erfs_2019
+
+  erfs_2020:
+    label: "ERFS 2020"
+    source:
+      format: parquet
+      path: /data/erfs/2020
+    output_subdir: erfs_2020
+```
+
+Cela remplace : la section `[erfs]` de raw_data.ini + la partie « informations » (csv_files, sas_files, …) dans le JSON de collection. Un seul endroit pour « où sont les données brutes » et « où écrire les sorties ».
+
+Pour la rétrocompatibilité, on peut prévoir un **adaptateur** qui lit l’ancien JSON + raw_data.ini et produit (ou expose) un équivalent manifest.
+
+### 3.4 Résolution du répertoire de config (simplifiée)
+
+- **Valeur explicite** : toujours possible de passer `config_dir` (ou `config_files_directory`) en argument aux APIs et au CLI.
+- **Par défaut** : `os.environ.get("OPENFISCA_SURVEY_CONFIG_DIR")` ou `xdg_config_home() / "openfisca-survey-manager"`.
+- **Tests** : répertoire dédié (ex. `tests/data_files`) fourni explicitement par les tests ; plus d’effet de bord à l’import (plus d’écriture de config.ini au chargement de `paths`).
+
+On **ne** résout plus le répertoire en fonction de la présence de `taxipp` ou `openfisca_france_data`. Les projets (france-data, taxipp) peuvent :
+- soit définir `OPENFISCA_SURVEY_CONFIG_DIR` vers leur répertoire,
+- soit passer le chemin de config à chaque appel.
+
+### 3.5 Backends de stockage (store)
+
+Le stockage des tables d’enquête peut s’effectuer via différents **backends** (choix au build / `fill_store`) :
+
+| Backend  | Format              | Usage                                      |
+|----------|---------------------|--------------------------------------------|
+| **hdf5** | Un fichier .h5      | Historique (déprécié à terme)              |
+| **parquet** | Répertoire, un .parquet par table | Recommandé (interop, colonnes) |
+| **zarr** | Répertoire .zarr, un groupe par table | Optionnel (dépendance `[zarr]`)     |
+
+- **API** : `io.backends.get_backend(name)`, `get_available_backend_names()`, `register_backend(name, backend)` pour étendre.
+- **CLI** : `build-collection --parquet` ou `build-collection --zarr` ; par défaut HDF5 (avec avertissement).
+- **Survey** : `store_format`, `hdf5_file_path` / `parquet_file_path` / `zarr_file_path` selon le backend.
+- **Zarr (compression, parallélisation)** : voir [docs/ZARR-BACKEND.md](ZARR-BACKEND.md).
+
+### 3.6 API cible (alignement RFC-001)
+
+- Charger un dataset par nom : `DataManager.load("erfs", config_dir=...)` → lit `collections_dir/erfs/manifest.yaml` et les données associées.
+- Accès aux métadonnées : `dataset.metadata` (provenant du manifest), `dataset.schema` (si on l’expose), chemins dérivés déterministes à partir de `collections_dir` + `name` + `output_subdir`.
+
+On garde une compatibilité avec l’API actuelle « SurveyCollection.load(collection=...) » pendant la transition, en faisant que cette API s’appuie en interne sur la nouvelle config + manifests (éventuellement via un bridge depuis l’ancien JSON).
+
+---
+
+## 4. Migration de l’existant
+
+### 4.1 Conserver l’existant en parallèle
+
+- Garder la lecture de **config.ini** et **raw_data.ini** tant que la nouvelle config n’est pas présente.
+- Si `config.yaml` (ou le nouveau config.ini [paths]) existe dans le répertoire de config : utiliser la nouvelle structure (manifests sous `collections_dir`).
+- Sinon : comportement actuel (config.ini [collections] + [data], raw_data.ini, JSON externes).
+
+### 4.2 Script de migration
+
+Un script permet de migrer l’existant vers la nouvelle structure :
+
+- **Emplacement** : `openfisca_survey_manager.scripts.migrate_config_to_rfc002`
+- **Usage** :
+  ```bash
+  python -m openfisca_survey_manager.scripts.migrate_config_to_rfc002 [--config-dir PATH] [--dry-run] [-v]
+  ```
+- **Comportement** : lit `config.ini` ([collections] + [data]) et, si présent, `raw_data.ini` ; pour chaque collection, charge le JSON, déduit `source.format` et `source.path` à partir de `informations` (csv_files, sas_files, etc.) ou de la section correspondante de raw_data.ini ; **infère `store_format`** (parquet, hdf5 ou zarr) à partir des champs `parquet_file_path` / `zarr_file_path` / `hdf5_file_path` des surveys du JSON legacy, et l’écrit dans le manifest ; crée `config.yaml` et `collections_dir/<name>/manifest.yaml` pour chaque collection. Avec `--dry-run`, n’écrit aucun fichier.
+- **Répertoire de config par défaut** : celui retourné par `get_config_dir()` (env `OPENFISCA_SURVEY_CONFIG_DIR` ou XDG). On peut imposer un répertoire avec `--config-dir`.
+
+### 4.3 Dépréciation
+
+- À terme : annoncer comme dépréciés `config.ini` [collections] (mapping nom → JSON), `raw_data.ini`, et les JSON de collection « à l’ancienne ». Documenter la migration dans MIGRATION_IMPORTS.md ou un nouveau MIGRATION_CONFIG.md.
+
+---
+
+## 5. Résumé des changements proposés
+
+| Actuel | Cible |
+|--------|--------|
+| Résolution config par taxipp / france_data / CI / XDG | XDG ou env `OPENFISCA_SURVEY_CONFIG_DIR` ou argument explicite |
+| config.ini + raw_data.ini | Un seul fichier (config.yaml ou config.ini [paths]) |
+| JSON de collection hors répertoire, référencé par config | Manifest (YAML/JSON) par dataset dans `collections_dir/<name>/manifest.yaml` |
+| Sources brutes dans raw_data.ini + informations (JSON) | Sources dans le manifest du dataset (`surveys.*.source`) |
+| Écriture config.ini au chargement des paths (tests) | Plus d’écriture à l’import ; tests passent un config_dir explicite |
+
+Cela donne une architecture **plus simple** (un format, un lieu par dataset), **plus standard** (XDG, chemins explicites), et **migrable** en gardant l’ancien comportement tant que la nouvelle config n’est pas en place.
diff --git a/docs/TICKET_OPENFISCA_DATA_STACK.md b/docs/TICKET_OPENFISCA_DATA_STACK.md
index fac3c70b..9939c187 100644
--- a/docs/TICKET_OPENFISCA_DATA_STACK.md
+++ b/docs/TICKET_OPENFISCA_DATA_STACK.md
@@ -1,4 +1,6 @@
-# 🏗️ Vision : OpenFisca Data Stack officielle
+# 🏗️ Vision : OpenFisca Data Stack officielle ([issue #381](https://github.com/openfisca/openfisca-survey-manager/issues/381))
+
+> **Voir la version RFC** : [RFC-001 : OpenFisca Data Stack](RFC-001-OPENFISCA-DATA-STACK.md). Le présent document conserve la formulation initiale du ticket ; la RFC en fixe la forme normative.
 
 **Objectif** : Formaliser une stack data OpenFisca claire, avec rôles et frontières bien définis. Ce ticket sert de référence pour les PR de refactor (survey-manager → data-manager, découplage, etc.) et les évolutions à venir.
 
diff --git a/docs/ZARR-BACKEND.md b/docs/ZARR-BACKEND.md
new file mode 100644
index 00000000..f375f609
--- /dev/null
+++ b/docs/ZARR-BACKEND.md
@@ -0,0 +1,126 @@
+# Utiliser Zarr avec OpenFisca Survey Manager
+
+Ce document explique **si et comment** utiliser le backend Zarr pour stocker les enquêtes, et ce qu’il en est de la **compression** et de la **parallélisation** en lecture/écriture.
+
+---
+
+## 1. Utiliser Zarr avec OpenFisca
+
+### Oui, c’est possible
+
+Le backend **zarr** est disponible dans `openfisca-survey-manager` à condition d’installer la dépendance optionnelle :
+
+```bash
+pip install openfisca-survey-manager[zarr]
+# ou
+pip install openfisca-survey-manager zarr numcodecs
+```
+
+(pandas 2.x utilise `to_zarr` / `read_zarr` ; le package **zarr** est requis.)
+
+### En ligne de commande (build-collection)
+
+Pour construire une collection en stockant les tables au format Zarr :
+
+```bash
+build-collection -c ma_collection --zarr
+```
+
+Sans `--zarr`, le format par défaut reste HDF5 (avec avertissement) ou vous pouvez utiliser `--parquet`.
+
+### En Python (fill_store)
+
+```python
+from openfisca_survey_manager.core.dataset import SurveyCollection
+
+col = SurveyCollection.load(collection="ma_collection", config_files_directory="...")
+col.fill_store(
+    source_format="sas",   # ou csv, parquet, etc.
+    store_format="zarr",
+)
+```
+
+Après cela, chaque survey a un répertoire `{output}/{survey.name}.zarr`, et chaque table est un **groupe zarr** (sous-répertoire) dans ce store. La lecture se fait comme d’habitude avec `survey.get_values(table=..., variables=...)` ; le code utilise automatiquement le backend zarr si `store_format == "zarr"`.
+
+### Vérifier que Zarr est disponible
+
+```python
+from openfisca_survey_manager.io.backends import get_available_backend_names, get_backend
+
+print(get_available_backend_names())  # doit contenir "zarr" si le package est installé
+backend = get_backend("zarr")         # lève ValueError si zarr absent
+```
+
+---
+
+## 2. Compression
+
+### Comportement actuel
+
+Dans l’implémentation actuelle, l’écriture Zarr passe par `pandas.DataFrame.to_zarr(path, mode="w")` **sans options de compression explicites**. Zarr/pandas peuvent donc utiliser un comportement par défaut (par ex. compression légère ou aucune selon les versions).
+
+### Ce que Zarr permet en général
+
+Zarr gère la compression **par blocs (chunks)** via **numcodecs**. On peut utiliser par exemple :
+
+- **Blosc** (LZ4, Zstd, Zlib) : bon compromis vitesse / ratio, très utilisé
+- **Zstd** : bon ratio, décompression rapide
+- **LZ4** : très rapide, ratio moindre
+- **Gzip** : standard, plus lent
+
+Ces options se configurent au moment de la **création** du tableau zarr (compressor, chunks). Avec **pandas** :
+
+- `df.to_zarr(path, ...)` peut accepter des arguments supplémentaires passés au store zarr sous-jacent (selon la version de pandas).
+- Pour un contrôle fin (compression, chunking), on peut créer soi‑même un store zarr avec le bon `compressor` puis y écrire les colonnes, ou étendre le backend (voir ci‑dessous).
+
+### Évolution possible dans le survey-manager
+
+On peut faire évoluer le backend Zarr pour accepter des options (compression, chunks) soit :
+
+- via des **kwargs** dans `fill_store(..., store_format="zarr", **zarr_options)` transmis à `to_zarr`,  
+- soit via la **config** (manifest ou config.yaml) pour définir un compressor par défaut pour le format zarr.
+
+Aujourd’hui, si vous avez besoin d’une compression précise, vous pouvez :
+
+1. **Enregistrer un backend personnalisé** (`register_backend`) qui appelle `to_zarr` avec le `compressor` (et éventuellement les chunks) de votre choix.
+2. Ou **post‑traiter** les répertoires `.zarr` générés (ré‑écriture avec d’autres options zarr) en dehors du survey-manager.
+
+---
+
+## 3. Parallélisation lecture / écriture
+
+### Zarr en général
+
+- **Parallélisme par blocs** : Zarr est conçu pour que des **chunks différents** puissent être lus ou écrits en parallèle sans verrou global (chaque chunk est indépendant).
+- **En Python** : le **GIL** limite le gain avec des threads pour la partie compression/décompression ; le parallélisme efficace passe souvent par **multi‑processus** ou des runtimes qui libèrent le GIL (Cython, C extensions utilisées par numcodecs/blosc).
+- **Goulot d’étranglement** : en pratique, la **compression/décompression** peut saturer le CPU (~1 GB/s) alors que le disque ou le réseau peuvent aller plus vite ; des évolutions (batch encode/decode, GPU) sont en cours dans l’écosystème zarr.
+
+### Dans le survey-manager aujourd’hui
+
+- **Écriture** : `fill_store(store_format="zarr")` appelle `to_zarr` pour chaque table, de façon **séquentielle** (une table après l’autre, pas de parallélisation interne exposée).
+- **Lecture** : `get_values()` utilise `read_zarr` pour une table donnée, également de façon **séquentielle** par appel.
+
+Donc **par défaut** : pas de parallélisation multi‑tables ni multi‑chunks exposée dans l’API actuelle.
+
+### Comment paralléliser quand même
+
+1. **Plusieurs tables / plusieurs surveys**  
+   Vous pouvez paralléliser vous‑même au niveau applicatif : lancer plusieurs processus ou threads qui appellent `fill_store` (ou `get_values`) sur des collections/surveys/tables différents ; chaque processus écrira/lira ses propres fichiers ou groupes zarr sans conflit.
+
+2. **Dask**  
+   Pour des tableaux zarr, **Dask** (dask.array, ou chargement des zarr en Dask) gère le chargement parallèle par chunks. Cela ne passe pas directement par l’API Survey/SurveyCollection actuelle : il faudrait soit exporter les chemins `.zarr` puis les ouvrir avec Dask, soit ajouter une couche d’intégration (p.ex. une fonction qui retourne un Dask DataFrame à partir d’un survey zarr).
+
+3. **Évolution du backend**  
+   On pourrait ajouter plus tard un mode « écriture parallèle par table » (threads/processes) ou une option de lecture qui retourne un objet Dask pour exploiter le parallélisme par chunks côté zarr.
+
+---
+
+## 4. Résumé pratique
+
+| Question | Réponse |
+|----------|--------|
+| **Utiliser Zarr avec OpenFisca ?** | Oui : `pip install openfisca-survey-manager[zarr]`, puis `build-collection --zarr` ou `fill_store(store_format="zarr")`. |
+| **Compression ?** | Par défaut : comportement zarr/pandas (souvent léger). Pour plus de contrôle : backend personnalisé avec `to_zarr(..., compressor=...)` ou post‑traitement des stores zarr. |
+| **Parallélisation lecture/écriture ?** | Pas exposée dans l’API actuelle (une table à la fois). Parallélisme possible : vous-même sur plusieurs tables/surveys, ou en utilisant Dask sur les chemins zarr générés. |
+
+Si vous voulez, on peut détailler une proposition d’API pour passer des options de compression (et éventuellement de chunking) au backend Zarr dans `fill_store` ou dans la config.
diff --git a/openfisca_survey_manager/calibration.py b/openfisca_survey_manager/calibration.py
deleted file mode 100644
index 1670538c..00000000
--- a/openfisca_survey_manager/calibration.py
+++ /dev/null
@@ -1,8 +0,0 @@
-"""Re-export for backward compatibility.
-
-Prefer: from openfisca_survey_manager.processing.weights import Calibration.
-"""
-
-from openfisca_survey_manager.processing.weights import Calibration
-
-__all__ = ["Calibration"]
diff --git a/openfisca_survey_manager/calmar.py b/openfisca_survey_manager/calmar.py
deleted file mode 100644
index 77c2ba53..00000000
--- a/openfisca_survey_manager/calmar.py
+++ /dev/null
@@ -1,5 +0,0 @@
-"""Re-export for backward compatibility. Prefer: from openfisca_survey_manager.processing.weights import calmar."""
-
-from openfisca_survey_manager.processing.weights import calmar, check_calmar
-
-__all__ = ["calmar", "check_calmar"]
diff --git a/openfisca_survey_manager/common/__init__.py b/openfisca_survey_manager/common/__init__.py
index 5ca887c8..cc8c0061 100644
--- a/openfisca_survey_manager/common/__init__.py
+++ b/openfisca_survey_manager/common/__init__.py
@@ -1,13 +1,9 @@
-# Target: shared helpers to avoid circular imports (from utils.py, paths, etc.).
-# Final name will be utils/ once utils.py is migrated. See docs/REFACTORING_PLAN.md.
-
-from openfisca_survey_manager.common.misc import (
+from openfisca_survey_manager.policy.legislation_asof import (  # Backward-compat imports
     asof,
     do_nothing,
     inflate_parameter_leaf,
     inflate_parameters,
     parameters_asof,
-    stata_files_to_data_frames,
     variables_asof,
 )
 
@@ -17,6 +13,5 @@
     "inflate_parameter_leaf",
     "inflate_parameters",
     "parameters_asof",
-    "stata_files_to_data_frames",
     "variables_asof",
 ]
diff --git a/openfisca_survey_manager/common/misc.py b/openfisca_survey_manager/common/misc.py
index 69ce1400..fe95bf70 100644
--- a/openfisca_survey_manager/common/misc.py
+++ b/openfisca_survey_manager/common/misc.py
@@ -1,251 +1,24 @@
-"""Shared helpers (no survey collection dependency) to avoid circular imports."""
-
-import logging
-from pathlib import Path
-
-import pandas as pd
-from openfisca_core import periods
-from openfisca_core.parameters import ParameterNode, Scale
-
-log = logging.getLogger(__name__)
-
-
-def do_nothing(*args, **kwargs):
-    return None
-
-
-def inflate_parameters(
-    parameters,
-    inflator,
-    base_year,
-    last_year=None,
-    ignore_missing_units=False,
-    start_instant=None,
-    round_ndigits=2,
-):
-    """
-    Inflate a Parameter node or a Parameter leaf for the years between base_year and last_year.
-
-    ::parameters:: a Parameter node or a Parameter leaf
-    ::inflator:: rate used to inflate the parameter. The rate is unique for all the years
-    ::base_year:: base year of the parameter
-    ::last_year:: last year of inflation
-    ::ignore_missing_units:: if True, a parameter leaf without unit in metadata will not be inflated
-    ::start_instant:: Instant of the year when the update should start, if None will be January 1st
-    ::round_ndigits:: Number of digits to keep in the rounded result
-    """
-    if (last_year is not None) and (last_year > base_year + 1):
-        for year in range(base_year + 1, last_year + 1):
-            inflate_parameters(
-                parameters,
-                inflator,
-                year - 1,
-                last_year=year,
-                ignore_missing_units=ignore_missing_units,
-                start_instant=start_instant,
-                round_ndigits=round_ndigits,
-            )
-    else:
-        if last_year is None:
-            last_year = base_year + 1
-
-        assert last_year == base_year + 1
-
-        if isinstance(parameters, ParameterNode):
-            for sub_parameter in parameters.children.values():
-                inflate_parameters(
-                    sub_parameter,
-                    inflator,
-                    base_year,
-                    last_year,
-                    ignore_missing_units=ignore_missing_units,
-                    start_instant=start_instant,
-                    round_ndigits=round_ndigits,
-                )
-        else:
-            acceptable_units = [
-                "rate_unit",
-                "threshold_unit",
-                "unit",
-            ]
-            if ignore_missing_units:
-                if not hasattr(parameters, "metadata"):
-                    return
-                if not bool(set(parameters.metadata.keys()) & set(acceptable_units)):
-                    return
-            assert hasattr(parameters, "metadata"), f"{parameters.name} doesn't have metadata"
-            unit_types = set(parameters.metadata.keys()).intersection(set(acceptable_units))
-            assert unit_types, (
-                f"No admissible unit in metadata for parameter {parameters.name}. You may consider using "
-                "the option 'ignore_missing_units' from the inflate_parameters() function."
-            )
-            if len(unit_types) > 1:
-                assert unit_types == {"threshold_unit", "rate_unit"}, (
-                    f"Too much admissible units in metadata for parameter {parameters.name}"
-                )
-            unit_by_type = {unit_type: parameters.metadata[unit_type] for unit_type in unit_types}
-            for unit_type in unit_by_type:
-                if parameters.metadata[unit_type].startswith("currency"):
-                    inflate_parameter_leaf(
-                        parameters,
-                        base_year,
-                        inflator,
-                        unit_type=unit_type,
-                        start_instant=start_instant,
-                        round_ndigits=round_ndigits,
-                    )
-
-
-def inflate_parameter_leaf(sub_parameter, base_year, inflator, unit_type="unit", start_instant=None, round_ndigits=2):
-    """
-    Inflate a Parameter leaf according to unit type for the year after base_year.
-
-    ::sub_parameter:: a Parameter leaf
-    ::base_year:: base year of the parameter
-    ::inflator:: rate used to inflate the parameter
-    ::unit_type:: unit supposed by default. Other admissible unit types are threshold_unit and rate_unit
-    ::start_instant:: Instant of the year when the update should start, if None will be January 1st
-    ::round_ndigits:: Number of digits to keep in the rounded result
-    """
-    if isinstance(sub_parameter, Scale):
-        if unit_type == "threshold_unit":
-            for bracket in sub_parameter.brackets:
-                threshold = bracket.children["threshold"]
-                inflate_parameter_leaf(
-                    threshold, base_year, inflator, start_instant=start_instant, round_ndigits=round_ndigits
-                )
-            return
-    else:
-        kept_instants_str = [
-            parameter_at_instant.instant_str
-            for parameter_at_instant in sub_parameter.values_list
-            if periods.instant(parameter_at_instant.instant_str).year <= base_year
-        ]
-        if not kept_instants_str:
-            return
-
-        last_admissible_instant_str = max(kept_instants_str)
-        sub_parameter.update(start=last_admissible_instant_str, value=sub_parameter(last_admissible_instant_str))
-        if start_instant is not None:
-            assert periods.instant(start_instant).year == (base_year + 1), (
-                "Year of start_instant should be base_year + 1"
-            )
-            value = (
-                round(sub_parameter(f"{base_year}-12-31") * (1 + inflator), round_ndigits)
-                if sub_parameter(f"{base_year}-12-31") is not None
-                else None
-            )
-            sub_parameter.update(
-                start=start_instant,
-                value=value,
-            )
-        else:
-            restricted_to_base_year_value_list = [
-                parameter_at_instant
-                for parameter_at_instant in sub_parameter.values_list
-                if periods.instant(parameter_at_instant.instant_str).year == base_year
-            ]
-            if restricted_to_base_year_value_list:
-                for parameter_at_instant in reversed(restricted_to_base_year_value_list):
-                    if parameter_at_instant.instant_str.startswith(str(base_year)):
-                        value = (
-                            round(parameter_at_instant.value * (1 + inflator), round_ndigits)
-                            if parameter_at_instant.value is not None
-                            else None
-                        )
-                        sub_parameter.update(
-                            start=parameter_at_instant.instant_str.replace(str(base_year), str(base_year + 1)),
-                            value=value,
-                        )
-            else:
-                value = (
-                    round(sub_parameter(f"{base_year}-12-31") * (1 + inflator), round_ndigits)
-                    if sub_parameter(f"{base_year}-12-31") is not None
-                    else None
-                )
-                sub_parameter.update(
-                    start=f"{base_year + 1}-01-01",
-                    value=value,
-                )
-
-
-def asof(tax_benefit_system, instant):
-    parameters = tax_benefit_system.parameters
-    parameters_asof(parameters, instant)
-    variables_asof(tax_benefit_system, instant)
-
-
-def leaf_asof(sub_parameter, instant):
-    kept_instants_str = [
-        parameter_at_instant.instant_str
-        for parameter_at_instant in sub_parameter.values_list
-        if periods.instant(parameter_at_instant.instant_str) <= instant
-    ]
-    if not kept_instants_str:
-        sub_parameter.values_list = []
-        return
-
-    last_admissible_instant_str = max(kept_instants_str)
-    sub_parameter.update(start=last_admissible_instant_str, value=sub_parameter(last_admissible_instant_str))
-
-
-def parameters_asof(parameters, instant):
-    if isinstance(instant, str):
-        instant = periods.instant(instant)
-    assert isinstance(instant, periods.Instant)
-
-    for sub_parameter in parameters.children.values():
-        if isinstance(sub_parameter, ParameterNode):
-            parameters_asof(sub_parameter, instant)
-        else:
-            if isinstance(sub_parameter, Scale):
-                for bracket in sub_parameter.brackets:
-                    threshold = bracket.children["threshold"]
-                    rate = bracket.children.get("rate")
-                    amount = bracket.children.get("amount")
-                    leaf_asof(threshold, instant)
-                    if rate:
-                        leaf_asof(rate, instant)
-                    if amount:
-                        leaf_asof(amount, instant)
-            else:
-                leaf_asof(sub_parameter, instant)
-
-
-def variables_asof(tax_benefit_system, instant, variables_list=None):
-    if isinstance(instant, str):
-        instant = periods.instant(instant)
-    assert isinstance(instant, periods.Instant)
-
-    if variables_list is None:
-        variables_list = tax_benefit_system.variables.keys()
-
-    for variable_name, variable in tax_benefit_system.variables.items():
-        if variable_name in variables_list:
-            formulas = variable.formulas
-            for instant_str in list(formulas.keys()):
-                if periods.instant(instant_str) > instant:
-                    del formulas[instant_str]
-
-            if variable.end is not None and periods.instant(variable.end) >= instant:
-                variable.end = None
-
-
-def stata_files_to_data_frames(data, period=None):
-    assert period is not None
-    period = periods.period(period)
-
-    stata_file_by_entity = data.get("stata_file_by_entity")
-    if stata_file_by_entity is None:
-        return
-
-    variables_from_stata_files = []
-    input_data_frame_by_entity_by_period = {}
-    input_data_frame_by_entity_by_period[periods.period(period)] = input_data_frame_by_entity = {}
-    for entity, file_path in stata_file_by_entity.items():
-        assert Path(file_path).exists(), f"Invalid file path: {file_path}"
-        entity_data_frame = input_data_frame_by_entity[entity] = pd.read_stata(file_path)
-        variables_from_stata_files += list(entity_data_frame.columns)
-    data["input_data_frame_by_entity_by_period"] = input_data_frame_by_entity_by_period
-
-    return variables_from_stata_files
+"""Backward-compatibility wrapper for legislation helpers.
+
+Use ``openfisca_survey_manager.policy.legislation_asof`` as canonical import path.
+"""
+
+from openfisca_survey_manager.policy.legislation_asof import (
+    asof,
+    do_nothing,
+    inflate_parameter_leaf,
+    inflate_parameters,
+    leaf_asof,
+    parameters_asof,
+    variables_asof,
+)
+
+__all__ = [
+    "asof",
+    "do_nothing",
+    "inflate_parameter_leaf",
+    "inflate_parameters",
+    "leaf_asof",
+    "parameters_asof",
+    "variables_asof",
+]
diff --git a/openfisca_survey_manager/config.py b/openfisca_survey_manager/config.py
deleted file mode 100644
index 2e09a43c..00000000
--- a/openfisca_survey_manager/config.py
+++ /dev/null
@@ -1,8 +0,0 @@
-"""Re-export for backward compatibility.
-
-Prefer: from openfisca_survey_manager.configuration.models import Config.
-"""
-
-from openfisca_survey_manager.configuration.models import Config
-
-__all__ = ["Config"]
diff --git a/openfisca_survey_manager/configuration/__init__.py b/openfisca_survey_manager/configuration/__init__.py
index b8cab758..137908e1 100644
--- a/openfisca_survey_manager/configuration/__init__.py
+++ b/openfisca_survey_manager/configuration/__init__.py
@@ -1,6 +1,11 @@
 # Config and paths; config.py and paths.py re-export for compatibility.
-# See docs/REFACTORING_PLAN.md.
+# See docs/REFACTORING_PLAN.md. RFC-002: config_loader for config.yaml + manifest.
 
+from openfisca_survey_manager.configuration.config_loader import (
+    get_config_dir,
+    load_config,
+    load_manifest,
+)
 from openfisca_survey_manager.configuration.models import Config
 from openfisca_survey_manager.configuration.paths import (
     config_ini,
@@ -15,7 +20,10 @@
     "Config",
     "config_ini",
     "default_config_files_directory",
+    "get_config_dir",
     "is_in_ci",
+    "load_config",
+    "load_manifest",
     "openfisca_survey_manager_location",
     "private_run_with_data",
     "test_config_files_directory",
diff --git a/openfisca_survey_manager/configuration/config_loader.py b/openfisca_survey_manager/configuration/config_loader.py
new file mode 100644
index 00000000..44c982cf
--- /dev/null
+++ b/openfisca_survey_manager/configuration/config_loader.py
@@ -0,0 +1,97 @@
+"""RFC-002: New config and manifest loading (YAML-based)."""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Any, Optional
+
+import yaml
+from xdg import BaseDirectory
+
+log = logging.getLogger(__name__)
+
+CONFIG_FILENAME = "config.yaml"
+MANIFEST_FILENAME = "manifest.yaml"
+ENV_CONFIG_DIR = "OPENFISCA_SURVEY_CONFIG_DIR"
+
+
+def get_config_dir(explicit: Optional[Path | str] = None) -> Path:
+    """Return config directory: explicit path, or env OPENFISCA_SURVEY_CONFIG_DIR, or XDG."""
+    if explicit is not None:
+        return Path(explicit).expanduser().resolve()
+    import os
+
+    env_path = os.environ.get(ENV_CONFIG_DIR)
+    if env_path:
+        return Path(env_path).expanduser().resolve()
+    return Path(BaseDirectory.save_config_path("openfisca-survey-manager"))
+
+
+def load_config(config_dir: Path) -> Optional[dict[str, Any]]:
+    """
+    Load new-style config from config_dir/config.yaml.
+    Returns dict with collections_dir, default_output_dir, tmp_dir (paths expanded),
+    or None if config.yaml is missing or invalid.
+    """
+    config_path = config_dir / CONFIG_FILENAME
+    if not config_path.is_file():
+        return None
+    try:
+        with config_path.open() as f:
+            data = yaml.safe_load(f)
+    except Exception as e:
+        log.warning("Failed to load %s: %s", config_path, e)
+        return None
+    if not data or not isinstance(data, dict):
+        return None
+    collections_dir = data.get("collections_dir")
+    if not collections_dir:
+        return None
+    out = {
+        "collections_dir": Path(collections_dir).expanduser().resolve(),
+        "default_output_dir": Path(data.get("default_output_dir", ".")).expanduser().resolve(),
+        "tmp_dir": Path(data.get("tmp_dir", "/tmp")).expanduser().resolve(),
+    }
+    return out
+
+
+def load_manifest(collections_dir: Path, name: str) -> Optional[dict[str, Any]]:
+    """
+    Load dataset manifest from collections_dir/name/manifest.yaml.
+    Returns manifest dict (name, label, surveys) or None if missing.
+    """
+    manifest_path = collections_dir / name / MANIFEST_FILENAME
+    if not manifest_path.is_file():
+        return None
+    try:
+        with manifest_path.open() as f:
+            data = yaml.safe_load(f)
+    except Exception as e:
+        log.warning("Failed to load manifest %s: %s", manifest_path, e)
+        return None
+    if not data or not isinstance(data, dict) or "surveys" not in data:
+        return None
+    return data
+
+
+def manifest_survey_to_json(survey_name: str, entry: dict[str, Any]) -> dict[str, Any]:
+    """
+    Convert a manifest survey entry to the dict shape expected by Survey.create_from_json.
+    entry: { label?, source: { format, path }, output_subdir? }
+    """
+    source = entry.get("source") or {}
+    fmt = source.get("format", "csv")
+    path = source.get("path", "")
+    # Survey expects e.g. csv_files, sas_files list in informations
+    files_key = f"{fmt}_files"
+    informations = {files_key: [path] if path else []}
+    return {
+        "name": survey_name,
+        "label": entry.get("label", survey_name),
+        "hdf5_file_path": None,
+        "parquet_file_path": None,
+        "zarr_file_path": None,
+        "tables": entry.get("tables"),
+        "informations": informations,
+    }
diff --git a/openfisca_survey_manager/configuration/models.py b/openfisca_survey_manager/configuration/models.py
index 571a6ac8..7a52d399 100644
--- a/openfisca_survey_manager/configuration/models.py
+++ b/openfisca_survey_manager/configuration/models.py
@@ -1,25 +1,36 @@
 """Configuration model (Config class from config.ini)."""
 
+from __future__ import annotations
+
 import configparser
+import logging
 from pathlib import Path
+from typing import Optional, Union
+
+log = logging.getLogger(__name__)
 
 
 class Config(configparser.ConfigParser):
     """Parser for config.ini; used by SurveyCollection and build scripts."""
 
-    config_ini = None
+    config_ini: Optional[Path] = None
 
-    def __init__(self, config_files_directory=None):
+    def __init__(
+        self,
+        config_files_directory: Optional[Union[Path, str]] = None,
+    ) -> None:
         configparser.ConfigParser.__init__(self)
         if config_files_directory is not None:
             config_ini = Path(config_files_directory) / "config.ini"
             assert config_ini.exists(), f"{config_ini} is not a valid path"
             self.config_ini = config_ini
             self.read([config_ini])
+            log.debug("Loaded config from %s", config_ini)
 
-    def save(self):
+    def save(self) -> None:
         assert self.config_ini, "configuration file path is not defined"
         assert self.config_ini.exists()
         config_file = self.config_ini.open("w")
         self.write(config_file)
         config_file.close()
+        log.debug("Saved config to %s", self.config_ini)
diff --git a/openfisca_survey_manager/core/__init__.py b/openfisca_survey_manager/core/__init__.py
index c7be67cc..4ea97a4d 100644
--- a/openfisca_survey_manager/core/__init__.py
+++ b/openfisca_survey_manager/core/__init__.py
@@ -1,2 +1,7 @@
-# Target: Survey (surveys.py), SurveyCollection, dataset orchestration.
-# See docs/REFACTORING_PLAN.md for migration steps.
+# Survey, Table, SurveyCollection, load_table.
+
+from openfisca_survey_manager.core.dataset import SurveyCollection, load_table
+from openfisca_survey_manager.core.survey import NoMoreDataError, Survey
+from openfisca_survey_manager.core.table import Table
+
+__all__ = ["NoMoreDataError", "Survey", "SurveyCollection", "Table", "load_table"]
diff --git a/openfisca_survey_manager/core/dataset.py b/openfisca_survey_manager/core/dataset.py
new file mode 100644
index 00000000..4eb008fe
--- /dev/null
+++ b/openfisca_survey_manager/core/dataset.py
@@ -0,0 +1,248 @@
+"""SurveyCollection: collection of surveys (dataset orchestration)."""
+
+from __future__ import annotations
+
+import codecs
+import collections
+import configparser
+import json
+import logging
+import warnings
+from pathlib import Path
+from typing import List, Optional, Union
+
+import pandas as pd
+
+from openfisca_survey_manager.configuration.config_loader import (
+    load_config,
+    load_manifest,
+    manifest_survey_to_json,
+)
+from openfisca_survey_manager.configuration.models import Config
+from openfisca_survey_manager.configuration.paths import default_config_files_directory
+from openfisca_survey_manager.core.survey import Survey
+from openfisca_survey_manager.exceptions import SurveyConfigError
+
+log = logging.getLogger(__name__)
+
+
+class SurveyCollection:
+    """A collection of Surveys."""
+
+    name: Optional[str] = None
+    label: Optional[str] = None
+    json_file_path: Optional[str] = None
+    surveys: List[Survey]  # set in __init__
+    config: Optional[Config] = None
+    output_directory: Optional[str] = None  # RFC-002: used when config is None (manifest-based)
+
+    def __init__(
+        self,
+        config_files_directory: Optional[Union[Path, str]] = default_config_files_directory,
+        label: Optional[str] = None,
+        name: Optional[str] = None,
+        json_file_path: Optional[str] = None,
+    ) -> None:
+        self.name = name
+        self.label = label
+        self.json_file_path = json_file_path
+        self.surveys = []
+        log.debug(f"Initializing SurveyCollection from config file found in {config_files_directory} ..")
+        config = Config(config_files_directory=config_files_directory)
+        if label is not None:
+            self.label = label
+        if name is not None:
+            self.name = name
+        if json_file_path is not None:
+            self.json_file_path = json_file_path
+            if "collections" not in config.sections():
+                config["collections"] = {}
+            config.set("collections", self.name, str(self.json_file_path))
+            config.save()
+        elif config is not None:
+            if config.has_option("collections", self.name):
+                self.json_file_path = config.get("collections", self.name)
+            elif config.get("collections", "collections_directory") is not None:
+                self.json_file_path = str(Path(config.get("collections", "collections_directory")) / (name + ".json"))
+
+        self.config = config
+
+    def __repr__(self) -> str:
+        header = f"""{self.name}
+Survey collection of {self.label}
+Contains the following surveys :
+"""
+        surveys = [f"       {survey.name} : {survey.label} \n" for survey in self.surveys]
+        return header + "".join(surveys)
+
+    def dump(
+        self,
+        config_files_directory: Optional[Union[Path, str]] = None,
+        json_file_path: Optional[str] = None,
+    ) -> None:
+        if json_file_path is not None:
+            self.json_file_path = json_file_path
+
+        if self.config is None:
+            # RFC-002: manifest-based collection; no config.ini to update
+            return
+
+        config = self.config
+        if self.json_file_path is None:
+            assert self.json_file_path is not None, "A json_file_path should be provided"
+
+        config.set("collections", self.name, str(self.json_file_path))
+        config.save()
+        with codecs.open(str(self.json_file_path), "w", encoding="utf-8") as _file:
+            json.dump(self.to_json(), _file, ensure_ascii=False, indent=2)
+
+    def fill_store(
+        self,
+        source_format: Optional[str] = None,
+        surveys: Optional[List[Survey]] = None,
+        tables: Optional[List[str]] = None,
+        overwrite: bool = False,
+        keep_original_parquet_file: bool = False,
+        encoding: Optional[str] = None,
+        store_format: str = "hdf5",
+        categorical_strategy: str = "unique_labels",
+    ) -> None:
+        if surveys is None:
+            surveys = self.surveys
+        for survey in surveys:
+            survey.fill_store(
+                source_format=source_format,
+                tables=tables,
+                overwrite=overwrite,
+                keep_original_parquet_file=keep_original_parquet_file,
+                encoding=encoding,
+                store_format=store_format,
+                categorical_strategy=categorical_strategy,
+            )
+        self.dump()
+
+    def get_survey(self, survey_name: str) -> Survey:
+        available_surveys_names = [survey.name for survey in self.surveys]
+        assert survey_name in available_surveys_names, (
+            f"Survey {survey_name} cannot be found for survey collection {self.name}.\n"
+            f"Available surveys are :{available_surveys_names}"
+        )
+        return [survey for survey in self.surveys if survey.name == survey_name].pop()
+
+    @classmethod
+    def load(
+        cls,
+        json_file_path: Optional[str] = None,
+        collection: Optional[str] = None,
+        config_files_directory: Optional[Union[Path, str]] = default_config_files_directory,
+    ) -> SurveyCollection:
+        config_dir = Path(config_files_directory).expanduser().resolve()
+        assert config_dir.exists(), f"Config directory does not exist: {config_dir}"
+
+        # RFC-002: try new config.yaml + manifest first
+        new_cfg = load_config(config_dir)
+        if json_file_path is None and collection is not None and new_cfg is not None:
+            manifest = load_manifest(new_cfg["collections_dir"], collection)
+            if manifest is not None:
+                self = cls.__new__(cls)
+                self.name = manifest.get("name", collection)
+                self.label = manifest.get("label", self.name)
+                self.json_file_path = str(new_cfg["collections_dir"] / collection / "manifest.yaml")
+                self.config = None
+                self.output_directory = str(new_cfg["default_output_dir"])
+                self.surveys = []
+                store_format = manifest.get("store_format", "parquet")
+                output_dir = Path(self.output_directory)
+                for survey_name, entry in manifest.get("surveys", {}).items():
+                    survey_json = manifest_survey_to_json(survey_name, entry)
+                    survey = Survey(name=survey_name)
+                    survey = survey.create_from_json(survey_json)
+                    survey.survey_collection = self
+                    survey.store_format = store_format
+                    if store_format == "hdf5":
+                        survey.hdf5_file_path = str(output_dir / (survey.name + ".h5"))
+                    elif store_format == "parquet":
+                        survey.parquet_file_path = str(output_dir / survey.name)
+                    elif store_format == "zarr":
+                        survey.zarr_file_path = str(output_dir / (survey.name + ".zarr"))
+                    self.surveys.append(survey)
+                return self
+
+        # Legacy: config.ini + JSON
+        warnings.warn(
+            "Loading collections from config.ini and JSON files is deprecated. "
+            "Migrate to config.yaml and manifest.yaml using: "
+            "python -m openfisca_survey_manager.scripts.migrate_config_to_rfc002 --config-dir <path> "
+            "See docs/RFC-002-METADATA-AND-CONFIG.md.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        config = Config(config_files_directory=config_files_directory)
+        if json_file_path is None:
+            assert collection is not None, "A collection is needed"
+            try:
+                json_file_path = config.get("collections", collection)
+            except (configparser.NoOptionError, configparser.NoSectionError) as error:
+                msg = f"Looking for config file in {config_files_directory}"
+                log.debug(msg)
+                log.error(error)
+                raise error
+            except Exception as error:
+                msg = f"Looking for config file in {config_files_directory}"
+                log.debug(msg)
+                log.error(error)
+                raise SurveyConfigError(msg) from error
+
+        with Path(json_file_path).open("r") as _file:
+            self_json = json.load(_file)
+            name = self_json["name"]
+
+        self = cls(config_files_directory=config_files_directory, name=name)
+        self.config = config
+        with Path(json_file_path).open("r") as _file:
+            self_json = json.load(_file)
+            self.json_file_path = json_file_path
+            self.label = self_json.get("label")
+            self.name = self_json.get("name")
+
+        surveys = self_json["surveys"]
+        for survey_name, survey_json in surveys.items():
+            survey = Survey(name=survey_name)
+            self.surveys.append(survey.create_from_json(survey_json))
+        return self
+
+    def to_json(self) -> dict:
+        self_json = collections.OrderedDict(())
+        self_json["name"] = self.name
+        self_json["surveys"] = collections.OrderedDict(())
+        for survey in self.surveys:
+            self_json["surveys"][survey.name] = survey.to_json()
+        return self_json
+
+
+def load_table(
+    config_files_directory,
+    variables: Optional[list] = None,
+    collection: Optional[str] = None,
+    survey: Optional[str] = None,
+    input_data_survey_prefix: Optional[str] = None,
+    data_year=None,
+    table: Optional[str] = None,
+    batch_size=None,
+    batch_index=0,
+    filter_by=None,
+) -> pd.DataFrame:
+    """Load table from a survey in a collection."""
+    survey_collection = SurveyCollection.load(collection=collection, config_files_directory=config_files_directory)
+    survey_name = survey if survey is not None else f"{input_data_survey_prefix}_{data_year}"
+    survey_ = survey_collection.get_survey(survey_name)
+    log.debug("Loading table %s in survey %s from collection %s", table, survey_name, collection)
+    if batch_size:
+        return survey_.get_values(
+            table=table,
+            variables=variables,
+            batch_size=batch_size,
+            batch_index=batch_index,
+            filter_by=filter_by,
+        )
+    return survey_.get_values(table=table, variables=variables, filter_by=filter_by)
diff --git a/openfisca_survey_manager/core/survey.py b/openfisca_survey_manager/core/survey.py
new file mode 100644
index 00000000..05375ad9
--- /dev/null
+++ b/openfisca_survey_manager/core/survey.py
@@ -0,0 +1,376 @@
+"""Survey: describes survey data and tables."""
+
+from __future__ import annotations
+
+import collections
+import logging
+import re
+from pathlib import Path
+from typing import TYPE_CHECKING, Any, List, Optional, Union
+
+import pandas
+import pyarrow as pa
+import pyarrow.parquet as pq
+import yaml
+
+from openfisca_survey_manager.core.table import Table
+from openfisca_survey_manager.exceptions import SurveyIOError, SurveyManagerError
+from openfisca_survey_manager.io.backends import get_backend
+from openfisca_survey_manager.io.hdf import hdf5_safe_key
+from openfisca_survey_manager.processing.harmonization import harmonize_data_frame_columns
+
+if TYPE_CHECKING:
+    from openfisca_survey_manager.core.dataset import SurveyCollection
+
+log = logging.getLogger(__name__)
+
+source_format_by_extension = {
+    "csv": "csv",
+    "sas7bdat": "sas",
+    "dta": "stata",
+    "Rdata": "Rdata",
+    "spss": "sav",
+    "parquet": "parquet",
+}
+
+admissible_source_formats = list(source_format_by_extension.values())
+
+
+class NoMoreDataError(Exception):
+    """Raised when the user asks for more data than available in file."""
+
+    pass
+
+
+class Survey:
+    """An object to describe survey data."""
+
+    hdf5_file_path: Optional[str] = None
+    parquet_file_path: Optional[str] = None
+    zarr_file_path: Optional[str] = None
+    label: Optional[str] = None
+    name: Optional[str] = None
+    survey_collection: Optional[SurveyCollection] = None
+    store_format: Optional[str] = None
+
+    def __init__(
+        self,
+        name: Optional[str] = None,
+        label: Optional[str] = None,
+        hdf5_file_path: Optional[str] = None,
+        parquet_file_path: Optional[str] = None,
+        survey_collection: Optional[SurveyCollection] = None,
+        **kwargs: Any,
+    ) -> None:
+        assert name is not None, "A survey should have a name"
+        self.name = name
+        self.tables = collections.OrderedDict()
+        self.informations = {}
+        self.tables_index = {}
+
+        if label is not None:
+            self.label = label
+
+        if hdf5_file_path is not None:
+            self.hdf5_file_path = hdf5_file_path
+
+        if parquet_file_path is not None:
+            self.parquet_file_path = parquet_file_path
+
+        if survey_collection is not None:
+            self.survey_collection = survey_collection
+
+        self.informations = kwargs
+
+    def __repr__(self) -> str:
+        header = f"""{self.name} : survey data {self.label}
+Contains the following tables : \n"""
+        tables = yaml.safe_dump(list(self.tables.keys()), default_flow_style=False)
+        informations = yaml.safe_dump(self.informations, default_flow_style=False)
+        return header + tables + informations
+
+    @classmethod
+    def create_from_json(cls, survey_json: dict) -> Survey:
+        # Top-level store paths; exclude from informations to avoid duplicate kwargs
+        store_path_keys = {"hdf5_file_path", "parquet_file_path", "zarr_file_path"}
+        infos = {k: v for k, v in survey_json.get("informations", {}).items() if k not in store_path_keys}
+        self = cls(
+            name=survey_json.get("name"),
+            label=survey_json.get("label"),
+            hdf5_file_path=survey_json.get("hdf5_file_path"),
+            parquet_file_path=survey_json.get("parquet_file_path"),
+            zarr_file_path=survey_json.get("zarr_file_path"),
+            **infos,
+        )
+        self.tables = survey_json.get("tables")
+        return self
+
+    def dump(self) -> None:
+        assert self.survey_collection is not None
+        self.survey_collection.dump()
+
+    def fill_store(
+        self,
+        source_format: Optional[str] = None,
+        tables: Optional[List[str]] = None,
+        overwrite: Union[bool, List[str]] = True,
+        keep_original_parquet_file: bool = False,
+        encoding: Optional[str] = None,
+        store_format: str = "hdf5",
+        categorical_strategy: str = "unique_labels",
+    ) -> None:
+        assert self.survey_collection is not None
+        assert isinstance(overwrite, (bool, list))
+        survey = self
+        sc = survey.survey_collection
+        if sc.config is not None:
+            directory_path = sc.config.get("data", "output_directory")
+        else:
+            directory_path = getattr(sc, "output_directory", None)
+        assert directory_path is not None, "SurveyCollection has no config and no output_directory; cannot fill_store"
+        if not Path(directory_path).is_dir():
+            log.warning(
+                f"{directory_path} who should be the store data directory does not exist: we create the directory"
+            )
+            Path(directory_path).mkdir(parents=True)
+
+        if source_format == "parquet":
+            store_format = "parquet"
+
+        if store_format == "hdf5" and survey.hdf5_file_path is None:
+            survey.hdf5_file_path = str(Path(directory_path) / (survey.name + ".h5"))
+
+        if store_format == "parquet" and survey.parquet_file_path is None:
+            survey.parquet_file_path = str(Path(directory_path) / survey.name)
+
+        if store_format == "zarr" and survey.zarr_file_path is None:
+            survey.zarr_file_path = str(Path(directory_path) / (survey.name + ".zarr"))
+
+        self.store_format = store_format
+
+        if source_format is not None:
+            assert source_format in admissible_source_formats, f"Data source format {source_format} is unknown"
+            source_formats = [source_format]
+        else:
+            source_formats = admissible_source_formats
+
+        for source_format in source_formats:
+            files = f"{source_format}_files"
+            for data_file in survey.informations.get(files, []):
+                name = Path(data_file).stem
+                extension = Path(data_file).suffix
+                if tables is None or name in tables:
+                    if keep_original_parquet_file:
+                        if re.match(r".*-\d$", name):
+                            name = name.split("-")[0]
+                            parquet_file = str(Path(data_file).parent)
+                            survey.parquet_file_path = str(Path(data_file).parent.parent)
+                        else:
+                            parquet_file = data_file
+                            survey.parquet_file_path = str(Path(data_file).parent)
+                        table = Table(
+                            label=name,
+                            name=name,
+                            source_format=source_format_by_extension[extension[1:]],
+                            survey=survey,
+                            parquet_file=parquet_file,
+                        )
+                        table.read_parquet_columns(data_file)
+
+                    else:
+                        table = Table(
+                            label=name,
+                            name=name,
+                            source_format=source_format_by_extension[extension[1:]],
+                            survey=survey,
+                        )
+                        table.fill_store(
+                            data_file,
+                            clean=True,
+                            overwrite=overwrite if isinstance(overwrite, bool) else table.name in overwrite,
+                            encoding=encoding,
+                            categorical_strategy=categorical_strategy,
+                        )
+        self.dump()
+
+    def get_value(
+        self,
+        variable: str,
+        table: Optional[str] = None,
+        lowercase: bool = False,
+        ignorecase: bool = False,
+    ) -> pandas.DataFrame:
+        return self.get_values([variable], table)
+
+    def _get_values_from_hdf5(self, table: str, ignorecase: bool = False) -> tuple[pandas.DataFrame, str]:
+        """Read table from HDF5 store. Returns (df, resolved_table_name)."""
+        assert Path(self.hdf5_file_path).exists(), (
+            f"{self.hdf5_file_path} is not a valid path. This could happen because "
+            "your data were not builded yet. Please consider using a rebuild option in your code."
+        )
+        store = pandas.HDFStore(self.hdf5_file_path, "r")
+        try:
+            # Use same key normalization as at write time (PyTables NaturalNameWarning)
+            hdf5_key = hdf5_safe_key(table)
+            if ignorecase:
+                keys = store.keys()
+                eligible_tables = [k for k in keys if hdf5_safe_key(k.lstrip("/")).lower() == hdf5_key.lower()]
+                if len(eligible_tables) > 1:
+                    raise SurveyManagerError(
+                        f"{table} is ambiguous since the following tables are available: {eligible_tables}"
+                    )
+                if len(eligible_tables) == 0:
+                    raise SurveyIOError(f"No eligible available table in {keys}")
+                hdf5_key = eligible_tables[0].lstrip("/")
+            try:
+                df = store.select(hdf5_key)
+            except KeyError:
+                # Backward compat: try raw table name (old files may have keys with hyphens)
+                df = store.select(table)
+            return df, table
+        except KeyError:
+            log.error("No table %s in the file %s", table, self.hdf5_file_path)
+            log.error(
+                "This could happen because your data were not builded yet. Available tables are: %s",
+                store.keys(),
+            )
+            raise
+        finally:
+            store.close()
+
+    def _get_values_from_parquet(
+        self,
+        table: str,
+        variables: Optional[List[str]],
+        filter_by: Optional[List[tuple]],
+        batch_size: Optional[int],
+        batch_index: int,
+    ) -> pandas.DataFrame:
+        """Read table from parquet. Resolves variables from table content if None."""
+        if table is None:
+            raise SurveyIOError("A table name is needed to retrieve data from a parquet file")
+        for table_name, table_content in self.tables.items():
+            if table != table_name:
+                continue
+            parquet_file = table_content.get("parquet_file")
+            if Path(parquet_file).is_dir():
+                for file in Path(parquet_file).iterdir():
+                    if file.suffix == ".parquet":
+                        one_parquet_file = str(Path(parquet_file) / file)
+                        break
+                else:
+                    raise SurveyIOError(f"No parquet file found in {parquet_file}")
+            else:
+                one_parquet_file = parquet_file
+            parquet_schema = pq.read_schema(one_parquet_file)
+            assert len(parquet_schema.names) >= 1, f"The parquet file {table_content.get('parquet_file')} is empty"
+            if variables is None:
+                variables = table_content.get("variables")
+            if filter_by:
+                return pq.ParquetDataset(parquet_file, filters=filter_by).read(columns=variables).to_pandas()
+            if batch_size is not None:
+                paths = (
+                    [str(p) for p in Path(parquet_file).glob("*.parquet")]
+                    if Path(parquet_file).is_dir()
+                    else [parquet_file]
+                )
+                tables_list = [pq.read_table(fp, columns=variables) for fp in paths]
+                final_table = pa.concat_tables(tables_list) if len(tables_list) > 1 else tables_list[0]
+                record_batches = final_table.to_batches(max_chunksize=batch_size)
+                if len(record_batches) <= batch_index:
+                    raise NoMoreDataError(
+                        f"Batch {batch_index} not found in {table_name}. Max index is {len(record_batches)}"
+                    )
+                return record_batches[batch_index].to_pandas()
+            return pq.ParquetDataset(parquet_file).read(columns=variables).to_pandas()
+        raise SurveyIOError(f"No table {table} found in {self.parquet_file_path}")
+
+    def _get_values_from_zarr(
+        self,
+        table: str,
+        variables: Optional[List[str]] = None,
+        **kwargs: Any,
+    ) -> pandas.DataFrame:
+        """Read table from zarr store."""
+        if self.zarr_file_path is None:
+            raise SurveyIOError("No zarr store path for survey")
+        backend = get_backend("zarr")
+        return backend.read_table(
+            self.zarr_file_path,
+            table,
+            variables=variables,
+            **kwargs,
+        )
+
+    def get_values(
+        self,
+        variables: Optional[List[str]] = None,
+        table: Optional[str] = None,
+        lowercase: bool = False,
+        ignorecase: bool = False,
+        rename_ident: bool = True,
+        batch_size: Optional[int] = None,
+        batch_index: int = 0,
+        filter_by: Optional[List[tuple]] = None,
+    ) -> pandas.DataFrame:
+        if self.parquet_file_path is None and self.hdf5_file_path is None and self.zarr_file_path is None:
+            raise SurveyIOError(f"No data file found for survey {self.name}")
+        if self.store_format == "zarr" and self.zarr_file_path is not None:
+            df = self._get_values_from_zarr(table or "", variables=variables)
+        elif self.hdf5_file_path is not None:
+            df, _ = self._get_values_from_hdf5(table or "", ignorecase=ignorecase)
+        else:
+            df = self._get_values_from_parquet(table, variables, filter_by, batch_size, batch_index)
+        harmonize_data_frame_columns(df, lowercase=lowercase, rename_ident=rename_ident)
+        if variables is None:
+            return df
+        diff = set(variables) - set(df.columns)
+        if diff:
+            raise SurveyIOError(f"The following variable(s) {diff} are missing")
+        variables = list(set(variables).intersection(df.columns))
+        return df[variables]
+
+    def insert_table(
+        self,
+        label: Optional[str] = None,
+        name: Optional[str] = None,
+        **kwargs: Any,
+    ) -> None:
+        parquet_file = kwargs.pop("parquet_file", None)
+        data_frame = kwargs.pop("data_frame", None)
+        if data_frame is None:
+            data_frame = kwargs.pop("dataframe", None)
+
+        if data_frame is not None:
+            assert isinstance(data_frame, pandas.DataFrame)
+            variables = kwargs.pop("variables", None)
+            if variables is not None:
+                assert set(variables) < set(data_frame.columns)
+            else:
+                variables = list(data_frame.columns)
+            if label is None:
+                label = name
+            table = Table(label=label, name=name, survey=self, variables=variables, parquet_file=parquet_file)
+            assert (table.survey.hdf5_file_path is not None) or (table.survey.parquet_file_path is not None)
+            if parquet_file is not None:
+                log.debug(f"Saving table {name} in {table.survey.parquet_file_path}")
+                data_frame.to_parquet(parquet_file)
+            else:
+                log.debug(f"Saving table {name} in {table.survey.hdf5_file_path}")
+                to_hdf_kwargs = kwargs.pop("to_hdf_kwargs", {})
+                table.save_data_frame_to_hdf5(data_frame, **to_hdf_kwargs)
+
+        if name not in self.tables:
+            self.tables[name] = {}
+        for key, val in kwargs.items():
+            self.tables[name][key] = val
+
+    def to_json(self) -> dict:
+        self_json = collections.OrderedDict(())
+        self_json["hdf5_file_path"] = str(self.hdf5_file_path) if self.hdf5_file_path else None
+        self_json["parquet_file_path"] = str(self.parquet_file_path) if self.parquet_file_path else None
+        self_json["label"] = self.label
+        self_json["name"] = self.name
+        self_json["tables"] = self.tables
+        self_json["informations"] = collections.OrderedDict(sorted(self.informations.items()))
+        return self_json
diff --git a/openfisca_survey_manager/core/table.py b/openfisca_survey_manager/core/table.py
new file mode 100644
index 00000000..4fe3fa76
--- /dev/null
+++ b/openfisca_survey_manager/core/table.py
@@ -0,0 +1,341 @@
+"""Table: a table of a survey (core I/O and storage)."""
+
+from __future__ import annotations
+
+import collections
+import csv
+import datetime
+import errno
+import gc
+import logging
+import os
+from pathlib import Path
+from typing import TYPE_CHECKING, Any, Optional
+
+import pandas
+from chardet.universaldetector import UniversalDetector
+from pyarrow import parquet as pq
+
+from openfisca_survey_manager.exceptions import SurveyIOError
+from openfisca_survey_manager.io.backends import get_backend
+from openfisca_survey_manager.io.readers import read_sas
+from openfisca_survey_manager.io.writers import write_table_to_hdf5, write_table_to_parquet
+from openfisca_survey_manager.processing.cleaning import clean_data_frame
+
+try:
+    from openfisca_survey_manager.io.readers import read_spss
+except ImportError:
+    read_spss = None  # optional dependency (savReaderWriter)
+
+if TYPE_CHECKING:
+    from openfisca_survey_manager.core.survey import Survey
+
+log = logging.getLogger(__name__)
+
+reader_by_source_format = {
+    "csv": pandas.read_csv,
+    "sas": read_sas,
+    "spss": read_spss,
+    "stata": pandas.read_stata,
+    "parquet": pandas.read_parquet,
+}
+
+
+class Table:
+    """A table of a survey."""
+
+    label: Optional[str] = None
+    name: Optional[str] = None
+    source_format: Optional[str] = None
+    survey: Optional[Survey] = None
+    variables: Optional[list[str]] = None
+    parquet_file: Optional[str] = None
+
+    def __init__(
+        self,
+        survey: Optional[Survey] = None,
+        name: Optional[str] = None,
+        label: Optional[str] = None,
+        source_format: Optional[str] = None,
+        variables: Optional[list[str]] = None,
+        parquet_file: Optional[str] = None,
+        **kwargs: Any,
+    ) -> None:
+        assert name is not None, "A table should have a name"
+        self.name = name
+        self.label = label
+        self.source_format = source_format
+        self.variables = variables
+        self.parquet_file = parquet_file
+        self.informations = kwargs
+
+        from openfisca_survey_manager.core.survey import Survey
+
+        assert isinstance(survey, Survey), f"survey is of type {type(survey)} and not {Survey}"
+        self.survey = survey
+        if not survey.tables:
+            survey.tables = collections.OrderedDict()
+
+        survey.tables[name] = collections.OrderedDict(
+            source_format=source_format,
+            variables=variables,
+            parquet_file=parquet_file,
+        )
+
+    def _check_and_log(self, data_file_path: str, store_file_path: Optional[str]) -> None:
+        assert store_file_path is not None, "Store file path cannot be None"
+        if not Path(data_file_path).is_file():
+            raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), data_file_path)
+
+        log.info(
+            f"Inserting table {self.name} from file {data_file_path} in store file {store_file_path} "
+            f"at point {self.name}"
+        )
+
+    def _get_store_path_and_format(self) -> Optional[tuple[str, str]]:
+        """Return (store_path, store_format) for the survey's current backend, or None."""
+        fmt = getattr(self.survey, "store_format", None) or "hdf5"
+        if fmt == "hdf5" and self.survey.hdf5_file_path is not None:
+            return (self.survey.hdf5_file_path, "hdf5")
+        if fmt == "parquet" and self.survey.parquet_file_path is not None:
+            return (self.survey.parquet_file_path, "parquet")
+        if fmt == "zarr" and getattr(self.survey, "zarr_file_path", None) is not None:
+            return (self.survey.zarr_file_path, "zarr")
+        if self.survey.hdf5_file_path is not None:
+            return (self.survey.hdf5_file_path, "hdf5")
+        if self.survey.parquet_file_path is not None:
+            return (self.survey.parquet_file_path, "parquet")
+        return None
+
+    def _is_stored(self) -> bool:
+        path_fmt = self._get_store_path_and_format()
+        if path_fmt is None:
+            return False
+        store_path, store_format = path_fmt
+        backend = get_backend(store_format)
+        if backend.table_exists(store_path, self.name):
+            log.info(
+                "Exiting without overwriting %s in %s (%s)",
+                self.name,
+                store_path,
+                store_format,
+            )
+            return True
+        return False
+
+    def _save(
+        self,
+        data_frame: Optional[pandas.DataFrame] = None,
+        store_format: str = "hdf5",
+    ) -> None:
+        assert data_frame is not None
+        variables = self.variables
+
+        if variables:
+            stored_variables = list(set(variables).intersection(set(data_frame.columns)))
+            log.info("The following variables are stored: %s", stored_variables)
+            if set(stored_variables) != set(variables):
+                log.info(
+                    "variables wanted by the user that were not available: "
+                    f"{list(set(variables) - set(stored_variables))}"
+                )
+            data_frame = data_frame[stored_variables].copy()
+
+        path_fmt = self._get_store_path_and_format()
+        if path_fmt is None:
+            raise SurveyIOError(
+                f"No store path set for survey (store_format={store_format}). "
+                "Set hdf5_file_path, parquet_file_path, or zarr_file_path."
+            )
+        store_path, resolved_format = path_fmt
+        if store_format != resolved_format:
+            store_format = resolved_format
+        backend = get_backend(store_format)
+        if store_format == "hdf5":
+            log.warning(
+                "HDF5 will no longer be the default format in a future version. "
+                "Please use parquet or zarr format instead."
+            )
+        log.info("Inserting table %s in %s store %s", self.name, store_format, store_path)
+        backend.write_table(store_path, self.name, data_frame)
+        self.variables = list(data_frame.columns)
+        self.survey.tables[self.name]["variables"] = self.variables
+        if store_format == "parquet":
+            self.parquet_file = str(Path(store_path) / f"{self.name}.parquet")
+            self.survey.tables[self.name]["parquet_file"] = self.parquet_file
+        gc.collect()
+
+    def fill_store(
+        self,
+        data_file: str,
+        overwrite: bool = False,
+        clean: bool = False,
+        **kwargs: Any,
+    ) -> None:
+        if not overwrite and self._is_stored():
+            return
+
+        start_table_time = datetime.datetime.now()
+        if self.source_format in ["sas", "parquet"] and "encoding" in kwargs:
+            del kwargs["encoding"]
+        data_frame = self.read_source(data_file, **kwargs)
+        try:
+            if clean:
+                clean_data_frame(data_frame)
+            self._save(data_frame=data_frame, store_format=self.survey.store_format)
+            log.info(f"File {data_file} has been processed in {datetime.datetime.now() - start_table_time}")
+        except Exception as e:
+            log.info(f"Skipping file {data_file} because of following error \n {e}")
+            raise e
+
+    def read_parquet_columns(self, parquet_file: Optional[str] = None) -> list[str]:
+        if parquet_file is None:
+            parquet_file = self.parquet_file
+        log.info(f"Initializing table {self.name} from parquet file {parquet_file}")
+        self.source_format = "parquet"
+        parquet_schema = pq.read_schema(parquet_file)
+        self.variables = parquet_schema.names
+        self.survey.tables[self.name]["variables"] = self.variables
+        return self.variables
+
+    def _read_csv_with_inferred_encoding(
+        self, data_file: str, reader: Any, kwargs: dict[str, Any]
+    ) -> pandas.DataFrame:
+        """Read CSV, inferring encoding and dialect if default read fails."""
+        log.debug("Failing to read %s, trying to infer encoding and dialect/separator", data_file)
+        detector = UniversalDetector()
+        with Path(data_file).open("rb") as csvfile:
+            for line in csvfile:
+                detector.feed(line)
+                if detector.done:
+                    break
+        detector.close()
+        encoding = detector.result["encoding"]
+        confidence = detector.result["confidence"]
+        try:
+            with Path(data_file).open("r", newline="", encoding=encoding) as csvfile:
+                dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,")
+        except Exception:
+            dialect = None
+            delimiter = ";"
+        log.debug(
+            "dialect.delimiter = %s, encoding = %s, confidence = %s",
+            dialect.delimiter if dialect is not None else delimiter,
+            encoding,
+            confidence,
+        )
+        kwargs = {**kwargs, "engine": "python", "encoding": encoding}
+        if dialect:
+            kwargs["dialect"] = dialect
+        else:
+            kwargs["delimiter"] = delimiter
+        return reader(data_file, **kwargs)
+
+    def _apply_stata_categorical_strategy(
+        self,
+        data_frame: pandas.DataFrame,
+        data_file: str,
+        categorical_strategy: str,
+    ) -> None:
+        """Apply categorical_strategy (unique_labels, codes, skip) to Stata value labels in place."""
+        from pandas.io.stata import StataReader
+
+        stata_reader = StataReader(data_file)
+        value_labels = stata_reader.value_labels()
+        for col_name, labels in value_labels.items():
+            if col_name not in data_frame.columns:
+                continue
+            if categorical_strategy == "unique_labels":
+                unique_labels = {}
+                seen_labels = {}
+                for code, label in labels.items():
+                    if pandas.isna(code):
+                        unique_labels[code] = label
+                    elif label in seen_labels:
+                        unique_labels[code] = f"{label} ({code})"
+                    else:
+                        unique_labels[code] = label
+                        seen_labels[label] = code
+                code_to_label = {code: unique_labels[code] for code in sorted(labels.keys())}
+                data_frame[col_name] = data_frame[col_name].map(code_to_label)
+                data_frame[col_name] = pandas.Categorical(
+                    data_frame[col_name],
+                    categories=list(code_to_label.values()),
+                    ordered=False,
+                )
+            elif categorical_strategy == "codes":
+                codes = sorted([c for c in labels if pandas.notna(c)])
+                if codes:
+                    data_frame[col_name] = pandas.Categorical(data_frame[col_name], categories=codes, ordered=False)
+            elif categorical_strategy != "skip":
+                log.warning("Unknown categorical_strategy %r, using 'skip'", categorical_strategy)
+
+    def read_source(self, data_file: str, **kwargs: Any) -> pandas.DataFrame:
+        source_format = self.source_format
+        path_fmt = self._get_store_path_and_format()
+        store_file_path = path_fmt[0] if path_fmt else None
+        self._check_and_log(data_file, store_file_path=store_file_path)
+        reader = reader_by_source_format[source_format]
+        categorical_strategy = (
+            kwargs.pop("categorical_strategy", "unique_labels")
+            if source_format == "stata"
+            else kwargs.pop("categorical_strategy", None)
+        )
+        try:
+            if source_format == "csv":
+                try:
+                    data_frame = reader(data_file, **kwargs)
+                    if len(data_frame.columns) == 1 and ";" in data_frame.columns[0]:
+                        raise SurveyIOError(
+                            "A ';' is present in the unique column name. Looks like we got the wrong separator."
+                        )
+                except Exception:
+                    data_frame = self._read_csv_with_inferred_encoding(data_file, reader, kwargs)
+            elif source_format == "stata":
+                if "encoding" in kwargs:
+                    kwargs.pop("encoding")
+                try:
+                    data_frame = reader(data_file, **kwargs)
+                except ValueError as e:
+                    if "not unique" not in str(e) and "Categorical categories must be unique" not in str(e):
+                        raise
+                    log.info(
+                        "Non-unique value labels detected in %s, using strategy %r",
+                        data_file,
+                        categorical_strategy,
+                    )
+                    kwargs_no_cat = {**kwargs, "convert_categoricals": False}
+                    data_frame = reader(data_file, **kwargs_no_cat)
+                    self._apply_stata_categorical_strategy(data_frame, data_file, categorical_strategy)
+            else:
+                data_frame = reader(data_file, **kwargs)
+        except Exception as e:
+            log.info("Error while reading %s", data_file)
+            raise e
+        gc.collect()
+        return data_frame
+
+    def save_data_frame_to_hdf5(self, data_frame: pandas.DataFrame, **kwargs: Any) -> None:
+        hdf5_file_path = self.survey.hdf5_file_path
+        log.info(f"Inserting table {self.name} in HDF file {hdf5_file_path}")
+        store_path = self.name
+        write_table_to_hdf5(
+            data_frame,
+            hdf5_file_path=hdf5_file_path,
+            store_path=store_path,
+            **kwargs,
+        )
+
+        self.variables = list(data_frame.columns)
+
+    def save_data_frame_to_parquet(self, data_frame: pandas.DataFrame) -> None:
+        parquet_file_path = self.survey.parquet_file_path
+        self.parquet_file = write_table_to_parquet(
+            data_frame,
+            parquet_dir_path=parquet_file_path,
+            table_name=self.name,
+        )
+        self.variables = list(data_frame.columns)
+
+        self.survey.tables[self.name]["parquet_file"] = self.parquet_file
+        self.survey.tables[self.name]["variables"] = self.variables
diff --git a/openfisca_survey_manager/google_colab.py b/openfisca_survey_manager/google_colab.py
deleted file mode 100644
index b5cf98a7..00000000
--- a/openfisca_survey_manager/google_colab.py
+++ /dev/null
@@ -1,23 +0,0 @@
-import configparser
-from pathlib import Path
-
-from openfisca_survey_manager.paths import default_config_files_directory as config_files_directory
-
-
-def create_raw_data_ini(value_by_option_by_section=None):
-    """Creates raw_data.ini configureation file
-
-    Args:
-      value_by_option_by_section(dict(dict)): Options value by section (Default value = None)
-
-    """
-    config_parser = configparser.ConfigParser()
-
-    if value_by_option_by_section is not None:
-        for section, value_by_option in value_by_option_by_section.items():
-            config_parser.add_section(section)
-            for option, value in value_by_option.items():
-                config_parser.set(section, option, value)
-
-    with (Path(config_files_directory) / "raw_data.ini").open("w") as raw_data_config_file:
-        config_parser.write(raw_data_config_file)
diff --git a/openfisca_survey_manager/io/backends.py b/openfisca_survey_manager/io/backends.py
new file mode 100644
index 00000000..a6f8273e
--- /dev/null
+++ b/openfisca_survey_manager/io/backends.py
@@ -0,0 +1,227 @@
+"""Store backends for survey tables: HDF5, Parquet, Zarr.
+
+Allows choosing the storage format (backend) when building or filling the store.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Any, Optional, Protocol
+
+import pandas as pd
+
+from openfisca_survey_manager.io.hdf import hdf5_safe_key, write_table_to_hdf5
+from openfisca_survey_manager.io.writers import write_table_to_parquet
+
+log = logging.getLogger(__name__)
+
+
+# Supported store format names (zarr only if zarr package is installed)
+def get_available_backend_names() -> tuple[str, ...]:
+    return tuple(_backends.keys())
+
+
+class StoreBackend(Protocol):
+    """Protocol for a store backend (write/read tables)."""
+
+    def write_table(
+        self,
+        store_path: str,
+        table_name: str,
+        data_frame: pd.DataFrame,
+        **kwargs: Any,
+    ) -> Optional[str]:
+        """Write a table. Returns path used for the table (e.g. file path) or None."""
+        ...
+
+    def read_table(
+        self,
+        store_path: str,
+        table_name: str,
+        variables: Optional[list[str]] = None,
+        **kwargs: Any,
+    ) -> pd.DataFrame:
+        """Read a table as DataFrame."""
+        ...
+
+    def table_exists(self, store_path: str, table_name: str) -> bool:
+        """Return True if the table exists in the store."""
+        ...
+
+
+class HDF5Backend:
+    """Store tables in a single HDF5 file."""
+
+    def write_table(
+        self,
+        store_path: str,
+        table_name: str,
+        data_frame: pd.DataFrame,
+        **kwargs: Any,
+    ) -> Optional[str]:
+        write_table_to_hdf5(
+            data_frame,
+            hdf5_file_path=store_path,
+            store_path=table_name,
+            **kwargs,
+        )
+        return None
+
+    def read_table(
+        self,
+        store_path: str,
+        table_name: str,
+        variables: Optional[list[str]] = None,
+        **kwargs: Any,
+    ) -> pd.DataFrame:
+        key = hdf5_safe_key(table_name)
+        store = pd.HDFStore(store_path, "r")
+        try:
+            df = store.select(key)
+        finally:
+            store.close()
+        if variables is not None:
+            df = df[[c for c in variables if c in df.columns]]
+        return df
+
+    def table_exists(self, store_path: str, table_name: str) -> bool:
+        if not Path(store_path).is_file():
+            return False
+        key = hdf5_safe_key(table_name)
+        store = pd.HDFStore(store_path, "r")
+        try:
+            keys = store.keys()
+            return key in keys or any(k.lstrip("/") == key for k in keys)
+        finally:
+            store.close()
+
+
+class ParquetBackend:
+    """Store each table as a parquet file in a directory (store_path/table_name.parquet)."""
+
+    def write_table(
+        self,
+        store_path: str,
+        table_name: str,
+        data_frame: pd.DataFrame,
+        **kwargs: Any,
+    ) -> Optional[str]:
+        return write_table_to_parquet(
+            data_frame,
+            parquet_dir_path=store_path,
+            table_name=table_name,
+        )
+
+    def read_table(
+        self,
+        store_path: str,
+        table_name: str,
+        variables: Optional[list[str]] = None,
+        **kwargs: Any,
+    ) -> pd.DataFrame:
+        path = Path(store_path) / f"{table_name}.parquet"
+        if not path.is_file():
+            raise FileNotFoundError(f"No table {table_name} at {path}")
+        return pd.read_parquet(path, columns=variables)
+
+    def table_exists(self, store_path: str, table_name: str) -> bool:
+        return (Path(store_path) / f"{table_name}.parquet").is_file()
+
+
+def _write_table_to_zarr(
+    data_frame: pd.DataFrame,
+    zarr_dir_path: str,
+    table_name: str,
+) -> str:
+    """Write a DataFrame to a zarr group (store_path/table_name)."""
+    import pandas as pd
+
+    zarr_path = str(Path(zarr_dir_path) / table_name)
+    Path(zarr_path).parent.mkdir(parents=True, exist_ok=True)
+    # Object columns can cause issues; coerce to string like parquet backend
+    for col in data_frame.columns:
+        if data_frame[col].dtype == "object":
+            try:
+                data_frame[col] = data_frame[col].astype(str)
+            except Exception:
+                data_frame[col] = data_frame[col].apply(lambda x: str(x) if pd.notna(x) else None)
+    data_frame.to_zarr(zarr_path, mode="w")
+    return zarr_path
+
+
+def _read_table_from_zarr(
+    zarr_dir_path: str,
+    table_name: str,
+    variables: Optional[list[str]] = None,
+) -> pd.DataFrame:
+    """Read a table from a zarr group."""
+    zarr_path = str(Path(zarr_dir_path) / table_name)
+    df = pd.read_zarr(zarr_path)
+    if variables is not None:
+        df = df[[c for c in variables if c in df.columns]]
+    return df
+
+
+class ZarrBackend:
+    """Store each table as a zarr group in a directory (store_path/table_name)."""
+
+    def write_table(
+        self,
+        store_path: str,
+        table_name: str,
+        data_frame: pd.DataFrame,
+        **kwargs: Any,
+    ) -> Optional[str]:
+        return _write_table_to_zarr(data_frame, store_path, table_name)
+
+    def read_table(
+        self,
+        store_path: str,
+        table_name: str,
+        variables: Optional[list[str]] = None,
+        **kwargs: Any,
+    ) -> pd.DataFrame:
+        return _read_table_from_zarr(store_path, table_name, variables)
+
+    def table_exists(self, store_path: str, table_name: str) -> bool:
+        return (Path(store_path) / table_name).is_dir()
+
+
+def _build_backends() -> dict[str, StoreBackend]:
+    backends: dict[str, StoreBackend] = {
+        "hdf5": HDF5Backend(),
+        "parquet": ParquetBackend(),
+    }
+    try:
+        import zarr  # noqa: F401
+
+        backends["zarr"] = ZarrBackend()
+    except ImportError:
+        log.debug("zarr not installed; zarr store backend unavailable")
+    return backends
+
+
+_backends = _build_backends()
+
+STORE_BACKEND_NAMES = get_available_backend_names()
+
+__all__ = [
+    "get_backend",
+    "get_available_backend_names",
+    "register_backend",
+    "StoreBackend",
+    "STORE_BACKEND_NAMES",
+]
+
+
+def get_backend(name: str) -> StoreBackend:
+    """Return the store backend for the given format name."""
+    if name not in _backends:
+        raise ValueError(f"Unknown store backend: {name}. Choose from {list(_backends.keys())}")
+    return _backends[name]
+
+
+def register_backend(name: str, backend: StoreBackend) -> None:
+    """Register a custom store backend (e.g. for testing or extensions)."""
+    _backends[name] = backend
diff --git a/openfisca_survey_manager/io/hdf.py b/openfisca_survey_manager/io/hdf.py
new file mode 100644
index 00000000..33d74ee7
--- /dev/null
+++ b/openfisca_survey_manager/io/hdf.py
@@ -0,0 +1,56 @@
+"""HDF5 write support for survey tables."""
+
+from __future__ import annotations
+
+import logging
+import re
+from typing import Any
+
+import pandas as pd
+
+log = logging.getLogger(__name__)
+
+# PyTables / pandas-HDF5 require node names to match ^[a-zA-Z_][a-zA-Z0-9_]*$
+# to avoid NaturalNameWarning. We normalize table names (e.g. person_2017-01 -> person_2017_01).
+_HDF5_SAFE_PATTERN = re.compile(r"[^a-zA-Z0-9_]")
+
+
+def hdf5_safe_key(name: str) -> str:
+    """Return an HDF5 node name safe for PyTables (valid Python identifier)."""
+    return _HDF5_SAFE_PATTERN.sub("_", name)
+
+
+def write_table_to_hdf5(
+    data_frame: pd.DataFrame,
+    *,
+    hdf5_file_path: str,
+    store_path: str,
+    **kwargs: Any,
+) -> None:
+    """Write a DataFrame to HDF5.
+
+    Mirrors historical behavior from `tables.Table.save_data_frame_to_hdf5`.
+    May mutate `data_frame` (type conversions) to ensure it can be written.
+    """
+    key = hdf5_safe_key(store_path)
+    try:
+        data_frame.to_hdf(hdf5_file_path, key=key, append=False, **kwargs)
+    except (TypeError, NotImplementedError):
+        log.info("Type problem(s) when creating %s in %s", store_path, hdf5_file_path)
+        dtypes = data_frame.dtypes
+        # Checking for strings
+        converted_dtypes = dtypes.isin(["mixed", "unicode"])
+        if converted_dtypes.any():
+            log.info("The following types are converted to strings %s", dtypes[converted_dtypes])
+            for column in dtypes[converted_dtypes].index:
+                data_frame[column] = data_frame[column].copy().astype(str)
+
+        # Checking for remaining categories
+        dtypes = data_frame.dtypes
+        converted_dtypes = dtypes.isin(["category"])
+        if not converted_dtypes.empty:  # With category table format is needed
+            log.info(
+                "The following types are added as category using the table format %s",
+                dtypes[converted_dtypes],
+            )
+            data_frame.to_hdf(hdf5_file_path, key=key, append=False, format="table", **kwargs)
diff --git a/openfisca_survey_manager/io/readers.py b/openfisca_survey_manager/io/readers.py
index b372c957..5feccab4 100644
--- a/openfisca_survey_manager/io/readers.py
+++ b/openfisca_survey_manager/io/readers.py
@@ -1,5 +1,7 @@
 """Readers for survey data (SAS, SPSS, DBF, etc.)."""
 
+from __future__ import annotations
+
 import logging
 from typing import Optional
 
@@ -45,7 +47,7 @@ def read_spss(spss_file_path: str) -> DataFrame:
 def read_dbf(
     dbf_path: str,
     index: Optional[str] = None,
-    cols: Optional[list] = None,
+    cols: Optional[list[str]] = None,
     incl_index: bool = False,
 ) -> DataFrame:
     """
diff --git a/openfisca_survey_manager/io/writers.py b/openfisca_survey_manager/io/writers.py
index 8481fe47..25f7063f 100644
--- a/openfisca_survey_manager/io/writers.py
+++ b/openfisca_survey_manager/io/writers.py
@@ -4,46 +4,14 @@
 
 import logging
 from pathlib import Path
-from typing import Any
 
 import pandas as pd
 
-log = logging.getLogger(__name__)
-
+from openfisca_survey_manager.io.hdf import write_table_to_hdf5
 
-def write_table_to_hdf5(
-    data_frame: pd.DataFrame,
-    *,
-    hdf5_file_path: str,
-    store_path: str,
-    **kwargs: Any,
-) -> None:
-    """Write a DataFrame to HDF5.
-
-    Mirrors historical behavior from `tables.Table.save_data_frame_to_hdf5`.
-    May mutate `data_frame` (type conversions) to ensure it can be written.
-    """
-    try:
-        data_frame.to_hdf(hdf5_file_path, store_path, append=False, **kwargs)
-    except (TypeError, NotImplementedError):
-        log.info("Type problem(s) when creating %s in %s", store_path, hdf5_file_path)
-        dtypes = data_frame.dtypes
-        # Checking for strings
-        converted_dtypes = dtypes.isin(["mixed", "unicode"])
-        if converted_dtypes.any():
-            log.info("The following types are converted to strings %s", dtypes[converted_dtypes])
-            for column in dtypes[converted_dtypes].index:
-                data_frame[column] = data_frame[column].copy().astype(str)
+log = logging.getLogger(__name__)
 
-        # Checking for remaining categories
-        dtypes = data_frame.dtypes
-        converted_dtypes = dtypes.isin(["category"])
-        if not converted_dtypes.empty:  # With category table format is needed
-            log.info(
-                "The following types are added as category using the table format %s",
-                dtypes[converted_dtypes],
-            )
-            data_frame.to_hdf(hdf5_file_path, store_path, append=False, format="table", **kwargs)
+__all__ = ["write_table_to_hdf5", "write_table_to_parquet"]
 
 
 def write_table_to_parquet(
diff --git a/openfisca_survey_manager/matching.py b/openfisca_survey_manager/matching.py
deleted file mode 100644
index 8fa93756..00000000
--- a/openfisca_survey_manager/matching.py
+++ /dev/null
@@ -1,145 +0,0 @@
-import logging
-from pathlib import Path
-
-import pandas as pd
-
-from openfisca_survey_manager.paths import openfisca_survey_manager_location
-
-log = logging.getLogger(__name__)
-
-
-config_files_directory = Path(openfisca_survey_manager_location)
-
-
-def nnd_hotdeck_using_feather(receiver=None, donor=None, matching_variables=None, z_variables=None):
-    """
-    Not working
-    """
-    import feather
-
-    assert receiver is not None and donor is not None
-    assert matching_variables is not None
-
-    temporary_directory_path = config_files_directory / "tmp"
-    assert temporary_directory_path.exists()
-    receiver_path = temporary_directory_path / "receiver.feather"
-    donor_path = temporary_directory_path / "donor.feather"
-    feather.write_dataframe(receiver, receiver_path)
-    feather.write_dataframe(donor, donor_path)
-    if isinstance(matching_variables, str):
-        match_vars = f'"{matching_variables}"'
-    elif len(matching_variables) == 1:
-        match_vars = f'"{matching_variables[0]}"'
-    else:
-        match_vars = '"{}"'.format("todo")
-
-    r_script = f"""
-rm(list=ls())
-gc()
-devtools::install_github("wesm/feather/R")
-library(feather)
-library(StatMatch)
-
-receiver <- read_feather({receiver_path})
-donor <- read_feather({donor_path})
-summary(receiver)
-summary(donor)
-
-# variables
-receiver = as.data.frame(receiver)
-donor = as.data.frame(donor)
-gc()
-match_vars = {match_vars}
-# don_class = c("sexe")
-out.nnd <- NND.hotdeck(
-  data.rec = receiver, data.don = donor, match.vars = match_vars
-  )
-
-# out.nndsummary(out.nnd$mtc.ids)
-# head(out.nnd$mtc.ids, 10)
-# head(receiver, 10)
-
-fused.nnd.m <- create.fused(
-    data.rec = receiver, data.don = donor,
-    mtc.ids = out.nnd$mtc.ids,
-    z.vars = "{z_variables}"
-    )
-summary(fused.nnd.m)
-"""
-    log.debug("%s", r_script)
-
-
-def nnd_hotdeck_using_rpy2(receiver=None, donor=None, matching_variables=None, z_variables=None, donor_classes=None):
-    from rpy2.robjects import pandas2ri
-    from rpy2.robjects.packages import importr
-
-    assert receiver is not None and donor is not None
-    assert matching_variables is not None
-
-    pandas2ri.activate()
-    stat_match = importr("StatMatch")
-
-    if isinstance(donor_classes, str):
-        assert donor_classes in receiver, "Donor class not present in receiver"
-        assert donor_classes in donor, "Donor class not present in donor"
-
-    try:
-        if donor_classes:
-            out_nnd = stat_match.NND_hotdeck(
-                data_rec=receiver,
-                data_don=donor,
-                match_vars=pd.Series(matching_variables),
-                don_class=pd.Series(donor_classes),
-            )
-        else:
-            out_nnd = stat_match.NND_hotdeck(
-                data_rec=receiver,
-                data_don=donor,
-                match_vars=pd.Series(matching_variables),
-                # don_class = pd.Series(donor_classes)
-            )
-    except Exception as e:
-        log.debug("receiver: %s", receiver)
-        log.debug("donor: %s", donor)
-        log.debug("matching_variables: %s", pd.Series(matching_variables))
-        log.exception("NND hotdeck failed: %s", e)
-        raise
-
-    # create synthetic data.set, without the
-    # duplication of the matching variables
-
-    fused_0 = pandas2ri.ri2py(
-        stat_match.create_fused(data_rec=receiver, data_don=donor, mtc_ids=out_nnd[0], z_vars=pd.Series(z_variables))
-    )
-
-    # create synthetic data.set, with the "duplication"
-    # of the matching variables
-
-    fused_1 = pandas2ri.ri2py(
-        stat_match.create_fused(
-            data_rec=receiver,
-            data_don=donor,
-            mtc_ids=out_nnd[0],
-            z_vars=pd.Series(z_variables),
-            dup_x=True,
-            match_vars=pd.Series(matching_variables),
-        )
-    )
-
-    return fused_0, fused_1
-
-
-if __name__ == "__main__":
-    log.setLevel(logging.INFO)
-
-    receiver = pd.DataFrame()
-    donor = pd.DataFrame()
-    matching_variables = "sexe"
-    z_variables = "ident"
-
-    nnd_hotdeck_using_feather(
-        receiver=receiver,
-        donor=donor,
-        matching_variables=matching_variables,
-        z_variables=z_variables,
-    )
diff --git a/openfisca_survey_manager/paths.py b/openfisca_survey_manager/paths.py
deleted file mode 100644
index 288c0a4a..00000000
--- a/openfisca_survey_manager/paths.py
+++ /dev/null
@@ -1,22 +0,0 @@
-"""Re-export for backward compatibility.
-
-Prefer: from openfisca_survey_manager.configuration.paths import ...
-"""
-
-from openfisca_survey_manager.configuration.paths import (
-    config_ini,
-    default_config_files_directory,
-    is_in_ci,
-    openfisca_survey_manager_location,
-    private_run_with_data,
-    test_config_files_directory,
-)
-
-__all__ = [
-    "config_ini",
-    "default_config_files_directory",
-    "is_in_ci",
-    "openfisca_survey_manager_location",
-    "private_run_with_data",
-    "test_config_files_directory",
-]
diff --git a/openfisca_survey_manager/policy/__init__.py b/openfisca_survey_manager/policy/__init__.py
new file mode 100644
index 00000000..60888238
--- /dev/null
+++ b/openfisca_survey_manager/policy/__init__.py
@@ -0,0 +1,53 @@
+# Policy-related modules (simulations, simulation_builder, aggregates).
+# À terme ces briques pourront être déplacées dans un paquet dédié.
+# Voir docs/REFACTORING_PLAN.md.
+
+from openfisca_survey_manager.policy.aggregates import AbstractAggregates
+from openfisca_survey_manager.policy.simulation_builder import (
+    SimulationBuilder,
+    diagnose_variable_mismatch,
+)
+from openfisca_survey_manager.policy.simulations import (
+    SecretViolationError,
+    Simulation,
+    adaptative_calculate_variable,
+    assert_variables_in_same_entity,
+    compute_aggregate,
+    compute_pivot_table,
+    compute_quantiles,
+    compute_winners_losers,
+    create_data_frame_by_entity,
+    get_words,
+    inflate,
+    init_entity_data,
+    init_simulation,
+    init_variable_in_entity,
+    new_from_tax_benefit_system,
+    print_memory_usage,
+    set_weight_variable_by_entity,
+    summarize_variable,
+)
+
+__all__ = [
+    "AbstractAggregates",
+    "Simulation",
+    "SimulationBuilder",
+    "SecretViolationError",
+    "adaptative_calculate_variable",
+    "assert_variables_in_same_entity",
+    "compute_aggregate",
+    "compute_pivot_table",
+    "compute_quantiles",
+    "compute_winners_losers",
+    "create_data_frame_by_entity",
+    "diagnose_variable_mismatch",
+    "get_words",
+    "inflate",
+    "init_entity_data",
+    "init_simulation",
+    "init_variable_in_entity",
+    "new_from_tax_benefit_system",
+    "print_memory_usage",
+    "set_weight_variable_by_entity",
+    "summarize_variable",
+]
diff --git a/openfisca_survey_manager/aggregates.py b/openfisca_survey_manager/policy/aggregates.py
similarity index 91%
rename from openfisca_survey_manager/aggregates.py
rename to openfisca_survey_manager/policy/aggregates.py
index 7d8059c4..7ed440fc 100644
--- a/openfisca_survey_manager/aggregates.py
+++ b/openfisca_survey_manager/policy/aggregates.py
@@ -1,8 +1,12 @@
+"""Aggregates computation for survey scenarios."""
+
+from __future__ import annotations
+
 import collections
 import logging
 from datetime import datetime
 from pathlib import Path
-from typing import Optional
+from typing import Any, Optional, Union
 
 import numpy as np
 import pandas as pd
@@ -29,11 +33,11 @@ class AbstractAggregates:
 
     def __init__(
         self,
-        survey_scenario=None,
-        absolute_minimal_detected_variation=0,
-        relative_minimal_detected_variation=0,
-        observations_threshold=0,
-    ):
+        survey_scenario: Any = None,
+        absolute_minimal_detected_variation: float = 0,
+        relative_minimal_detected_variation: float = 0,
+        observations_threshold: float = 0,
+    ) -> None:
         assert survey_scenario is not None
 
         self.period = survey_scenario.period
@@ -270,7 +274,7 @@ def compute_variable_aggregates(
 
         return variable_data_frame
 
-    def create_description(self):
+    def create_description(self) -> pd.DataFrame:
         """Create a description dataframe."""
         now = datetime.now()
         return pd.DataFrame(
@@ -284,14 +288,14 @@ def create_description(self):
 
     def to_csv(
         self,
-        path=None,
-        absolute=True,
-        amount=True,
-        beneficiaries=True,
-        default="actual",
-        relative=True,
-        target="reform",
-    ):
+        path: Optional[Union[Path, str]] = None,
+        absolute: bool = True,
+        amount: bool = True,
+        beneficiaries: bool = True,
+        default: str = "actual",
+        relative: bool = True,
+        target: str = "reform",
+    ) -> None:
         """Saves the table to csv."""
         assert path is not None
 
@@ -313,14 +317,14 @@ def to_csv(
 
     def to_excel(
         self,
-        path=None,
-        absolute=True,
-        amount=True,
-        beneficiaries=True,
-        default="actual",
-        relative=True,
-        target="reform",
-    ):
+        path: Optional[Union[Path, str]] = None,
+        absolute: bool = True,
+        amount: bool = True,
+        beneficiaries: bool = True,
+        default: str = "actual",
+        relative: bool = True,
+        target: str = "reform",
+    ) -> None:
         """Save the table to excel."""
         assert path is not None
 
@@ -346,14 +350,14 @@ def to_excel(
 
     def to_html(
         self,
-        path=None,
-        absolute=True,
-        amount=True,
-        beneficiaries=True,
-        default="actual",
-        relative=True,
-        target="reform",
-    ):
+        path: Optional[Union[Path, str]] = None,
+        absolute: bool = True,
+        amount: bool = True,
+        beneficiaries: bool = True,
+        default: str = "actual",
+        relative: bool = True,
+        target: str = "reform",
+    ) -> str:
         """Get or saves the table to html format."""
         df = self.get_data_frame(
             absolute=absolute,
@@ -377,14 +381,14 @@ def to_html(
 
     def to_markdown(
         self,
-        path=None,
-        absolute=True,
-        amount=True,
-        beneficiaries=True,
-        default="actual",
-        relative=True,
-        target="reform",
-    ):
+        path: Optional[Union[Path, str]] = None,
+        absolute: bool = True,
+        amount: bool = True,
+        beneficiaries: bool = True,
+        default: str = "actual",
+        relative: bool = True,
+        target: str = "reform",
+    ) -> str:
         """Get or saves the table to markdown format."""
         df = self.get_data_frame(
             absolute=absolute,
@@ -425,7 +429,7 @@ def get_data_frame(
         relative: bool = True,
         target: str = "reform",
         ignore_labels: bool = False,
-    ):
+    ) -> pd.DataFrame:
         assert target is None or target in ["reform", "baseline"]
 
         columns = self.labels.keys()
@@ -509,10 +513,14 @@ def get_data_frame(
 
         return df
 
-    def load_actual_data(self, period=None):
+    def load_actual_data(self, period: Any = None) -> None:
         pass
 
-    def compute_winners_losers(self, variable: str, filter_by: Optional[str] = None):
+    def compute_winners_losers(
+        self,
+        variable: str,
+        filter_by: Optional[str] = None,
+    ) -> pd.DataFrame:
         if "reform" not in self.simulations or "baseline" not in self.simulations:
             log.warning("Cannot compute winners and losers without a reform and a baseline simulation.")
             return pd.DataFrame()
@@ -546,7 +554,10 @@ def compute_winners_losers(self, variable: str, filter_by: Optional[str] = None)
         )
         return winners_losers_df
 
-    def compute_all_winners_losers(self, filter_by: Optional[str] = None):
+    def compute_all_winners_losers(
+        self,
+        filter_by: Optional[str] = None,
+    ) -> pd.DataFrame:
         all_winners_losers = pd.DataFrame()
         for variable in self.aggregate_variables:
             winners_losers = self.compute_winners_losers(variable, filter_by=filter_by)
diff --git a/openfisca_survey_manager/processing/weights/calibration.py b/openfisca_survey_manager/policy/calibration.py
similarity index 89%
rename from openfisca_survey_manager/processing/weights/calibration.py
rename to openfisca_survey_manager/policy/calibration.py
index acf6b212..a4ee84aa 100644
--- a/openfisca_survey_manager/processing/weights/calibration.py
+++ b/openfisca_survey_manager/policy/calibration.py
@@ -1,14 +1,18 @@
 """Calibration of survey weights (SurveyScenario)."""
 
+from __future__ import annotations
+
 import logging
 import re
+from typing import Any, Optional
 
 import numpy
+import numpy as np
 import pandas as pd
 from numpy import logical_not
 from openfisca_core.model_api import Enum
 
-from openfisca_survey_manager.processing.weights.calmar import calmar
+from openfisca_survey_manager.policy.calmar import calmar
 
 log = logging.getLogger(__name__)
 
@@ -16,32 +20,32 @@
 class Calibration:
     """An object to calibrate survey data of a SurveyScenario."""
 
-    filter_by = None
-    initial_entity_count = None
-    _initial_weight_name = None
-    initial_weight_by_entity = None
-    target_margins = None
-    margins_by_variable = None
-    parameters = None
-    period = None
-    simulation = None
-    target_entity_count = None
-    other_entity_count = None
-    target_entity = None
-    weight_name = None
-    entities = None
+    filter_by: Any = None
+    initial_entity_count: Optional[float] = None
+    _initial_weight_name: Optional[str] = None
+    initial_weight_by_entity: dict[str, Any]  # set in __init__
+    target_margins: Optional[dict[str, Any]] = None
+    margins_by_variable: Optional[dict[str, Any]] = None
+    parameters: Optional[dict[str, Any]] = None
+    period: Any = None
+    simulation: Any = None
+    target_entity_count: Optional[float] = None
+    other_entity_count: Optional[float] = None
+    target_entity: Optional[str] = None
+    weight_name: Optional[str] = None
+    entities: Optional[list[str]] = None
 
     def __init__(
         self,
-        simulation,
-        target_margins,
-        period,
-        target_entity_count=None,
-        other_entity_count=None,
-        parameters=None,
-        filter_by=None,
-        entity=None,
-    ):
+        simulation: Any,
+        target_margins: dict[str, Any],
+        period: Any,
+        target_entity_count: Optional[float] = None,
+        other_entity_count: Optional[float] = None,
+        parameters: Optional[dict[str, Any]] = None,
+        filter_by: Any = None,
+        entity: Optional[str] = None,
+    ) -> None:
         target_entity = entity
         self.parameters = parameters or {
             "use_proportions": True,
@@ -197,14 +201,14 @@ def _build_calmar_data(self) -> dict:
 
         return data
 
-    def calibrate(self, inplace=False):
+    def calibrate(self, inplace: bool = False) -> Optional[np.ndarray]:
         """Apply the calibrations by updating weights and margins.
 
         Args:
-            inplace (bool, optional): Whether to return the calibrated or to setthem inplace. Defaults to False.
+            inplace: Whether to apply in place (no return) or return calibrated weights.
 
         Returns:
-            numpy.array: calibrated weights
+            Calibrated weights array, or None if inplace=True.
         """
         assert self.margins_by_variable is not None, "Margins by variable should be set"
         margins_by_variable = self.margins_by_variable
@@ -230,7 +234,7 @@ def calibrate(self, inplace=False):
 
         return self.weight
 
-    def get_parameters(self) -> dict:
+    def get_parameters(self) -> dict[str, Any]:
         """Get the parameters.
 
         Returns:
@@ -252,12 +256,12 @@ def get_parameters(self) -> dict:
         p["initial_weight"] = self.weight_name + ""
         return p
 
-    def set_target_margin(self, variable, target):
+    def set_target_margin(self, variable: str, target: Any) -> None:
         """Set variable target margin.
 
         Args:
-          variable: Target variable
-          target: Target value
+            variable: Target variable name or expression.
+            target: Target value (scalar or dict of category -> value).
         """
         simulation = self.simulation
         period = self.period
@@ -293,13 +297,13 @@ def set_target_margin(self, variable, target):
         self.margins_by_variable[variable]["target"] = target_by_category or target
         self._update_margins()
 
-    def reset(self):
+    def reset(self) -> None:
         """Reset the calibration to its initial state."""
         simulation = self.simulation
         simulation.delete_arrays(self.weight_name, self.period)
         simulation.set_input(self.weight_name, self.period, numpy.array(self.initial_weight))
 
-    def set_calibrated_weights(self):
+    def set_calibrated_weights(self) -> None:
         """Modify the weights to use the calibrated weights."""
         period = self.period
         simulation = self.simulation
@@ -314,7 +318,7 @@ def set_calibrated_weights(self):
             elif weight_variable.formulas:
                 simulation.delete_arrays(weight_variable.name, period)
 
-    def summary(self):
+    def summary(self) -> pd.DataFrame:
         """Summarize margins."""
         margins_df = pd.DataFrame.from_dict(self.margins_by_variable).T
         margins_df.loc["entity_count", "actual"] = (self.weight * self.filter_by).sum()
@@ -322,7 +326,7 @@ def summary(self):
         margins_df.loc["entity_count", "target"] = self.target_entity_count
         return margins_df
 
-    def _update_margins(self):
+    def _update_margins(self) -> None:
         """Update margins."""
         for variable in self.margins_by_variable:
             simulation = self.simulation
@@ -381,17 +385,12 @@ def _update_margins(self):
                 }
             self.margins_by_variable[variable].update(margin_by_type)
 
-    def _update_weights(self, margins, parameters=None):
-        """Run calmar, stores new weights and returns adjusted margins.
-
-        Args:
-          margins: margins
-          parameters:  Parameters (Default value = {})
-
-        Returns:
-            dict: Updated margins
-
-        """
+    def _update_weights(
+        self,
+        margins: dict[str, Any],
+        parameters: Optional[dict[str, Any]] = None,
+    ) -> dict[str, Any]:
+        """Run calmar, store new weights and return adjusted margins."""
         if parameters is None:
             parameters = {}
 
diff --git a/openfisca_survey_manager/processing/weights/calmar.py b/openfisca_survey_manager/policy/calmar.py
similarity index 88%
rename from openfisca_survey_manager/processing/weights/calmar.py
rename to openfisca_survey_manager/policy/calmar.py
index dd4986d4..c54d0497 100644
--- a/openfisca_survey_manager/processing/weights/calmar.py
+++ b/openfisca_survey_manager/policy/calmar.py
@@ -1,8 +1,12 @@
 """CALMAR: Calibrates weights to satisfy margins constraints."""
 
+from __future__ import annotations
+
 import logging
 import operator
+from typing import Any, Optional
 
+import numpy as np
 import pandas as pd
 from numpy import array, dot, exp, float64, ones, sqrt, unique, zeros
 from numpy import log as ln
@@ -12,7 +16,7 @@
 log = logging.getLogger(__name__)
 
 
-def linear(u):
+def linear(u: np.ndarray) -> np.ndarray:
     """
 
     Args:
@@ -24,68 +28,27 @@ def linear(u):
     return 1 + u
 
 
-def linear_prime(u):
-    """
-
-    Args:
-      u:
-
-    Returns:
-
-    """
+def linear_prime(u: np.ndarray) -> np.ndarray:
+    """Derivative of linear (constant 1)."""
     return ones(u.shape, dtype=float)
 
 
-def raking_ratio(u):
-    """
-
-    Args:
-      u:
-
-    Returns:
-
-    """
+def raking_ratio(u: np.ndarray) -> np.ndarray:
+    """Raking ratio (exponential) calibration function."""
     return exp(u)
 
 
-def raking_ratio_prime(u):
-    """
-
-    Args:
-      u:
-
-    Returns:
-
-    """
+def raking_ratio_prime(u: np.ndarray) -> np.ndarray:
+    """Derivative of raking_ratio."""
     return exp(u)
 
 
-def logit(u, low, up):
-    """
-
-    Args:
-      u:
-      low:
-      up:
-
-    Returns:
-
-    """
+def logit(u: np.ndarray, low: float, up: float) -> np.ndarray:
     a = (up - low) / ((1 - low) * (up - 1))
     return (low * (up - 1) + up * (1 - low) * exp(a * u)) / (up - 1 + (1 - low) * exp(a * u))
 
 
-def logit_prime(u, low, up):
-    """
-
-    Args:
-      u:
-      low:
-      up:
-
-    Returns:
-
-    """
+def logit_prime(u: np.ndarray, low: float, up: float) -> np.ndarray:
     a = (up - low) / ((1 - low) * (up - 1))
     return (
         (a * up * (1 - low) * exp(a * u)) * (up - 1 + (1 - low) * exp(a * u))
@@ -93,12 +56,12 @@ def logit_prime(u, low, up):
     ) / (up - 1 + (1 - low) * exp(a * u)) ** 2
 
 
-def hyperbolic_sinus(u, alpha):
+def hyperbolic_sinus(u: np.ndarray, alpha: float) -> np.ndarray:
     logarithm = ln(2 * alpha * u + sqrt(4 * (alpha**2) * (u**2) + 1))
     return 0.5 * (logarithm / alpha + sqrt((logarithm / alpha) ** 2 + 4))
 
 
-def hyperbolic_sinus_prime(u, alpha):
+def hyperbolic_sinus_prime(u: np.ndarray, alpha: float) -> np.ndarray:
     square = sqrt(4 * (alpha**2) * (u**2) + 1)
     return 0.5 * (
         ((4 * (alpha**2) * u) / square + 2 * alpha) / (alpha * (square + 2 * alpha * u))
@@ -107,16 +70,8 @@ def hyperbolic_sinus_prime(u, alpha):
     )
 
 
-def build_dummies_dict(data):
-    """
-
-    Args:
-      data:
-
-    Returns:
-
-
-    """
+def build_dummies_dict(data: np.ndarray | pd.Series) -> dict[Any, np.ndarray | pd.Series]:
+    """Build a dict mapping each unique value to a boolean mask (data == value)."""
     unique_val_list = unique(data)
     output = {}
     for val in unique_val_list:
@@ -125,17 +80,17 @@ def build_dummies_dict(data):
 
 
 def calmar(
-    data_in,
-    margins: dict,
+    data_in: dict[str, Any],
+    margins: dict[str, Any],
     initial_weight: str,
-    method="linear",
-    lo=None,
-    up=None,
-    alpha=None,
+    method: str = "linear",
+    lo: Optional[float] = None,
+    up: Optional[float] = None,
+    alpha: Optional[float] = None,
     use_proportions: bool = False,
     xtol: float = 1.49012e-08,
     maxfev: int = 256,
-):
+) -> tuple[np.ndarray, np.ndarray, dict[str, Any]]:
     """Calibrates weights to satisfy margins constraints.
 
     Args:
@@ -235,20 +190,20 @@ def calmar(
         assert lo is not None, "When method == 'logit', a value < 1 for lo is mandatory"
         assert lo < 1, "lo should be < 1"
 
-        def f(x):
+        def f(x: np.ndarray) -> np.ndarray:
             return logit(x, lo, up)
 
-        def f_prime(x):
+        def f_prime(x: np.ndarray) -> np.ndarray:
             return logit_prime(x, lo, up)
 
     elif method == "hyperbolic sinus":
         assert alpha is not None, "When method == 'hyperbolic sinus', a value > 0 for alpha is mandatory"
         assert alpha > 0, "alpha should be > 0"
 
-        def f(x):
+        def f(x: np.ndarray) -> np.ndarray:
             return hyperbolic_sinus(x, alpha)
 
-        def f_prime(x):
+        def f_prime(x: np.ndarray) -> np.ndarray:
             return hyperbolic_sinus_prime(x, alpha)
 
     margins = margins.copy()
@@ -345,10 +300,10 @@ def f_prime(x):
         margins_dict[var] = val
 
     # Résolution des équations du premier ordre
-    def constraint(lambda_):
+    def constraint(lambda_: np.ndarray) -> np.ndarray:
         return dot(d * f(dot(x, lambda_)), x) - xmargins
 
-    def constraint_prime(lambda_):
+    def constraint_prime(lambda_: np.ndarray) -> np.ndarray:
         return dot(d * (x.T * f_prime(dot(x, lambda_))), x)
         # le jacobien ci-dessus est constraintprime = @(lambda) x*(d.*Fprime(x'*lambda)*x');
 
@@ -388,16 +343,13 @@ def constraint_prime(lambda_):
     return pondfin_out, lambdasol, margins_new_dict
 
 
-def check_calmar(margins, margins_new_dict=None):
-    """
-
-    Args:
-      margins:
-      margins_new_dict:  (Default value = None)
-
-    Returns:
-
-    """
+def check_calmar(
+    margins: dict[str, Any],
+    margins_new_dict: Optional[dict[str, Any]] = None,
+) -> None:
+    """Log relative difference between initial margins and calibrated margins."""
+    if margins_new_dict is None:
+        return
     for variable, margin in margins.items():
         if variable != "total_population":
             rel_diff = abs(margin - margins_new_dict[variable]) / abs(margin)
diff --git a/openfisca_survey_manager/coicop.py b/openfisca_survey_manager/policy/coicop.py
similarity index 85%
rename from openfisca_survey_manager/coicop.py
rename to openfisca_survey_manager/policy/coicop.py
index b30223d5..f6785ed5 100644
--- a/openfisca_survey_manager/coicop.py
+++ b/openfisca_survey_manager/policy/coicop.py
@@ -1,9 +1,14 @@
+"""COICOP nomenclature helpers."""
+
+from __future__ import annotations
+
 import logging
 from pathlib import Path
+from typing import Literal
 
 import pandas as pd
 
-from openfisca_survey_manager.paths import openfisca_survey_manager_location
+from openfisca_survey_manager.configuration.paths import openfisca_survey_manager_location
 
 log = logging.getLogger(__name__)
 
@@ -11,11 +16,18 @@
 legislation_directory = Path(openfisca_survey_manager_location) / "openfisca_survey_manager" / "assets"
 
 
-sub_levels = ["divisions", "groupes", "classes", "sous_classes", "postes"]
+sub_levels: tuple[str, ...] = ("divisions", "groupes", "classes", "sous_classes", "postes")
 divisions = [f"0{i}" for i in range(1, 10)] + ["11", "12"]
 
+CoicopLevel = Literal["divisions", "groupes", "classes", "sous_classes", "postes"]
+
 
-def build_coicop_level_nomenclature(level, year=2016, keep_code=False, to_csv=False):
+def build_coicop_level_nomenclature(
+    level: CoicopLevel,
+    year: int = 2016,
+    keep_code: bool = False,
+    to_csv: bool = False,
+) -> pd.DataFrame:
     assert level in sub_levels
     log.debug(f"Reading nomenclature coicop {year} source data for level {level}")
     try:
@@ -69,7 +81,7 @@ def build_coicop_level_nomenclature(level, year=2016, keep_code=False, to_csv=Fa
     return data_frame
 
 
-def build_raw_coicop_nomenclature(year=2016):
+def build_raw_coicop_nomenclature(year: int = 2016) -> pd.DataFrame:
     """Builds raw COICOP nomenclature from ecoicop levels"""
     coicop_nomenclature = None
 
diff --git a/openfisca_survey_manager/policy/legislation_asof.py b/openfisca_survey_manager/policy/legislation_asof.py
new file mode 100644
index 00000000..362fa6e4
--- /dev/null
+++ b/openfisca_survey_manager/policy/legislation_asof.py
@@ -0,0 +1,243 @@
+"""Shared helpers (no survey collection dependency) to avoid circular imports."""
+
+from __future__ import annotations
+
+import logging
+from typing import Any, Optional
+
+from openfisca_core import periods
+from openfisca_core.parameters import ParameterNode, Scale
+
+log = logging.getLogger(__name__)
+
+
+def do_nothing(*args: Any, **kwargs: Any) -> None:
+    return None
+
+
+def inflate_parameters(
+    parameters: ParameterNode | Scale | Any,
+    inflator: float,
+    base_year: int,
+    last_year: Optional[int] = None,
+    ignore_missing_units: bool = False,
+    start_instant: Optional[str] = None,
+    round_ndigits: int = 2,
+) -> None:
+    """
+    Inflate a Parameter node or a Parameter leaf for the years between base_year and last_year.
+
+    ::parameters:: a Parameter node or a Parameter leaf
+    ::inflator:: rate used to inflate the parameter. The rate is unique for all the years
+    ::base_year:: base year of the parameter
+    ::last_year:: last year of inflation
+    ::ignore_missing_units:: if True, a parameter leaf without unit in metadata will not be inflated
+    ::start_instant:: Instant of the year when the update should start, if None will be January 1st
+    ::round_ndigits:: Number of digits to keep in the rounded result
+    """
+    if (last_year is not None) and (last_year > base_year + 1):
+        for year in range(base_year + 1, last_year + 1):
+            inflate_parameters(
+                parameters,
+                inflator,
+                year - 1,
+                last_year=year,
+                ignore_missing_units=ignore_missing_units,
+                start_instant=start_instant,
+                round_ndigits=round_ndigits,
+            )
+    else:
+        if last_year is None:
+            last_year = base_year + 1
+
+        assert last_year == base_year + 1
+
+        if isinstance(parameters, ParameterNode):
+            for sub_parameter in parameters.children.values():
+                inflate_parameters(
+                    sub_parameter,
+                    inflator,
+                    base_year,
+                    last_year,
+                    ignore_missing_units=ignore_missing_units,
+                    start_instant=start_instant,
+                    round_ndigits=round_ndigits,
+                )
+        else:
+            acceptable_units = [
+                "rate_unit",
+                "threshold_unit",
+                "unit",
+            ]
+            if ignore_missing_units:
+                if not hasattr(parameters, "metadata"):
+                    return
+                if not bool(set(parameters.metadata.keys()) & set(acceptable_units)):
+                    return
+            assert hasattr(parameters, "metadata"), f"{parameters.name} doesn't have metadata"
+            unit_types = set(parameters.metadata.keys()).intersection(set(acceptable_units))
+            assert unit_types, (
+                f"No admissible unit in metadata for parameter {parameters.name}. You may consider using "
+                "the option 'ignore_missing_units' from the inflate_parameters() function."
+            )
+            if len(unit_types) > 1:
+                assert unit_types == {"threshold_unit", "rate_unit"}, (
+                    f"Too much admissible units in metadata for parameter {parameters.name}"
+                )
+            unit_by_type = {unit_type: parameters.metadata[unit_type] for unit_type in unit_types}
+            for unit_type in unit_by_type:
+                if parameters.metadata[unit_type].startswith("currency"):
+                    inflate_parameter_leaf(
+                        parameters,
+                        base_year,
+                        inflator,
+                        unit_type=unit_type,
+                        start_instant=start_instant,
+                        round_ndigits=round_ndigits,
+                    )
+
+
+def inflate_parameter_leaf(
+    sub_parameter: Any,
+    base_year: int,
+    inflator: float,
+    unit_type: str = "unit",
+    start_instant: Optional[str] = None,
+    round_ndigits: int = 2,
+) -> None:
+    """
+    Inflate a Parameter leaf according to unit type for the year after base_year.
+
+    ::sub_parameter:: a Parameter leaf
+    ::base_year:: base year of the parameter
+    ::inflator:: rate used to inflate the parameter
+    ::unit_type:: unit supposed by default. Other admissible unit types are threshold_unit and rate_unit
+    ::start_instant:: Instant of the year when the update should start, if None will be January 1st
+    ::round_ndigits:: Number of digits to keep in the rounded result
+    """
+    if isinstance(sub_parameter, Scale):
+        if unit_type == "threshold_unit":
+            for bracket in sub_parameter.brackets:
+                threshold = bracket.children["threshold"]
+                inflate_parameter_leaf(
+                    threshold, base_year, inflator, start_instant=start_instant, round_ndigits=round_ndigits
+                )
+            return
+    else:
+        kept_instants_str = [
+            parameter_at_instant.instant_str
+            for parameter_at_instant in sub_parameter.values_list
+            if periods.instant(parameter_at_instant.instant_str).year <= base_year
+        ]
+        if not kept_instants_str:
+            return
+
+        last_admissible_instant_str = max(kept_instants_str)
+        sub_parameter.update(start=last_admissible_instant_str, value=sub_parameter(last_admissible_instant_str))
+        if start_instant is not None:
+            assert periods.instant(start_instant).year == (base_year + 1), (
+                "Year of start_instant should be base_year + 1"
+            )
+            value = (
+                round(sub_parameter(f"{base_year}-12-31") * (1 + inflator), round_ndigits)
+                if sub_parameter(f"{base_year}-12-31") is not None
+                else None
+            )
+            sub_parameter.update(
+                start=start_instant,
+                value=value,
+            )
+        else:
+            restricted_to_base_year_value_list = [
+                parameter_at_instant
+                for parameter_at_instant in sub_parameter.values_list
+                if periods.instant(parameter_at_instant.instant_str).year == base_year
+            ]
+            if restricted_to_base_year_value_list:
+                for parameter_at_instant in reversed(restricted_to_base_year_value_list):
+                    if parameter_at_instant.instant_str.startswith(str(base_year)):
+                        value = (
+                            round(parameter_at_instant.value * (1 + inflator), round_ndigits)
+                            if parameter_at_instant.value is not None
+                            else None
+                        )
+                        sub_parameter.update(
+                            start=parameter_at_instant.instant_str.replace(str(base_year), str(base_year + 1)),
+                            value=value,
+                        )
+            else:
+                value = (
+                    round(sub_parameter(f"{base_year}-12-31") * (1 + inflator), round_ndigits)
+                    if sub_parameter(f"{base_year}-12-31") is not None
+                    else None
+                )
+                sub_parameter.update(
+                    start=f"{base_year + 1}-01-01",
+                    value=value,
+                )
+
+
+def asof(tax_benefit_system: Any, instant: str | periods.Instant) -> None:
+    parameters = tax_benefit_system.parameters
+    parameters_asof(parameters, instant)
+    variables_asof(tax_benefit_system, instant)
+
+
+def leaf_asof(sub_parameter: Any, instant: periods.Instant) -> None:
+    kept_instants_str = [
+        parameter_at_instant.instant_str
+        for parameter_at_instant in sub_parameter.values_list
+        if periods.instant(parameter_at_instant.instant_str) <= instant
+    ]
+    if not kept_instants_str:
+        sub_parameter.values_list = []
+        return
+
+    last_admissible_instant_str = max(kept_instants_str)
+    sub_parameter.update(start=last_admissible_instant_str, value=sub_parameter(last_admissible_instant_str))
+
+
+def parameters_asof(parameters: ParameterNode | Any, instant: str | periods.Instant) -> None:
+    if isinstance(instant, str):
+        instant = periods.instant(instant)
+    assert isinstance(instant, periods.Instant)
+
+    for sub_parameter in parameters.children.values():
+        if isinstance(sub_parameter, ParameterNode):
+            parameters_asof(sub_parameter, instant)
+        else:
+            if isinstance(sub_parameter, Scale):
+                for bracket in sub_parameter.brackets:
+                    threshold = bracket.children["threshold"]
+                    rate = bracket.children.get("rate")
+                    amount = bracket.children.get("amount")
+                    leaf_asof(threshold, instant)
+                    if rate:
+                        leaf_asof(rate, instant)
+                    if amount:
+                        leaf_asof(amount, instant)
+            else:
+                leaf_asof(sub_parameter, instant)
+
+
+def variables_asof(
+    tax_benefit_system: Any,
+    instant: str | periods.Instant,
+    variables_list: Optional[list[str]] = None,
+) -> None:
+    if isinstance(instant, str):
+        instant = periods.instant(instant)
+    assert isinstance(instant, periods.Instant)
+
+    if variables_list is None:
+        variables_list = tax_benefit_system.variables.keys()
+
+    for variable_name, variable in tax_benefit_system.variables.items():
+        if variable_name in variables_list:
+            formulas = variable.formulas
+            for instant_str in list(formulas.keys()):
+                if periods.instant(instant_str) > instant:
+                    del formulas[instant_str]
+
+            if variable.end is not None and periods.instant(variable.end) >= instant:
+                variable.end = None
diff --git a/openfisca_survey_manager/policy/matching.py b/openfisca_survey_manager/policy/matching.py
new file mode 100644
index 00000000..06f8e996
--- /dev/null
+++ b/openfisca_survey_manager/policy/matching.py
@@ -0,0 +1,380 @@
+"""Nearest-neighbor donor (NND) hot deck matching — pure Python or R (StatMatch)."""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import List, Optional, Union
+
+import numpy as np
+import pandas as pd
+
+from openfisca_survey_manager.configuration.paths import openfisca_survey_manager_location
+
+log = logging.getLogger(__name__)
+
+config_files_directory = Path(openfisca_survey_manager_location)
+
+
+def _normalize_list(
+    x: Optional[Union[str, List[str]]],
+    name: str = "variables",
+) -> Optional[list[str]]:
+    """Return a list of variable names from str or list."""
+    if x is None:
+        return None
+    if isinstance(x, str):
+        return [x]
+    return list(x)
+
+
+def _nnd_hotdeck_python(
+    receiver: pd.DataFrame,
+    donor: pd.DataFrame,
+    matching_variables: list[str],
+    donor_classes: list[str] | str | None = None,
+    dist_fun: str = "Manhattan",
+    random_state: int | None = None,
+) -> np.ndarray:
+    """
+    Nearest-neighbor donor matching in pure Python (pandas + numpy).
+
+    For each receiver row, finds the donor row that minimizes distance on
+    `matching_variables`. Optionally restricts to donors in the same
+    `donor_classes`. Ties are broken at random.
+
+    Parameters
+    ----------
+    receiver, donor : DataFrame
+        Recipient and donor datasets; must contain `matching_variables`
+        (and `donor_classes` if provided). Matching variables must be numeric
+        for Manhattan/Euclidean.
+    matching_variables : list of str
+        Column names used to compute distance.
+    donor_classes : str or list of str, optional
+        Columns defining donation classes; matching is done only within
+        the same class. Must not contain missing values.
+    dist_fun : str
+        "Manhattan" (default) or "Euclidean".
+    random_state : int, optional
+        Seed for breaking ties.
+
+    Returns
+    -------
+    mtc_ids : ndarray of int
+        Shape (len(receiver), 2): (receiver_index, donor_index) for each row.
+        Receiver index is 0..n_rec-1, donor index is 0..n_don-1.
+    """
+    rng = np.random.default_rng(random_state)
+    match_vars = _normalize_list(matching_variables)
+    don_class = _normalize_list(donor_classes) if donor_classes is not None else None
+
+    for col in match_vars:
+        if col not in receiver.columns or col not in donor.columns:
+            raise ValueError(f"Matching variable {col!r} missing in receiver or donor")
+    if don_class:
+        for col in don_class:
+            if col not in receiver.columns or col not in donor.columns:
+                raise ValueError(f"Donor class variable {col!r} missing in receiver or donor")
+
+    x_rec = receiver[match_vars].astype(float).values
+    x_don = donor[match_vars].astype(float).values
+    n_rec, n_don = len(receiver), len(donor)
+    if n_don == 0:
+        raise ValueError("Donor dataframe is empty")
+
+    if dist_fun == "Manhattan":
+
+        def dist_fn(donors: np.ndarray, rec_row: np.ndarray) -> np.ndarray:
+            return np.sum(np.abs(donors - rec_row), axis=1)
+
+    elif dist_fun == "Euclidean":
+
+        def dist_fn(donors: np.ndarray, rec_row: np.ndarray) -> np.ndarray:
+            return np.sqrt(np.sum((donors - rec_row) ** 2, axis=1))
+
+    else:
+        raise ValueError(f"dist_fun must be 'Manhattan' or 'Euclidean', got {dist_fun!r}")
+
+    if don_class is None:
+        # Global matching: for each receiver row, min distance over all donors
+        donor_ix = np.zeros(n_rec, dtype=np.intp)
+        for i in range(n_rec):
+            d = dist_fn(x_don, x_rec[i])
+            min_d = np.min(d)
+            candidates = np.where(d == min_d)[0]
+            donor_ix[i] = rng.choice(candidates)
+        mtc_ids = np.column_stack([np.arange(n_rec), donor_ix])
+        return mtc_ids
+
+    # Within-class matching: for each group, match receiver rows to donors in same group
+    rec_groups = receiver.groupby(don_class, sort=False)
+    don_groups = donor.groupby(don_class, sort=False)
+    donor_iloc = np.zeros(n_rec, dtype=np.intp)
+
+    for key, rec_grp in rec_groups:
+        try:
+            don_grp = don_groups.get_group(key)
+        except KeyError:
+            log.warning("No donors for class %s; receiver rows get donor 0", key)
+            continue
+        x_r = rec_grp[match_vars].astype(float).values
+        x_d = don_grp[match_vars].astype(float).values
+        n_r, n_d = len(rec_grp), len(don_grp)
+        if n_d == 0:
+            continue
+        # Receiver global ilocs for this group
+        rec_global_ilocs = receiver.index.get_indexer(rec_grp.index)
+        for j in range(n_r):
+            d = dist_fn(x_d, x_r[j])
+            min_d = np.min(d)
+            candidates = np.where(d == min_d)[0]
+            don_local = rng.choice(candidates)
+            don_global_iloc = donor.index.get_loc(don_grp.index[don_local])
+            donor_iloc[rec_global_ilocs[j]] = don_global_iloc
+    mtc_ids = np.column_stack([np.arange(n_rec), donor_iloc])
+    return mtc_ids
+
+
+def _create_fused_python(
+    receiver: pd.DataFrame,
+    donor: pd.DataFrame,
+    mtc_ids: np.ndarray,
+    z_variables: list[str],
+    dup_x: bool = False,
+    matching_variables: list[str] | None = None,
+) -> pd.DataFrame:
+    """
+    Build fused dataset: receiver plus z_variables from matched donors.
+
+    mtc_ids : shape (n_receiver, 2), second column is donor position (integer).
+    """
+    z_vars = _normalize_list(z_variables)
+    for col in z_vars:
+        if col not in donor.columns:
+            raise ValueError(f"z_variable {col!r} not in donor")
+    fused = receiver.copy()
+    don_pos = mtc_ids[:, 1]
+    for col in z_vars:
+        fused[col] = donor[col].iloc[don_pos].values
+    if dup_x and matching_variables:
+        match_vars = _normalize_list(matching_variables)
+        for col in match_vars:
+            if col in donor.columns:
+                fused[col + "_donor"] = donor[col].iloc[don_pos].values
+    return fused
+
+
+def nnd_hotdeck(
+    receiver: pd.DataFrame | None = None,
+    donor: pd.DataFrame | None = None,
+    matching_variables: str | list[str] | None = None,
+    z_variables: str | list[str] | None = None,
+    donor_classes: str | list[str] | None = None,
+    dist_fun: str = "Manhattan",
+    use_r: bool = False,
+    random_state: int | None = None,
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+    """
+    Nearest-neighbor donor (NND) hot deck: match each receiver row to a donor,
+    then fuse z_variables from donor into receiver.
+
+    By default uses a **pure Python** implementation (pandas + numpy).
+    Set `use_r=True` to use R's StatMatch via rpy2 (same API as before).
+
+    Parameters
+    ----------
+    receiver, donor : DataFrame
+        Recipient and donor datasets.
+    matching_variables : str or list of str
+        Columns used to compute distance (must be numeric for Manhattan/Euclidean).
+    z_variables : str or list of str
+        Donor columns to copy into the fused dataset.
+    donor_classes : str or list of str, optional
+        Match only within the same class (e.g. same sex).
+    dist_fun : str
+        "Manhattan" (default) or "Euclidean" (pure Python); R supports more.
+    use_r : bool
+        If True, use R StatMatch via rpy2; otherwise use pure Python.
+    random_state : int, optional
+        Seed for tie-breaking (pure Python only).
+
+    Returns
+    -------
+    fused_0, fused_1 : DataFrame
+        fused_0: receiver + z_variables from donor (no duplicate match vars).
+        fused_1: same with matching variables duplicated as _donor (if applicable).
+    """
+    assert receiver is not None and donor is not None
+    assert matching_variables is not None and z_variables is not None
+    match_vars = _normalize_list(matching_variables)
+    z_vars = _normalize_list(z_variables)
+
+    if use_r:
+        return _nnd_hotdeck_rpy2(
+            receiver=receiver,
+            donor=donor,
+            matching_variables=match_vars,
+            z_variables=z_vars,
+            donor_classes=donor_classes,
+        )
+
+    mtc_ids = _nnd_hotdeck_python(
+        receiver,
+        donor,
+        match_vars,
+        donor_classes=donor_classes,
+        dist_fun=dist_fun,
+        random_state=random_state,
+    )
+    fused_0 = _create_fused_python(receiver, donor, mtc_ids, z_vars, dup_x=False)
+    fused_1 = _create_fused_python(receiver, donor, mtc_ids, z_vars, dup_x=True, matching_variables=match_vars)
+    return fused_0, fused_1
+
+
+def _nnd_hotdeck_rpy2(
+    receiver: pd.DataFrame,
+    donor: pd.DataFrame,
+    matching_variables: list[str],
+    z_variables: list[str],
+    donor_classes: str | list[str] | None = None,
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+    """R (StatMatch) implementation via rpy2; same return as nnd_hotdeck."""
+    from rpy2.robjects import pandas2ri
+    from rpy2.robjects.packages import importr
+
+    pandas2ri.activate()
+    stat_match = importr("StatMatch")
+
+    if donor_classes is not None:
+        don_class = _normalize_list(donor_classes)
+        for col in don_class:
+            if col not in receiver.columns or col not in donor.columns:
+                raise ValueError(f"Donor class variable {col!r} missing")
+        out_nnd = stat_match.NND_hotdeck(
+            data_rec=receiver,
+            data_don=donor,
+            match_vars=pd.Series(matching_variables),
+            don_class=pd.Series(don_class),
+        )
+    else:
+        out_nnd = stat_match.NND_hotdeck(
+            data_rec=receiver,
+            data_don=donor,
+            match_vars=pd.Series(matching_variables),
+        )
+
+    fused_0 = pandas2ri.ri2py(
+        stat_match.create_fused(data_rec=receiver, data_don=donor, mtc_ids=out_nnd[0], z_vars=pd.Series(z_variables))
+    )
+    fused_1 = pandas2ri.ri2py(
+        stat_match.create_fused(
+            data_rec=receiver,
+            data_don=donor,
+            mtc_ids=out_nnd[0],
+            z_vars=pd.Series(z_variables),
+            dup_x=True,
+            match_vars=pd.Series(matching_variables),
+        )
+    )
+    return fused_0, fused_1
+
+
+def nnd_hotdeck_using_rpy2(
+    receiver: pd.DataFrame | None = None,
+    donor: pd.DataFrame | None = None,
+    matching_variables: str | list[str] | None = None,
+    z_variables: str | list[str] | None = None,
+    donor_classes: str | list[str] | None = None,
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+    """
+    NND hot deck via R (StatMatch). Prefer `nnd_hotdeck(..., use_r=True)`.
+    """
+    return nnd_hotdeck(
+        receiver=receiver,
+        donor=donor,
+        matching_variables=matching_variables,
+        z_variables=z_variables,
+        donor_classes=donor_classes,
+        use_r=True,
+    )
+
+
+def nnd_hotdeck_using_feather(
+    receiver: pd.DataFrame | None = None,
+    donor: pd.DataFrame | None = None,
+    matching_variables: str | list[str] | None = None,
+    z_variables: str | list[str] | None = None,
+) -> None:
+    """
+    Not working
+    """
+    import feather
+
+    assert receiver is not None and donor is not None
+    assert matching_variables is not None
+
+    temporary_directory_path = config_files_directory / "tmp"
+    assert temporary_directory_path.exists()
+    receiver_path = temporary_directory_path / "receiver.feather"
+    donor_path = temporary_directory_path / "donor.feather"
+    feather.write_dataframe(receiver, receiver_path)
+    feather.write_dataframe(donor, donor_path)
+    if isinstance(matching_variables, str):
+        match_vars = f'"{matching_variables}"'
+    elif len(matching_variables) == 1:
+        match_vars = f'"{matching_variables[0]}"'
+    else:
+        match_vars = '"{}"'.format("todo")
+
+    r_script = f"""
+rm(list=ls())
+gc()
+devtools::install_github("wesm/feather/R")
+library(feather)
+library(StatMatch)
+
+receiver <- read_feather({receiver_path})
+donor <- read_feather({donor_path})
+summary(receiver)
+summary(donor)
+
+# variables
+receiver = as.data.frame(receiver)
+donor = as.data.frame(donor)
+gc()
+match_vars = {match_vars}
+# don_class = c("sexe")
+out.nnd <- NND.hotdeck(
+  data.rec = receiver, data.don = donor, match.vars = match_vars
+  )
+
+# out.nndsummary(out.nnd$mtc.ids)
+# head(out.nnd$mtc.ids, 10)
+# head(receiver, 10)
+
+fused.nnd.m <- create.fused(
+    data.rec = receiver, data.don = donor,
+    mtc.ids = out.nnd$mtc.ids,
+    z.vars = "{z_variables}"
+    )
+summary(fused.nnd.m)
+"""
+    log.debug("%s", r_script)
+
+
+if __name__ == "__main__":
+    log.setLevel(logging.INFO)
+    # Minimal example: pure Python NND hot deck (no R required)
+    np.random.seed(42)
+    receiver = pd.DataFrame({"x": [1.0, 2.0, 3.0], "y": [10.0, 20.0, 30.0]})
+    donor = pd.DataFrame({"x": [1.1, 2.2, 2.9], "y": [10.5, 19.0, 31.0], "ident": [100, 200, 300]})
+    fused_0, fused_1 = nnd_hotdeck(
+        receiver=receiver,
+        donor=donor,
+        matching_variables=["x", "y"],
+        z_variables="ident",
+        random_state=42,
+    )
+    log.info("fused_0 (receiver + z from donor):\n%s", fused_0)
+    log.info("fused_1 (with _donor dup):\n%s", fused_1)
diff --git a/openfisca_survey_manager/scenarios/__init__.py b/openfisca_survey_manager/policy/py.typed
similarity index 100%
rename from openfisca_survey_manager/scenarios/__init__.py
rename to openfisca_survey_manager/policy/py.typed
diff --git a/openfisca_survey_manager/policy/scenarios/__init__.py b/openfisca_survey_manager/policy/scenarios/__init__.py
new file mode 100644
index 00000000..aa248f89
--- /dev/null
+++ b/openfisca_survey_manager/policy/scenarios/__init__.py
@@ -0,0 +1,4 @@
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.policy.scenarios.reform_scenario import ReformScenario
+
+__all__ = ["AbstractSurveyScenario", "ReformScenario"]
diff --git a/openfisca_survey_manager/scenarios/abstract_scenario.py b/openfisca_survey_manager/policy/scenarios/abstract_scenario.py
similarity index 87%
rename from openfisca_survey_manager/scenarios/abstract_scenario.py
rename to openfisca_survey_manager/policy/scenarios/abstract_scenario.py
index 57fe9493..6c5ef32e 100644
--- a/openfisca_survey_manager/scenarios/abstract_scenario.py
+++ b/openfisca_survey_manager/policy/scenarios/abstract_scenario.py
@@ -2,7 +2,7 @@
 
 import logging
 from pathlib import Path
-from typing import Optional, Union
+from typing import Any, Optional, Union
 
 import numpy as np
 import pandas as pd
@@ -11,10 +11,10 @@
 from openfisca_core.tools.simulation_dumper import dump_simulation, restore_simulation
 from openfisca_core.types import Array, Period, TaxBenefitSystem
 
-from openfisca_survey_manager.calibration import Calibration
+from openfisca_survey_manager.core.survey import Survey
 from openfisca_survey_manager.exceptions import SurveyManagerError
-from openfisca_survey_manager.simulations import Simulation
-from openfisca_survey_manager.surveys import Survey
+from openfisca_survey_manager.policy import Simulation
+from openfisca_survey_manager.policy.calibration import Calibration
 
 log = logging.getLogger(__name__)
 
@@ -44,11 +44,16 @@ class AbstractSurveyScenario:
     varying_variable = None
     weight_variable_by_entity = None
 
-    def build_input_data(self, **kwargs):
+    def build_input_data(self, **kwargs: Any) -> None:
         """Build input data."""
         raise NotImplementedError
 
-    def calculate_series(self, variable, period=None, simulation=None):
+    def calculate_series(
+        self,
+        variable: str,
+        period: Optional[Union[int, str, Period]] = None,
+        simulation: Optional[str] = None,
+    ) -> pd.Series:
         """Compute variable values for period for a given simulation.
 
         Args:
@@ -65,7 +70,12 @@ def calculate_series(self, variable, period=None, simulation=None):
             name=variable,
         )
 
-    def calculate_variable(self, variable, period=None, simulation=None):
+    def calculate_variable(
+        self,
+        variable: str,
+        period: Optional[Union[int, str, Period]] = None,
+        simulation: Optional[str] = None,
+    ) -> Array:
         """Compute variable values for period for a given simulation.
 
         Args:
@@ -303,7 +313,7 @@ def compute_marginal_tax_rate(
         else:
             target_variable_entity_key = variables[target_variable].entity.key
 
-            def cast_to_target_entity(simulation: Simulation):
+            def cast_to_target_entity(simulation: Simulation) -> np.ndarray:
                 population = simulation.populations[target_variable_entity_key]
                 df = (
                     pd.DataFrame(
@@ -337,21 +347,21 @@ def cast_to_target_entity(simulation: Simulation):
 
     def compute_pivot_table(
         self,
-        aggfunc="mean",
-        columns=None,
-        baseline_simulation=None,
-        filter_by=None,
-        index=None,
-        period=None,
-        simulation=None,
-        difference=False,
-        use_baseline_for_columns=None,
-        values=None,
-        missing_variable_default_value=np.nan,
-        concat_axis=None,
-        weighted=True,
-        alternative_weights=None,
-    ):
+        aggfunc: str = "mean",
+        columns: Optional[list[str]] = None,
+        baseline_simulation: Optional[str] = None,
+        filter_by: Optional[str] = None,
+        index: Optional[list[str]] = None,
+        period: Optional[Union[int, str, Period]] = None,
+        simulation: Optional[str] = None,
+        difference: bool = False,
+        use_baseline_for_columns: Optional[bool] = None,
+        values: Optional[list[str]] = None,
+        missing_variable_default_value: Any = np.nan,
+        concat_axis: Optional[int] = None,
+        weighted: bool = True,
+        alternative_weights: Optional[Union[str, int, float, Array]] = None,
+    ) -> pd.DataFrame:
         """Compute a pivot table of agregated values casted along specified index and columns.
 
         Args:
@@ -405,17 +415,17 @@ def compute_pivot_table(
 
     def compute_winners_losers(
         self,
-        variable,
-        simulation,
-        baseline_simulation=None,
-        filter_by=None,
-        period=None,
-        absolute_minimal_detected_variation=0,
-        relative_minimal_detected_variation=0.01,
-        observations_threshold=None,
-        weighted=True,
-        alternative_weights=None,
-    ):
+        variable: str,
+        simulation: str,
+        baseline_simulation: Optional[str] = None,
+        filter_by: Optional[str] = None,
+        period: Optional[Union[int, str, Period]] = None,
+        absolute_minimal_detected_variation: float = 0,
+        relative_minimal_detected_variation: float = 0.01,
+        observations_threshold: Optional[int] = None,
+        weighted: bool = True,
+        alternative_weights: Optional[Union[str, int, float, Array]] = None,
+    ) -> dict[str, Union[int, float]]:
         simulation = self.simulations[simulation]
         if baseline_simulation:
             baseline_simulation = self.simulations[baseline_simulation]
@@ -434,8 +444,15 @@ def compute_winners_losers(
         )
 
     def create_data_frame_by_entity(
-        self, variables=None, expressions=None, filter_by=None, index=False, period=None, simulation=None, merge=False
-    ):
+        self,
+        variables: Optional[list[str]] = None,
+        expressions: Optional[list[str]] = None,
+        filter_by: Optional[str] = None,
+        index: bool = False,
+        period: Optional[Union[int, str, Period]] = None,
+        simulation: Optional[str] = None,
+        merge: bool = False,
+    ) -> Union[pd.DataFrame, dict[str, pd.DataFrame]]:
         """Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe).
 
         Args:
@@ -466,7 +483,11 @@ def create_data_frame_by_entity(
             merge=merge,
         )
 
-    def custom_input_data_frame(self, input_data_frame, **kwargs):
+    def custom_input_data_frame(
+        self,
+        input_data_frame: pd.DataFrame,
+        **kwargs: Any,
+    ) -> None:
         """Customize input data frame.
 
         Args:
@@ -475,7 +496,12 @@ def custom_input_data_frame(self, input_data_frame, **kwargs):
         """
         pass
 
-    def dump_data_frame_by_entity(self, variables=None, survey_collection=None, survey_name=None):
+    def dump_data_frame_by_entity(
+        self,
+        variables: Optional[list[str]] = None,
+        survey_collection: Optional[Any] = None,
+        survey_name: Optional[str] = None,
+    ) -> None:
         assert survey_collection is not None
         assert survey_name is not None
         assert variables is not None
@@ -486,7 +512,7 @@ def dump_data_frame_by_entity(self, variables=None, survey_collection=None, surv
             survey_collection.surveys.append(survey)
             survey_collection.dump(collection="openfisca")
 
-    def dump_simulations(self, directory: str):
+    def dump_simulations(self, directory: str) -> None:
         """
         Dump simulations.
 
@@ -504,7 +530,7 @@ def dump_simulations(self, directory: str):
             simulation = next(iter(self.simulations.values()))
             dump_simulation(simulation, directory)
 
-    def generate_performance_data(self, output_dir: str):
+    def generate_performance_data(self, output_dir: str) -> None:
         if not self.trace:
             raise SurveyManagerError("Method generate_performance_data cannot be used if trace hasn't been activated.")
 
@@ -517,7 +543,12 @@ def generate_performance_data(self, output_dir: str):
             simulation.tracer.generate_performance_graph(simulation_dir)
             simulation.tracer.generate_performance_tables(simulation_dir)
 
-    def inflate(self, inflator_by_variable=None, period=None, target_by_variable=None):
+    def inflate(
+        self,
+        inflator_by_variable: Optional[dict[str, float]] = None,
+        period: Optional[Union[int, str, Period]] = None,
+        target_by_variable: Optional[dict[str, float]] = None,
+    ) -> None:
         assert inflator_by_variable or target_by_variable
         assert period is not None
         inflator_by_variable = {} if inflator_by_variable is None else inflator_by_variable
@@ -530,14 +561,14 @@ def inflate(self, inflator_by_variable=None, period=None, target_by_variable=Non
 
     def init_from_data(
         self,
-        calibration_kwargs=None,
-        inflation_kwargs=None,
-        rebuild_input_data=False,
-        rebuild_kwargs=None,
-        data=None,
-        memory_config=None,
-        use_marginal_tax_rate=False,
-    ):
+        calibration_kwargs: Optional[dict[str, Any]] = None,
+        inflation_kwargs: Optional[dict[str, Any]] = None,
+        rebuild_input_data: bool = False,
+        rebuild_kwargs: Optional[dict[str, Any]] = None,
+        data: Optional[dict[str, Any]] = None,
+        memory_config: Optional[Any] = None,
+        use_marginal_tax_rate: bool = False,
+    ) -> None:
         """Initialise a survey scenario from data.
 
         Args:
@@ -601,8 +632,14 @@ def init_from_data(
             self.inflate(**inflation_kwargs)
 
     def new_simulation(
-        self, simulation_name, debug=False, trace=False, data=None, memory_config=None, marginal_tax_rate_only=False
-    ):
+        self,
+        simulation_name: str,
+        debug: bool = False,
+        trace: bool = False,
+        data: Optional[dict[str, Any]] = None,
+        memory_config: Optional[Any] = None,
+        marginal_tax_rate_only: bool = False,
+    ) -> Simulation:
         tax_benefit_system = self.tax_benefit_systems[simulation_name]
         assert tax_benefit_system is not None
 
@@ -643,13 +680,13 @@ def new_simulation(
 
         return simulation
 
-    def memory_usage(self):
+    def memory_usage(self) -> None:
         """Log memory usage."""
         for simulation_name, simulation in self.simulations.items():
             log.info("simulation: %s", simulation_name)
             simulation.print_memory_usage()
 
-    def neutralize_variables(self, tax_benefit_system):
+    def neutralize_variables(self, tax_benefit_system: TaxBenefitSystem) -> None:
         """Neutralizes input variables not in input dataframe and keep some crucial variables.
 
         Args:
@@ -668,7 +705,7 @@ def neutralize_variables(self, tax_benefit_system):
 
             tax_benefit_system.neutralize_variable(variable_name)
 
-    def restore_simulations(self, directory, **kwargs):
+    def restore_simulations(self, directory: Union[str, Path], **kwargs: Any) -> None:
         """Restores SurveyScenario's simulations.
 
         Args:
@@ -690,7 +727,7 @@ def restore_simulations(self, directory, **kwargs):
             simulation.id_variable_by_entity_key = self.id_variable_by_entity_key
             self.simulations["unique_simulation"] = simulation
 
-    def set_input_data_frame(self, input_data_frame):
+    def set_input_data_frame(self, input_data_frame: pd.DataFrame) -> None:
         """Set the input dataframe.
 
         Args:
@@ -713,7 +750,10 @@ def set_tax_benefit_systems(self, tax_benefit_systems: dict[str, TaxBenefitSyste
         #
         self.tax_benefit_systems = tax_benefit_systems
 
-    def set_weight_variable_by_entity(self, weight_variable_by_entity=None):
+    def set_weight_variable_by_entity(
+        self,
+        weight_variable_by_entity: Optional[dict[str, str]] = None,
+    ) -> None:
         if weight_variable_by_entity is not None:
             self.weight_variable_by_entity = weight_variable_by_entity
 
@@ -721,7 +761,12 @@ def set_weight_variable_by_entity(self, weight_variable_by_entity=None):
             for simulation in self.simulations.values():
                 simulation.set_weight_variable_by_entity(self.weight_variable_by_entity)
 
-    def summarize_variable(self, variable=None, weighted=False, force_compute=False):
+    def summarize_variable(
+        self,
+        variable: Optional[str] = None,
+        weighted: bool = False,
+        force_compute: bool = False,
+    ) -> None:
         """Log a summary of a variable including its memory usage for all the simulations.
 
         Args:
@@ -740,12 +785,20 @@ def summarize_variable(self, variable=None, weighted=False, force_compute=False)
         for _simulation_name, simulation in self.simulations.items():
             simulation.summarize_variable(variable, weighted, force_compute)
 
-    def _apply_modification(self, simulation, period):
+    def _apply_modification(
+        self,
+        simulation: Simulation,
+        period: Union[int, str, Period],
+    ) -> None:
         period = periods.period(period)
         varying_variable = self.varying_variable
         definition_period = simulation.tax_benefit_system.variables[varying_variable].definition_period
 
-        def set_variable(varying_variable, varying_variable_value, period_):
+        def set_variable(
+            varying_variable: str,
+            varying_variable_value: np.ndarray,
+            period_: Period,
+        ) -> None:
             delta = self.variation_factor * varying_variable_value
             new_variable_value = varying_variable_value + delta
             simulation.delete_arrays(varying_variable, period_)
diff --git a/openfisca_survey_manager/scenarios/reform_scenario.py b/openfisca_survey_manager/policy/scenarios/reform_scenario.py
similarity index 82%
rename from openfisca_survey_manager/scenarios/reform_scenario.py
rename to openfisca_survey_manager/policy/scenarios/reform_scenario.py
index 5c4673b2..cd056a56 100644
--- a/openfisca_survey_manager/scenarios/reform_scenario.py
+++ b/openfisca_survey_manager/policy/scenarios/reform_scenario.py
@@ -1,14 +1,14 @@
 """Abstract survey scenario definition."""
 
 import logging
-from typing import Optional, Union
+from typing import Any, Optional, Union
 
 import numpy as np
 import pandas as pd
 from openfisca_core.types import Array, Period
 
-from openfisca_survey_manager.scenarios.abstract_scenario import AbstractSurveyScenario
-from openfisca_survey_manager.simulations import Simulation
+from openfisca_survey_manager.policy import Simulation
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
 
 log = logging.getLogger(__name__)
 
@@ -16,7 +16,7 @@
 class ReformScenario(AbstractSurveyScenario):
     """Reform survey scenario."""
 
-    def _get_simulation(self, use_baseline: bool = False):
+    def _get_simulation(self, use_baseline: bool = False) -> Simulation:
         """
         Get relevant simulation
 
@@ -32,11 +32,16 @@ def _get_simulation(self, use_baseline: bool = False):
         assert simulation is not None, f"{simulation_name} does not exist"
         return simulation
 
-    def build_input_data(self, **kwargs):
+    def build_input_data(self, **kwargs: Any) -> None:
         """Build input data."""
         raise NotImplementedError
 
-    def calculate_series(self, variable, period=None, use_baseline=False):
+    def calculate_series(
+        self,
+        variable: str,
+        period: Optional[Union[int, str, Period]] = None,
+        use_baseline: bool = False,
+    ) -> pd.Series:
         """Compute variable values for period and baseline or reform tax benefit and system.
 
         Args:
@@ -53,7 +58,12 @@ def calculate_series(self, variable, period=None, use_baseline=False):
             name=variable,
         )
 
-    def calculate_variable(self, variable, period=None, use_baseline=False):
+    def calculate_variable(
+        self,
+        variable: str,
+        period: Optional[Union[int, str, Period]] = None,
+        use_baseline: bool = False,
+    ) -> Array:
         """Compute variable values for period and baseline or reform tax benefit and system.
 
         Args:
@@ -195,20 +205,20 @@ def compute_marginal_tax_rate(
 
     def compute_pivot_table(
         self,
-        aggfunc="mean",
-        columns=None,
-        difference=False,
-        filter_by=None,
-        index=None,
-        period=None,
-        use_baseline=False,
-        use_baseline_for_columns=None,
-        values=None,
-        missing_variable_default_value=np.nan,
-        concat_axis=None,
-        weighted=True,
-        alternative_weights=None,
-    ):
+        aggfunc: str = "mean",
+        columns: Optional[list[str]] = None,
+        difference: bool = False,
+        filter_by: Optional[str] = None,
+        index: Optional[list[str]] = None,
+        period: Optional[Union[int, str, Period]] = None,
+        use_baseline: bool = False,
+        use_baseline_for_columns: Optional[bool] = None,
+        values: Optional[list[str]] = None,
+        missing_variable_default_value: Any = np.nan,
+        concat_axis: Optional[int] = None,
+        weighted: bool = True,
+        alternative_weights: Optional[Union[str, int, float, Array]] = None,
+    ) -> pd.DataFrame:
         filtering_variable_by_entity = self.filtering_variable_by_entity
 
         return Simulation.compute_pivot_table(
@@ -231,15 +241,15 @@ def compute_pivot_table(
 
     def compute_winners_losers(
         self,
-        variable=None,
-        filter_by=None,
-        period=None,
-        absolute_minimal_detected_variation=0,
-        relative_minimal_detected_variation=0.01,
-        observations_threshold=None,
-        weighted=True,
-        alternative_weights=None,
-    ):
+        variable: Optional[str] = None,
+        filter_by: Optional[str] = None,
+        period: Optional[Union[int, str, Period]] = None,
+        absolute_minimal_detected_variation: float = 0,
+        relative_minimal_detected_variation: float = 0.01,
+        observations_threshold: Optional[int] = None,
+        weighted: bool = True,
+        alternative_weights: Optional[Union[str, int, float, Array]] = None,
+    ) -> dict[str, Union[int, float]]:
         return super().compute_winners_losers(
             simulation="reform",
             baseline_simulation="baseline",
@@ -259,14 +269,14 @@ def compute_winners_losers(
 
     def create_data_frame_by_entity(
         self,
-        variables=None,
-        expressions=None,
-        filter_by=None,
-        index=False,
-        period=None,
-        use_baseline=False,
-        merge=False,
-    ):
+        variables: Optional[list[str]] = None,
+        expressions: Optional[list[str]] = None,
+        filter_by: Optional[str] = None,
+        index: bool = False,
+        period: Optional[Union[int, str, Period]] = None,
+        use_baseline: bool = False,
+        merge: bool = False,
+    ) -> Union[pd.DataFrame, dict[str, pd.DataFrame]]:
         """Create dataframe(s) of computed variable for every entity (eventually merged in a unique dataframe).
 
         Args:
diff --git a/openfisca_survey_manager/simulation_builder.py b/openfisca_survey_manager/policy/simulation_builder.py
similarity index 90%
rename from openfisca_survey_manager/simulation_builder.py
rename to openfisca_survey_manager/policy/simulation_builder.py
index 1feaad1f..9b9594fb 100644
--- a/openfisca_survey_manager/simulation_builder.py
+++ b/openfisca_survey_manager/policy/simulation_builder.py
@@ -1,7 +1,14 @@
+"""Simulation builder extensions for survey manager."""
+
+from __future__ import annotations
+
 import logging
+from typing import Any, Optional
 
+import pandas as pd
 from openfisca_core.model_api import MONTH, YEAR
 from openfisca_core.simulations.simulation_builder import SimulationBuilder
+from openfisca_core.types import TaxBenefitSystem
 
 from openfisca_survey_manager.exceptions import SurveyManagerError
 
@@ -17,7 +24,10 @@
 # Helpers
 
 
-def diagnose_variable_mismatch(used_as_input_variables, input_data_frame):
+def diagnose_variable_mismatch(
+    used_as_input_variables: Optional[list[str]],
+    input_data_frame: pd.DataFrame,
+) -> None:
     """Diagnose variables mismatch.
 
     Args:
@@ -41,7 +51,7 @@ def diagnose_variable_mismatch(used_as_input_variables, input_data_frame):
 # SimulationBuilder monkey-patched methods
 
 
-def _set_id_variable_by_entity_key(builder) -> dict[str, str]:
+def _set_id_variable_by_entity_key(builder: SimulationBuilder) -> dict[str, str]:
     """Identify and sets the correct ids for the different entities."""
     if builder.id_variable_by_entity_key is None:
         log.debug("Use default id_variable names")
@@ -52,7 +62,7 @@ def _set_id_variable_by_entity_key(builder) -> dict[str, str]:
     return builder.id_variable_by_entity_key
 
 
-def _set_role_variable_by_entity_key(builder) -> dict[str, str]:
+def _set_role_variable_by_entity_key(builder: SimulationBuilder) -> dict[str, str]:
     """Identify and sets the correct roles for the different entities."""
     if builder.role_variable_by_entity_key is None:
         builder.role_variable_by_entity_key = {
@@ -62,7 +72,7 @@ def _set_role_variable_by_entity_key(builder) -> dict[str, str]:
     return builder.role_variable_by_entity_key
 
 
-def _set_used_as_input_variables_by_entity(builder) -> dict[str, list[str]]:
+def _set_used_as_input_variables_by_entity(builder: SimulationBuilder) -> Optional[dict[str, list[str]]]:
     """Identify and sets the correct input variables for the different entities."""
     if builder.used_as_input_variables_by_entity is not None:
         return
@@ -88,7 +98,11 @@ def _set_used_as_input_variables_by_entity(builder) -> dict[str, list[str]]:
     return builder.used_as_input_variables_by_entity
 
 
-def filter_input_variables(builder, input_data_frame, tax_benefit_system):
+def filter_input_variables(
+    builder: SimulationBuilder,
+    input_data_frame: pd.DataFrame,
+    tax_benefit_system: TaxBenefitSystem,
+) -> pd.DataFrame:
     """Filter the input data frame from variables that won't be used or are set to be computed.
 
     Args:
@@ -152,7 +166,11 @@ def filter_input_variables(builder, input_data_frame, tax_benefit_system):
     return input_data_frame
 
 
-def init_all_entities(builder, input_data_frame, period=None):
+def init_all_entities(
+    builder: SimulationBuilder,
+    input_data_frame: pd.DataFrame,
+    period: Any = None,
+) -> Any:
     assert period is not None
     log.debug(f"Initialasing simulation using input_data_frame for period {period}")
     builder._set_id_variable_by_entity_key()
@@ -182,7 +200,11 @@ def init_all_entities(builder, input_data_frame, period=None):
     return simulation
 
 
-def init_entity_structure(builder, entity, input_data_frame):
+def init_entity_structure(
+    builder: SimulationBuilder,
+    entity: Any,
+    input_data_frame: pd.DataFrame,
+) -> None:
     """Initialize sthe simulation with tax_benefit_system entities and input_data_frame.
 
     Args:
@@ -225,7 +247,11 @@ def init_entity_structure(builder, entity, input_data_frame):
             )
 
 
-def init_simulation_with_data_frame(builder, input_data_frame, period):
+def init_simulation_with_data_frame(
+    builder: SimulationBuilder,
+    input_data_frame: pd.DataFrame,
+    period: Any,
+) -> Any:
     """Initialize the simulation period with current input_data_frame for an entity if specified."""
     used_as_input_variables = builder.used_as_input_variables
     id_variable_by_entity_key = builder.id_variable_by_entity_key
diff --git a/openfisca_survey_manager/simulations.py b/openfisca_survey_manager/policy/simulations.py
similarity index 97%
rename from openfisca_survey_manager/simulations.py
rename to openfisca_survey_manager/policy/simulations.py
index 090ce1b8..fb9f3ffc 100644
--- a/openfisca_survey_manager/simulations.py
+++ b/openfisca_survey_manager/policy/simulations.py
@@ -17,11 +17,14 @@
 from openfisca_core.types import Array, Period, TaxBenefitSystem
 from openfisca_core.types import CoreEntity as Entity
 
+from openfisca_survey_manager.core.dataset import SurveyCollection, load_table
 from openfisca_survey_manager.exceptions import SurveyManagerError
-from openfisca_survey_manager.simulation_builder import SimulationBuilder, diagnose_variable_mismatch
-from openfisca_survey_manager.statshelpers import mark_weighted_percentiles
-from openfisca_survey_manager.survey_collections import SurveyCollection
-from openfisca_survey_manager.utils import do_nothing, load_table
+from openfisca_survey_manager.policy.legislation_asof import do_nothing
+from openfisca_survey_manager.policy.simulation_builder import (
+    SimulationBuilder,
+    diagnose_variable_mismatch,
+)
+from openfisca_survey_manager.policy.statshelpers import mark_weighted_percentiles
 
 log = logging.getLogger(__name__)
 
@@ -29,7 +32,10 @@
 # Helpers
 
 
-def assert_variables_in_same_entity(tax_benefit_system: TaxBenefitSystem, variables: list):
+def assert_variables_in_same_entity(
+    tax_benefit_system: TaxBenefitSystem,
+    variables: list[str],
+) -> str:
     """
     Assert that variables are in the same entity.
 
@@ -52,7 +58,7 @@ def assert_variables_in_same_entity(tax_benefit_system: TaxBenefitSystem, variab
     return entity.key
 
 
-def get_words(text: str):
+def get_words(text: str) -> list[str]:
     return re.compile("[A-Za-z_]+[A-Za-z0-9_]*").findall(text)
 
 
@@ -113,7 +119,7 @@ def compute_aggregate(
     missing_variable_default_value: Any = np.nan,
     weighted: bool = True,
     alternative_weights: Optional[Union[str, int, float, Array]] = None,
-    filtering_variable_by_entity: Optional[dict] = None,
+    filtering_variable_by_entity: Optional[dict[str, str]] = None,
 ) -> Optional[Union[float, int]]:
     """
     Compute aggregate of a variable.
@@ -244,7 +250,7 @@ def compute_quantiles(
     filter_by: Optional[str] = None,
     weighted: bool = True,
     alternative_weights: Optional[Union[str, int, float, Array]] = None,
-    filtering_variable_by_entity: Optional[dict] = None,
+    filtering_variable_by_entity: Optional[dict[str, str]] = None,
 ) -> list[float]:
     """
     Compute quantiles of a variable.
@@ -296,8 +302,8 @@ def compute_quantiles(
 
 
 def compute_pivot_table(
-    simulation: Simulation = None,
-    baseline_simulation: Simulation = None,
+    simulation: Optional[Simulation] = None,
+    baseline_simulation: Optional[Simulation] = None,
     aggfunc: str = "mean",
     columns: Optional[list[str]] = None,
     difference: bool = False,
@@ -310,8 +316,8 @@ def compute_pivot_table(
     concat_axis: Optional[int] = None,
     weighted: bool = True,
     alternative_weights: Optional[Union[str, int, float, Array]] = None,
-    filtering_variable_by_entity: Optional[dict] = None,
-):
+    filtering_variable_by_entity: Optional[dict[str, str]] = None,
+) -> pd.DataFrame:
     """
     Compute pivot table.
 
@@ -399,7 +405,7 @@ def compute_pivot_table(
                 variables.add(weight_variable)
 
             else:
-                log.warn(
+                log.warning(
                     f"There is no weight variable for entity {entity_key} nor alternative weights. "
                     "Switch to unweighted"
                 )
@@ -687,7 +693,7 @@ def compute_winners_losers(
     observations_threshold: Optional[int] = None,
     weighted: bool = True,
     alternative_weights: Optional[Union[str, int, float, Array]] = None,
-    filtering_variable_by_entity: Optional[dict] = None,
+    filtering_variable_by_entity: Optional[dict[str, str]] = None,
 ) -> dict[str, Union[int, float]]:
     """
     Compute the number of winners and losers for a given variable.
@@ -744,7 +750,7 @@ def compute_winners_losers(
             weight_variable = weight_variable_by_entity[entity_key]
             weight = baseline_simulation.calculate(weight_variable, period=period)
         else:
-            log.warn(
+            log.warning(
                 f"There is no weight variable for entity {entity_key} nor alternative weights. Switch to unweighted"
             )
 
@@ -803,8 +809,8 @@ def init_entity_data(
     entity: Entity,
     filtered_input_data_frame: pd.DataFrame,
     period: Period,
-    used_as_input_variables_by_entity: dict,
-):
+    used_as_input_variables_by_entity: dict[str, list[str]],
+) -> None:
     """
     Initialize entity in simulation at some period with input provided by a dataframe.
 
@@ -837,9 +843,9 @@ def init_entity_data(
 
 def inflate(
     simulation: Simulation,
-    inflator_by_variable: Optional[dict] = None,
+    inflator_by_variable: Optional[dict[str, float]] = None,
     period: Optional[Union[int, str, Period]] = None,
-    target_by_variable: Optional[dict] = None,
+    target_by_variable: Optional[dict[str, float]] = None,
 ) -> None:
     tax_benefit_system = simulation.tax_benefit_system
     for variable_name in set(inflator_by_variable.keys()).union(set(target_by_variable.keys())):
@@ -874,12 +880,12 @@ def inflate(
 def _load_table_for_survey(
     config_files_directory: str,
     collection: str,
-    survey: str,
-    table: str,
+    survey: Optional[str] = None,
+    table: Optional[str] = None,
     batch_size: Optional[int] = None,
     batch_index: Optional[int] = None,
     filter_by: Optional[str] = None,
-):
+) -> pd.DataFrame:
     if survey is not None:
         input_data_frame = load_table(
             config_files_directory=config_files_directory,
@@ -905,15 +911,15 @@ def _load_table_for_survey(
 
 def _input_data_table_by_entity_by_period_monolithic(
     tax_benefit_system: TaxBenefitSystem,
-    simulation: Simulation,
+    simulation: Optional[Simulation],
     period: Period,
-    input_data_table_by_entity: dict,
+    input_data_table_by_entity: dict[str, Any],
     builder: SimulationBuilder,
-    custom_input_data_frame: Callable,
+    custom_input_data_frame: Callable[..., Any],
     config_files_directory: str,
     collection: str,
     survey: Optional[str] = None,
-):
+) -> Simulation:
     """
     Initialize simulation with input data from a table for each entity and period.
     """
@@ -961,15 +967,15 @@ def _input_data_table_by_entity_by_period_monolithic(
 
 def _input_data_table_by_entity_by_period_batch(
     tax_benefit_system: TaxBenefitSystem,
-    simulation: Simulation,
+    simulation: Optional[Simulation],
     period: Period,
-    input_data_table_by_entity: dict,
+    input_data_table_by_entity: dict[str, Any],
     builder: SimulationBuilder,
-    custom_input_data_frame: Callable,
+    custom_input_data_frame: Callable[..., Any],
     config_files_directory: str,
     collection: str,
     survey: Optional[str] = None,
-):
+) -> Simulation:
     """
     Initialize simulation with input data from a table for each entity and period.
     """
@@ -1047,8 +1053,8 @@ def _input_data_table_by_entity_by_period_batch(
 def init_simulation(
     tax_benefit_system: TaxBenefitSystem,
     period: Union[str, int, Period],
-    data: dict,
-):
+    data: dict[str, Any],
+) -> Simulation:
     builder = SimulationBuilder()
     builder.create_entities(tax_benefit_system)
 
@@ -1186,7 +1192,7 @@ def init_variable_in_entity(
     variable_name: str,
     series: pd.Series,
     period: Period,
-):
+) -> None:
     variable = simulation.tax_benefit_system.variables[variable_name]
 
     # np.issubdtype cannot handles categorical variables
@@ -1243,7 +1249,7 @@ def init_variable_in_entity(
     if variable.definition_period == YEAR and period.unit == MONTH:
         # Some variables defined for a year are present in month/quarter dataframes
         # Cleaning the dataframe would probably be better in the long run
-        log.warn(
+        log.warning(
             f"Trying to set a monthly value for variable {variable_name}, which is defined on a year. "
             "The  montly values you provided will be summed."
         )
@@ -1263,7 +1269,7 @@ def new_from_tax_benefit_system(
     debug: bool = False,
     trace: bool = False,
     data: Optional[dict] = None,
-    memory_config: MemoryConfig = None,
+    memory_config: Optional[MemoryConfig] = None,
     period: Optional[Union[int, str, Period]] = None,
     custom_initialize: Optional[Callable] = None,
 ) -> Simulation:
@@ -1295,7 +1301,7 @@ def new_from_tax_benefit_system(
     return simulation
 
 
-def print_memory_usage(simulation: Simulation):
+def print_memory_usage(simulation: Simulation) -> None:
     """
     Print memory usage.
 
@@ -1337,7 +1343,7 @@ def print_memory_usage(simulation: Simulation):
 
 def set_weight_variable_by_entity(
     simulation: Simulation,
-    weight_variable_by_entity: dict,
+    weight_variable_by_entity: Optional[dict[str, str]],
 ) -> None:
     """
     Set weight variable for each entity.
@@ -1354,7 +1360,7 @@ def summarize_variable(
     variable: Optional[str] = None,
     weighted: bool = False,
     force_compute: bool = False,
-):
+) -> None:
     """Print a summary of a variable including its memory usage.
 
     Args:
@@ -1428,7 +1434,7 @@ def summarize_variable(
                     )
                     df = pd.DataFrame({variable: array}).replace(categories_by_index).astype(categories_type)
                     df["weights"] = weights if weighted else 1
-                    groupby = df.groupby(variable)["weights"].sum()
+                    groupby = df.groupby(variable, observed=False)["weights"].sum()
                     total = groupby.sum()
                     expr = [f" {index} = {row:.2e} ({row / total:.1%})" for index, row in groupby.items()]
                     log.info("%s: %s.", period, ",".join(expr))
diff --git a/openfisca_survey_manager/statshelpers.py b/openfisca_survey_manager/policy/statshelpers.py
similarity index 85%
rename from openfisca_survey_manager/statshelpers.py
rename to openfisca_survey_manager/policy/statshelpers.py
index ade97abb..a55e2464 100644
--- a/openfisca_survey_manager/statshelpers.py
+++ b/openfisca_survey_manager/policy/statshelpers.py
@@ -1,3 +1,10 @@
+"""Statistical helpers (Gini, Lorenz, weighted percentiles, etc.)."""
+
+from __future__ import annotations
+
+import logging
+from typing import Optional
+
 import numpy as np
 import pandas as pd
 import weightedcalcs as wc
@@ -5,8 +12,13 @@
 from numpy import argsort, asarray, cumsum, linspace, ones, repeat, zeros
 from numpy import logical_and as and_
 
+log = logging.getLogger(__name__)
+
 
-def gini(values, weights=None):
+def gini(
+    values: np.ndarray | pd.Series,
+    weights: Optional[np.ndarray | pd.Series] = None,
+) -> float:
     """Computes Gini coefficient (normalized to 1).
     # Using fastgini formula :
     #             i=N      j=i
@@ -43,7 +55,11 @@ def gini(values, weights=None):
     return gini
 
 
-def kakwani(values, ineq_axis, weights=None):
+def kakwani(
+    values: np.ndarray | pd.Series,
+    ineq_axis: np.ndarray | pd.Series,
+    weights: Optional[np.ndarray | pd.Series] = None,
+) -> float:
     """Computes the Kakwani index
 
     Args:
@@ -67,7 +83,10 @@ def kakwani(values, ineq_axis, weights=None):
     return simps((lcy - plcy), lcx)
 
 
-def lorenz(values, weights=None):
+def lorenz(
+    values: np.ndarray | pd.Series,
+    weights: Optional[np.ndarray | pd.Series] = None,
+) -> tuple[np.ndarray, np.ndarray]:
     """Computes Lorenz curve coordinates (x, y)
 
     Args:
@@ -90,7 +109,13 @@ def lorenz(values, weights=None):
     return x, y
 
 
-def mark_weighted_percentiles(a, labels, weights, method, return_quantiles=False):
+def mark_weighted_percentiles(
+    a: np.ndarray | pd.Series,
+    labels: np.ndarray | list,
+    weights: np.ndarray | pd.Series,
+    method: int,
+    return_quantiles: bool = False,
+) -> np.ndarray | tuple[np.ndarray, list[float]]:
     """
 
     Args:
@@ -253,7 +278,11 @@ def mark_weighted_percentiles(a, labels, weights, method, return_quantiles=False
             return ret
 
 
-def pseudo_lorenz(values, ineq_axis, weights=None):
+def pseudo_lorenz(
+    values: np.ndarray | pd.Series,
+    ineq_axis: np.ndarray | pd.Series,
+    weights: Optional[np.ndarray | pd.Series] = None,
+) -> tuple[np.ndarray, np.ndarray]:
     """Computes The pseudo Lorenz Curve coordinates
 
     Args:
@@ -276,7 +305,11 @@ def pseudo_lorenz(values, ineq_axis, weights=None):
     return x, y
 
 
-def bottom_share(values, rank_from_bottom, weights=None):
+def bottom_share(
+    values: np.ndarray | pd.Series,
+    rank_from_bottom: float,
+    weights: Optional[np.ndarray | pd.Series] = None,
+) -> float:
     """
 
     Args:
@@ -304,7 +337,11 @@ def bottom_share(values, rank_from_bottom, weights=None):
     ).sum()
 
 
-def top_share(values, rank_from_top, weights=None):
+def top_share(
+    values: np.ndarray | pd.Series,
+    rank_from_top: float,
+    weights: Optional[np.ndarray | pd.Series] = None,
+) -> float:
     """
 
     Args:
@@ -331,7 +368,12 @@ def top_share(values, rank_from_top, weights=None):
     ).sum()
 
 
-def weighted_quantiles(data, labels, weights, return_quantiles=False):
+def weighted_quantiles(
+    data: np.ndarray | pd.Series,
+    labels: np.ndarray | list,
+    weights: np.ndarray | pd.Series,
+    return_quantiles: bool = False,
+) -> np.ndarray | tuple[np.ndarray, list[float]]:
     num_categories = len(labels)
     breaks = linspace(0, 1, num_categories + 1)
     quantiles = [wquantiles.quantile_1D(data, weights, mybreak) for mybreak in breaks[1:]]
@@ -347,7 +389,12 @@ def weighted_quantiles(data, labels, weights, return_quantiles=False):
         return ret + 1
 
 
-def weightedcalcs_quantiles(data, labels, weights, return_quantiles=False):
+def weightedcalcs_quantiles(
+    data: np.ndarray | pd.Series,
+    labels: np.ndarray | list,
+    weights: np.ndarray | pd.Series,
+    return_quantiles: bool = False,
+) -> np.ndarray | tuple[np.ndarray, list[float]]:
     calc = wc.Calculator("weights")
     num_categories = len(labels)
     breaks = linspace(0, 1, num_categories + 1)
diff --git a/openfisca_survey_manager/policy/tests/__init__.py b/openfisca_survey_manager/policy/tests/__init__.py
new file mode 100644
index 00000000..f9d9e925
--- /dev/null
+++ b/openfisca_survey_manager/policy/tests/__init__.py
@@ -0,0 +1 @@
+# Tests for policy package (simulations, simulation_builder, aggregates).
diff --git a/openfisca_survey_manager/tests/test_aggregates.py b/openfisca_survey_manager/policy/tests/test_aggregates.py
similarity index 96%
rename from openfisca_survey_manager/tests/test_aggregates.py
rename to openfisca_survey_manager/policy/tests/test_aggregates.py
index 38c461d4..7ec83109 100644
--- a/openfisca_survey_manager/tests/test_aggregates.py
+++ b/openfisca_survey_manager/policy/tests/test_aggregates.py
@@ -1,7 +1,7 @@
 import pytest
 from openfisca_country_template.reforms.modify_social_security_taxation import modify_social_security_taxation
 
-from openfisca_survey_manager.aggregates import AbstractAggregates
+from openfisca_survey_manager.policy import AbstractAggregates
 from openfisca_survey_manager.tests.test_scenario import create_randomly_initialized_survey_scenario
 
 
diff --git a/openfisca_survey_manager/tests/test_compute_aggregate.py b/openfisca_survey_manager/policy/tests/test_compute_aggregate.py
similarity index 100%
rename from openfisca_survey_manager/tests/test_compute_aggregate.py
rename to openfisca_survey_manager/policy/tests/test_compute_aggregate.py
diff --git a/openfisca_survey_manager/tests/test_compute_pivot_table.py b/openfisca_survey_manager/policy/tests/test_compute_pivot_table.py
similarity index 89%
rename from openfisca_survey_manager/tests/test_compute_pivot_table.py
rename to openfisca_survey_manager/policy/tests/test_compute_pivot_table.py
index 75706e21..4c20ff94 100644
--- a/openfisca_survey_manager/tests/test_compute_pivot_table.py
+++ b/openfisca_survey_manager/policy/tests/test_compute_pivot_table.py
@@ -7,7 +7,7 @@ def test_compute_pivot_table():
     survey_scenario = create_randomly_initialized_survey_scenario(reform=modify_social_security_taxation)
     period = "2017-01"
 
-    return survey_scenario.compute_pivot_table(
+    pivot_table = survey_scenario.compute_pivot_table(
         aggfunc="mean",
         columns=["age"],
         difference=False,
@@ -22,3 +22,4 @@ def test_compute_pivot_table():
         weighted=True,
         alternative_weights=None,
     )
+    assert pivot_table is not None
diff --git a/openfisca_survey_manager/tests/test_compute_winners_losers.py b/openfisca_survey_manager/policy/tests/test_compute_winners_losers.py
similarity index 98%
rename from openfisca_survey_manager/tests/test_compute_winners_losers.py
rename to openfisca_survey_manager/policy/tests/test_compute_winners_losers.py
index 3e50d93a..63fd7867 100644
--- a/openfisca_survey_manager/tests/test_compute_winners_losers.py
+++ b/openfisca_survey_manager/policy/tests/test_compute_winners_losers.py
@@ -1,7 +1,7 @@
 import pytest
 from openfisca_country_template.reforms.modify_social_security_taxation import modify_social_security_taxation
 
-from openfisca_survey_manager.simulations import SecretViolationError
+from openfisca_survey_manager.policy import SecretViolationError
 from openfisca_survey_manager.tests.test_scenario import create_randomly_initialized_survey_scenario
 
 
diff --git a/openfisca_survey_manager/tests/test_create_data_frame_by_entity.py b/openfisca_survey_manager/policy/tests/test_create_data_frame_by_entity.py
similarity index 100%
rename from openfisca_survey_manager/tests/test_create_data_frame_by_entity.py
rename to openfisca_survey_manager/policy/tests/test_create_data_frame_by_entity.py
diff --git a/openfisca_survey_manager/tests/test_marginal_tax_rate.py b/openfisca_survey_manager/policy/tests/test_marginal_tax_rate.py
similarity index 100%
rename from openfisca_survey_manager/tests/test_marginal_tax_rate.py
rename to openfisca_survey_manager/policy/tests/test_marginal_tax_rate.py
diff --git a/openfisca_survey_manager/tests/test_summarize_variables.py b/openfisca_survey_manager/policy/tests/test_summarize_variables.py
similarity index 95%
rename from openfisca_survey_manager/tests/test_summarize_variables.py
rename to openfisca_survey_manager/policy/tests/test_summarize_variables.py
index 5e7c2ebb..4b62f548 100644
--- a/openfisca_survey_manager/tests/test_summarize_variables.py
+++ b/openfisca_survey_manager/policy/tests/test_summarize_variables.py
@@ -19,7 +19,7 @@ def test_summarize_variable_log_output(caplog):
     The doctest used to check stdout; we now send that output to the logging system.
     This test captures logs and verifies the expected content is present.
     """
-    with caplog.at_level(logging.INFO, logger="openfisca_survey_manager.simulations"):
+    with caplog.at_level(logging.INFO, logger="openfisca_survey_manager.policy.simulations"):
         survey_scenario = create_randomly_initialized_survey_scenario(collection=None)
         survey_scenario.summarize_variable(variable="housing_occupancy_status", force_compute=True)
 
@@ -32,7 +32,7 @@ def test_summarize_variable_log_output(caplog):
     assert "owner" in text or "tenant" in text or "free_lodger" in text or "homeless" in text
 
     caplog.clear()
-    with caplog.at_level(logging.INFO, logger="openfisca_survey_manager.simulations"):
+    with caplog.at_level(logging.INFO, logger="openfisca_survey_manager.policy.simulations"):
         survey_scenario.summarize_variable(variable="rent", force_compute=True)
 
     messages = [r.message for r in caplog.records]
@@ -43,7 +43,7 @@ def test_summarize_variable_log_output(caplog):
 
     survey_scenario.tax_benefit_systems["baseline"].neutralize_variable("age")
     caplog.clear()
-    with caplog.at_level(logging.INFO, logger="openfisca_survey_manager.simulations"):
+    with caplog.at_level(logging.INFO, logger="openfisca_survey_manager.policy.simulations"):
         survey_scenario.summarize_variable(variable="age")
 
     messages = [r.message for r in caplog.records]
diff --git a/openfisca_survey_manager/variables.py b/openfisca_survey_manager/policy/variables.py
similarity index 78%
rename from openfisca_survey_manager/variables.py
rename to openfisca_survey_manager/policy/variables.py
index f91cbcc3..fc6f8fc1 100644
--- a/openfisca_survey_manager/variables.py
+++ b/openfisca_survey_manager/policy/variables.py
@@ -1,14 +1,24 @@
+"""Policy variables helpers (quantile formulas)."""
+
+from __future__ import annotations
+
 import logging
+from typing import Any, Callable, Optional
 
 from numpy import arange
 from openfisca_core.model_api import ADD, YEAR, Variable, where
 
-from openfisca_survey_manager.statshelpers import mark_weighted_percentiles, weightedcalcs_quantiles
+from openfisca_survey_manager.policy.statshelpers import mark_weighted_percentiles, weightedcalcs_quantiles
 
 log = logging.getLogger(__name__)
 
 
-def create_quantile(x, nquantiles, weight_variable, entity_name):
+def create_quantile(
+    x: str,
+    nquantiles: int,
+    weight_variable: str,
+    entity_name: Any,
+) -> type[Variable]:
     class quantile(Variable):
         value_type = int
         entity = entity_name
@@ -35,7 +45,12 @@ def formula(entity, period):
     return quantile
 
 
-def quantile(q, variable, weight_variable=None, filter_variable=None):
+def quantile(
+    q: int,
+    variable: str,
+    weight_variable: Optional[str] = None,
+    filter_variable: Optional[str] = None,
+) -> Callable[..., Any]:
     """
     Return quantile of a variable with weight provided by a specific wieght variable potentially filtered
     """
@@ -63,7 +78,12 @@ def formula(entity, period):
     return formula
 
 
-def old_quantile(q, variable, weight_variable=None, filter_variable=None):
+def old_quantile(
+    q: int,
+    variable: str,
+    weight_variable: Optional[str] = None,
+    filter_variable: Optional[str] = None,
+) -> Callable[..., Any]:
     def formula(entity, period):
         value = entity(variable, period)
         if weight_variable is not None:
diff --git a/openfisca_survey_manager/processing/__init__.py b/openfisca_survey_manager/processing/__init__.py
index ce2da410..b6e95546 100644
--- a/openfisca_survey_manager/processing/__init__.py
+++ b/openfisca_survey_manager/processing/__init__.py
@@ -2,6 +2,23 @@
 # See docs/REFACTORING_PLAN.md for migration steps.
 
 from openfisca_survey_manager.processing.cleaning import clean_data_frame
-from openfisca_survey_manager.processing.weights import Calibration, calmar, check_calmar
+from openfisca_survey_manager.processing.harmonization import harmonize_data_frame_columns
 
-__all__ = ["Calibration", "calmar", "check_calmar", "clean_data_frame"]
+
+# Lazy import to avoid circular dependency (processing -> policy -> survey_collections -> core)
+def __getattr__(name: str) -> object:
+    if name in ("Calibration", "calmar", "check_calmar"):
+        from openfisca_survey_manager.policy.calibration import Calibration
+        from openfisca_survey_manager.policy.calmar import calmar, check_calmar
+
+        return {"Calibration": Calibration, "calmar": calmar, "check_calmar": check_calmar}[name]
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
+
+
+__all__ = [
+    "Calibration",
+    "calmar",
+    "check_calmar",
+    "clean_data_frame",
+    "harmonize_data_frame_columns",
+]
diff --git a/openfisca_survey_manager/processing/cleaning.py b/openfisca_survey_manager/processing/cleaning.py
index 6a0b81fa..5aa88b7e 100644
--- a/openfisca_survey_manager/processing/cleaning.py
+++ b/openfisca_survey_manager/processing/cleaning.py
@@ -1,5 +1,7 @@
 """Data frame cleaning (column normalization, empty handling)."""
 
+from __future__ import annotations
+
 import logging
 
 import pandas as pd
diff --git a/openfisca_survey_manager/processing/harmonization.py b/openfisca_survey_manager/processing/harmonization.py
new file mode 100644
index 00000000..4d30c6b4
--- /dev/null
+++ b/openfisca_survey_manager/processing/harmonization.py
@@ -0,0 +1,36 @@
+"""Column harmonization for survey data (lowercase, ident renaming)."""
+
+from __future__ import annotations
+
+import logging
+import re
+
+import pandas as pd
+
+log = logging.getLogger(__name__)
+
+# Column names matching this pattern (e.g. ident01, ident2019) are renamed to "ident"
+IDENT_COLUMN_PATTERN = re.compile(r"(?i)ident\d{2,4}$")
+
+
+def harmonize_data_frame_columns(
+    data_frame: pd.DataFrame,
+    *,
+    lowercase: bool = False,
+    rename_ident: bool = True,
+) -> None:
+    """Harmonize column names in place.
+
+    - If lowercase: rename all columns to lowercase.
+    - If rename_ident: rename the first column matching ident pattern (e.g. ident01, ident2019) to "ident".
+    """
+    if lowercase:
+        columns = {col: col.lower() for col in data_frame.columns}
+        data_frame.rename(columns=columns, inplace=True)
+
+    if rename_ident:
+        for column_name in data_frame.columns:
+            if IDENT_COLUMN_PATTERN.match(str(column_name)) is not None:
+                data_frame.rename(columns={column_name: "ident"}, inplace=True)
+                log.info("%s column have been replaced by ident", column_name)
+                break
diff --git a/openfisca_survey_manager/processing/weights/__init__.py b/openfisca_survey_manager/processing/weights/__init__.py
deleted file mode 100644
index 66b4484b..00000000
--- a/openfisca_survey_manager/processing/weights/__init__.py
+++ /dev/null
@@ -1,6 +0,0 @@
-# Calibration and CALMAR weight calibration. See docs/REFACTORING_PLAN.md.
-
-from openfisca_survey_manager.processing.weights.calibration import Calibration
-from openfisca_survey_manager.processing.weights.calmar import calmar, check_calmar
-
-__all__ = ["Calibration", "calmar", "check_calmar"]
diff --git a/openfisca_survey_manager/read_dbf.py b/openfisca_survey_manager/read_dbf.py
deleted file mode 100644
index 3ff17ae2..00000000
--- a/openfisca_survey_manager/read_dbf.py
+++ /dev/null
@@ -1,5 +0,0 @@
-"""Re-export for backward compatibility. Prefer: from openfisca_survey_manager.io.readers import read_dbf."""
-
-from openfisca_survey_manager.io.readers import read_dbf
-
-__all__ = ["read_dbf"]
diff --git a/openfisca_survey_manager/read_sas.py b/openfisca_survey_manager/read_sas.py
deleted file mode 100644
index 58168894..00000000
--- a/openfisca_survey_manager/read_sas.py
+++ /dev/null
@@ -1,5 +0,0 @@
-"""Re-export for backward compatibility. Prefer: from openfisca_survey_manager.io.readers import read_sas."""
-
-from openfisca_survey_manager.io.readers import read_sas
-
-__all__ = ["read_sas"]
diff --git a/openfisca_survey_manager/read_spss.py b/openfisca_survey_manager/read_spss.py
deleted file mode 100644
index ee4da5cb..00000000
--- a/openfisca_survey_manager/read_spss.py
+++ /dev/null
@@ -1,5 +0,0 @@
-"""Re-export for backward compatibility. Prefer: from openfisca_survey_manager.io.readers import read_spss."""
-
-from openfisca_survey_manager.io.readers import read_spss
-
-__all__ = ["read_spss"]
diff --git a/openfisca_survey_manager/scripts/build_collection.py b/openfisca_survey_manager/scripts/build_collection.py
index 4aa6472b..51b368e0 100755
--- a/openfisca_survey_manager/scripts/build_collection.py
+++ b/openfisca_survey_manager/scripts/build_collection.py
@@ -13,9 +13,12 @@
 import sys
 from pathlib import Path
 
-from openfisca_survey_manager.paths import default_config_files_directory, openfisca_survey_manager_location
-from openfisca_survey_manager.survey_collections import SurveyCollection
-from openfisca_survey_manager.surveys import Survey
+from openfisca_survey_manager.configuration.paths import (
+    default_config_files_directory,
+    openfisca_survey_manager_location,
+)
+from openfisca_survey_manager.core.dataset import SurveyCollection
+from openfisca_survey_manager.core.survey import Survey
 
 app_name = Path(__file__).stem
 log = logging.getLogger(app_name)
@@ -240,9 +243,14 @@ def main():
         "--parquet",
         action="store_true",
         default=False,
+        help="save data in parquet format (directory with one .parquet file per table)",
+    )
+    parser.add_argument(
+        "--zarr",
+        action="store_true",
+        default=False,
         help=(
-            "save data in parquet format instead of HDF5 "
-            "(HDF5 will no longer be the default format in a future version)"
+            "save data in zarr format (one zarr group per table); requires: pip install openfisca-survey-manager[zarr]"
         ),
     )
     parser.add_argument(
@@ -280,22 +288,16 @@ def main():
 
     start_time = datetime.datetime.now()
 
-    # Determine store format based on argument
-    store_format = "parquet" if args.parquet else "hdf5"
-
-    # Deprecation warning for HDF5 format
-    if not args.parquet:
-        import warnings
-
-        warnings.warn(
-            "HDF5 will no longer be the default format in a future version. "
-            "Please use --parquet option to save data in parquet format.",
-            DeprecationWarning,
-            stacklevel=2,
-        )
+    # Determine store format based on argument (--zarr > --parquet > default hdf5)
+    if args.zarr:
+        store_format = "zarr"
+    elif args.parquet:
+        store_format = "parquet"
+    else:
+        store_format = "hdf5"
         log.warning(
             "HDF5 will no longer be the default format in a future version. "
-            "Please use --parquet option to save data in parquet format."
+            "Please use --parquet or --zarr to save data in parquet or zarr format."
         )
 
     try:
diff --git a/openfisca_survey_manager/scripts/migrate_config_to_rfc002.py b/openfisca_survey_manager/scripts/migrate_config_to_rfc002.py
new file mode 100644
index 00000000..acd42d1f
--- /dev/null
+++ b/openfisca_survey_manager/scripts/migrate_config_to_rfc002.py
@@ -0,0 +1,252 @@
+#!/usr/bin/env python
+"""
+Migrate existing config (config.ini + raw_data.ini + JSON collections) to RFC-002 layout.
+
+Produces:
+  - config.yaml (collections_dir, default_output_dir, tmp_dir)
+  - collections_dir/<name>/manifest.yaml per collection
+
+Usage:
+  python -m openfisca_survey_manager.scripts.migrate_config_to_rfc002 [--config-dir PATH] [--dry-run]
+"""
+
+from __future__ import annotations
+
+import argparse
+import configparser
+import json
+import logging
+import sys
+from pathlib import Path
+
+import yaml
+
+# Allow running as __main__ or as script
+try:
+    from openfisca_survey_manager.configuration.config_loader import (
+        CONFIG_FILENAME,
+        MANIFEST_FILENAME,
+    )
+except ImportError:
+    CONFIG_FILENAME = "config.yaml"
+    MANIFEST_FILENAME = "manifest.yaml"
+
+log = logging.getLogger(__name__)
+
+SOURCE_FORMAT_KEYS = ("csv_files", "sas_files", "stata_files", "parquet_files")
+
+
+def _informations_to_source(informations: dict) -> tuple[str, str]:
+    """From Survey.informations (e.g. csv_files, sas_files), return (format, path)."""
+    if not informations:
+        return "csv", ""
+    for key in SOURCE_FORMAT_KEYS:
+        paths = informations.get(key)
+        if paths and isinstance(paths, list) and len(paths) > 0:
+            fmt = key.replace("_files", "")
+            path = paths[0] if isinstance(paths[0], str) else str(paths[0])
+            return fmt, path
+    return "csv", ""
+
+
+def build_manifest_from_json(
+    json_path: Path,
+    raw_data_section: dict[str, str] | None = None,
+) -> dict:
+    """
+    Build RFC-002 manifest dict from a legacy collection JSON file.
+    raw_data_section: optional dict survey_name -> path from raw_data.ini [collection_name].
+    """
+    with json_path.open(encoding="utf-8") as f:
+        data = json.load(f)
+    name = data.get("name", json_path.stem)
+    label = data.get("label", name)
+    surveys_data = data.get("surveys", {})
+    if not isinstance(surveys_data, dict):
+        surveys_data = {}
+    surveys = {}
+    for survey_name, survey_obj in surveys_data.items():
+        if not isinstance(survey_obj, dict):
+            continue
+        infos = survey_obj.get("informations", {}) or {}
+        if raw_data_section and survey_name in raw_data_section:
+            path = raw_data_section[survey_name]
+            fmt = "csv"
+            for k in SOURCE_FORMAT_KEYS:
+                if infos.get(k):
+                    fmt = k.replace("_files", "")
+                    break
+        else:
+            fmt, path = _informations_to_source(infos)
+        surveys[survey_name] = {
+            "label": survey_obj.get("label", survey_name),
+            "source": {"format": fmt, "path": path},
+        }
+        if survey_obj.get("output_subdir"):
+            surveys[survey_name]["output_subdir"] = survey_obj["output_subdir"]
+
+    store_format = _infer_store_format_from_legacy(surveys_data)
+    return {"name": name, "label": label, "store_format": store_format, "surveys": surveys}
+
+
+def _infer_store_format_from_legacy(surveys_data: dict) -> str:
+    """Infer store_format from legacy JSON surveys (parquet_file_path, zarr_file_path, hdf5_file_path)."""
+    if not isinstance(surveys_data, dict):
+        return "parquet"
+    for survey_obj in surveys_data.values():
+        if not isinstance(survey_obj, dict):
+            continue
+        if survey_obj.get("zarr_file_path"):
+            return "zarr"
+        if survey_obj.get("parquet_file_path"):
+            return "parquet"
+        if survey_obj.get("hdf5_file_path"):
+            return "hdf5"
+    return "parquet"
+
+
+def load_raw_data_ini(config_dir: Path) -> configparser.ConfigParser | None:
+    """Load raw_data.ini if present."""
+    path = config_dir / "raw_data.ini"
+    if not path.is_file():
+        return None
+    parser = configparser.ConfigParser()
+    parser.read(path, encoding="utf-8")
+    return parser
+
+
+def migrate(
+    config_dir: Path,
+    *,
+    dry_run: bool = False,
+) -> bool:
+    """
+    Migrate config_dir from config.ini (+ raw_data.ini + JSON) to config.yaml + manifests.
+    Returns True if migration was done (or dry_run succeeded).
+    """
+    config_ini = config_dir / "config.ini"
+    if not config_ini.is_file():
+        log.error("No config.ini found in %s", config_dir)
+        return False
+
+    parser = configparser.ConfigParser()
+    parser.read(config_ini, encoding="utf-8")
+    if "collections" not in parser.sections():
+        log.error("config.ini has no [collections] section")
+        return False
+
+    collections_dir_str = parser.get("collections", "collections_directory", fallback=None)
+    if not collections_dir_str:
+        collections_dir_str = str(config_dir / "collections")
+    collections_dir = Path(collections_dir_str).expanduser().resolve()
+
+    output_dir = parser.get("data", "output_directory", fallback=str(config_dir / "output"))
+    tmp_dir = parser.get("data", "tmp_directory", fallback="/tmp")
+    if "data" not in parser.sections():
+        output_dir = str(config_dir / "output")
+        tmp_dir = "/tmp"
+
+    raw_data = load_raw_data_ini(config_dir)
+    collection_names: list[str] = []
+    for key in parser.options("collections"):
+        if key == "collections_directory":
+            continue
+        collection_names.append(key)
+
+    if not collection_names:
+        log.warning("No collection entries in config.ini (only collections_directory)")
+        # Still write config.yaml so the dir is ready for new-style use
+    else:
+        if not dry_run:
+            collections_dir.mkdir(parents=True, exist_ok=True)
+        for name in collection_names:
+            try:
+                json_path_str = parser.get("collections", name)
+            except configparser.NoOptionError:
+                continue
+            json_path = Path(json_path_str).expanduser().resolve()
+            if not json_path.is_file():
+                log.warning("Collection %s: JSON file not found %s", name, json_path)
+                continue
+            raw_section = None
+            if raw_data and raw_data.has_section(name):
+                raw_section = dict(raw_data.items(name))
+            manifest = build_manifest_from_json(json_path, raw_section)
+            manifest_path = collections_dir / name / MANIFEST_FILENAME
+            if dry_run:
+                log.info("[dry-run] Would write %s", manifest_path)
+                continue
+            manifest_path.parent.mkdir(parents=True, exist_ok=True)
+            with manifest_path.open("w", encoding="utf-8") as f:
+                yaml.safe_dump(
+                    manifest,
+                    f,
+                    default_flow_style=False,
+                    allow_unicode=True,
+                    sort_keys=False,
+                )
+            log.info("Wrote %s", manifest_path)
+
+    config_yaml_path = config_dir / CONFIG_FILENAME
+    new_config = {
+        "collections_dir": str(collections_dir),
+        "default_output_dir": str(Path(output_dir).expanduser().resolve()),
+        "tmp_dir": str(Path(tmp_dir).expanduser().resolve()),
+    }
+    if dry_run:
+        log.info("[dry-run] Would write %s with %s", config_yaml_path, new_config)
+        return True
+    with config_yaml_path.open("w", encoding="utf-8") as f:
+        yaml.safe_dump(new_config, f, default_flow_style=False, sort_keys=False)
+    log.info("Wrote %s", config_yaml_path)
+    return True
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser(
+        description="Migrate config.ini + raw_data.ini + JSON to RFC-002 (config.yaml + manifests)",
+    )
+    ap.add_argument(
+        "--config-dir",
+        type=Path,
+        default=None,
+        help="Directory containing config.ini (default: XDG or OPENFISCA_SURVEY_CONFIG_DIR)",
+    )
+    ap.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Only log what would be done",
+    )
+    ap.add_argument(
+        "-v",
+        "--verbose",
+        action="store_true",
+        help="Verbose logging",
+    )
+    args = ap.parse_args()
+    logging.basicConfig(
+        level=logging.DEBUG if args.verbose else logging.INFO,
+        format="%(message)s",
+        stream=sys.stdout,
+    )
+    if args.config_dir is None:
+        try:
+            from openfisca_survey_manager.configuration.config_loader import (
+                get_config_dir,
+            )
+
+            config_dir = get_config_dir()
+        except Exception:
+            log.error("Provide --config-dir or set OPENFISCA_SURVEY_CONFIG_DIR")
+            return 1
+    else:
+        config_dir = args.config_dir.expanduser().resolve()
+    if not config_dir.is_dir():
+        log.error("Config directory does not exist: %s", config_dir)
+        return 1
+    ok = migrate(config_dir, dry_run=args.dry_run)
+    return 0 if ok else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/openfisca_survey_manager/survey_collections.py b/openfisca_survey_manager/survey_collections.py
deleted file mode 100644
index ba25168b..00000000
--- a/openfisca_survey_manager/survey_collections.py
+++ /dev/null
@@ -1,154 +0,0 @@
-import codecs
-import collections
-import configparser
-import json
-import logging
-from pathlib import Path
-
-from openfisca_survey_manager.config import Config
-from openfisca_survey_manager.exceptions import SurveyConfigError
-from openfisca_survey_manager.paths import default_config_files_directory
-from openfisca_survey_manager.surveys import Survey
-
-log = logging.getLogger(__name__)
-
-
-class SurveyCollection:
-    """A collection of Surveys"""
-
-    def __init__(
-        self, config_files_directory=default_config_files_directory, label=None, name=None, json_file_path=None
-    ):
-        self.name = name
-        self.label = label
-        self.json_file_path = json_file_path
-        self.surveys = []
-        log.debug(f"Initializing SurveyCollection from config file found in {config_files_directory} ..")
-        config = Config(config_files_directory=config_files_directory)
-        if label is not None:
-            self.label = label
-        if name is not None:
-            self.name = name
-        if json_file_path is not None:
-            self.json_file_path = json_file_path
-            if "collections" not in config.sections():
-                config["collections"] = {}
-            config.set("collections", self.name, str(self.json_file_path))
-            config.save()
-        elif config is not None:
-            if config.has_option("collections", self.name):
-                self.json_file_path = config.get("collections", self.name)
-            elif config.get("collections", "collections_directory") is not None:
-                self.json_file_path = str(Path(config.get("collections", "collections_directory")) / (name + ".json"))
-
-        self.config = config
-
-    def __repr__(self):
-        header = f"""{self.name}
-Survey collection of {self.label}
-Contains the following surveys :
-"""
-        surveys = [f"       {survey.name} : {survey.label} \n" for survey in self.surveys]
-        return header + "".join(surveys)
-
-    def dump(self, config_files_directory=None, json_file_path=None):
-        """
-        Dump the survey collection to a json file
-        And set the json file path in the config file
-        """
-        if self.config is not None:
-            config = self.config
-        else:
-            if config_files_directory is not None:
-                pass
-            else:
-                config_files_directory = default_config_files_directory
-            self.config = Config(config_files_directory=config_files_directory)
-
-        if json_file_path is None:
-            assert self.json_file_path is not None, "A json_file_path should be provided"
-        else:
-            self.json_file_path = json_file_path
-
-        config.set("collections", self.name, str(self.json_file_path))
-        config.save()
-        with codecs.open(str(self.json_file_path), "w", encoding="utf-8") as _file:
-            json.dump(self.to_json(), _file, ensure_ascii=False, indent=2)
-
-    def fill_store(
-        self,
-        source_format=None,
-        surveys=None,
-        tables=None,
-        overwrite=False,
-        keep_original_parquet_file=False,
-        encoding=None,
-        store_format="hdf5",
-        categorical_strategy="unique_labels",
-    ):
-        if surveys is None:
-            surveys = self.surveys
-        for survey in surveys:
-            survey.fill_store(
-                source_format=source_format,
-                tables=tables,
-                overwrite=overwrite,
-                keep_original_parquet_file=keep_original_parquet_file,
-                encoding=encoding,
-                store_format=store_format,
-                categorical_strategy=categorical_strategy,
-            )
-        self.dump()
-
-    def get_survey(self, survey_name):
-        available_surveys_names = [survey.name for survey in self.surveys]
-        assert survey_name in available_surveys_names, (
-            f"Survey {survey_name} cannot be found for survey collection {self.name}.\n"
-            f"Available surveys are :{available_surveys_names}"
-        )
-        return [survey for survey in self.surveys if survey.name == survey_name].pop()
-
-    @classmethod
-    def load(cls, json_file_path=None, collection=None, config_files_directory=default_config_files_directory):
-        assert Path(config_files_directory).exists()
-        config = Config(config_files_directory=config_files_directory)
-        if json_file_path is None:
-            assert collection is not None, "A collection is needed"
-            try:
-                json_file_path = config.get("collections", collection)
-            except (configparser.NoOptionError, configparser.NoSectionError) as error:
-                msg = f"Looking for config file in {config_files_directory}"
-                log.debug(msg)
-                log.error(error)
-                raise error
-            except Exception as error:
-                msg = f"Looking for config file in {config_files_directory}"
-                log.debug(msg)
-                log.error(error)
-                raise SurveyConfigError(msg) from error
-
-        with Path(json_file_path).open("r") as _file:
-            self_json = json.load(_file)
-            name = self_json["name"]
-
-        self = cls(config_files_directory=config_files_directory, name=name)
-        self.config = config
-        with Path(json_file_path).open("r") as _file:
-            self_json = json.load(_file)
-            self.json_file_path = json_file_path
-            self.label = self_json.get("label")
-            self.name = self_json.get("name")
-
-        surveys = self_json["surveys"]
-        for survey_name, survey_json in surveys.items():
-            survey = Survey(name=survey_name)
-            self.surveys.append(survey.create_from_json(survey_json))
-        return self
-
-    def to_json(self):
-        self_json = collections.OrderedDict(())
-        self_json["name"] = self.name
-        self_json["surveys"] = collections.OrderedDict(())
-        for survey in self.surveys:
-            self_json["surveys"][survey.name] = survey.to_json()
-        return self_json
diff --git a/openfisca_survey_manager/surveys.py b/openfisca_survey_manager/surveys.py
deleted file mode 100644
index 77dd566e..00000000
--- a/openfisca_survey_manager/surveys.py
+++ /dev/null
@@ -1,388 +0,0 @@
-#! /usr/bin/env python
-
-
-import collections
-import logging
-import re
-from pathlib import Path
-
-import pandas
-import pyarrow as pa
-import pyarrow.parquet as pq
-import yaml
-
-from openfisca_survey_manager.exceptions import SurveyIOError, SurveyManagerError
-
-from .tables import Table
-
-ident_re = re.compile(r"(?i)ident\d{2,4}$")
-
-log = logging.getLogger(__name__)
-
-
-source_format_by_extension = {
-    "csv": "csv",
-    "sas7bdat": "sas",
-    "dta": "stata",
-    "Rdata": "Rdata",
-    "spss": "sav",
-    "parquet": "parquet",
-}
-
-admissible_source_formats = list(source_format_by_extension.values())
-
-
-class NoMoreDataError(Exception):
-    # Exception when the user ask for more data than available in file
-    pass
-
-
-class Survey:
-    """An object to describe survey data"""
-
-    hdf5_file_path = None
-    parquet_file_path = None
-    label = None
-    name = None
-    survey_collection = None
-
-    def __init__(
-        self, name=None, label=None, hdf5_file_path=None, parquet_file_path=None, survey_collection=None, **kwargs
-    ):
-        assert name is not None, "A survey should have a name"
-        self.name = name
-        self.tables = collections.OrderedDict()
-        self.informations = {}
-        self.tables_index = {}
-
-        if label is not None:
-            self.label = label
-
-        if hdf5_file_path is not None:
-            self.hdf5_file_path = hdf5_file_path
-
-        if parquet_file_path is not None:
-            self.parquet_file_path = parquet_file_path
-
-        if survey_collection is not None:
-            self.survey_collection = survey_collection
-
-        self.informations = kwargs
-
-    def __repr__(self):
-        header = f"""{self.name} : survey data {self.label}
-Contains the following tables : \n"""
-        tables = yaml.safe_dump(list(self.tables.keys()), default_flow_style=False)
-        informations = yaml.safe_dump(self.informations, default_flow_style=False)
-        return header + tables + informations
-
-    @classmethod
-    def create_from_json(cls, survey_json):
-        self = cls(
-            name=survey_json.get("name"),
-            label=survey_json.get("label"),
-            hdf5_file_path=survey_json.get("hdf5_file_path"),
-            parquet_file_path=survey_json.get("parquet_file_path"),
-            **survey_json.get("informations", {}),
-        )
-        self.tables = survey_json.get("tables")
-        return self
-
-    def dump(self):
-        assert self.survey_collection is not None
-        self.survey_collection.dump()
-
-    def fill_store(
-        self,
-        source_format=None,
-        tables=None,
-        overwrite=True,
-        keep_original_parquet_file=False,
-        encoding=None,
-        store_format="hdf5",
-        categorical_strategy="unique_labels",
-    ):
-        """
-        Convert data from the source files to store format either hdf5 or parquet.
-        If the source is in parquet, the data is not converted.
-        """
-        assert self.survey_collection is not None
-        assert isinstance(overwrite, (bool, list))
-        survey = self
-        # Create folder if it does not exist
-        config = survey.survey_collection.config
-        directory_path = config.get("data", "output_directory")
-        if not Path(directory_path).is_dir():
-            log.warn(
-                f"{directory_path} who should be the store data directory does not exist: we create the directory"
-            )
-            Path(directory_path).mkdir(parents=True)
-
-        if source_format == "parquet":
-            store_format = "parquet"
-
-        if store_format == "hdf5" and survey.hdf5_file_path is None:
-            survey.hdf5_file_path = str(Path(directory_path) / (survey.name + ".h5"))
-
-        if store_format == "parquet" and survey.parquet_file_path is None:
-            survey.parquet_file_path = str(Path(directory_path) / survey.name)
-
-        self.store_format = store_format
-
-        if source_format is not None:
-            assert source_format in admissible_source_formats, f"Data source format {source_format} is unknown"
-            source_formats = [source_format]
-        else:
-            source_formats = admissible_source_formats
-
-        for source_format in source_formats:
-            files = f"{source_format}_files"
-            for data_file in survey.informations.get(files, []):
-                name = Path(data_file).stem
-                extension = Path(data_file).suffix
-                if tables is None or name in tables:
-                    if keep_original_parquet_file:
-                        # Use folder instead of files if numeric at end of file
-                        if re.match(r".*-\d$", name):
-                            name = name.split("-")[0]
-                            parquet_file = str(Path(data_file).parent)
-                            # Get the parent folder
-                            survey.parquet_file_path = str(Path(data_file).parent.parent)
-                        else:
-                            parquet_file = data_file
-                            survey.parquet_file_path = str(Path(data_file).parent)
-                        table = Table(
-                            label=name,
-                            name=name,
-                            source_format=source_format_by_extension[extension[1:]],
-                            survey=survey,
-                            parquet_file=parquet_file,
-                        )
-                        table.read_parquet_columns(data_file)
-
-                    else:
-                        table = Table(
-                            label=name,
-                            name=name,
-                            source_format=source_format_by_extension[extension[1:]],
-                            survey=survey,
-                        )
-                        table.fill_store(
-                            data_file,
-                            clean=True,
-                            overwrite=overwrite if isinstance(overwrite, bool) else table.name in overwrite,
-                            encoding=encoding,
-                            categorical_strategy=categorical_strategy,
-                        )
-        self.dump()
-
-    def get_value(self, variable, table, lowercase=False, ignorecase=False):
-        """Get variable value from a survey table.
-
-        Args:
-          variable: variable to retrieve
-          table(str): name of the table
-          lowercase(bool, optional, optional): lowercase variable names, defaults to False
-          ignorecase: ignore case of table name, defaults to False
-
-        Returns:
-          pd.DataFrame: dataframe containing the variable
-
-        """
-        return self.get_values([variable], table)
-
-    def get_values(
-        self,
-        variables=None,
-        table=None,
-        lowercase=False,
-        ignorecase=False,
-        rename_ident=True,
-        batch_size=None,
-        batch_index=0,
-        filter_by=None,
-    ) -> pandas.DataFrame:
-        """Get variables values from a survey table.
-
-        Args:
-          variables(list, optional, optional): variables to retrieve, defaults to None (retrieve all variables)
-          table(str, optional, optional): name of the table, defaults to None
-          ignorecase: ignore case of table name, defaults to False
-          lowercase(bool, optional, optional): lowercase variable names, defaults to False
-          rename_ident(bool, optional, optional): rename ident+yr (e.g. ident08) into ident, defaults to True
-          batch_size(int, optional, optional): batch size for parquet file, defaults to None
-          batch_index(int, optional, optional): batch index for parquet file, defaults to 0
-
-        Returns:
-          pd.DataFrame: dataframe containing the variables
-
-        Raises:
-          Exception:
-
-        """
-        if self.parquet_file_path is None and self.hdf5_file_path is None:
-            raise SurveyIOError(f"No data file found for survey {self.name}")
-        if self.hdf5_file_path is not None:
-            assert Path(self.hdf5_file_path).exists(), (
-                f"{self.hdf5_file_path} is not a valid path. This could happen because "
-                "your data were not builded yet. Please consider using a rebuild option in your code."
-            )
-            store = pandas.HDFStore(self.hdf5_file_path, "r")
-            if ignorecase:
-                keys = store.keys()
-                eligible_tables = []
-                for string in keys:
-                    match = re.findall(table, string, re.IGNORECASE)
-                    if match:
-                        eligible_tables.append(match[0])
-                if len(eligible_tables) > 1:
-                    raise SurveyManagerError(
-                        f"{table} is ambiguous since the following tables are available: {eligible_tables}"
-                    )
-                elif len(eligible_tables) == 0:
-                    raise SurveyIOError(f"No eligible available table in {keys}")
-                else:
-                    table = eligible_tables[0]
-            try:
-                df = store.select(table)
-            except KeyError:
-                log.error(f"No table {table} in the file {self.hdf5_file_path}")
-                log.error(
-                    f"This could happen because your data were not builded yet. Available tables are: {store.keys()}"
-                )
-                store.close()
-                raise
-
-            store.close()
-
-        elif self.parquet_file_path is not None:
-            if table is None:
-                raise SurveyIOError("A table name is needed to retrieve data from a parquet file")
-            for table_name, table_content in self.tables.items():
-                if table == table_name:
-                    parquet_file = table_content.get("parquet_file")
-                    # Is parquet_file a folder or a file?
-                    if Path(parquet_file).is_dir():
-                        # find first parquet file in folder
-                        for file in Path(parquet_file).iterdir():
-                            if file.suffix == ".parquet":
-                                one_parquet_file = str(Path(parquet_file) / file)
-                                break
-                        else:
-                            raise SurveyIOError(f"No parquet file found in {parquet_file}")
-                    else:
-                        one_parquet_file = parquet_file
-                    parquet_schema = pq.read_schema(one_parquet_file)
-                    assert len(parquet_schema.names) >= 1, (
-                        f"The parquet file {table_content.get('parquet_file')} is empty"
-                    )
-                    if variables is None:
-                        variables = table_content.get("variables")
-                    if filter_by:
-                        df = pq.ParquetDataset(parquet_file, filters=filter_by).read(columns=variables).to_pandas()
-                    elif batch_size:
-                        if Path(parquet_file).is_dir():
-                            parquet_file = [str(p) for p in Path(parquet_file).glob("*.parquet")]
-                        else:
-                            parquet_file = [parquet_file]
-                        # Initialize an empty list to store the Parquet tables
-                        tables = []
-                        # Loop through the file paths and read each Parquet file
-                        for file_path in parquet_file:
-                            table = pq.read_table(file_path, columns=variables)
-                            tables.append(table)
-
-                        # Concatenate the tables if needed
-                        final_table = pa.concat_tables(tables) if len(tables) > 1 else tables[0]
-                        record_batches = final_table.to_batches(max_chunksize=batch_size)
-                        if len(record_batches) <= batch_index:
-                            raise NoMoreDataError(
-                                f"Batch {batch_index} not found in {table_name}. Max index is {len(record_batches)}"
-                            )
-                        df = record_batches[batch_index].to_pandas()
-                        # iter_parquet = parquet_file.iter_batches(batch_size=batch_size, columns=variables)
-                        # index = 0
-                        # while True:
-                        #     try:
-                        #         batch = next(iter_parquet)
-                        #     except StopIteration:
-                        #         raise NoMoreDataError(
-                        #             f"Batch {batch_index} not found in {table_name}. Max index is {index}"
-                        #         )
-                        #         break
-                        #     if batch_index == index:
-                        #         df = batch.to_pandas()
-                        #         break
-                        #     index += 1
-                    else:
-                        df = pq.ParquetDataset(parquet_file).read(columns=variables).to_pandas()
-                    break
-            else:
-                raise SurveyIOError(f"No table {table} found in {self.parquet_file_path}")
-
-        if lowercase:
-            columns = {column_name: column_name.lower() for column_name in df}
-            df.rename(columns=columns, inplace=True)
-
-        if rename_ident is True:
-            for column_name in df:
-                if ident_re.match(str(column_name)) is not None:
-                    df.rename(columns={column_name: "ident"}, inplace=True)
-                    log.info(f"{column_name} column have been replaced by ident")
-                    break
-
-        if variables is None:
-            return df
-        else:
-            diff = set(variables) - set(df.columns)
-            if diff:
-                raise SurveyIOError(f"The following variable(s) {diff} are missing")
-            variables = list(set(variables).intersection(df.columns))
-            df = df[variables]
-            return df
-
-    def insert_table(self, label=None, name=None, **kwargs):
-        """Insert a table in the Survey object.
-
-        If a pandas dataframe is provided, it is saved in the store file
-        """
-        parquet_file = kwargs.pop("parquet_file", None)
-        data_frame = kwargs.pop("data_frame", None)
-        if data_frame is None:
-            # Try without underscore
-            data_frame = kwargs.pop("dataframe", None)
-
-        if data_frame is not None:
-            assert isinstance(data_frame, pandas.DataFrame)
-            variables = kwargs.pop("variables", None)
-            if variables is not None:
-                assert set(variables) < set(data_frame.columns)
-            else:
-                variables = list(data_frame.columns)
-            if label is None:
-                label = name
-            table = Table(label=label, name=name, survey=self, variables=variables, parquet_file=parquet_file)
-            assert (table.survey.hdf5_file_path is not None) or (table.survey.parquet_file_path is not None)
-            if parquet_file is not None:
-                log.debug(f"Saving table {name} in {table.survey.parquet_file_path}")
-                data_frame.to_parquet(parquet_file)
-            else:
-                log.debug(f"Saving table {name} in {table.survey.hdf5_file_path}")
-                to_hdf_kwargs = kwargs.pop("to_hdf_kwargs", {})
-                table.save_data_frame_to_hdf5(data_frame, **to_hdf_kwargs)
-
-        if name not in self.tables:
-            self.tables[name] = {}
-        for key, val in kwargs.items():
-            self.tables[name][key] = val
-
-    def to_json(self):
-        """Convert the survey to a JSON object."""
-        self_json = collections.OrderedDict(())
-        self_json["hdf5_file_path"] = str(self.hdf5_file_path) if self.hdf5_file_path else None
-        self_json["parquet_file_path"] = str(self.parquet_file_path) if self.parquet_file_path else None
-        self_json["label"] = self.label
-        self_json["name"] = self.name
-        self_json["tables"] = self.tables
-        self_json["informations"] = collections.OrderedDict(sorted(self.informations.items()))
-        return self_json
diff --git a/openfisca_survey_manager/tables.py b/openfisca_survey_manager/tables.py
deleted file mode 100644
index 89951176..00000000
--- a/openfisca_survey_manager/tables.py
+++ /dev/null
@@ -1,365 +0,0 @@
-"""Tables."""
-
-import collections
-import csv
-import datetime
-import errno
-import gc
-import logging
-import os
-from pathlib import Path
-
-import pandas
-from chardet.universaldetector import UniversalDetector
-from pyarrow import parquet as pq
-
-from openfisca_survey_manager import read_sas
-from openfisca_survey_manager.exceptions import SurveyIOError
-from openfisca_survey_manager.io.writers import write_table_to_hdf5, write_table_to_parquet
-from openfisca_survey_manager.processing.cleaning import clean_data_frame
-
-try:
-    from openfisca_survey_manager.read_spss import read_spss
-except ImportError:
-    read_spss = None
-
-
-log = logging.getLogger(__name__)
-
-
-reader_by_source_format = {
-    # Rdata = pandas.rpy.common.load_data,
-    "csv": pandas.read_csv,
-    "sas": read_sas.read_sas,
-    "spss": read_spss,
-    "stata": pandas.read_stata,
-    "parquet": pandas.read_parquet,
-}
-
-
-class Table:
-    """A table of a survey."""
-
-    label = None
-    name = None
-    source_format = None
-    survey = None
-    variables = None
-    parquet_file = None
-
-    def __init__(
-        self, survey=None, name=None, label=None, source_format=None, variables=None, parquet_file=None, **kwargs
-    ):
-        assert name is not None, "A table should have a name"
-        self.name = name
-        self.label = label
-        self.source_format = source_format
-        self.variables = variables
-        self.parquet_file = parquet_file
-        self.informations = kwargs
-
-        from .surveys import Survey  # Keep it here to avoid infinite recursion
-
-        assert isinstance(survey, Survey), f"survey is of type {type(survey)} and not {Survey}"
-        self.survey = survey
-        if not survey.tables:
-            survey.tables = collections.OrderedDict()
-
-        survey.tables[name] = collections.OrderedDict(
-            source_format=source_format,
-            variables=variables,
-            parquet_file=parquet_file,
-        )
-
-    def _check_and_log(self, data_file_path, store_file_path):
-        """
-        Check if the file exists and log the insertion.
-
-        Args:
-            data_file_path: Data file path
-            store_file_path: Store file or dir path
-
-        Raises:
-            Exception: File not found
-        """
-        assert store_file_path is not None, "Store file path cannot be None"
-        if not Path(data_file_path).is_file():
-            raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), data_file_path)
-
-        log.info(
-            f"Inserting table {self.name} from file {data_file_path} in store file {store_file_path} "
-            f"at point {self.name}"
-        )
-
-    def _is_stored(self):
-        if self.survey.hdf5_file_path is not None:
-            store = pandas.HDFStore(self.survey.hdf5_file_path)
-            if self.name in store:
-                log.info(f"Exiting without overwriting {self.name} in {self.survey.hdf5_file_path}")
-                store.close()
-                return True
-
-            store.close()
-            return False
-        else:
-            return False
-
-    def _save(self, data_frame: pandas.DataFrame = None, store_format="hdf5"):
-        """
-        Save a data frame in the store according to is format (HDF5 or Parque).
-        """
-        assert data_frame is not None
-        variables = self.variables
-
-        if variables:
-            stored_variables = list(set(variables).intersection(set(data_frame.columns)))
-            log.info(f"The folloging variables are stored: {stored_variables}")
-            if set(stored_variables) != set(variables):
-                log.info(
-                    "variables wanted by the user that were not available: "
-                    f"{list(set(variables) - set(stored_variables))}"
-                )
-            data_frame = data_frame[stored_variables].copy()
-
-        assert store_format in ["hdf5", "parquet"], f"invalid store_format: {store_format}"
-        if store_format == "hdf5":
-            import warnings
-
-            warnings.warn(
-                "HDF5 will no longer be the default format in a future version. Please use parquet format instead.",
-                DeprecationWarning,
-                stacklevel=3,
-            )
-            log.warning(
-                "HDF5 will no longer be the default format in a future version. Please use parquet format instead."
-            )
-            self.save_data_frame_to_hdf5(data_frame)
-        else:
-            parquet_file_path = self.survey.parquet_file_path
-            log.info(f"Inserting table {self.name} in Parquet file {parquet_file_path}")
-            self.save_data_frame_to_parquet(data_frame)
-        gc.collect()
-
-    def fill_store(self, data_file, overwrite: bool = False, clean: bool = False, **kwargs):
-        """
-        Fill the store (HDF5 or parquet file) with the table.
-        Read the `data_file` in parameter and save it to the store.
-
-        Args:
-            data_file (_type_, optional): The data file path. Defaults to None.
-            overwrite (bool, optional): Overwrite the data. Defaults to False.
-            clean (bool, optional): Clean the raw data befoe saving. Defaults to False.
-            store_format (str, optional): _description_. Defaults to "hdf5".
-
-        Raises:
-            e: Skip file if error
-        """
-        if not overwrite and self._is_stored():
-            log.info(f"Exiting without overwriting {self.name} in {self.survey.hdf5_file_path}")
-            return
-
-        start_table_time = datetime.datetime.now()
-        if self.source_format in ["sas", "parquet"] and "encoding" in kwargs:
-            del kwargs["encoding"]
-        data_frame = self.read_source(data_file, **kwargs)
-        try:
-            if clean:
-                clean_data_frame(data_frame)
-            self._save(data_frame=data_frame, store_format=self.survey.store_format)
-            log.info(f"File {data_file} has been processed in {datetime.datetime.now() - start_table_time}")
-        except Exception as e:
-            log.info(f"Skipping file {data_file} because of following error \n {e}")
-            raise e
-
-    def read_parquet_columns(self, parquet_file=None) -> list:
-        """
-        Initialize the table from a parquet file.
-        """
-        if parquet_file is None:
-            parquet_file = self.parquet_file
-        log.info(f"Initializing table {self.name} from parquet file {parquet_file}")
-        self.source_format = "parquet"
-        parquet_schema = pq.read_schema(parquet_file)
-        self.variables = parquet_schema.names
-        self.survey.tables[self.name]["variables"] = self.variables
-        return self.variables
-
-    def read_source(self, data_file, **kwargs):
-        source_format = self.source_format
-        store_file_path = (
-            self.survey.hdf5_file_path if self.survey.store_format == "hdf5" else self.survey.parquet_file_path
-        )
-
-        self._check_and_log(data_file, store_file_path=store_file_path)
-        reader = reader_by_source_format[source_format]
-        # Extract categorical_strategy early - only stata format uses it
-        # Other formats (parquet, csv, etc.) don't support it and will error if passed
-        categorical_strategy = (
-            kwargs.pop("categorical_strategy", "unique_labels")
-            if source_format == "stata"
-            else kwargs.pop("categorical_strategy", None)
-        )
-        try:
-            if source_format == "csv":
-                try:
-                    data_frame = reader(data_file, **kwargs)
-
-                    if len(data_frame.columns) == 1 and ";" in data_frame.columns[0]:
-                        raise SurveyIOError(
-                            "A ';' is present in the unique column name. Looks like we got the wrong separator."
-                        )
-
-                except Exception:
-                    log.debug(f"Failing to read {data_file}, Trying to infer encoding and dialect/separator")
-
-                    # Detect encoding
-                    detector = UniversalDetector()
-                    with Path(data_file).open("rb") as csvfile:
-                        for line in csvfile:
-                            detector.feed(line)
-                            if detector.done:
-                                break
-                        detector.close()
-
-                    encoding = detector.result["encoding"]
-                    confidence = detector.result["confidence"]
-
-                    # Sniff dialect
-                    try:
-                        with Path(data_file).open("r", newline="", encoding=encoding) as csvfile:
-                            dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,")
-                    except Exception:
-                        # Sometimes the sniffer fails, we switch back to the default ... of french statistical data
-                        dialect = None
-                        delimiter = ";"
-
-                    log.debug(
-                        f"dialect.delimiter = {dialect.delimiter if dialect is not None else delimiter}, "
-                        f"encoding = {encoding}, confidence = {confidence}"
-                    )
-                    kwargs["engine"] = "python"
-                    if dialect:
-                        kwargs["dialect"] = dialect
-                    else:
-                        kwargs["delimiter"] = delimiter
-                    kwargs["encoding"] = encoding
-                    data_frame = reader(data_file, **kwargs)
-
-            else:
-                # Remove encoding parameter for pandas 2.0+ compatibility (not supported in read_stata)
-                if "encoding" in kwargs and source_format == "stata":
-                    kwargs.pop("encoding")
-                # Try to read with categoricals, handle non-unique labels with configurable strategy
-                if source_format == "stata":
-                    # categorical_strategy already extracted above
-
-                    try:
-                        # Try reading with default convert_categoricals (True) if not specified
-                        if "convert_categoricals" not in kwargs:
-                            data_frame = reader(data_file, **kwargs)
-                        else:
-                            data_frame = reader(data_file, **kwargs)
-                    except ValueError as e:
-                        if "not unique" in str(e) or "Categorical categories must be unique" in str(e):
-                            log.info(
-                                f"Non-unique value labels detected in {data_file}, "
-                                f"using strategy '{categorical_strategy}'"
-                            )
-
-                            # Read without categoricals first
-                            kwargs_no_cat = kwargs.copy()
-                            kwargs_no_cat["convert_categoricals"] = False
-                            data_frame = reader(data_file, **kwargs_no_cat)
-
-                            # Apply categorical strategy
-                            if categorical_strategy == "unique_labels":
-                                # Solution 2: Make labels unique by adding code suffix
-                                from pandas.io.stata import StataReader
-
-                                stata_reader = StataReader(data_file)
-                                value_labels = stata_reader.value_labels()
-
-                                for col_name, labels in value_labels.items():
-                                    if col_name in data_frame.columns:
-                                        unique_labels = {}
-                                        seen_labels = {}
-
-                                        for code, label in labels.items():
-                                            if pandas.isna(code):
-                                                unique_labels[code] = label
-                                            elif label in seen_labels:
-                                                # Duplicate label: add code as suffix
-                                                unique_labels[code] = f"{label} ({code})"
-                                            else:
-                                                unique_labels[code] = label
-                                                seen_labels[label] = code
-
-                                        # Create mapping code -> unique label
-                                        code_to_label = {code: unique_labels[code] for code in sorted(labels.keys())}
-
-                                        # Map codes to unique labels and create categories
-                                        data_frame[col_name] = data_frame[col_name].map(code_to_label)
-                                        data_frame[col_name] = pandas.Categorical(
-                                            data_frame[col_name],
-                                            categories=list(code_to_label.values()),
-                                            ordered=False,
-                                        )
-
-                            elif categorical_strategy == "codes":
-                                # Solution 1: Use codes as categories
-                                from pandas.io.stata import StataReader
-
-                                stata_reader = StataReader(data_file)
-                                value_labels = stata_reader.value_labels()
-
-                                for col_name, labels in value_labels.items():
-                                    if col_name in data_frame.columns:
-                                        codes = sorted([c for c in labels if pandas.notna(c)])
-                                        if codes:
-                                            data_frame[col_name] = pandas.Categorical(
-                                                data_frame[col_name], categories=codes, ordered=False
-                                            )
-
-                            elif categorical_strategy == "skip":
-                                # Keep as-is (no categories)
-                                pass
-                            else:
-                                log.warning(f"Unknown categorical_strategy '{categorical_strategy}', using 'skip'")
-                        else:
-                            raise
-                else:
-                    data_frame = reader(data_file, **kwargs)
-
-        except Exception as e:
-            log.info(f"Error while reading {data_file}")
-            raise e
-
-        gc.collect()
-        return data_frame
-
-    def save_data_frame_to_hdf5(self, data_frame, **kwargs):
-        """Save a data frame in the HDF5 file format."""
-        hdf5_file_path = self.survey.hdf5_file_path
-        log.info(f"Inserting table {self.name} in HDF file {hdf5_file_path}")
-        store_path = self.name
-        write_table_to_hdf5(
-            data_frame,
-            hdf5_file_path=hdf5_file_path,
-            store_path=store_path,
-            **kwargs,
-        )
-
-        self.variables = list(data_frame.columns)
-
-    def save_data_frame_to_parquet(self, data_frame):
-        """Save a data frame in the Parquet file format."""
-        parquet_file_path = self.survey.parquet_file_path
-        self.parquet_file = write_table_to_parquet(
-            data_frame,
-            parquet_dir_path=parquet_file_path,
-            table_name=self.name,
-        )
-        self.variables = list(data_frame.columns)
-
-        self.survey.tables[self.name]["parquet_file"] = self.parquet_file
-        self.survey.tables[self.name]["variables"] = self.variables
diff --git a/openfisca_survey_manager/temporary.py b/openfisca_survey_manager/temporary.py
index 4b185fb5..59d84fb9 100644
--- a/openfisca_survey_manager/temporary.py
+++ b/openfisca_survey_manager/temporary.py
@@ -5,7 +5,7 @@
 
 from pandas import HDFStore
 
-from openfisca_survey_manager.paths import default_config_files_directory
+from openfisca_survey_manager.configuration.paths import default_config_files_directory
 
 log = logging.getLogger(__name__)
 
diff --git a/openfisca_survey_manager/tests/conftest.py b/openfisca_survey_manager/tests/conftest.py
index 825a5cd1..5d639e77 100644
--- a/openfisca_survey_manager/tests/conftest.py
+++ b/openfisca_survey_manager/tests/conftest.py
@@ -1,8 +1,34 @@
+import json
 from pathlib import Path
 
 import pandas as pd
 import pytest
 
+# Répertoire des données de test (à côté de conftest.py)
+TESTS_DATA_FILES = Path(__file__).resolve().parent / "data_files"
+TEST_RANDOM_GENERATOR_JSON = TESTS_DATA_FILES / "test_random_generator.json"
+
+# Contenu minimal pour que SurveyCollection.load() réussisse
+TEST_RANDOM_GENERATOR_MINIMAL_JSON = {"name": "test_random_generator", "surveys": {}}
+
+
+@pytest.fixture(scope="session", autouse=True)
+def ensure_test_random_generator_json():
+    """Crée test_random_generator.json avec un contenu minimal s'il est absent ou vide."""
+    TESTS_DATA_FILES.mkdir(parents=True, exist_ok=True)
+    if not TEST_RANDOM_GENERATOR_JSON.exists():
+        TEST_RANDOM_GENERATOR_JSON.write_text(
+            json.dumps(TEST_RANDOM_GENERATOR_MINIMAL_JSON, indent=2),
+            encoding="utf-8",
+        )
+    else:
+        raw = TEST_RANDOM_GENERATOR_JSON.read_text(encoding="utf-8").strip()
+        if not raw:
+            TEST_RANDOM_GENERATOR_JSON.write_text(
+                json.dumps(TEST_RANDOM_GENERATOR_MINIMAL_JSON, indent=2),
+                encoding="utf-8",
+            )
+
 
 @pytest.fixture
 def parquet_data(tmp_path: Path):
diff --git a/openfisca_survey_manager/input_dataframe_generator.py b/openfisca_survey_manager/tests/input_dataframe_generator.py
similarity index 96%
rename from openfisca_survey_manager/input_dataframe_generator.py
rename to openfisca_survey_manager/tests/input_dataframe_generator.py
index 4fa18510..3316ac5c 100644
--- a/openfisca_survey_manager/input_dataframe_generator.py
+++ b/openfisca_survey_manager/tests/input_dataframe_generator.py
@@ -1,3 +1,5 @@
+"""Helpers to build input dataframes and fill surveys for tests."""
+
 import configparser
 import logging
 import random
@@ -7,9 +9,12 @@
 import pandas as pd
 from openfisca_core import periods
 
-from openfisca_survey_manager.paths import default_config_files_directory, openfisca_survey_manager_location
-from openfisca_survey_manager.survey_collections import SurveyCollection
-from openfisca_survey_manager.surveys import Survey
+from openfisca_survey_manager.configuration.paths import (
+    default_config_files_directory,
+    openfisca_survey_manager_location,
+)
+from openfisca_survey_manager.core.dataset import SurveyCollection
+from openfisca_survey_manager.core.survey import Survey
 
 log = logging.getLogger(__name__)
 
@@ -27,7 +32,7 @@ def make_input_dataframe_by_entity(tax_benefit_system, nb_persons, nb_groups):
 
       Example:
 
-        >>> from openfisca_survey_manager.input_dataframe_generator import make_input_dataframe_by_entity
+        >>> from openfisca_survey_manager.tests.input_dataframe_generator import make_input_dataframe_by_entity
         >>> from openfisca_country_template import CountryTaxBenefitSystem
         >>> tbs = CountryTaxBenefitSystem()
         >>> input_dataframe_by_entity = make_input_dataframe_by_entity(tbs, 400, 100)
@@ -150,7 +155,7 @@ def randomly_init_variable(
       seed: Random seed used whe ndrawing the values (Default value = None)
 
     Examples
-        >>> from openfisca_survey_manager.input_dataframe_generator import make_input_dataframe_by_entity
+        >>> from openfisca_survey_manager.tests.input_dataframe_generator import make_input_dataframe_by_entity
         >>> from openfisca_country_template import CountryTaxBenefitSystem
         >>> tbs = CountryTaxBenefitSystem()
         >>> input_dataframe_by_entity = make_input_dataframe_by_entity(tbs, 400, 100)
diff --git a/openfisca_survey_manager/tests/test_add_survey_to_collection.py b/openfisca_survey_manager/tests/test_add_survey_to_collection.py
index 3d2c968d..a417f29d 100644
--- a/openfisca_survey_manager/tests/test_add_survey_to_collection.py
+++ b/openfisca_survey_manager/tests/test_add_survey_to_collection.py
@@ -1,11 +1,11 @@
 import pandas as pd
 import pytest
 
-from openfisca_survey_manager.input_dataframe_generator import set_table_in_survey
+from openfisca_survey_manager.core.dataset import SurveyCollection
 from openfisca_survey_manager.scripts.build_collection import (
     add_survey_to_collection,
 )
-from openfisca_survey_manager.survey_collections import SurveyCollection
+from openfisca_survey_manager.tests.input_dataframe_generator import set_table_in_survey
 
 
 @pytest.mark.order(after="test_write_parquet.py::test_write_parquet_one_file_per_entity")
diff --git a/openfisca_survey_manager/tests/test_calibration.py b/openfisca_survey_manager/tests/test_calibration.py
index 9e2bc37b..05f23fcb 100644
--- a/openfisca_survey_manager/tests/test_calibration.py
+++ b/openfisca_survey_manager/tests/test_calibration.py
@@ -1,8 +1,8 @@
 from openfisca_core import periods
 from openfisca_core.tools import assert_near
 
-from openfisca_survey_manager.calibration import Calibration
-from openfisca_survey_manager.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.policy.calibration import Calibration
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
 from openfisca_survey_manager.tests import tax_benefit_system
 from openfisca_survey_manager.tests.test_scenario import (
     create_randomly_initialized_survey_scenario,
@@ -164,4 +164,4 @@ def test_simulation_calibration_input_from_data(tmp_path):
             f"{simulation_name} weight_variable_by_entity does not match {weight_variable_by_entity}"
         )
         assert (survey_scenario.calculate_series("household_weight", period, simulation=simulation_name) != 0).all()
-    return survey_scenario
+    assert survey_scenario is not None
diff --git a/openfisca_survey_manager/tests/test_calmar.py b/openfisca_survey_manager/tests/test_calmar.py
index 93a914d4..884f3997 100644
--- a/openfisca_survey_manager/tests/test_calmar.py
+++ b/openfisca_survey_manager/tests/test_calmar.py
@@ -6,7 +6,7 @@
 import numpy as np
 import pandas as pd
 
-from openfisca_survey_manager.calmar import calmar
+from openfisca_survey_manager.policy.calmar import calmar
 
 
 def create_input_dataframe(entities=1):
diff --git a/openfisca_survey_manager/tests/test_config_manifest_rfc002.py b/openfisca_survey_manager/tests/test_config_manifest_rfc002.py
new file mode 100644
index 00000000..9f5da00c
--- /dev/null
+++ b/openfisca_survey_manager/tests/test_config_manifest_rfc002.py
@@ -0,0 +1,302 @@
+"""Tests for RFC-002: config.yaml and manifest.yaml (new metadata architecture)."""
+
+from pathlib import Path
+
+import pytest
+
+from openfisca_survey_manager.configuration.config_loader import (
+    get_config_dir,
+    load_config,
+    load_manifest,
+    manifest_survey_to_json,
+)
+from openfisca_survey_manager.configuration.paths import openfisca_survey_manager_location
+from openfisca_survey_manager.core.dataset import SurveyCollection
+
+
+@pytest.fixture
+def rfc002_config_dir(tmp_path):
+    """Create a config dir with config.yaml and a dataset with manifest.yaml."""
+    config_dir = tmp_path / "config"
+    config_dir.mkdir()
+    (config_dir / "config.yaml").write_text(
+        """
+collections_dir: {collections}
+default_output_dir: {output}
+tmp_dir: {tmp}
+""".format(
+            collections=tmp_path / "collections",
+            output=tmp_path / "output",
+            tmp=tmp_path / "tmp",
+        )
+    )
+    collections_dir = tmp_path / "collections"
+    collections_dir.mkdir()
+    dataset_dir = collections_dir / "test_dataset"
+    dataset_dir.mkdir()
+    (dataset_dir / "manifest.yaml").write_text(
+        """
+name: test_dataset
+label: "Test dataset (RFC-002)"
+
+surveys:
+  survey_a:
+    label: "Survey A"
+    source:
+      format: csv
+      path: /data/survey_a
+  survey_b:
+    label: "Survey B"
+    source:
+      format: sas
+      path: /data/survey_b
+"""
+    )
+    return config_dir
+
+
+def test_get_config_dir_explicit(tmp_path):
+    assert get_config_dir(tmp_path) == tmp_path.resolve()
+
+
+def test_get_config_dir_env(monkeypatch, tmp_path):
+    monkeypatch.setenv("OPENFISCA_SURVEY_CONFIG_DIR", str(tmp_path))
+    assert get_config_dir() == tmp_path.resolve()
+
+
+def test_load_config_missing(tmp_path):
+    assert load_config(tmp_path) is None
+
+
+def test_load_config_present(rfc002_config_dir):
+    cfg = load_config(rfc002_config_dir)
+    assert cfg is not None
+    assert "collections_dir" in cfg
+    assert "default_output_dir" in cfg
+    assert cfg["collections_dir"].is_dir()
+    assert (cfg["collections_dir"] / "test_dataset").is_dir()
+
+
+def test_load_manifest_missing(tmp_path):
+    assert load_manifest(tmp_path, "nonexistent") is None
+
+
+def test_load_manifest_present(rfc002_config_dir):
+    cfg = load_config(rfc002_config_dir)
+    assert cfg is not None
+    manifest = load_manifest(cfg["collections_dir"], "test_dataset")
+    assert manifest is not None
+    assert manifest["name"] == "test_dataset"
+    assert manifest["label"] == "Test dataset (RFC-002)"
+    assert "survey_a" in manifest["surveys"]
+    assert manifest["surveys"]["survey_a"]["source"]["format"] == "csv"
+    assert manifest["surveys"]["survey_a"]["source"]["path"] == "/data/survey_a"
+
+
+def test_manifest_survey_to_json():
+    entry = {
+        "label": "My survey",
+        "source": {"format": "sas", "path": "/path/to/data"},
+    }
+    out = manifest_survey_to_json("my_survey", entry)
+    assert out["name"] == "my_survey"
+    assert out["label"] == "My survey"
+    assert out["informations"]["sas_files"] == ["/path/to/data"]
+
+
+def test_survey_collection_load_from_manifest(rfc002_config_dir):
+    """SurveyCollection.load(collection=..., config_files_directory=...) uses manifest when config.yaml exists."""
+    col = SurveyCollection.load(
+        collection="test_dataset",
+        config_files_directory=rfc002_config_dir,
+    )
+    assert col.name == "test_dataset"
+    assert col.label == "Test dataset (RFC-002)"
+    assert col.config is None
+    assert col.output_directory is not None
+    assert len(col.surveys) == 2
+    names = {s.name for s in col.surveys}
+    assert names == {"survey_a", "survey_b"}
+    survey_a = col.get_survey("survey_a")
+    assert survey_a.label == "Survey A"
+    assert survey_a.informations.get("csv_files") == ["/data/survey_a"]
+    # Default store_format when missing in manifest is parquet
+    assert survey_a.store_format == "parquet"
+    assert survey_a.parquet_file_path is not None
+    assert "survey_a" in survey_a.parquet_file_path
+
+
+def test_survey_collection_load_from_manifest_store_format_zarr(tmp_path):
+    """When manifest has store_format: zarr, surveys get zarr_file_path set."""
+    config_dir = tmp_path / "config"
+    config_dir.mkdir()
+    (config_dir / "config.yaml").write_text(
+        f"""
+collections_dir: {tmp_path / "collections"}
+default_output_dir: {tmp_path / "output"}
+tmp_dir: {tmp_path / "tmp"}
+"""
+    )
+    collections_dir = tmp_path / "collections"
+    collections_dir.mkdir()
+    dataset_dir = collections_dir / "zarr_dataset"
+    dataset_dir.mkdir()
+    (dataset_dir / "manifest.yaml").write_text(
+        """
+name: zarr_dataset
+label: "Zarr dataset"
+store_format: zarr
+surveys:
+  s1:
+    label: "Survey 1"
+    source:
+      format: csv
+      path: /data/s1
+"""
+    )
+    col = SurveyCollection.load(
+        collection="zarr_dataset",
+        config_files_directory=config_dir,
+    )
+    assert col.output_directory is not None
+    survey_s1 = col.get_survey("s1")
+    assert survey_s1.store_format == "zarr"
+    assert survey_s1.zarr_file_path is not None
+    assert survey_s1.zarr_file_path.endswith(".zarr")
+    assert survey_s1.hdf5_file_path is None
+    assert survey_s1.parquet_file_path is None
+
+
+def test_survey_collection_load_legacy_unchanged(tmp_path):
+    """Legacy config.ini + JSON still works when config.yaml is absent (emits DeprecationWarning)."""
+    # Use the package test data dir which has config.ini and fake.json
+    tests_data = Path(openfisca_survey_manager_location) / "openfisca_survey_manager" / "tests" / "data_files"
+    if not (tests_data / "config.ini").exists():
+        pytest.skip("config.ini not present in tests/data_files")
+    if not (tests_data / "fake.json").exists():
+        pytest.skip("fake.json not present in tests/data_files")
+    with pytest.warns(DeprecationWarning, match="config.ini and JSON files is deprecated"):
+        col = SurveyCollection.load(
+            collection="fake",
+            config_files_directory=tests_data,
+        )
+    assert col.config is not None
+    assert col.name == "fake"
+    assert len(col.surveys) >= 0
+
+
+# --- Migration script tests ---
+
+
+@pytest.fixture
+def legacy_config_dir(tmp_path):
+    """Create a minimal legacy config dir: config.ini + one collection JSON."""
+    config_dir = tmp_path / "legacy_config"
+    config_dir.mkdir()
+    collections_dir = tmp_path / "legacy_collections"
+    collections_dir.mkdir()
+    json_path = collections_dir / "my_collection.json"
+    json_path.write_text(
+        """
+{
+  "name": "my_collection",
+  "label": "My collection",
+  "surveys": {
+    "survey_1": {
+      "label": "Survey 1",
+      "informations": {
+        "csv_files": ["/data/s1/file1.csv"]
+      }
+    },
+    "survey_2": {
+      "label": "Survey 2",
+      "informations": {
+        "sas_files": ["/data/s2/file.sas7bdat"]
+      }
+    }
+  }
+}
+""",
+        encoding="utf-8",
+    )
+    config_ini = config_dir / "config.ini"
+    config_ini.write_text(
+        f"""[collections]
+collections_directory = {collections_dir}
+my_collection = {json_path}
+
+[data]
+output_directory = {tmp_path / "output"}
+tmp_directory = /tmp
+""",
+        encoding="utf-8",
+    )
+    return config_dir
+
+
+def test_migrate_produces_config_yaml_and_manifests(legacy_config_dir):
+    """Migration script creates config.yaml and collection manifest."""
+    from openfisca_survey_manager.scripts.migrate_config_to_rfc002 import (
+        CONFIG_FILENAME,
+        MANIFEST_FILENAME,
+        migrate,
+    )
+
+    ok = migrate(legacy_config_dir, dry_run=False)
+    assert ok is True
+    config_yaml = legacy_config_dir / CONFIG_FILENAME
+    assert config_yaml.is_file()
+    cfg = load_config(legacy_config_dir)
+    assert cfg is not None
+    assert "collections_dir" in cfg
+    assert (Path(cfg["collections_dir"]) / "my_collection" / MANIFEST_FILENAME).is_file()
+    manifest = load_manifest(cfg["collections_dir"], "my_collection")
+    assert manifest is not None
+    assert manifest["name"] == "my_collection"
+    assert manifest.get("store_format") == "parquet"
+    assert manifest["surveys"]["survey_1"]["source"]["format"] == "csv"
+    assert manifest["surveys"]["survey_1"]["source"]["path"] == "/data/s1/file1.csv"
+    assert manifest["surveys"]["survey_2"]["source"]["format"] == "sas"
+    assert manifest["surveys"]["survey_2"]["source"]["path"] == "/data/s2/file.sas7bdat"
+
+
+def test_migrate_infers_store_format_from_legacy(tmp_path):
+    """Migration infers store_format from legacy JSON (hdf5_file_path -> hdf5, etc.)."""
+    from openfisca_survey_manager.scripts.migrate_config_to_rfc002 import (
+        _infer_store_format_from_legacy,
+        build_manifest_from_json,
+    )
+
+    # Legacy with parquet_file_path
+    json_parquet = tmp_path / "p.json"
+    json_parquet.write_text(
+        '{"name":"p","label":"P","surveys":{"s1":{"label":"S1","parquet_file_path":"/out/s1"}}}',
+        encoding="utf-8",
+    )
+    manifest_parquet = build_manifest_from_json(json_parquet, None)
+    assert manifest_parquet["store_format"] == "parquet"
+
+    # Legacy with hdf5_file_path only
+    json_hdf5 = tmp_path / "h.json"
+    json_hdf5.write_text(
+        '{"name":"h","label":"H","surveys":{"s1":{"label":"S1","hdf5_file_path":"/out/s1.h5"}}}',
+        encoding="utf-8",
+    )
+    manifest_hdf5 = build_manifest_from_json(json_hdf5, None)
+    assert manifest_hdf5["store_format"] == "hdf5"
+
+    # Infer function directly
+    assert _infer_store_format_from_legacy({}) == "parquet"
+    assert _infer_store_format_from_legacy({"s": {"zarr_file_path": "/z"}}) == "zarr"
+
+
+def test_migrate_dry_run_does_not_write(legacy_config_dir):
+    """Migration with --dry-run does not create files."""
+    from openfisca_survey_manager.scripts.migrate_config_to_rfc002 import (
+        CONFIG_FILENAME,
+        migrate,
+    )
+
+    ok = migrate(legacy_config_dir, dry_run=True)
+    assert ok is True
+    assert not (legacy_config_dir / CONFIG_FILENAME).is_file()
diff --git a/openfisca_survey_manager/tests/test_coverage_boost.py b/openfisca_survey_manager/tests/test_coverage_boost.py
index 3e1e6930..845843e1 100644
--- a/openfisca_survey_manager/tests/test_coverage_boost.py
+++ b/openfisca_survey_manager/tests/test_coverage_boost.py
@@ -5,22 +5,21 @@
 import pytest
 from openfisca_core import periods
 
-from openfisca_survey_manager.aggregates import AbstractAggregates
-from openfisca_survey_manager.input_dataframe_generator import (
-    make_input_dataframe_by_entity,
-    randomly_init_variable,
-    set_table_in_survey,
-)
-from openfisca_survey_manager.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.core.dataset import SurveyCollection
+from openfisca_survey_manager.core.survey import Survey
+from openfisca_survey_manager.policy import AbstractAggregates
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.policy.variables import quantile
 from openfisca_survey_manager.scripts.build_collection import (
     check_template_config_files,
     create_data_file_by_format,
 )
-from openfisca_survey_manager.survey_collections import SurveyCollection
-from openfisca_survey_manager.surveys import Survey
 from openfisca_survey_manager.tests import tax_benefit_system
-from openfisca_survey_manager.utils import do_nothing
-from openfisca_survey_manager.variables import quantile
+from openfisca_survey_manager.tests.input_dataframe_generator import (
+    make_input_dataframe_by_entity,
+    randomly_init_variable,
+    set_table_in_survey,
+)
 
 
 def setup_test_config(config_files_directory: Path):
@@ -135,17 +134,6 @@ def test_build_collection_helpers(tmp_path):
     create_data_file_by_format(str(tmp_path))
 
 
-def test_google_colab_boost():
-    from openfisca_survey_manager.google_colab import create_raw_data_ini
-
-    with suppress(Exception):
-        create_raw_data_ini({"test": {"opt": "val"}})
-
-
-def test_utils_do_nothing():
-    assert do_nothing(1, a=2) is None
-
-
 def test_matching_mock_extended(monkeypatch):
     import sys
 
@@ -168,7 +156,7 @@ def test_matching_mock_extended(monkeypatch):
     )
     monkeypatch.setitem(sys.modules, "rpy2", fake_rpy2)
     monkeypatch.setitem(sys.modules, "rpy2.robjects", fake_rpy2.robjects)
-    from openfisca_survey_manager.matching import nnd_hotdeck_using_feather, nnd_hotdeck_using_rpy2
+    from openfisca_survey_manager.policy.matching import nnd_hotdeck_using_feather, nnd_hotdeck_using_rpy2
 
     receiver = pd.DataFrame({"a": [1], "c": [1]})
     donor = pd.DataFrame({"a": [1], "b": [2], "c": [1]})
diff --git a/openfisca_survey_manager/tests/test_enum.py b/openfisca_survey_manager/tests/test_enum.py
index 8189acb8..8d6277c4 100644
--- a/openfisca_survey_manager/tests/test_enum.py
+++ b/openfisca_survey_manager/tests/test_enum.py
@@ -1,7 +1,7 @@
 import pandas as pd
 from openfisca_country_template.variables.housing import HousingOccupancyStatus
 
-from openfisca_survey_manager.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
 from openfisca_survey_manager.tests import tax_benefit_system
 
 
diff --git a/openfisca_survey_manager/tests/test_legislation_inflator.py b/openfisca_survey_manager/tests/test_legislation_inflator.py
index 7344a7b7..986174fb 100644
--- a/openfisca_survey_manager/tests/test_legislation_inflator.py
+++ b/openfisca_survey_manager/tests/test_legislation_inflator.py
@@ -1,7 +1,7 @@
 from openfisca_core import periods
 from openfisca_country_template import CountryTaxBenefitSystem
 
-from openfisca_survey_manager.utils import inflate_parameters, parameters_asof
+from openfisca_survey_manager.policy.legislation_asof import inflate_parameters, parameters_asof
 
 
 def test_asof_simple_annual_parameter():
diff --git a/openfisca_survey_manager/tests/test_matching.py b/openfisca_survey_manager/tests/test_matching.py
index 10d91255..ecc1b39e 100644
--- a/openfisca_survey_manager/tests/test_matching.py
+++ b/openfisca_survey_manager/tests/test_matching.py
@@ -2,7 +2,7 @@
 
 import pandas as pd
 
-from openfisca_survey_manager.matching import nnd_hotdeck_using_rpy2
+from openfisca_survey_manager.policy.matching import nnd_hotdeck_using_rpy2
 
 try:
     import rpy2
diff --git a/openfisca_survey_manager/tests/test_parquet.py b/openfisca_survey_manager/tests/test_parquet.py
index 6612874f..d1f4b51b 100644
--- a/openfisca_survey_manager/tests/test_parquet.py
+++ b/openfisca_survey_manager/tests/test_parquet.py
@@ -4,13 +4,13 @@
 import pytest
 from openfisca_core import periods
 
-from openfisca_survey_manager.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.core.dataset import SurveyCollection
+from openfisca_survey_manager.core.survey import NoMoreDataError
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
 from openfisca_survey_manager.scripts.build_collection import (
     add_survey_to_collection,
     build_survey_collection,
 )
-from openfisca_survey_manager.survey_collections import SurveyCollection
-from openfisca_survey_manager.surveys import NoMoreDataError
 from openfisca_survey_manager.tests import tax_benefit_system
 
 logger = logging.getLogger(__name__)
diff --git a/openfisca_survey_manager/tests/test_quantile.py b/openfisca_survey_manager/tests/test_quantile.py
index 6c33c69b..87b40a1c 100644
--- a/openfisca_survey_manager/tests/test_quantile.py
+++ b/openfisca_survey_manager/tests/test_quantile.py
@@ -4,11 +4,11 @@
 from openfisca_core.model_api import YEAR, Variable
 from openfisca_core.taxbenefitsystems import TaxBenefitSystem
 
-from openfisca_survey_manager.paths import default_config_files_directory
-from openfisca_survey_manager.scenarios.abstract_scenario import AbstractSurveyScenario
-from openfisca_survey_manager.statshelpers import mark_weighted_percentiles
+from openfisca_survey_manager.configuration.paths import default_config_files_directory
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.policy.statshelpers import mark_weighted_percentiles
+from openfisca_survey_manager.policy.variables import quantile
 from openfisca_survey_manager.tests.test_scenario import setup_test_config
-from openfisca_survey_manager.variables import quantile
 
 Individu = build_entity(
     key="individu",
diff --git a/openfisca_survey_manager/tests/test_read_sas.py b/openfisca_survey_manager/tests/test_read_sas.py
index 455c87b6..27a77ef8 100644
--- a/openfisca_survey_manager/tests/test_read_sas.py
+++ b/openfisca_survey_manager/tests/test_read_sas.py
@@ -6,8 +6,8 @@
 
 from pandas.testing import assert_frame_equal
 
-from openfisca_survey_manager.paths import openfisca_survey_manager_location
-from openfisca_survey_manager.read_sas import read_sas
+from openfisca_survey_manager.configuration.paths import openfisca_survey_manager_location
+from openfisca_survey_manager.io.readers import read_sas
 
 
 def test():
diff --git a/openfisca_survey_manager/tests/test_scenario.py b/openfisca_survey_manager/tests/test_scenario.py
index 13fa5c0a..941e5626 100644
--- a/openfisca_survey_manager/tests/test_scenario.py
+++ b/openfisca_survey_manager/tests/test_scenario.py
@@ -7,17 +7,17 @@
 from openfisca_core import periods
 from openfisca_core.tools import assert_near
 
-from openfisca_survey_manager.input_dataframe_generator import (
+from openfisca_survey_manager.configuration.paths import (
+    default_config_files_directory,
+)
+from openfisca_survey_manager.policy.scenarios.abstract_scenario import AbstractSurveyScenario
+from openfisca_survey_manager.policy.scenarios.reform_scenario import ReformScenario
+from openfisca_survey_manager.tests import tax_benefit_system
+from openfisca_survey_manager.tests.input_dataframe_generator import (
     make_input_dataframe_by_entity,
     random_data_generator,
     randomly_init_variable,
 )
-from openfisca_survey_manager.paths import (
-    default_config_files_directory,
-)
-from openfisca_survey_manager.scenarios.abstract_scenario import AbstractSurveyScenario
-from openfisca_survey_manager.scenarios.reform_scenario import ReformScenario
-from openfisca_survey_manager.tests import tax_benefit_system
 
 log = logging.getLogger(__name__)
 
diff --git a/openfisca_survey_manager/tests/test_store_backends.py b/openfisca_survey_manager/tests/test_store_backends.py
new file mode 100644
index 00000000..6e08e56b
--- /dev/null
+++ b/openfisca_survey_manager/tests/test_store_backends.py
@@ -0,0 +1,50 @@
+"""Tests for store backends (HDF5, Parquet, Zarr)."""
+
+from pathlib import Path
+
+import pandas as pd
+import pytest
+
+from openfisca_survey_manager.io.backends import (
+    get_available_backend_names,
+    get_backend,
+)
+
+
+def test_get_backend_hdf5():
+    backend = get_backend("hdf5")
+    assert backend is not None
+    assert hasattr(backend, "write_table") and hasattr(backend, "read_table")
+    assert hasattr(backend, "table_exists")
+
+
+def test_get_backend_parquet():
+    backend = get_backend("parquet")
+    assert backend is not None
+
+
+def test_get_backend_invalid_raises():
+    with pytest.raises(ValueError, match="Unknown store backend"):
+        get_backend("invalid_format")
+
+
+def test_parquet_backend_roundtrip(tmp_path):
+    backend = get_backend("parquet")
+    store_path = str(tmp_path / "survey")
+    store_path_path = Path(store_path)
+    store_path_path.mkdir(parents=True)
+    df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
+    backend.write_table(store_path, "mytable", df)
+    assert backend.table_exists(store_path, "mytable")
+    df2 = backend.read_table(store_path, "mytable")
+    pd.testing.assert_frame_equal(df, df2)
+    df3 = backend.read_table(store_path, "mytable", variables=["a"])
+    assert list(df3.columns) == ["a"]
+
+
+def test_available_backends_include_hdf5_parquet():
+    names = get_available_backend_names()
+    assert "hdf5" in names
+    assert "parquet" in names
+    # zarr only if zarr package installed
+    assert len(names) >= 2
diff --git a/openfisca_survey_manager/tests/test_surveys.py b/openfisca_survey_manager/tests/test_surveys.py
index de2556be..d3891ca7 100644
--- a/openfisca_survey_manager/tests/test_surveys.py
+++ b/openfisca_survey_manager/tests/test_surveys.py
@@ -1,9 +1,9 @@
 import pandas as pd
 import pytest
 
-from openfisca_survey_manager.input_dataframe_generator import set_table_in_survey
-from openfisca_survey_manager.survey_collections import SurveyCollection
-from openfisca_survey_manager.surveys import Survey
+from openfisca_survey_manager.core.dataset import SurveyCollection
+from openfisca_survey_manager.core.survey import Survey
+from openfisca_survey_manager.tests.input_dataframe_generator import set_table_in_survey
 
 
 @pytest.fixture
diff --git a/openfisca_survey_manager/tests/test_tax_benefit_system_asof.py b/openfisca_survey_manager/tests/test_tax_benefit_system_asof.py
index 994f7f44..3318df92 100644
--- a/openfisca_survey_manager/tests/test_tax_benefit_system_asof.py
+++ b/openfisca_survey_manager/tests/test_tax_benefit_system_asof.py
@@ -2,7 +2,7 @@
 from openfisca_core.parameters import ParameterNode, Scale
 from openfisca_country_template import CountryTaxBenefitSystem
 
-from openfisca_survey_manager.utils import parameters_asof, variables_asof
+from openfisca_survey_manager.policy.legislation_asof import parameters_asof, variables_asof
 
 
 def check_max_instant_leaf(sub_parameter, instant):
diff --git a/openfisca_survey_manager/tests/test_top_bottom_share.py b/openfisca_survey_manager/tests/test_top_bottom_share.py
index 05497b56..acf6b362 100644
--- a/openfisca_survey_manager/tests/test_top_bottom_share.py
+++ b/openfisca_survey_manager/tests/test_top_bottom_share.py
@@ -1,6 +1,6 @@
 import numpy as np
 
-from openfisca_survey_manager.statshelpers import bottom_share, top_share
+from openfisca_survey_manager.policy.statshelpers import bottom_share, top_share
 
 size = 1000
 x = np.ones(size) + np.random.uniform(0, 0.00000001, size)
diff --git a/openfisca_survey_manager/utils.py b/openfisca_survey_manager/utils.py
deleted file mode 100644
index 24ba9c9e..00000000
--- a/openfisca_survey_manager/utils.py
+++ /dev/null
@@ -1,70 +0,0 @@
-"""Utilities: re-exports from common.misc + load_table (survey-dependent)."""
-
-import logging
-from typing import Optional
-
-import pandas as pd
-
-from openfisca_survey_manager.common.misc import (
-    asof,
-    do_nothing,
-    inflate_parameter_leaf,
-    inflate_parameters,
-    parameters_asof,
-    stata_files_to_data_frames,
-    variables_asof,
-)
-from openfisca_survey_manager.survey_collections import SurveyCollection
-
-log = logging.getLogger(__name__)
-
-__all__ = [
-    "asof",
-    "do_nothing",
-    "inflate_parameter_leaf",
-    "inflate_parameters",
-    "load_table",
-    "parameters_asof",
-    "stata_files_to_data_frames",
-    "variables_asof",
-]
-
-
-def load_table(
-    config_files_directory,
-    variables: Optional[list] = None,
-    collection: Optional[str] = None,
-    survey: Optional[str] = None,
-    input_data_survey_prefix: Optional[str] = None,
-    data_year=None,
-    table: Optional[str] = None,
-    batch_size=None,
-    batch_index=0,
-    filter_by=None,
-) -> pd.DataFrame:
-    """
-    Load values from table from a survey in a collection.
-
-    Args:
-        config_files_directory : _description_.
-        variables (List, optional): List of the variables to retrieve in the table.
-            Defaults to None to get all the variables.
-        collection (str, optional): Collection. Defaults to None.
-        survey (str, optional): Survey. Defaults to None.
-        input_data_survey_prefix (str, optional): Prefix of the survey to be combined with data year. Defaults to None.
-        data_year (_type_, optional): Year of the survey data. Defaults to None.
-        table (str, optional): Table. Defaults to None.
-
-    Returns:
-        pandas.DataFrame: A table with the retrieved variables
-    """
-    survey_collection = SurveyCollection.load(collection=collection, config_files_directory=config_files_directory)
-    survey = survey if survey is not None else f"{input_data_survey_prefix}_{data_year}"
-    survey_ = survey_collection.get_survey(survey)
-    log.debug(f"Loading table {table} in survey {survey} from collection {collection}")
-    if batch_size:
-        return survey_.get_values(
-            table=table, variables=variables, batch_size=batch_size, batch_index=batch_index, filter_by=filter_by
-        )
-    else:
-        return survey_.get_values(table=table, variables=variables, filter_by=filter_by)
diff --git a/pyproject.toml b/pyproject.toml
index a25a5be4..0e6c7805 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "OpenFisca-Survey-Manager"
-version = "6.3.1"
+version = "8.0.0"
 description = "A tool for managing survey/administrative data and import them in OpenFisca"
 readme = "README.md"
 keywords = ["microsimulation", "tax", "benefit", "rac", "rules-as-code", "survey", "data"]
@@ -52,6 +52,9 @@ build-backend = "setuptools.build_meta"
 build-collection = "openfisca_survey_manager.scripts.build_collection:main"
 
 [project.optional-dependencies]
+zarr = [
+    'zarr >=2.18.0, < 3.0',
+]
 matching = [
     # 'feather',
     'rpy2 >=3.5.10, < 4.0'
diff --git a/uv.lock b/uv.lock
index 966c17f3..a5c605b4 100644
--- a/uv.lock
+++ b/uv.lock
@@ -9,6 +9,12 @@ resolution-markers = [
     "python_full_version < '3.10'",
 ]
 
+[[package]]
+name = "asciitree"
+version = "0.3.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/2d/6a/885bc91484e1aa8f618f6f0228d76d0e67000b0fdd6090673b777e311913/asciitree-0.3.3.tar.gz", hash = "sha256:4aa4b9b649f85e3fcb343363d97564aa1fb62e249677f2e18a96765145cc0f6e", size = 3951 }
+
 [[package]]
 name = "asttokens"
 version = "3.0.0"
@@ -439,6 +445,18 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/4e/8c/f3147f5c4b73e7550fe5f9352eaa956ae838d5c51eb58e7a25b9f3e2643b/decorator-5.2.1-py3-none-any.whl", hash = "sha256:d316bb415a2d9e2d2b3abcc4084c6502fc09240e292cd76a76afc106a1c8e04a", size = 9190 },
 ]
 
+[[package]]
+name = "deprecated"
+version = "1.3.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "wrapt", marker = "python_full_version >= '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/49/85/12f0a49a7c4ffb70572b6c2ef13c90c88fd190debda93b23f026b25f9634/deprecated-1.3.1.tar.gz", hash = "sha256:b1b50e0ff0c1fddaa5708a2c6b0a6588bb09b892825ab2b214ac9ea9d92a5223", size = 2932523 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/84/d0/205d54408c08b13550c733c4b85429e7ead111c7f0014309637425520a9a/deprecated-1.3.1-py2.py3-none-any.whl", hash = "sha256:597bfef186b6f60181535a29fbe44865ce137a5079f295b479886c82729d5f3f", size = 11298 },
+]
+
 [[package]]
 name = "distlib"
 version = "0.4.0"
@@ -497,6 +515,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/30/c3/6f0e3896f193528bbd2b4d2122d4be8108a37efab0b8475855556a8c4afa/fancycompleter-0.11.1-py3-none-any.whl", hash = "sha256:44243d7fab37087208ca5acacf8f74c0aa4d733d04d593857873af7513cdf8a6", size = 11207 },
 ]
 
+[[package]]
+name = "fasteners"
+version = "0.20"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/2d/18/7881a99ba5244bfc82f06017316ffe93217dbbbcfa52b887caa1d4f2a6d3/fasteners-0.20.tar.gz", hash = "sha256:55dce8792a41b56f727ba6e123fcaee77fd87e638a6863cec00007bfea84c8d8", size = 25087 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/51/ac/e5d886f892666d2d1e5cb8c1a41146e1d79ae8896477b1153a21711d3b44/fasteners-0.20-py3-none-any.whl", hash = "sha256:9422c40d1e350e4259f509fb2e608d6bc43c0136f79a00db1b49046029d0b3b7", size = 18702 },
+]
+
 [[package]]
 name = "filelock"
 version = "3.19.1"
@@ -999,6 +1026,96 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/d2/1d/1b658dbd2b9fa9c4c9f32accbfc0205d532c8c6194dc0f2a4c0428e7128a/nodeenv-1.9.1-py2.py3-none-any.whl", hash = "sha256:ba11c9782d29c27c70ffbdda2d7415098754709be8a7056d79a737cd901155c9", size = 22314 },
 ]
 
+[[package]]
+name = "numcodecs"
+version = "0.12.1"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+    "python_full_version < '3.10'",
+]
+dependencies = [
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.10'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b7/1b/1f1d880e29e719c7c6205065d1afbc91114c0d91935ac419faa43e5e08b0/numcodecs-0.12.1.tar.gz", hash = "sha256:05d91a433733e7eef268d7e80ec226a0232da244289614a8f3826901aec1098e", size = 4091415 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/9f/66/08744c9007f1d02476dd97f3c23032f3555dbb8e9a32b0f0ea4724e6b2a2/numcodecs-0.12.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:d37f628fe92b3699e65831d5733feca74d2e33b50ef29118ffd41c13c677210e", size = 1696843 },
+    { url = "https://files.pythonhosted.org/packages/b8/6f/a04a33c5edb8fa9ba63783d34ff5768ba6b562ebe11078c07848e283f4ad/numcodecs-0.12.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:941b7446b68cf79f089bcfe92edaa3b154533dcbcd82474f994b28f2eedb1c60", size = 1422578 },
+    { url = "https://files.pythonhosted.org/packages/1e/b8/1040f299803eacc9c522fdc69a4dafc42ad0e8722bb48aa43d2310cf195b/numcodecs-0.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0e79bf9d1d37199ac00a60ff3adb64757523291d19d03116832e600cac391c51", size = 7709402 },
+    { url = "https://files.pythonhosted.org/packages/8c/fa/da0637e1a6db74361a2875425021957859749166c0174ddedbb629518970/numcodecs-0.12.1-cp310-cp310-win_amd64.whl", hash = "sha256:82d7107f80f9307235cb7e74719292d101c7ea1e393fe628817f0d635b7384f5", size = 790204 },
+    { url = "https://files.pythonhosted.org/packages/10/63/a50f4113a2bb1decfaedeffc448c5f8b26ded1c583247c893120fcd25e3e/numcodecs-0.12.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:eeaf42768910f1c6eebf6c1bb00160728e62c9343df9e2e315dc9fe12e3f6071", size = 1696786 },
+    { url = "https://files.pythonhosted.org/packages/92/77/0fde34bf3a8402d696218a565230097d904c9eebb62cd952923b1155b7f7/numcodecs-0.12.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:135b2d47563f7b9dc5ee6ce3d1b81b0f1397f69309e909f1a35bb0f7c553d45e", size = 1422330 },
+    { url = "https://files.pythonhosted.org/packages/14/e6/8f9d4a498a06f11a06297f0b02af9968844d2e40ee79d372ccee33595285/numcodecs-0.12.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a191a8e347ecd016e5c357f2bf41fbcb026f6ffe78fff50c77ab12e96701d155", size = 7949787 },
+    { url = "https://files.pythonhosted.org/packages/08/f3/44597198c2cfb0d808d68583445b60b0d0ae057f20f0caf2a1200405655e/numcodecs-0.12.1-cp311-cp311-win_amd64.whl", hash = "sha256:21d8267bd4313f4d16f5b6287731d4c8ebdab236038f29ad1b0e93c9b2ca64ee", size = 790313 },
+    { url = "https://files.pythonhosted.org/packages/d7/b2/7842675a798e79686d14a20baa554b165aab86feac28f32695266ab42b7e/numcodecs-0.12.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:2f84df6b8693206365a5b37c005bfa9d1be486122bde683a7b6446af4b75d862", size = 1697725 },
+    { url = "https://files.pythonhosted.org/packages/fc/1f/e3b033181a28ce153fd0c9acd3ed978ee9c424de7cc3d8e97fc60647eddf/numcodecs-0.12.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:760627780a8b6afdb7f942f2a0ddaf4e31d3d7eea1d8498cf0fd3204a33c4618", size = 1423927 },
+    { url = "https://files.pythonhosted.org/packages/3b/88/fb3186f944b9586e9c4c54bd1d1899947b88465ad3ab1ff1111066871644/numcodecs-0.12.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c258bd1d3dfa75a9b708540d23b2da43d63607f9df76dfa0309a7597d1de3b73", size = 7944856 },
+    { url = "https://files.pythonhosted.org/packages/f4/03/54e22e273d584e83100ffa60c47c29cae905015ecb1f693918072c3595b9/numcodecs-0.12.1-cp312-cp312-win_amd64.whl", hash = "sha256:e04649ea504aff858dbe294631f098fbfd671baf58bfc04fc48d746554c05d67", size = 787000 },
+    { url = "https://files.pythonhosted.org/packages/dd/3c/950f816b837fc7714102b45491e2612b10757106f9a8e3785d7b3806acd4/numcodecs-0.12.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:2fbb12a6a1abe95926f25c65e283762d63a9bf9e43c0de2c6a1a798347dfcb40", size = 1700073 },
+    { url = "https://files.pythonhosted.org/packages/76/2f/19f4f012f253ff33948a024e0a814c758ea137e3ba86118daac83a8d9123/numcodecs-0.12.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:f2207871868b2464dc11c513965fd99b958a9d7cde2629be7b2dc84fdaab013b", size = 1425835 },
+    { url = "https://files.pythonhosted.org/packages/6d/0f/0442e80d707b5dd2e177a9490c25b89aa6a6c44579de8ec223e78a8884da/numcodecs-0.12.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:abff3554a6892a89aacf7b642a044e4535499edf07aeae2f2e6e8fc08c9ba07f", size = 7722207 },
+    { url = "https://files.pythonhosted.org/packages/77/b6/345f8648874a81232bc1a87e55a771430488a832c68f873aa6ed23a1dedf/numcodecs-0.12.1-cp39-cp39-win_amd64.whl", hash = "sha256:ef964d4860d3e6b38df0633caf3e51dc850a6293fd8e93240473642681d95136", size = 792870 },
+]
+
+[[package]]
+name = "numcodecs"
+version = "0.13.1"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+    "python_full_version == '3.10.*'",
+]
+dependencies = [
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.10.*'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/85/56/8895a76abe4ec94ebd01eeb6d74f587bc4cddd46569670e1402852a5da13/numcodecs-0.13.1.tar.gz", hash = "sha256:a3cf37881df0898f3a9c0d4477df88133fe85185bffe57ba31bcc2fa207709bc", size = 5955215 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/14/c0/6d72cde772bcec196b7188731d41282993b2958440f77fdf0db216f722da/numcodecs-0.13.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:96add4f783c5ce57cc7e650b6cac79dd101daf887c479a00a29bc1487ced180b", size = 1580012 },
+    { url = "https://files.pythonhosted.org/packages/94/1d/f81fc1fa9210bbea97258242393a1f9feab4f6d8fb201f81f76003005e4b/numcodecs-0.13.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:237b7171609e868a20fd313748494444458ccd696062f67e198f7f8f52000c15", size = 1176919 },
+    { url = "https://files.pythonhosted.org/packages/16/e4/b9ec2f4dfc34ecf724bc1beb96a9f6fa9b91801645688ffadacd485089da/numcodecs-0.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:96e42f73c31b8c24259c5fac6adba0c3ebf95536e37749dc6c62ade2989dca28", size = 8625842 },
+    { url = "https://files.pythonhosted.org/packages/fe/90/299952e1477954ec4f92813fa03e743945e3ff711bb4f6c9aace431cb3da/numcodecs-0.13.1-cp310-cp310-win_amd64.whl", hash = "sha256:eda7d7823c9282e65234731fd6bd3986b1f9e035755f7fed248d7d366bb291ab", size = 828638 },
+    { url = "https://files.pythonhosted.org/packages/f0/78/34b8e869ef143e88d62e8231f4dbfcad85e5c41302a11fc5bd2228a13df5/numcodecs-0.13.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:2eda97dd2f90add98df6d295f2c6ae846043396e3d51a739ca5db6c03b5eb666", size = 1580199 },
+    { url = "https://files.pythonhosted.org/packages/3b/cf/f70797d86bb585d258d1e6993dced30396f2044725b96ce8bcf87a02be9c/numcodecs-0.13.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2a86f5367af9168e30f99727ff03b27d849c31ad4522060dde0bce2923b3a8bc", size = 1177203 },
+    { url = "https://files.pythonhosted.org/packages/a8/b5/d14ad69b63fde041153dfd05d7181a49c0d4864de31a7a1093c8370da957/numcodecs-0.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:233bc7f26abce24d57e44ea8ebeb5cd17084690b4e7409dd470fdb75528d615f", size = 8868743 },
+    { url = "https://files.pythonhosted.org/packages/13/d4/27a7b5af0b33f6d61e198faf177fbbf3cb83ff10d9d1a6857b7efc525ad5/numcodecs-0.13.1-cp311-cp311-win_amd64.whl", hash = "sha256:796b3e6740107e4fa624cc636248a1580138b3f1c579160f260f76ff13a4261b", size = 829603 },
+    { url = "https://files.pythonhosted.org/packages/37/3a/bc09808425e7d3df41e5fc73fc7a802c429ba8c6b05e55f133654ade019d/numcodecs-0.13.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:5195bea384a6428f8afcece793860b1ab0ae28143c853f0b2b20d55a8947c917", size = 1575806 },
+    { url = "https://files.pythonhosted.org/packages/3a/cc/dc74d0bfdf9ec192332a089d199f1e543e747c556b5659118db7a437dcca/numcodecs-0.13.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3501a848adaddce98a71a262fee15cd3618312692aa419da77acd18af4a6a3f6", size = 1178233 },
+    { url = "https://files.pythonhosted.org/packages/d4/ce/434e8e3970b8e92ae9ab6d9db16cb9bc7aa1cd02e17c11de6848224100a1/numcodecs-0.13.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:da2230484e6102e5fa3cc1a5dd37ca1f92dfbd183d91662074d6f7574e3e8f53", size = 8857827 },
+    { url = "https://files.pythonhosted.org/packages/83/e7/1d8b1b266a92f9013c755b1c146c5ad71a2bff147ecbc67f86546a2e4d6a/numcodecs-0.13.1-cp312-cp312-win_amd64.whl", hash = "sha256:e5db4824ebd5389ea30e54bc8aeccb82d514d28b6b68da6c536b8fa4596f4bca", size = 826539 },
+    { url = "https://files.pythonhosted.org/packages/83/8b/06771dead2cc4a8ae1ea9907737cf1c8d37a323392fa28f938a586373468/numcodecs-0.13.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:7a60d75179fd6692e301ddfb3b266d51eb598606dcae7b9fc57f986e8d65cb43", size = 1571660 },
+    { url = "https://files.pythonhosted.org/packages/f9/ea/d925bf85f92dfe4635356018da9fe4bfecb07b1c72f62b01c1bc47f936b1/numcodecs-0.13.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:3f593c7506b0ab248961a3b13cb148cc6e8355662ff124ac591822310bc55ecf", size = 1169925 },
+    { url = "https://files.pythonhosted.org/packages/0f/d6/643a3839d571d8e439a2c77dc4b0b8cab18d96ac808e4a81dbe88e959ab6/numcodecs-0.13.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80d3071465f03522e776a31045ddf2cfee7f52df468b977ed3afdd7fe5869701", size = 8814257 },
+    { url = "https://files.pythonhosted.org/packages/a6/c5/f3e56bc9b4e438a287fff738993d6d11abef368c0328a612ac2842ba9fca/numcodecs-0.13.1-cp313-cp313-win_amd64.whl", hash = "sha256:90d3065ae74c9342048ae0046006f99dcb1388b7288da5a19b3bddf9c30c3176", size = 821887 },
+]
+
+[[package]]
+name = "numcodecs"
+version = "0.15.1"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+    "python_full_version >= '3.13'",
+    "python_full_version == '3.12.*'",
+    "python_full_version == '3.11.*'",
+]
+dependencies = [
+    { name = "deprecated", marker = "python_full_version >= '3.11'" },
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11' and python_full_version < '3.13'" },
+    { name = "numpy", version = "2.4.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.13'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/63/fc/bb532969eb8236984ba65e4f0079a7da885b8ac0ce1f0835decbb3938a62/numcodecs-0.15.1.tar.gz", hash = "sha256:eeed77e4d6636641a2cc605fbc6078c7a8f2cc40f3dfa2b3f61e52e6091b04ff", size = 6267275 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e4/fc/410f1cacaef0931f5daf06813b1b8a2442f7418ee284ec73fe5e830dca48/numcodecs-0.15.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:698f1d59511488b8fe215fadc1e679a4c70d894de2cca6d8bf2ab770eed34dfd", size = 1649501 },
+    { url = "https://files.pythonhosted.org/packages/85/29/dff62fae04323035912c419a82dc9624fad7d08541dbfcd9ab78a3a40074/numcodecs-0.15.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:bef8c8e64fab76677324a07672b10c31861775d03fc63ed5012ca384144e4bb9", size = 1187306 },
+    { url = "https://files.pythonhosted.org/packages/a6/a8/908a226632ffabf19caf8c99f1b2898f2f22aac02795a6fe9d018fd6d9dd/numcodecs-0.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cdfaef9f5f2ed8f65858db801f1953f1007c9613ee490a1c56233cd78b505ed5", size = 8891971 },
+    { url = "https://files.pythonhosted.org/packages/2b/e8/058aac43e1300d588e99b2d0d5b771c8a43fa92ce9c9517da596869fc146/numcodecs-0.15.1-cp311-cp311-win_amd64.whl", hash = "sha256:e2547fa3a7ffc9399cfd2936aecb620a3db285f2630c86c8a678e477741a4b3c", size = 840035 },
+    { url = "https://files.pythonhosted.org/packages/e7/7e/f12fc32d3beedc6a8f1ec69ea0ba72e93cb99c0350feed2cff5d04679bc3/numcodecs-0.15.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b0a9d9cd29a0088220682dda4a9898321f7813ff7802be2bbb545f6e3d2f10ff", size = 1691889 },
+    { url = "https://files.pythonhosted.org/packages/81/38/88e40d40288b73c3b3a390ed5614a34b0661d00255bdd4cfb91c32101364/numcodecs-0.15.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:a34f0fe5e5f3b837bbedbeb98794a6d4a12eeeef8d4697b523905837900b5e1c", size = 1189149 },
+    { url = "https://files.pythonhosted.org/packages/28/7d/7527d9180bc76011d6163c848c9cf02cd28a623c2c66cf543e1e86de7c5e/numcodecs-0.15.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c3a09e22140f2c691f7df26303ff8fa2dadcf26d7d0828398c0bc09b69e5efa3", size = 8879163 },
+    { url = "https://files.pythonhosted.org/packages/ab/bc/b6c3cde91c754860a3467a8c058dcf0b1a5ca14d82b1c5397c700cf8b1eb/numcodecs-0.15.1-cp312-cp312-win_amd64.whl", hash = "sha256:daed6066ffcf40082da847d318b5ab6123d69ceb433ba603cb87c323a541a8bc", size = 836785 },
+    { url = "https://files.pythonhosted.org/packages/78/57/acbc54b3419e5be65015e47177c76c0a73e037fd3ae2cde5808169194d4d/numcodecs-0.15.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:e3d82b70500cf61e8d115faa0d0a76be6ecdc24a16477ee3279d711699ad85f3", size = 1688220 },
+    { url = "https://files.pythonhosted.org/packages/b6/56/9863fa6dc679f40a31bea5e9713ee5507a31dcd3ee82ea4b1a9268ce52e8/numcodecs-0.15.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:1d471a1829ce52d3f365053a2bd1379e32e369517557c4027ddf5ac0d99c591e", size = 1180294 },
+    { url = "https://files.pythonhosted.org/packages/fa/91/d96999b41e3146b6c0ce6bddc5ad85803cb4d743c95394562c2a4bb8cded/numcodecs-0.15.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1dfdea4a67108205edfce99c1cb6cd621343bc7abb7e16a041c966776920e7de", size = 8834323 },
+    { url = "https://files.pythonhosted.org/packages/c3/32/233e5ede6568bdb044e6f99aaa9fa39827ff3109c6487fc137315f733586/numcodecs-0.15.1-cp313-cp313-win_amd64.whl", hash = "sha256:a4f7bdb26f1b34423cb56d48e75821223be38040907c9b5954eeb7463e7eb03c", size = 831955 },
+]
+
 [[package]]
 name = "numexpr"
 version = "2.10.2"
@@ -1272,7 +1389,7 @@ wheels = [
 
 [[package]]
 name = "openfisca-survey-manager"
-version = "6.3.0"
+version = "1.0.0"
 source = { editable = "." }
 dependencies = [
     { name = "chardet" },
@@ -1327,6 +1444,11 @@ sas = [
     { name = "pyreadstat" },
     { name = "sas7bdat" },
 ]
+zarr = [
+    { name = "zarr", version = "2.18.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.10'" },
+    { name = "zarr", version = "2.18.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.10.*'" },
+    { name = "zarr", version = "2.18.7", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+]
 
 [package.metadata]
 requires-dist = [
@@ -1364,8 +1486,9 @@ requires-dist = [
     { name = "tabulate", specifier = ">=0.9.0,<0.10.0" },
     { name = "weightedcalcs", specifier = ">=0.1.2,<0.2.0" },
     { name = "wquantiles", specifier = ">=0.6,<0.7" },
+    { name = "zarr", marker = "extra == 'zarr'", specifier = ">=2.18.0,<3.0" },
 ]
-provides-extras = ["matching", "dev", "casd", "sas"]
+provides-extras = ["zarr", "matching", "dev", "casd", "sas"]
 
 [[package]]
 name = "packaging"
@@ -2430,6 +2553,144 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/f7/75/3cce30508bf46121b7cabce57b9cacbf8d935fa555cb3c5fca43f8dd0414/wquantiles-0.6-py3-none-any.whl", hash = "sha256:1b90d68fa05251bb96f8806a346e8d7dec9a9bb99f381ad5094707b46ab85218", size = 3291 },
 ]
 
+[[package]]
+name = "wrapt"
+version = "2.1.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f7/37/ae31f40bec90de2f88d9597d0b5281e23ffe85b893a47ca5d9c05c63a4f6/wrapt-2.1.1.tar.gz", hash = "sha256:5fdcb09bf6db023d88f312bd0767594b414655d58090fc1c46b3414415f67fac", size = 81329 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ca/21/293b657a27accfbbbb6007ebd78af0efa2083dac83e8f523272ea09b4638/wrapt-2.1.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:7e927375e43fd5a985b27a8992327c22541b6dede1362fc79df337d26e23604f", size = 60554 },
+    { url = "https://files.pythonhosted.org/packages/25/e9/96dd77728b54a899d4ce2798d7b1296989ce687ed3c0cb917d6b3154bf5d/wrapt-2.1.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e1c99544b6a7d40ca22195563b6d8bc3986ee8bb82f272f31f0670fe9440c869", size = 61496 },
+    { url = "https://files.pythonhosted.org/packages/44/79/4c755b45df6ef30c0dd628ecfaa0c808854be147ca438429da70a162833c/wrapt-2.1.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:b2be3fa5f4efaf16ee7c77d0556abca35f5a18ad4ac06f0ef3904c3399010ce9", size = 113528 },
+    { url = "https://files.pythonhosted.org/packages/9f/63/23ce28f7b841217d9a6337a340fbb8d4a7fbd67a89d47f377c8550fa34aa/wrapt-2.1.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:67c90c1ae6489a6cb1a82058902caa8006706f7b4e8ff766f943e9d2c8e608d0", size = 115536 },
+    { url = "https://files.pythonhosted.org/packages/23/7b/5ca8d3b12768670d16c8329e29960eedd56212770365a02a8de8bf73dc01/wrapt-2.1.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:05c0db35ccffd7480143e62df1e829d101c7b86944ae3be7e4869a7efa621f53", size = 114716 },
+    { url = "https://files.pythonhosted.org/packages/c7/3a/9789ccb14a096d30bb847bf3ee137bf682cc9750c2ce155f4c5ae1962abf/wrapt-2.1.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:0c2ec9f616755b2e1e0bf4d0961f59bb5c2e7a77407e7e2c38ef4f7d2fdde12c", size = 113200 },
+    { url = "https://files.pythonhosted.org/packages/cf/e5/4ec3526ce6ce920b267c8d35d2c2f0874d3fad2744c8b7259353f1132baa/wrapt-2.1.1-cp310-cp310-win32.whl", hash = "sha256:203ba6b3f89e410e27dbd30ff7dccaf54dcf30fda0b22aa1b82d560c7f9fe9a1", size = 57876 },
+    { url = "https://files.pythonhosted.org/packages/d1/4e/661c7c76ecd85375b2bc03488941a3a1078642af481db24949e2b9de01f4/wrapt-2.1.1-cp310-cp310-win_amd64.whl", hash = "sha256:6f9426d9cfc2f8732922fc96198052e55c09bb9db3ddaa4323a18e055807410e", size = 60224 },
+    { url = "https://files.pythonhosted.org/packages/5f/b7/53c7252d371efada4cb119e72e774fa2c6b3011fc33e3e552cdf48fb9488/wrapt-2.1.1-cp310-cp310-win_arm64.whl", hash = "sha256:69c26f51b67076b40714cff81bdd5826c0b10c077fb6b0678393a6a2f952a5fc", size = 58645 },
+    { url = "https://files.pythonhosted.org/packages/b8/a8/9254e4da74b30a105935197015b18b31b7a298bf046e67d8952ef74967bd/wrapt-2.1.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6c366434a7fb914c7a5de508ed735ef9c133367114e1a7cb91dfb5cd806a1549", size = 60554 },
+    { url = "https://files.pythonhosted.org/packages/9e/a1/378579880cc7af226354054a2c255f69615b379d8adad482bfe2f22a0dc2/wrapt-2.1.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:5d6a2068bd2e1e19e5a317c8c0b288267eec4e7347c36bc68a6e378a39f19ee7", size = 61491 },
+    { url = "https://files.pythonhosted.org/packages/dc/72/957b51c56acca35701665878ad31626182199fc4afecfe67dea072210f95/wrapt-2.1.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:891ab4713419217b2aed7dd106c9200f64e6a82226775a0d2ebd6bef2ebd1747", size = 113949 },
+    { url = "https://files.pythonhosted.org/packages/cd/74/36bbebb4a3d2ae9c3e6929639721f8606cd0710a82a777c371aa69e36504/wrapt-2.1.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c8ef36a0df38d2dc9d907f6617f89e113c5892e0a35f58f45f75901af0ce7d81", size = 115989 },
+    { url = "https://files.pythonhosted.org/packages/ae/0d/f1177245a083c7be284bc90bddfe5aece32cdd5b858049cb69ce001a0e8d/wrapt-2.1.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:76e9af3ebd86f19973143d4d592cbf3e970cf3f66ddee30b16278c26ae34b8ab", size = 115242 },
+    { url = "https://files.pythonhosted.org/packages/62/3e/3b7cf5da27e59df61b1eae2d07dd03ff5d6f75b5408d694873cca7a8e33c/wrapt-2.1.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:ff562067485ebdeaef2fa3fe9b1876bc4e7b73762e0a01406ad81e2076edcebf", size = 113676 },
+    { url = "https://files.pythonhosted.org/packages/f7/65/8248d3912c705f2c66f81cb97c77436f37abcbedb16d633b5ab0d795d8cd/wrapt-2.1.1-cp311-cp311-win32.whl", hash = "sha256:9e60a30aa0909435ec4ea2a3c53e8e1b50ac9f640c0e9fe3f21fd248a22f06c5", size = 57863 },
+    { url = "https://files.pythonhosted.org/packages/6b/31/d29310ab335f71f00c50466153b3dc985aaf4a9fc03263e543e136859541/wrapt-2.1.1-cp311-cp311-win_amd64.whl", hash = "sha256:7d79954f51fcf84e5ec4878ab4aea32610d70145c5bbc84b3370eabfb1e096c2", size = 60224 },
+    { url = "https://files.pythonhosted.org/packages/0c/90/a6ec319affa6e2894962a0cb9d73c67f88af1a726d15314bfb5c88b8a08d/wrapt-2.1.1-cp311-cp311-win_arm64.whl", hash = "sha256:d3ffc6b0efe79e08fd947605fd598515aebefe45e50432dc3b5cd437df8b1ada", size = 58643 },
+    { url = "https://files.pythonhosted.org/packages/df/cb/4d5255d19bbd12be7f8ee2c1fb4269dddec9cef777ef17174d357468efaa/wrapt-2.1.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ab8e3793b239db021a18782a5823fcdea63b9fe75d0e340957f5828ef55fcc02", size = 61143 },
+    { url = "https://files.pythonhosted.org/packages/6f/07/7ed02daa35542023464e3c8b7cb937fa61f6c61c0361ecf8f5fecf8ad8da/wrapt-2.1.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:7c0300007836373d1c2df105b40777986accb738053a92fe09b615a7a4547e9f", size = 61740 },
+    { url = "https://files.pythonhosted.org/packages/c4/60/a237a4e4a36f6d966061ccc9b017627d448161b19e0a3ab80a7c7c97f859/wrapt-2.1.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2b27c070fd1132ab23957bcd4ee3ba707a91e653a9268dc1afbd39b77b2799f7", size = 121327 },
+    { url = "https://files.pythonhosted.org/packages/ae/fe/9139058a3daa8818fc67e6460a2340e8bbcf3aef8b15d0301338bbe181ca/wrapt-2.1.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8b0e36d845e8b6f50949b6b65fc6cd279f47a1944582ed4ec8258cd136d89a64", size = 122903 },
+    { url = "https://files.pythonhosted.org/packages/91/10/b8479202b4164649675846a531763531f0a6608339558b5a0a718fc49a8d/wrapt-2.1.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:4aeea04a9889370fcfb1ef828c4cc583f36a875061505cd6cd9ba24d8b43cc36", size = 121333 },
+    { url = "https://files.pythonhosted.org/packages/5f/75/75fc793b791d79444aca2c03ccde64e8b99eda321b003f267d570b7b0985/wrapt-2.1.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:d88b46bb0dce9f74b6817bc1758ff2125e1ca9e1377d62ea35b6896142ab6825", size = 120458 },
+    { url = "https://files.pythonhosted.org/packages/d7/8f/c3f30d511082ca6d947c405f9d8f6c8eaf83cfde527c439ec2c9a30eb5ea/wrapt-2.1.1-cp312-cp312-win32.whl", hash = "sha256:63decff76ca685b5c557082dfbea865f3f5f6d45766a89bff8dc61d336348833", size = 58086 },
+    { url = "https://files.pythonhosted.org/packages/0a/c8/37625b643eea2849f10c3b90f69c7462faa4134448d4443234adaf122ae5/wrapt-2.1.1-cp312-cp312-win_amd64.whl", hash = "sha256:b828235d26c1e35aca4107039802ae4b1411be0fe0367dd5b7e4d90e562fcbcd", size = 60328 },
+    { url = "https://files.pythonhosted.org/packages/ce/79/56242f07572d5682ba8065a9d4d9c2218313f576e3c3471873c2a5355ffd/wrapt-2.1.1-cp312-cp312-win_arm64.whl", hash = "sha256:75128507413a9f1bcbe2db88fd18fbdbf80f264b82fa33a6996cdeaf01c52352", size = 58722 },
+    { url = "https://files.pythonhosted.org/packages/f7/ca/3cf290212855b19af9fcc41b725b5620b32f470d6aad970c2593500817eb/wrapt-2.1.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:ce9646e17fa7c3e2e7a87e696c7de66512c2b4f789a8db95c613588985a2e139", size = 61150 },
+    { url = "https://files.pythonhosted.org/packages/9d/33/5b8f89a82a9859ce82da4870c799ad11ce15648b6e1c820fec3e23f4a19f/wrapt-2.1.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:428cfc801925454395aa468ba7ddb3ed63dc0d881df7b81626cdd433b4e2b11b", size = 61743 },
+    { url = "https://files.pythonhosted.org/packages/1e/2f/60c51304fbdf47ce992d9eefa61fbd2c0e64feee60aaa439baf42ea6f40b/wrapt-2.1.1-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:5797f65e4d58065a49088c3b32af5410751cd485e83ba89e5a45e2aa8905af98", size = 121341 },
+    { url = "https://files.pythonhosted.org/packages/ad/03/ce5256e66dd94e521ad5e753c78185c01b6eddbed3147be541f4d38c0cb7/wrapt-2.1.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5a2db44a71202c5ae4bb5f27c6d3afbc5b23053f2e7e78aa29704541b5dad789", size = 122947 },
+    { url = "https://files.pythonhosted.org/packages/eb/ae/50ca8854b81b946a11a36fcd6ead32336e6db2c14b6e4a8b092b80741178/wrapt-2.1.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:8d5350c3590af09c1703dd60ec78a7370c0186e11eaafb9dda025a30eee6492d", size = 121370 },
+    { url = "https://files.pythonhosted.org/packages/fb/d9/d6a7c654e0043319b4cc137a4caaf7aa16b46b51ee8df98d1060254705b7/wrapt-2.1.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:2d9b076411bed964e752c01b49fd224cc385f3a96f520c797d38412d70d08359", size = 120465 },
+    { url = "https://files.pythonhosted.org/packages/55/90/65be41e40845d951f714b5a77e84f377a3787b1e8eee6555a680da6d0db5/wrapt-2.1.1-cp313-cp313-win32.whl", hash = "sha256:0bb7207130ce6486727baa85373503bf3334cc28016f6928a0fa7e19d7ecdc06", size = 58090 },
+    { url = "https://files.pythonhosted.org/packages/5f/66/6a09e0294c4fc8c26028a03a15191721c9271672467cc33e6617ee0d91d2/wrapt-2.1.1-cp313-cp313-win_amd64.whl", hash = "sha256:cbfee35c711046b15147b0ae7db9b976f01c9520e6636d992cd9e69e5e2b03b1", size = 60341 },
+    { url = "https://files.pythonhosted.org/packages/7a/f0/20ceb8b701e9a71555c87a5ddecbed76ec16742cf1e4b87bbaf26735f998/wrapt-2.1.1-cp313-cp313-win_arm64.whl", hash = "sha256:7d2756061022aebbf57ba14af9c16e8044e055c22d38de7bf40d92b565ecd2b0", size = 58731 },
+    { url = "https://files.pythonhosted.org/packages/80/b4/fe95beb8946700b3db371f6ce25115217e7075ca063663b8cca2888ba55c/wrapt-2.1.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:4814a3e58bc6971e46baa910ecee69699110a2bf06c201e24277c65115a20c20", size = 62969 },
+    { url = "https://files.pythonhosted.org/packages/b8/89/477b0bdc784e3299edf69c279697372b8bd4c31d9c6966eae405442899df/wrapt-2.1.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:106c5123232ab9b9f4903692e1fa0bdc231510098f04c13c3081f8ad71c3d612", size = 63606 },
+    { url = "https://files.pythonhosted.org/packages/ed/55/9d0c1269ab76de87715b3b905df54dd25d55bbffd0b98696893eb613469f/wrapt-2.1.1-cp313-cp313t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:1a40b83ff2535e6e56f190aff123821eea89a24c589f7af33413b9c19eb2c738", size = 152536 },
+    { url = "https://files.pythonhosted.org/packages/44/18/2004766030462f79ad86efaa62000b5e39b1ff001dcce86650e1625f40ae/wrapt-2.1.1-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:789cea26e740d71cf1882e3a42bb29052bc4ada15770c90072cb47bf73fb3dbf", size = 158697 },
+    { url = "https://files.pythonhosted.org/packages/e1/bb/0a880fa0f35e94ee843df4ee4dd52a699c9263f36881311cfb412c09c3e5/wrapt-2.1.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:ba49c14222d5e5c0ee394495a8655e991dc06cbca5398153aefa5ac08cd6ccd7", size = 155563 },
+    { url = "https://files.pythonhosted.org/packages/42/ff/cd1b7c4846c8678fac359a6eb975dc7ab5bd606030adb22acc8b4a9f53f1/wrapt-2.1.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:ac8cda531fe55be838a17c62c806824472bb962b3afa47ecbd59b27b78496f4e", size = 150161 },
+    { url = "https://files.pythonhosted.org/packages/38/ec/67c90a7082f452964b4621e4890e9a490f1add23cdeb7483cc1706743291/wrapt-2.1.1-cp313-cp313t-win32.whl", hash = "sha256:b8af75fe20d381dd5bcc9db2e86a86d7fcfbf615383a7147b85da97c1182225b", size = 59783 },
+    { url = "https://files.pythonhosted.org/packages/ec/08/466afe4855847d8febdfa2c57c87e991fc5820afbdef01a273683dfd15a0/wrapt-2.1.1-cp313-cp313t-win_amd64.whl", hash = "sha256:45c5631c9b6c792b78be2d7352129f776dd72c605be2c3a4e9be346be8376d83", size = 63082 },
+    { url = "https://files.pythonhosted.org/packages/9a/62/60b629463c28b15b1eeadb3a0691e17568622b12aa5bfa7ebe9b514bfbeb/wrapt-2.1.1-cp313-cp313t-win_arm64.whl", hash = "sha256:da815b9263947ac98d088b6414ac83507809a1d385e4632d9489867228d6d81c", size = 60251 },
+    { url = "https://files.pythonhosted.org/packages/95/a0/1c2396e272f91efe6b16a6a8bce7ad53856c8f9ae4f34ceaa711d63ec9e1/wrapt-2.1.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:9aa1765054245bb01a37f615503290d4e207e3fd59226e78341afb587e9c1236", size = 61311 },
+    { url = "https://files.pythonhosted.org/packages/b0/9a/d2faba7e61072a7507b5722db63562fdb22f5a24e237d460d18755627f15/wrapt-2.1.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:feff14b63a6d86c1eee33a57f77573649f2550935981625be7ff3cb7342efe05", size = 61805 },
+    { url = "https://files.pythonhosted.org/packages/db/56/073989deb4b5d7d6e7ea424476a4ae4bda02140f2dbeaafb14ba4864dd60/wrapt-2.1.1-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:81fc5f22d5fcfdbabde96bb3f5379b9f4476d05c6d524d7259dc5dfb501d3281", size = 120308 },
+    { url = "https://files.pythonhosted.org/packages/d1/b6/84f37261295e38167a29eb82affaf1dc15948dc416925fe2091beee8e4ac/wrapt-2.1.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:951b228ecf66def855d22e006ab9a1fc12535111ae7db2ec576c728f8ddb39e8", size = 122688 },
+    { url = "https://files.pythonhosted.org/packages/ea/80/32db2eec6671f80c65b7ff175be61bc73d7f5223f6910b0c921bbc4bd11c/wrapt-2.1.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:0ddf582a95641b9a8c8bd643e83f34ecbbfe1b68bc3850093605e469ab680ae3", size = 121115 },
+    { url = "https://files.pythonhosted.org/packages/49/ef/dcd00383df0cd696614127902153bf067971a5aabcd3c9dcb2d8ef354b2a/wrapt-2.1.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:fc5c500966bf48913f795f1984704e6d452ba2414207b15e1f8c339a059d5b16", size = 119484 },
+    { url = "https://files.pythonhosted.org/packages/76/29/0630280cdd2bd8f86f35cb6854abee1c9d6d1a28a0c6b6417cd15d378325/wrapt-2.1.1-cp314-cp314-win32.whl", hash = "sha256:4aa4baadb1f94b71151b8e44a0c044f6af37396c3b8bcd474b78b49e2130a23b", size = 58514 },
+    { url = "https://files.pythonhosted.org/packages/db/19/5bed84f9089ed2065f6aeda5dfc4f043743f642bc871454b261c3d7d322b/wrapt-2.1.1-cp314-cp314-win_amd64.whl", hash = "sha256:860e9d3fd81816a9f4e40812f28be4439ab01f260603c749d14be3c0a1170d19", size = 60763 },
+    { url = "https://files.pythonhosted.org/packages/e4/cb/b967f2f9669e4249b4fe82e630d2a01bc6b9e362b9b12ed91bbe23ae8df4/wrapt-2.1.1-cp314-cp314-win_arm64.whl", hash = "sha256:3c59e103017a2c1ea0ddf589cbefd63f91081d7ce9d491d69ff2512bb1157e23", size = 59051 },
+    { url = "https://files.pythonhosted.org/packages/eb/19/6fed62be29f97eb8a56aff236c3f960a4b4a86e8379dc7046a8005901a97/wrapt-2.1.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:9fa7c7e1bee9278fc4f5dd8275bc8d25493281a8ec6c61959e37cc46acf02007", size = 63059 },
+    { url = "https://files.pythonhosted.org/packages/0a/1c/b757fd0adb53d91547ed8fad76ba14a5932d83dde4c994846a2804596378/wrapt-2.1.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:39c35e12e8215628984248bd9c8897ce0a474be2a773db207eb93414219d8469", size = 63618 },
+    { url = "https://files.pythonhosted.org/packages/10/fe/e5ae17b1480957c7988d991b93df9f2425fc51f128cf88144d6a18d0eb12/wrapt-2.1.1-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:94ded4540cac9125eaa8ddf5f651a7ec0da6f5b9f248fe0347b597098f8ec14c", size = 152544 },
+    { url = "https://files.pythonhosted.org/packages/3e/cc/99aed210c6b547b8a6e4cb9d1425e4466727158a6aeb833aa7997e9e08dd/wrapt-2.1.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:da0af328373f97ed9bdfea24549ac1b944096a5a71b30e41c9b8b53ab3eec04a", size = 158700 },
+    { url = "https://files.pythonhosted.org/packages/81/0e/d442f745f4957944d5f8ad38bc3a96620bfff3562533b87e486e979f3d99/wrapt-2.1.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:4ad839b55f0bf235f8e337ce060572d7a06592592f600f3a3029168e838469d3", size = 155561 },
+    { url = "https://files.pythonhosted.org/packages/51/ac/9891816280e0018c48f8dfd61b136af7b0dcb4a088895db2531acde5631b/wrapt-2.1.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:0d89c49356e5e2a50fa86b40e0510082abcd0530f926cbd71cf25bee6b9d82d7", size = 150188 },
+    { url = "https://files.pythonhosted.org/packages/24/98/e2f273b6d70d41f98d0739aa9a269d0b633684a5fb17b9229709375748d4/wrapt-2.1.1-cp314-cp314t-win32.whl", hash = "sha256:f4c7dd22cf7f36aafe772f3d88656559205c3af1b7900adfccb70edeb0d2abc4", size = 60425 },
+    { url = "https://files.pythonhosted.org/packages/1e/06/b500bfc38a4f82d89f34a13069e748c82c5430d365d9e6b75afb3ab74457/wrapt-2.1.1-cp314-cp314t-win_amd64.whl", hash = "sha256:f76bc12c583ab01e73ba0ea585465a41e48d968f6d1311b4daec4f8654e356e3", size = 63855 },
+    { url = "https://files.pythonhosted.org/packages/d9/cc/5f6193c32166faee1d2a613f278608e6f3b95b96589d020f0088459c46c9/wrapt-2.1.1-cp314-cp314t-win_arm64.whl", hash = "sha256:7ea74fc0bec172f1ae5f3505b6655c541786a5cabe4bbc0d9723a56ac32eb9b9", size = 60443 },
+    { url = "https://files.pythonhosted.org/packages/08/3e/144e085a4a237b60a1b41f56e8a173e5e4f21f42a201e43f8d38272b4772/wrapt-2.1.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:9e03b3d486eb39f5d3f562839f59094dcee30c4039359ea15768dc2214d9e07c", size = 60552 },
+    { url = "https://files.pythonhosted.org/packages/69/25/576fa5d1e8c0b2657ed411b947bb50c7cc56a0a882fbd1b04574803e668a/wrapt-2.1.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:0fdf3073f488ce4d929929b7799e3b8c52b220c9eb3f4a5a51e2dc0e8ff07881", size = 61498 },
+    { url = "https://files.pythonhosted.org/packages/48/01/37def21f806dee9db8c12f99b872b3cdf15215bafe3919c982968134b804/wrapt-2.1.1-cp39-cp39-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:0cb4f59238c6625fae2eeb72278da31c9cfba0ff4d9cbe37446b73caa0e9bcf7", size = 113232 },
+    { url = "https://files.pythonhosted.org/packages/bf/ee/31dfda37ae75db11cc46634aa950c3497f7a8f987d811388bf1b11fe2f80/wrapt-2.1.1-cp39-cp39-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7f794a1c148871b714cb566f5466ec8288e0148a1c417550983864b3981737cd", size = 115198 },
+    { url = "https://files.pythonhosted.org/packages/93/d5/43cb27a2d7142bdbe9700099e7261fdc264f63c6b60a8025dd5f8af157da/wrapt-2.1.1-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:95ef3866631c6da9ce1fc0f1e17b90c4c0aa6d041fc70a11bc90733aee122e1a", size = 114400 },
+    { url = "https://files.pythonhosted.org/packages/61/91/8429803605df5540b918fe6fc9ffc4f167270f4b7ca1f82eaf7d7b1204b6/wrapt-2.1.1-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:66bc1b2446f01cbbd3c56b79a3a8435bcd4178ac4e06b091913f7751a7f528b8", size = 112998 },
+    { url = "https://files.pythonhosted.org/packages/7e/6a/25cb316f3e8262a1626da71b2c299ae2be02fb0547028eac9aa21daeedda/wrapt-2.1.1-cp39-cp39-win32.whl", hash = "sha256:1b9e08e57cabc32972f7c956d10e85093c5da9019faa24faf411e7dd258e528c", size = 57871 },
+    { url = "https://files.pythonhosted.org/packages/09/69/ffd41e6149ac4bd9700552659842383f44eb96f00e03c2db433bc856bf2f/wrapt-2.1.1-cp39-cp39-win_amd64.whl", hash = "sha256:e75ad48c3cca739f580b5e14c052993eb644c7fa5b4c90aa51193280b30875ae", size = 60222 },
+    { url = "https://files.pythonhosted.org/packages/59/f0/1889e68a0d389d2552b9e014ed6471addcfab98f09611bac61a8d8fab223/wrapt-2.1.1-cp39-cp39-win_arm64.whl", hash = "sha256:9ccd657873b7f964711447d004563a2bc08d1476d7a1afcad310f3713e6f50f4", size = 58647 },
+    { url = "https://files.pythonhosted.org/packages/c4/da/5a086bf4c22a41995312db104ec2ffeee2cf6accca9faaee5315c790377d/wrapt-2.1.1-py3-none-any.whl", hash = "sha256:3b0f4629eb954394a3d7c7a1c8cca25f0b07cefe6aa8545e862e9778152de5b7", size = 43886 },
+]
+
+[[package]]
+name = "zarr"
+version = "2.18.2"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+    "python_full_version < '3.10'",
+]
+dependencies = [
+    { name = "asciitree", marker = "python_full_version < '3.10'" },
+    { name = "fasteners", marker = "python_full_version < '3.10' and sys_platform != 'emscripten'" },
+    { name = "numcodecs", version = "0.12.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.10'" },
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.10'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b3/00/ac5c518ff1c1b1cc87a62f86ad9d19c647c19d969a91faa40d3b6342ccaa/zarr-2.18.2.tar.gz", hash = "sha256:9bb393b8a0a38fb121dbb913b047d75db28de9890f6d644a217a73cf4ae74f47", size = 3603055 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5d/bd/8d881d8ca6d80fcb8da2b2f94f8855384daf649499ddfba78ffd1ee2caa3/zarr-2.18.2-py3-none-any.whl", hash = "sha256:a638754902f97efa99b406083fdc807a0e2ccf12a949117389d2a4ba9b05df38", size = 210228 },
+]
+
+[[package]]
+name = "zarr"
+version = "2.18.3"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+    "python_full_version == '3.10.*'",
+]
+dependencies = [
+    { name = "asciitree", marker = "python_full_version == '3.10.*'" },
+    { name = "fasteners", marker = "python_full_version == '3.10.*' and sys_platform != 'emscripten'" },
+    { name = "numcodecs", version = "0.13.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.10.*'" },
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version == '3.10.*'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/23/c4/187a21ce7cf7c8f00c060dd0e04c2a81139bb7b1ab178bba83f2e1134ce2/zarr-2.18.3.tar.gz", hash = "sha256:2580d8cb6dd84621771a10d31c4d777dca8a27706a1a89b29f42d2d37e2df5ce", size = 3603224 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ed/c9/142095e654c2b97133ff71df60979422717b29738b08bc8a1709a5d5e0d0/zarr-2.18.3-py3-none-any.whl", hash = "sha256:b1f7dfd2496f436745cdd4c7bcf8d3b4bc1dceef5fdd0d589c87130d842496dd", size = 210723 },
+]
+
+[[package]]
+name = "zarr"
+version = "2.18.7"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+    "python_full_version >= '3.13'",
+    "python_full_version == '3.12.*'",
+    "python_full_version == '3.11.*'",
+]
+dependencies = [
+    { name = "asciitree", marker = "python_full_version >= '3.11'" },
+    { name = "fasteners", marker = "python_full_version >= '3.11' and sys_platform != 'emscripten'" },
+    { name = "numcodecs", version = "0.15.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11' and python_full_version < '3.13'" },
+    { name = "numpy", version = "2.4.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.13'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/da/1d/01cf9e3ab2d85190278efc3fca9f68563de35ae30ee59e7640e3af98abe3/zarr-2.18.7.tar.gz", hash = "sha256:b2b8f66f14dac4af66b180d2338819981b981f70e196c9a66e6bfaa9e59572f5", size = 3604558 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5e/d8/9ffd8c237b3559945bb52103cf0eed64ea098f7b7f573f8d2962ef27b4b2/zarr-2.18.7-py3-none-any.whl", hash = "sha256:ac3dc4033e9ae4e9d7b5e27c97ea3eaf1003cc0a07f010bd83d5134bf8c4b223", size = 211273 },
+]
+
 [[package]]
 name = "zipp"
 version = "3.23.0"