openfisca
diff --git a/‎CHANGELOG.md‎
Lines changed: 9 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/MIGRATION_IMPORTS.md‎
Lines changed: 154 additions & 0 deletions b/‎docs/MIGRATION_IMPORTS.md‎
Lines changed: 154 additions & 0 deletions
diff --git a/‎docs/REFACTORING_PLAN.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/REFACTORING_PLAN.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎openfisca_survey_manager/configuration/models.py‎
Lines changed: 5 additions & 0 deletions b/‎openfisca_survey_manager/configuration/models.py‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎openfisca_survey_manager/core/__init__.py‎
Lines changed: 8 additions & 2 deletions b/‎openfisca_survey_manager/core/__init__.py‎
Lines changed: 8 additions & 2 deletions
diff --git a/‎openfisca_survey_manager/core/dataset.py‎
Lines changed: 152 additions & 0 deletions b/‎openfisca_survey_manager/core/dataset.py‎
Lines changed: 152 additions & 0 deletions
@@ -1,5 +1,14 @@
 # Changelog
 
+# 6.4.0
+
+* Refactor (no breaking API changes)
+  - **io/hdf**: Extract HDF5 write logic into `io.hdf` (`write_table_to_hdf5`); `io.writers` re-exports for compatibility
+  - **processing/harmonization**: Add `harmonize_data_frame_columns` (lowercase, rename ident); used in `Survey.get_values`; export from `processing`
+  - **core**: Add `core.table`, `core.survey`, `core.dataset` (Table, Survey, NoMoreDataError, SurveyCollection); root `tables.py`, `surveys.py`, `survey_collections.py` re-export for compatibility
+  - **Logging**: Extend to all modules — add logger to `configuration.models`, `google_colab`, `statshelpers`; fix typo "folloging" → "following" in `core.table`
+  - **Docs**: Add `docs/MIGRATION_IMPORTS.md` (import mapping and steps when re-exports will be removed, with breaking-change warning); update `REFACTORING_PLAN.md` (§3.4 Logging done)
+
 # 6.3.1
 
 * Technical changes
 
@@ -0,0 +1,154 @@
+# Migration des imports après retrait des ré-exports
+
+Ce document décrit les changements à effectuer **lorsqu’on retirera les ré-exports** (fichiers de compatibilité) à la racine du package : mise à jour de tous les imports vers les nouveaux chemins, puis suppression des anciens modules.
+
+**Référence** : `docs/REFACTORING_PLAN.md`.
+
+---
+
+## Mise en garde
+
+Le retrait des ré-exports est une **breaking change** : tout code (interne ou externe) qui importe depuis les anciens chemins (`config`, `paths`, `tables`, `surveys`, `survey_collections`, `read_sas`, `read_spss`, `read_dbf`, `calibration`, `calmar`, `utils`) verra ses imports **échouer** (`ModuleNotFoundError`). Il faut migrer tous les imports **avant** de supprimer les fichiers listés en section 3, et documenter le changement dans le CHANGELOG pour les projets dépendants (ex. openfisca-france-data).
+
+---
+
+## 1. Correspondance ancien → nouveau
+
+| Ancien import (à supprimer) | Nouvel import (à utiliser) |
+|-----------------------------|----------------------------|
+| `from openfisca_survey_manager.config import Config` | `from openfisca_survey_manager.configuration.models import Config` |
+| `from openfisca_survey_manager.paths import ...` | `from openfisca_survey_manager.configuration.paths import ...` |
+| `from openfisca_survey_manager.tables import Table` | `from openfisca_survey_manager.core.table import Table` |
+| `from openfisca_survey_manager.surveys import Survey` | `from openfisca_survey_manager.core.survey import Survey` |
+| `from openfisca_survey_manager.surveys import NoMoreDataError` | `from openfisca_survey_manager.core.survey import NoMoreDataError` |
+| `from openfisca_survey_manager.survey_collections import SurveyCollection` | `from openfisca_survey_manager.core.dataset import SurveyCollection` |
+| `from openfisca_survey_manager.read_sas import read_sas` | `from openfisca_survey_manager.io.readers import read_sas` |
+| `from openfisca_survey_manager.read_spss import read_spss` | `from openfisca_survey_manager.io.readers import read_spss` |
+| `from openfisca_survey_manager.read_dbf import read_dbf` | `from openfisca_survey_manager.io.readers import read_dbf` |
+| `from openfisca_survey_manager.calibration import Calibration` | `from openfisca_survey_manager.processing.weights import Calibration` |
+| `from openfisca_survey_manager.calmar import calmar` | `from openfisca_survey_manager.processing.weights import calmar` |
+| `from openfisca_survey_manager.calmar import check_calmar` | `from openfisca_survey_manager.processing.weights import check_calmar` |
+| `from openfisca_survey_manager.utils import do_nothing, load_table, ...` | Voir section 2 (utils) |
+
+**Symboles exportés par `paths`** (même noms dans `configuration.paths`) :
+`config_ini`, `default_config_files_directory`, `is_in_ci`, `openfisca_survey_manager_location`, `private_run_with_data`, `test_config_files_directory`.
+
+**Symboles exportés par `utils`** :
+- Depuis `common.misc` : `asof`, `do_nothing`, `inflate_parameter_leaf`, `inflate_parameters`, `parameters_asof`, `stata_files_to_data_frames`, `variables_asof`.
+- Définis dans `utils.py` : `load_table` (à déplacer vers un module adapté, ex. `core` ou `io`, avant suppression de `utils.py`).
+
+---
+
+## 2. Fichiers à modifier quand on retire les ré-exports
+
+Avant (ou en même temps que) la suppression des fichiers listés en section 3, mettre à jour les imports dans les fichiers suivants.
+
+### 2.1 Imports depuis `config`, `paths`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `input_dataframe_generator.py` | `paths` → `configuration.paths` |
+| `scripts/build_collection.py` | `paths` → `configuration.paths` |
+| `temporary.py` | `paths` → `configuration.paths` |
+| `google_colab.py` | `paths` → `configuration.paths` |
+| `coicop.py` | `paths` → `configuration.paths` |
+| `matching.py` | `paths` → `configuration.paths` |
+| `tests/test_read_sas.py` | `paths` → `configuration.paths` ; `read_sas` → `io.readers` |
+| `tests/test_quantile.py` | `paths` → `configuration.paths` |
+| `tests/test_scenario.py` | `paths` → `configuration.paths` |
+
+### 2.2 Imports depuis `survey_collections`, `surveys`, `tables`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `input_dataframe_generator.py` | `survey_collections`, `surveys` → `core.dataset`, `core.survey` |
+| `simulations.py` | `survey_collections`, `utils` → `core.dataset` ; utils → `common.misc` + module de `load_table` |
+| `utils.py` | `survey_collections` → `core.dataset` (pour `load_table`) |
+| `scripts/build_collection.py` | `survey_collections`, `surveys` → `core.dataset`, `core.survey` |
+| `scenarios/abstract_scenario.py` | `calibration`, `surveys` → `processing.weights`, `core.survey` |
+| `tests/test_surveys.py` | `survey_collections`, `surveys` → `core.dataset`, `core.survey` |
+| `tests/test_coverage_boost.py` | `survey_collections`, `surveys`, `utils` → idem |
+| `tests/test_add_survey_to_collection.py` | `survey_collections` → `core.dataset` |
+| `tests/test_parquet.py` | `survey_collections` → `core.dataset` ; `surveys` (NoMoreDataError) → `core.survey` |
+
+### 2.3 Imports depuis `read_sas`, `read_spss`, `read_dbf`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `core/table.py` | `from openfisca_survey_manager import read_sas` → `from openfisca_survey_manager.io.readers import read_sas` ; `read_sas.read_sas` → `read_sas` dans `reader_by_source_format`. Puis `from openfisca_survey_manager.read_spss import read_spss` → `from openfisca_survey_manager.io.readers import read_spss` (dans le try/except). |
+| `tests/test_read_sas.py` | `from ...paths import ...` → `configuration.paths` ; `from ...read_sas import read_sas` → `from ...io.readers import read_sas` |
+
+### 2.4 Imports depuis `calibration`, `calmar`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `scenarios/abstract_scenario.py` | `calibration` → `processing.weights` |
+| `tests/test_calibration.py` | `calibration` → `processing.weights` |
+| `tests/test_calmar.py` | `calmar` → `processing.weights` |
+
+### 2.5 Imports depuis `utils`
+
+| Fichier | Remplacer |
+|---------|-----------|
+| `simulations.py` | `utils.do_nothing`, `utils.load_table` → `common.misc.do_nothing` + module contenant `load_table` |
+| `tests/test_coverage_boost.py` | `utils.do_nothing` → `common.misc.do_nothing` |
+| `tests/test_legislation_inflator.py` | `utils.inflate_parameters`, `parameters_asof` → `common.misc` |
+| `tests/test_tax_benefit_system_asof.py` | `utils.parameters_asof`, `variables_asof` → `common.misc` |
+
+**Note** : `load_table` dépend de `SurveyCollection` ; il doit vivre soit dans un module qui importe `core.dataset`, soit être déplacé (ex. `core.dataset` ou un module `io.loaders`) avant de supprimer `utils.py`.
+
+---
+
+## 3. Fichiers à supprimer (ré-exports)
+
+Une fois tous les imports mis à jour selon les sections 1 et 2, on pourra supprimer les fichiers suivants (ils ne contiennent que des ré-exports) :
+
+- `config.py`
+- `paths.py`
+- `tables.py`
+- `surveys.py`
+- `survey_collections.py`
+- `read_sas.py`
+- `read_spss.py`
+- `read_dbf.py`
+- `calibration.py`
+- `calmar.py`
+- `utils.py` (après déplacement de `load_table` et mise à jour des imports listés en 2.5)
+
+---
+
+## 4. Modules sans ré-export (imports canoniques)
+
+Ces modules n’ont pas de fichier ré-export à la racine ; le code interne les utilise déjà. Pour du code externe ou de la doc, les imports canoniques sont :
+
+| Symbole | Import canonique |
+|---------|------------------|
+| `harmonize_data_frame_columns` | `from openfisca_survey_manager.processing.harmonization import harmonize_data_frame_columns` (ou `from openfisca_survey_manager.processing import harmonize_data_frame_columns`) |
+| `write_table_to_hdf5` | `from openfisca_survey_manager.io.hdf import write_table_to_hdf5` (ou `from openfisca_survey_manager.io.writers import write_table_to_hdf5`) |
+| `write_table_to_parquet` | `from openfisca_survey_manager.io.writers import write_table_to_parquet` |
+
+---
+
+## 5. Package racine `openfisca_survey_manager`
+
+Aujourd’hui le `__init__.py` du package n’expose que les exceptions. Si du code externe fait par exemple `from openfisca_survey_manager import read_sas`, il s’appuie sur le sous-module `read_sas.py`. **Après retrait des ré-exports**, ces chemins d’import ne seront plus valides (échec à l’import) ; les migrer vers `from openfisca_survey_manager.io.readers import read_sas` (voir section 1).
+
+À faire avant ou après la migration : vérifier dans ce dépôt et les projets dépendants (openfisca-france-data, etc.) les imports depuis la racine du package ou depuis les anciens modules listés en section 3.
+
+---
+
+## 6. Ordre recommandé pour la migration
+
+1. **Déplacer `load_table`** vers un module définitif (ex. `core.dataset` ou `io.loaders`) et mettre à jour les appels (section 2.5).
+2. **Mettre à jour tous les imports internes** (section 2) vers les nouveaux chemins, fichier par fichier.
+3. **Lancer la suite de tests** : `pytest` ; corriger les oublis jusqu’à 0 échec.
+4. **Supprimer les fichiers de ré-export** listés en section 3.
+5. **Vérifier les usages externes** (section 5) et documenter les changements dans le CHANGELOG (breaking changes).
+
+---
+
+## 7. Évolutions optionnelles ultérieures
+
+- Renommer le dossier `common/` en `utils/` une fois `utils.py` supprimé (comme prévu dans le plan de refactoring).
+- Renommer `configuration/` en `config/` si on souhaite un nom plus court (en cohérence avec le plan).
+- Ces renommages impliqueront une nouvelle vague de mise à jour des imports (configuration → config, common → utils).
@@ -99,8 +99,8 @@ Aujourd’hui ces couches sont entremêlées (ex. lecture + nettoyage dans `tabl
 
 ### 3.4 Logging
 
-- Remplacer les `print()` par du `logging` structuré (déjà entamé dans matching, calmar).
-- Étendre à tous les modules (readers, writers, calibration, etc.).
+- **Fait** : `print()` remplacés par du `logging` structuré (matching, calmar, scenarios, scripts/build_collection, simulations, readers, writers, calibration, core, processing, etc.).
+- **Fait** : logging étendu à tous les modules métier (configuration/models, google_colab, statshelpers, et l’ensemble des modules concernés).
 
 ### 3.5 Gestion d’erreurs centralisée
 
 
@@ -1,8 +1,11 @@
 """Configuration model (Config class from config.ini)."""
 
 import configparser
+import logging
 from pathlib import Path
 
+log = logging.getLogger(__name__)
+
 
 class Config(configparser.ConfigParser):
     """Parser for config.ini; used by SurveyCollection and build scripts."""
@@ -16,10 +19,12 @@ def __init__(self, config_files_directory=None):
             assert config_ini.exists(), f"{config_ini} is not a valid path"
             self.config_ini = config_ini
             self.read([config_ini])
+            log.debug("Loaded config from %s", config_ini)
 
     def save(self):
         assert self.config_ini, "configuration file path is not defined"
         assert self.config_ini.exists()
         config_file = self.config_ini.open("w")
         self.write(config_file)
         config_file.close()
+        log.debug("Saved config to %s", self.config_ini)
@@ -1,2 +1,8 @@
-# Target: Survey (surveys.py), SurveyCollection, dataset orchestration.
-# See docs/REFACTORING_PLAN.md for migration steps.
+# Survey, Table, SurveyCollection. Legacy modules re-export for compatibility.
+# See docs/REFACTORING_PLAN.md.
+
+from openfisca_survey_manager.core.dataset import SurveyCollection
+from openfisca_survey_manager.core.survey import NoMoreDataError, Survey
+from openfisca_survey_manager.core.table import Table
+
+__all__ = ["NoMoreDataError", "Survey", "SurveyCollection", "Table"]
@@ -0,0 +1,152 @@
+"""SurveyCollection: collection of surveys (dataset orchestration)."""
+
+import codecs
+import collections
+import configparser
+import json
+import logging
+from pathlib import Path
+
+from openfisca_survey_manager.configuration.models import Config
+from openfisca_survey_manager.configuration.paths import default_config_files_directory
+from openfisca_survey_manager.core.survey import Survey
+from openfisca_survey_manager.exceptions import SurveyConfigError
+
+log = logging.getLogger(__name__)
+
+
+class SurveyCollection:
+    """A collection of Surveys."""
+
+    def __init__(
+        self, config_files_directory=default_config_files_directory, label=None, name=None, json_file_path=None
+    ):
+        self.name = name
+        self.label = label
+        self.json_file_path = json_file_path
+        self.surveys = []
+        log.debug(f"Initializing SurveyCollection from config file found in {config_files_directory} ..")
+        config = Config(config_files_directory=config_files_directory)
+        if label is not None:
+            self.label = label
+        if name is not None:
+            self.name = name
+        if json_file_path is not None:
+            self.json_file_path = json_file_path
+            if "collections" not in config.sections():
+                config["collections"] = {}
+            config.set("collections", self.name, str(self.json_file_path))
+            config.save()
+        elif config is not None:
+            if config.has_option("collections", self.name):
+                self.json_file_path = config.get("collections", self.name)
+            elif config.get("collections", "collections_directory") is not None:
+                self.json_file_path = str(Path(config.get("collections", "collections_directory")) / (name + ".json"))
+
+        self.config = config
+
+    def __repr__(self):
+        header = f"""{self.name}
+Survey collection of {self.label}
+Contains the following surveys :
+"""
+        surveys = [f"       {survey.name} : {survey.label} \n" for survey in self.surveys]
+        return header + "".join(surveys)
+
+    def dump(self, config_files_directory=None, json_file_path=None):
+        if self.config is not None:
+            config = self.config
+        else:
+            if config_files_directory is not None:
+                pass
+            else:
+                config_files_directory = default_config_files_directory
+            self.config = Config(config_files_directory=config_files_directory)
+
+        if json_file_path is None:
+            assert self.json_file_path is not None, "A json_file_path should be provided"
+        else:
+            self.json_file_path = json_file_path
+
+        config.set("collections", self.name, str(self.json_file_path))
+        config.save()
+        with codecs.open(str(self.json_file_path), "w", encoding="utf-8") as _file:
+            json.dump(self.to_json(), _file, ensure_ascii=False, indent=2)
+
+    def fill_store(
+        self,
+        source_format=None,
+        surveys=None,
+        tables=None,
+        overwrite=False,
+        keep_original_parquet_file=False,
+        encoding=None,
+        store_format="hdf5",
+        categorical_strategy="unique_labels",
+    ):
+        if surveys is None:
+            surveys = self.surveys
+        for survey in surveys:
+            survey.fill_store(
+                source_format=source_format,
+                tables=tables,
+                overwrite=overwrite,
+                keep_original_parquet_file=keep_original_parquet_file,
+                encoding=encoding,
+                store_format=store_format,
+                categorical_strategy=categorical_strategy,
+            )
+        self.dump()
+
+    def get_survey(self, survey_name):
+        available_surveys_names = [survey.name for survey in self.surveys]
+        assert survey_name in available_surveys_names, (
+            f"Survey {survey_name} cannot be found for survey collection {self.name}.\n"
+            f"Available surveys are :{available_surveys_names}"
+        )
+        return [survey for survey in self.surveys if survey.name == survey_name].pop()
+
+    @classmethod
+    def load(cls, json_file_path=None, collection=None, config_files_directory=default_config_files_directory):
+        assert Path(config_files_directory).exists()
+        config = Config(config_files_directory=config_files_directory)
+        if json_file_path is None:
+            assert collection is not None, "A collection is needed"
+            try:
+                json_file_path = config.get("collections", collection)
+            except (configparser.NoOptionError, configparser.NoSectionError) as error:
+                msg = f"Looking for config file in {config_files_directory}"
+                log.debug(msg)
+                log.error(error)
+                raise error
+            except Exception as error:
+                msg = f"Looking for config file in {config_files_directory}"
+                log.debug(msg)
+                log.error(error)
+                raise SurveyConfigError(msg) from error
+
+        with Path(json_file_path).open("r") as _file:
+            self_json = json.load(_file)
+            name = self_json["name"]
+
+        self = cls(config_files_directory=config_files_directory, name=name)
+        self.config = config
+        with Path(json_file_path).open("r") as _file:
+            self_json = json.load(_file)
+            self.json_file_path = json_file_path
+            self.label = self_json.get("label")
+            self.name = self_json.get("name")
+
+        surveys = self_json["surveys"]
+        for survey_name, survey_json in surveys.items():
+            survey = Survey(name=survey_name)
+            self.surveys.append(survey.create_from_json(survey_json))
+        return self
+
+    def to_json(self):
+        self_json = collections.OrderedDict(())
+        self_json["name"] = self.name
+        self_json["surveys"] = collections.OrderedDict(())
+        for survey in self.surveys:
+            self_json["surveys"][survey.name] = survey.to_json()
+        return self_json