Skip to content

Commit 348c665

Browse files
committed
refactor: io/hdf, processing/harmonization, core modules, logging, migration doc (v6.4.0)
- io/hdf: extract write_table_to_hdf5 to io.hdf, writers re-exports - processing/harmonization: harmonize_data_frame_columns, used in Survey.get_values - core: add core.table, core.survey, core.dataset; tables/surveys/survey_collections re-export - logging: add to configuration.models, google_colab, statshelpers; fix typo in core.table - docs: MIGRATION_IMPORTS.md (import mapping when re-exports removed), REFACTORING_PLAN §3.4 - CHANGELOG + bump to 6.4.0 Made-with: Cursor
1 parent c65b4bb commit 348c665

18 files changed

Lines changed: 1072 additions & 939 deletions

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
# Changelog
22

3+
# 6.4.0
4+
5+
* Refactor (no breaking API changes)
6+
- **io/hdf**: Extract HDF5 write logic into `io.hdf` (`write_table_to_hdf5`); `io.writers` re-exports for compatibility
7+
- **processing/harmonization**: Add `harmonize_data_frame_columns` (lowercase, rename ident); used in `Survey.get_values`; export from `processing`
8+
- **core**: Add `core.table`, `core.survey`, `core.dataset` (Table, Survey, NoMoreDataError, SurveyCollection); root `tables.py`, `surveys.py`, `survey_collections.py` re-export for compatibility
9+
- **Logging**: Extend to all modules — add logger to `configuration.models`, `google_colab`, `statshelpers`; fix typo "folloging" → "following" in `core.table`
10+
- **Docs**: Add `docs/MIGRATION_IMPORTS.md` (import mapping and steps when re-exports will be removed, with breaking-change warning); update `REFACTORING_PLAN.md` (§3.4 Logging done)
11+
312
# 6.3.1
413

514
* Technical changes

docs/MIGRATION_IMPORTS.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Migration des imports après retrait des ré-exports
2+
3+
Ce document décrit les changements à effectuer **lorsqu’on retirera les ré-exports** (fichiers de compatibilité) à la racine du package : mise à jour de tous les imports vers les nouveaux chemins, puis suppression des anciens modules.
4+
5+
**Référence** : `docs/REFACTORING_PLAN.md`.
6+
7+
---
8+
9+
## Mise en garde
10+
11+
Le retrait des ré-exports est une **breaking change** : tout code (interne ou externe) qui importe depuis les anciens chemins (`config`, `paths`, `tables`, `surveys`, `survey_collections`, `read_sas`, `read_spss`, `read_dbf`, `calibration`, `calmar`, `utils`) verra ses imports **échouer** (`ModuleNotFoundError`). Il faut migrer tous les imports **avant** de supprimer les fichiers listés en section 3, et documenter le changement dans le CHANGELOG pour les projets dépendants (ex. openfisca-france-data).
12+
13+
---
14+
15+
## 1. Correspondance ancien → nouveau
16+
17+
| Ancien import (à supprimer) | Nouvel import (à utiliser) |
18+
|-----------------------------|----------------------------|
19+
| `from openfisca_survey_manager.config import Config` | `from openfisca_survey_manager.configuration.models import Config` |
20+
| `from openfisca_survey_manager.paths import ...` | `from openfisca_survey_manager.configuration.paths import ...` |
21+
| `from openfisca_survey_manager.tables import Table` | `from openfisca_survey_manager.core.table import Table` |
22+
| `from openfisca_survey_manager.surveys import Survey` | `from openfisca_survey_manager.core.survey import Survey` |
23+
| `from openfisca_survey_manager.surveys import NoMoreDataError` | `from openfisca_survey_manager.core.survey import NoMoreDataError` |
24+
| `from openfisca_survey_manager.survey_collections import SurveyCollection` | `from openfisca_survey_manager.core.dataset import SurveyCollection` |
25+
| `from openfisca_survey_manager.read_sas import read_sas` | `from openfisca_survey_manager.io.readers import read_sas` |
26+
| `from openfisca_survey_manager.read_spss import read_spss` | `from openfisca_survey_manager.io.readers import read_spss` |
27+
| `from openfisca_survey_manager.read_dbf import read_dbf` | `from openfisca_survey_manager.io.readers import read_dbf` |
28+
| `from openfisca_survey_manager.calibration import Calibration` | `from openfisca_survey_manager.processing.weights import Calibration` |
29+
| `from openfisca_survey_manager.calmar import calmar` | `from openfisca_survey_manager.processing.weights import calmar` |
30+
| `from openfisca_survey_manager.calmar import check_calmar` | `from openfisca_survey_manager.processing.weights import check_calmar` |
31+
| `from openfisca_survey_manager.utils import do_nothing, load_table, ...` | Voir section 2 (utils) |
32+
33+
**Symboles exportés par `paths`** (même noms dans `configuration.paths`) :
34+
`config_ini`, `default_config_files_directory`, `is_in_ci`, `openfisca_survey_manager_location`, `private_run_with_data`, `test_config_files_directory`.
35+
36+
**Symboles exportés par `utils`** :
37+
- Depuis `common.misc` : `asof`, `do_nothing`, `inflate_parameter_leaf`, `inflate_parameters`, `parameters_asof`, `stata_files_to_data_frames`, `variables_asof`.
38+
- Définis dans `utils.py` : `load_table` (à déplacer vers un module adapté, ex. `core` ou `io`, avant suppression de `utils.py`).
39+
40+
---
41+
42+
## 2. Fichiers à modifier quand on retire les ré-exports
43+
44+
Avant (ou en même temps que) la suppression des fichiers listés en section 3, mettre à jour les imports dans les fichiers suivants.
45+
46+
### 2.1 Imports depuis `config`, `paths`
47+
48+
| Fichier | Remplacer |
49+
|---------|-----------|
50+
| `input_dataframe_generator.py` | `paths``configuration.paths` |
51+
| `scripts/build_collection.py` | `paths``configuration.paths` |
52+
| `temporary.py` | `paths``configuration.paths` |
53+
| `google_colab.py` | `paths``configuration.paths` |
54+
| `coicop.py` | `paths``configuration.paths` |
55+
| `matching.py` | `paths``configuration.paths` |
56+
| `tests/test_read_sas.py` | `paths``configuration.paths` ; `read_sas``io.readers` |
57+
| `tests/test_quantile.py` | `paths``configuration.paths` |
58+
| `tests/test_scenario.py` | `paths``configuration.paths` |
59+
60+
### 2.2 Imports depuis `survey_collections`, `surveys`, `tables`
61+
62+
| Fichier | Remplacer |
63+
|---------|-----------|
64+
| `input_dataframe_generator.py` | `survey_collections`, `surveys``core.dataset`, `core.survey` |
65+
| `simulations.py` | `survey_collections`, `utils``core.dataset` ; utils → `common.misc` + module de `load_table` |
66+
| `utils.py` | `survey_collections``core.dataset` (pour `load_table`) |
67+
| `scripts/build_collection.py` | `survey_collections`, `surveys``core.dataset`, `core.survey` |
68+
| `scenarios/abstract_scenario.py` | `calibration`, `surveys``processing.weights`, `core.survey` |
69+
| `tests/test_surveys.py` | `survey_collections`, `surveys``core.dataset`, `core.survey` |
70+
| `tests/test_coverage_boost.py` | `survey_collections`, `surveys`, `utils` → idem |
71+
| `tests/test_add_survey_to_collection.py` | `survey_collections``core.dataset` |
72+
| `tests/test_parquet.py` | `survey_collections``core.dataset` ; `surveys` (NoMoreDataError) → `core.survey` |
73+
74+
### 2.3 Imports depuis `read_sas`, `read_spss`, `read_dbf`
75+
76+
| Fichier | Remplacer |
77+
|---------|-----------|
78+
| `core/table.py` | `from openfisca_survey_manager import read_sas``from openfisca_survey_manager.io.readers import read_sas` ; `read_sas.read_sas``read_sas` dans `reader_by_source_format`. Puis `from openfisca_survey_manager.read_spss import read_spss``from openfisca_survey_manager.io.readers import read_spss` (dans le try/except). |
79+
| `tests/test_read_sas.py` | `from ...paths import ...``configuration.paths` ; `from ...read_sas import read_sas``from ...io.readers import read_sas` |
80+
81+
### 2.4 Imports depuis `calibration`, `calmar`
82+
83+
| Fichier | Remplacer |
84+
|---------|-----------|
85+
| `scenarios/abstract_scenario.py` | `calibration``processing.weights` |
86+
| `tests/test_calibration.py` | `calibration``processing.weights` |
87+
| `tests/test_calmar.py` | `calmar``processing.weights` |
88+
89+
### 2.5 Imports depuis `utils`
90+
91+
| Fichier | Remplacer |
92+
|---------|-----------|
93+
| `simulations.py` | `utils.do_nothing`, `utils.load_table``common.misc.do_nothing` + module contenant `load_table` |
94+
| `tests/test_coverage_boost.py` | `utils.do_nothing``common.misc.do_nothing` |
95+
| `tests/test_legislation_inflator.py` | `utils.inflate_parameters`, `parameters_asof``common.misc` |
96+
| `tests/test_tax_benefit_system_asof.py` | `utils.parameters_asof`, `variables_asof``common.misc` |
97+
98+
**Note** : `load_table` dépend de `SurveyCollection` ; il doit vivre soit dans un module qui importe `core.dataset`, soit être déplacé (ex. `core.dataset` ou un module `io.loaders`) avant de supprimer `utils.py`.
99+
100+
---
101+
102+
## 3. Fichiers à supprimer (ré-exports)
103+
104+
Une fois tous les imports mis à jour selon les sections 1 et 2, on pourra supprimer les fichiers suivants (ils ne contiennent que des ré-exports) :
105+
106+
- `config.py`
107+
- `paths.py`
108+
- `tables.py`
109+
- `surveys.py`
110+
- `survey_collections.py`
111+
- `read_sas.py`
112+
- `read_spss.py`
113+
- `read_dbf.py`
114+
- `calibration.py`
115+
- `calmar.py`
116+
- `utils.py` (après déplacement de `load_table` et mise à jour des imports listés en 2.5)
117+
118+
---
119+
120+
## 4. Modules sans ré-export (imports canoniques)
121+
122+
Ces modules n’ont pas de fichier ré-export à la racine ; le code interne les utilise déjà. Pour du code externe ou de la doc, les imports canoniques sont :
123+
124+
| Symbole | Import canonique |
125+
|---------|------------------|
126+
| `harmonize_data_frame_columns` | `from openfisca_survey_manager.processing.harmonization import harmonize_data_frame_columns` (ou `from openfisca_survey_manager.processing import harmonize_data_frame_columns`) |
127+
| `write_table_to_hdf5` | `from openfisca_survey_manager.io.hdf import write_table_to_hdf5` (ou `from openfisca_survey_manager.io.writers import write_table_to_hdf5`) |
128+
| `write_table_to_parquet` | `from openfisca_survey_manager.io.writers import write_table_to_parquet` |
129+
130+
---
131+
132+
## 5. Package racine `openfisca_survey_manager`
133+
134+
Aujourd’hui le `__init__.py` du package n’expose que les exceptions. Si du code externe fait par exemple `from openfisca_survey_manager import read_sas`, il s’appuie sur le sous-module `read_sas.py`. **Après retrait des ré-exports**, ces chemins d’import ne seront plus valides (échec à l’import) ; les migrer vers `from openfisca_survey_manager.io.readers import read_sas` (voir section 1).
135+
136+
À faire avant ou après la migration : vérifier dans ce dépôt et les projets dépendants (openfisca-france-data, etc.) les imports depuis la racine du package ou depuis les anciens modules listés en section 3.
137+
138+
---
139+
140+
## 6. Ordre recommandé pour la migration
141+
142+
1. **Déplacer `load_table`** vers un module définitif (ex. `core.dataset` ou `io.loaders`) et mettre à jour les appels (section 2.5).
143+
2. **Mettre à jour tous les imports internes** (section 2) vers les nouveaux chemins, fichier par fichier.
144+
3. **Lancer la suite de tests** : `pytest` ; corriger les oublis jusqu’à 0 échec.
145+
4. **Supprimer les fichiers de ré-export** listés en section 3.
146+
5. **Vérifier les usages externes** (section 5) et documenter les changements dans le CHANGELOG (breaking changes).
147+
148+
---
149+
150+
## 7. Évolutions optionnelles ultérieures
151+
152+
- Renommer le dossier `common/` en `utils/` une fois `utils.py` supprimé (comme prévu dans le plan de refactoring).
153+
- Renommer `configuration/` en `config/` si on souhaite un nom plus court (en cohérence avec le plan).
154+
- Ces renommages impliqueront une nouvelle vague de mise à jour des imports (configuration → config, common → utils).

docs/REFACTORING_PLAN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,8 @@ Aujourd’hui ces couches sont entremêlées (ex. lecture + nettoyage dans `tabl
9999

100100
### 3.4 Logging
101101

102-
- Remplacer les `print()` par du `logging` structuré (déjà entamé dans matching, calmar).
103-
- Étendre à tous les modules (readers, writers, calibration, etc.).
102+
- **Fait** : `print()` remplacés par du `logging` structuré (matching, calmar, scenarios, scripts/build_collection, simulations, readers, writers, calibration, core, processing, etc.).
103+
- **Fait** : logging étendu à tous les modules métier (configuration/models, google_colab, statshelpers, et l’ensemble des modules concernés).
104104

105105
### 3.5 Gestion d’erreurs centralisée
106106

openfisca_survey_manager/configuration/models.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
"""Configuration model (Config class from config.ini)."""
22

33
import configparser
4+
import logging
45
from pathlib import Path
56

7+
log = logging.getLogger(__name__)
8+
69

710
class Config(configparser.ConfigParser):
811
"""Parser for config.ini; used by SurveyCollection and build scripts."""
@@ -16,10 +19,12 @@ def __init__(self, config_files_directory=None):
1619
assert config_ini.exists(), f"{config_ini} is not a valid path"
1720
self.config_ini = config_ini
1821
self.read([config_ini])
22+
log.debug("Loaded config from %s", config_ini)
1923

2024
def save(self):
2125
assert self.config_ini, "configuration file path is not defined"
2226
assert self.config_ini.exists()
2327
config_file = self.config_ini.open("w")
2428
self.write(config_file)
2529
config_file.close()
30+
log.debug("Saved config to %s", self.config_ini)
Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,8 @@
1-
# Target: Survey (surveys.py), SurveyCollection, dataset orchestration.
2-
# See docs/REFACTORING_PLAN.md for migration steps.
1+
# Survey, Table, SurveyCollection. Legacy modules re-export for compatibility.
2+
# See docs/REFACTORING_PLAN.md.
3+
4+
from openfisca_survey_manager.core.dataset import SurveyCollection
5+
from openfisca_survey_manager.core.survey import NoMoreDataError, Survey
6+
from openfisca_survey_manager.core.table import Table
7+
8+
__all__ = ["NoMoreDataError", "Survey", "SurveyCollection", "Table"]
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
"""SurveyCollection: collection of surveys (dataset orchestration)."""
2+
3+
import codecs
4+
import collections
5+
import configparser
6+
import json
7+
import logging
8+
from pathlib import Path
9+
10+
from openfisca_survey_manager.configuration.models import Config
11+
from openfisca_survey_manager.configuration.paths import default_config_files_directory
12+
from openfisca_survey_manager.core.survey import Survey
13+
from openfisca_survey_manager.exceptions import SurveyConfigError
14+
15+
log = logging.getLogger(__name__)
16+
17+
18+
class SurveyCollection:
19+
"""A collection of Surveys."""
20+
21+
def __init__(
22+
self, config_files_directory=default_config_files_directory, label=None, name=None, json_file_path=None
23+
):
24+
self.name = name
25+
self.label = label
26+
self.json_file_path = json_file_path
27+
self.surveys = []
28+
log.debug(f"Initializing SurveyCollection from config file found in {config_files_directory} ..")
29+
config = Config(config_files_directory=config_files_directory)
30+
if label is not None:
31+
self.label = label
32+
if name is not None:
33+
self.name = name
34+
if json_file_path is not None:
35+
self.json_file_path = json_file_path
36+
if "collections" not in config.sections():
37+
config["collections"] = {}
38+
config.set("collections", self.name, str(self.json_file_path))
39+
config.save()
40+
elif config is not None:
41+
if config.has_option("collections", self.name):
42+
self.json_file_path = config.get("collections", self.name)
43+
elif config.get("collections", "collections_directory") is not None:
44+
self.json_file_path = str(Path(config.get("collections", "collections_directory")) / (name + ".json"))
45+
46+
self.config = config
47+
48+
def __repr__(self):
49+
header = f"""{self.name}
50+
Survey collection of {self.label}
51+
Contains the following surveys :
52+
"""
53+
surveys = [f" {survey.name} : {survey.label} \n" for survey in self.surveys]
54+
return header + "".join(surveys)
55+
56+
def dump(self, config_files_directory=None, json_file_path=None):
57+
if self.config is not None:
58+
config = self.config
59+
else:
60+
if config_files_directory is not None:
61+
pass
62+
else:
63+
config_files_directory = default_config_files_directory
64+
self.config = Config(config_files_directory=config_files_directory)
65+
66+
if json_file_path is None:
67+
assert self.json_file_path is not None, "A json_file_path should be provided"
68+
else:
69+
self.json_file_path = json_file_path
70+
71+
config.set("collections", self.name, str(self.json_file_path))
72+
config.save()
73+
with codecs.open(str(self.json_file_path), "w", encoding="utf-8") as _file:
74+
json.dump(self.to_json(), _file, ensure_ascii=False, indent=2)
75+
76+
def fill_store(
77+
self,
78+
source_format=None,
79+
surveys=None,
80+
tables=None,
81+
overwrite=False,
82+
keep_original_parquet_file=False,
83+
encoding=None,
84+
store_format="hdf5",
85+
categorical_strategy="unique_labels",
86+
):
87+
if surveys is None:
88+
surveys = self.surveys
89+
for survey in surveys:
90+
survey.fill_store(
91+
source_format=source_format,
92+
tables=tables,
93+
overwrite=overwrite,
94+
keep_original_parquet_file=keep_original_parquet_file,
95+
encoding=encoding,
96+
store_format=store_format,
97+
categorical_strategy=categorical_strategy,
98+
)
99+
self.dump()
100+
101+
def get_survey(self, survey_name):
102+
available_surveys_names = [survey.name for survey in self.surveys]
103+
assert survey_name in available_surveys_names, (
104+
f"Survey {survey_name} cannot be found for survey collection {self.name}.\n"
105+
f"Available surveys are :{available_surveys_names}"
106+
)
107+
return [survey for survey in self.surveys if survey.name == survey_name].pop()
108+
109+
@classmethod
110+
def load(cls, json_file_path=None, collection=None, config_files_directory=default_config_files_directory):
111+
assert Path(config_files_directory).exists()
112+
config = Config(config_files_directory=config_files_directory)
113+
if json_file_path is None:
114+
assert collection is not None, "A collection is needed"
115+
try:
116+
json_file_path = config.get("collections", collection)
117+
except (configparser.NoOptionError, configparser.NoSectionError) as error:
118+
msg = f"Looking for config file in {config_files_directory}"
119+
log.debug(msg)
120+
log.error(error)
121+
raise error
122+
except Exception as error:
123+
msg = f"Looking for config file in {config_files_directory}"
124+
log.debug(msg)
125+
log.error(error)
126+
raise SurveyConfigError(msg) from error
127+
128+
with Path(json_file_path).open("r") as _file:
129+
self_json = json.load(_file)
130+
name = self_json["name"]
131+
132+
self = cls(config_files_directory=config_files_directory, name=name)
133+
self.config = config
134+
with Path(json_file_path).open("r") as _file:
135+
self_json = json.load(_file)
136+
self.json_file_path = json_file_path
137+
self.label = self_json.get("label")
138+
self.name = self_json.get("name")
139+
140+
surveys = self_json["surveys"]
141+
for survey_name, survey_json in surveys.items():
142+
survey = Survey(name=survey_name)
143+
self.surveys.append(survey.create_from_json(survey_json))
144+
return self
145+
146+
def to_json(self):
147+
self_json = collections.OrderedDict(())
148+
self_json["name"] = self.name
149+
self_json["surveys"] = collections.OrderedDict(())
150+
for survey in self.surveys:
151+
self_json["surveys"][survey.name] = survey.to_json()
152+
return self_json

0 commit comments

Comments
 (0)