feat: store backends (HDF5/Parquet/Zarr), store_format in manifest (v1.1.0)#388
Open
feat: store backends (HDF5/Parquet/Zarr), store_format in manifest (v1.1.0)#388
Conversation
…gration doc (v6.4.0) - io/hdf: extract write_table_to_hdf5 to io.hdf, writers re-exports - processing/harmonization: harmonize_data_frame_columns, used in Survey.get_values - core: add core.table, core.survey, core.dataset; tables/surveys/survey_collections re-export - logging: add to configuration.models, google_colab, statshelpers; fix typo in core.table - docs: MIGRATION_IMPORTS.md (import mapping when re-exports removed), REFACTORING_PLAN §3.4 - CHANGELOG + bump to 6.4.0 Made-with: Cursor
- core: type hints Table, Survey, SurveyCollection (TYPE_CHECKING for circular refs) - io: type hints readers (read_dbf cols), writers/hdf already typed - processing: type hints cleaning, harmonization, calmar, Calibration - REFACTORING_PLAN §3.3 updated; version 6.5.0 Made-with: Cursor
- Implémentation _nnd_hotdeck_python et _create_fused_python (Manhattan/Euclidean, donor_classes, tie-breaking aléatoire) - API unifiée nnd_hotdeck() par défaut en Python, use_r=True pour StatMatch - nnd_hotdeck_using_rpy2 conservée en alias pour compatibilité - Corrections lint (noms variables, lambda->def, doublon supprimé, print->log) Made-with: Cursor
- Suppression de la fonction dans common.misc - Retrait des exports dans common et utils - Mise à jour MIGRATION_IMPORTS et REFACTORING_PLAN Made-with: Cursor
- Add openfisca_survey_manager.policy (simulations, simulation_builder, aggregates) - Keep root modules as DeprecationWarning placeholders re-exporting from policy - Move policy-related tests to policy/tests and update imports - Add policy.legislation_asof and deprecate common.misc / utils helpers Made-with: Cursor
…atshelpers/variables to policy - input_dataframe_generator: moved to tests/ (used only by tests) - coicop, matching, statshelpers, variables: moved to policy/ - update all imports accordingly - coicop: paths -> configuration.paths Made-with: Cursor
…precationWarnings - calmar, calibration: processing/weights -> policy/ - processing/weights: re-export from policy + DeprecationWarning - processing/__init__: lazy import for weights (avoid circular import) - placeholders at root: coicop, input_dataframe_generator, matching, statshelpers, variables (with DeprecationWarnings) - calmar, calibration placeholders: add DeprecationWarnings Made-with: Cursor
…Warning - scenarios/ -> policy/scenarios/ (abstract_scenario, reform_scenario) - Placeholders at openfisca_survey_manager/scenarios/ with DeprecationWarning - Remove common, processing/weights, root placeholders (coicop, matching, etc.) - Update all imports to policy.scenarios - Add missing Survey import in abstract_scenario Made-with: Cursor
- policy: add py.typed marker; type legislation_asof, variables, coicop, matching, statshelpers, calmar, calibration, simulation_builder, aggregates, simulations, scenarios (abstract + reform) - configuration/models: type Config.__init__ and save - processing/__init__: type __getattr__ return Made-with: Cursor
- Suppression des modules de compatibilité (config, paths, tables, surveys, survey_collections, read_sas, read_spss, read_dbf, calibration, calmar, utils) - load_table déplacé dans core.dataset, imports migrés selon MIGRATION_IMPORTS.md - Version 1.0.0, CHANGELOG et doc (REFACTORING_PLAN, RFC-001) Made-with: Cursor
- Delete root simulations/simulation_builder/aggregates/utils modules in favor of policy.* - Keep common.* as thin aliases to policy.legislation_asof (no DeprecationWarning) - Update scenarios and tests to import from policy and core.dataset/core.survey - Wire policy.simulations to SurveyCollection/load_table from core.dataset; tests still all pass Made-with: Cursor
- Delete scenarios/abstract_scenario.py, scenarios/reform_scenario.py - Remove DeprecationWarning in core/table.py (HDF5), keep log.warning - build_collection: keep log.warning only (no DeprecationWarning) Made-with: Cursor
- policy/simulations: log.warn -> log.warning (3), groupby(..., observed=False) - io/hdf: hdf5_safe_key() for PyTables NaturalNameWarning; to_hdf key= keyword - core/survey: use hdf5_safe_key when reading HDF5, backward compat fallback - tests: fix PytestReturnNotNoneWarning (assert instead of return) Made-with: Cursor
Made-with: Cursor
- Add config_loader (get_config_dir, load_config, load_manifest, manifest_survey_to_json) - SurveyCollection.load() tries config.yaml + manifest first, else legacy config.ini + JSON - Add migrate_config_to_rfc002 script (config.ini/raw_data.ini/JSON -> config.yaml + manifests) - Emit DeprecationWarning when loading via legacy config.ini + JSON - Add tests for RFC-002 and migration; legacy load test expects deprecation warning - Add docs/RFC-002-METADATA-AND-CONFIG.md Made-with: Cursor
…igration (v1.1.0) - io/backends: backend registry (hdf5, parquet, zarr), get_backend, register_backend - Survey: zarr_file_path, fill_store/get_values for zarr; build-collection --zarr - Table: delegate write/read to backends via _get_store_path_and_format - Manifest: store_format (hdf5|parquet|zarr) at dataset level; load applies it and sets store paths - Migration script: infer store_format from legacy JSON and write in manifest - Docs: ZARR-BACKEND.md, RFC-002 store_format example and migration note - Changelog 1.1.0, pyproject 1.1.0 Made-with: Cursor
06fd784 to
9d494cd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Résumé (v1.1.0)
Store backends (choix du format de stockage des tables)
StoreBackend) ;get_backend(name),get_available_backend_names(),register_backend()pour étendre.pip install openfisca-survey-manager[zarr]) ; une table = un groupe zarr dans un répertoire.zarrpar survey.zarr_file_path;fill_store(store_format="zarr")et lecture viaget_valuespour zarr._is_storeddélégués aux backends ;_get_store_path_and_format()unifie les chemins.--zarren plus de--parquet; défaut HDF5 avec avertissement.docs/ZARR-BACKEND.md(utilisation Zarr, compression, parallélisation).Manifest (RFC-002) : store_format
store_format(hdf5, parquet, zarr) au niveau dataset ; par défautparquetau chargement.store_formatet déduit les chemins de store à partir dedefault_output_dir.store_formatdepuis le JSON legacy et l’écrit dans le manifest généré.store_format; section 3.5 et 4.2 mises à jour.Déjà en place (commits précédents)
Version : 1.1.0 (CHANGELOG + pyproject.toml)