Skip to content

Commit 3031bd2

Browse files
authored
Release 0.9.4: Python 3.12/3.13 support, standard logging, initial population seeding, and expanded model coverage (#41)
* Next version with revision will be 0.9.4 * Modified: improved example * Modified: default search spaces * Added initial and default individual options. * Fixed mlflow test * Improved plots to reduced size * Updated sphinx gallery examples * Improved plots with responsiveness * Added default parameters, pytest and gallery sphinx examples of catboost, lightgbm and other sklearn ml models * Added attributes with number of n_trials_ and optimization_time_ and a future parameter disable_file_output to reduce overhead when a fast execution is needed * Added disable_file_output to reduce overhead in experiments * Added optimization_time_ and n_trials_ to the sphinx gallery examples * Fix parallelization using joblib instead of multiprocessing * Added: test of ml models * Updated and fixed: mlflow integration improved and documented, Phase 1 about params * Updated documentation conf.py to remove mlflow problems and fix scipy * Modified: version from func * Added: default conf files for ml scikit learn models * Updated README with the new features and new example. * Fix test * Added tests for initial params in genetic population and disabling the file output * Removed to do * Improved gitignore * The library now follows the standard Python library logging pattern * Added verbose option * Added python 3.12 * Added python 3.13
1 parent 4530666 commit 3031bd2

79 files changed

Lines changed: 5645 additions & 460 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/CI.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
strategy:
99
fail-fast: false
1010
matrix:
11-
python-version: [3.9, '3.10', 3.11]
11+
python-version: [3.9, '3.10', 3.11, 3.12, 3.13]
1212
os: [ubuntu-latest]
1313

1414

.gitignore

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -348,4 +348,69 @@ dmypy.json
348348
*tmpfiles
349349
*lol
350350

351-
# End of https://www.toptal.com/developers/gitignore/api/pycharm,python,test,macos
351+
# End of https://www.toptal.com/developers/gitignore/api/pycharm,python,test,macos
352+
353+
### mloptimizer specific ###
354+
# Experiment output directories (timestamped)
355+
[0-9]*_*Classifier/
356+
[0-9]*_*Regressor/
357+
358+
# MLflow tracking databases
359+
mlflow.db
360+
mlflow_test.db
361+
mlruns/
362+
mloptimizer/test/mlruns/
363+
docs/mlruns/
364+
examples/mlruns/
365+
366+
# Pickle files (saved datasets)
367+
*.pkl
368+
saved_datasets.pkl
369+
370+
# Virtual environments
371+
.cvenv/
372+
373+
# Notebooks and experiments
374+
notebooks/
375+
experiment/
376+
377+
# Analysis and temporary markdown files
378+
*_SUMMARY.md
379+
*_GUIDE.md
380+
*_ANALYSIS.md
381+
*_FIX.md
382+
*_PLAN.md
383+
PRELIMINARY_*.md
384+
PAPER_REVISION_ADVICE.md
385+
386+
# Generated documentation
387+
docs/auto_examples/
388+
docs/html/
389+
docs/sg_execution_times.rst
390+
391+
# Example outputs
392+
examples/Evolution_example/
393+
examples/Optimizer/
394+
examples/Search_space_example/
395+
examples/catboost_info/
396+
397+
# Test outputs
398+
mloptimizer/test/application/reporting/*.html
399+
mloptimizer/test/application/reporting/*.png
400+
mloptimizer/test/catboost_info/
401+
mloptimizer/test/test_genoptimizer/
402+
403+
# Scripts and temporary files
404+
scripts/
405+
notexamples/
406+
review.txt
407+
softwx-draft.tex
408+
test_*.py
409+
verify_*.py
410+
demo_*.py
411+
dockerfile
412+
413+
# Benchmark results
414+
openml_*.png
415+
openml_*.csv
416+
openml_*.tex

README.md

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,19 @@ The genetic algorithm used in mloptimizer provides an efficient and flexible app
2121
- Default hyperparameter ranges
2222
- Default score functions for evaluating the performance of the model
2323
- Reproducibility of results
24+
- Early stopping to prevent overfitting
25+
- Population seeding with known good configurations
26+
- Performance tracking (trials count, optimization time)
27+
- Zero file output mode for cleaner workflows (enabled by default)
2428

2529
## Advanced Features
2630
- Extensible with more machine learning algorithms that comply with the Scikit-Learn API
2731
- Customizable hyperparameter ranges
2832
- Customizable score functions
2933
- Optional mlflow compatibility for tracking the optimization process
34+
- Generation-level MLflow tracking with nested runs
35+
- Responsive Plotly visualizations with WebGL acceleration
36+
- Joblib-based parallelization for better compatibility
3037

3138
## Installation
3239

@@ -67,17 +74,24 @@ X, y = load_iris(return_X_y=True)
6774
hyperparameter_space = HyperparameterSpaceBuilder.get_default_space(DecisionTreeClassifier)
6875

6976
# 3) Create the optimizer and optimize the classifier
70-
# - 10 generations starting with a population of 10 individuals, other parameters are set to default
71-
72-
opt = GeneticSearch(estimator_class=DecisionTreeClassifier,
73-
hyperparam_space=hyperparameter_space,
74-
**{"generations": 5, "population_size": 5}
75-
)
76-
77-
# 4) Optimize the classifier, the optimization returns the best estimator found in the optimization process
77+
opt = GeneticSearch(
78+
estimator_class=DecisionTreeClassifier,
79+
hyperparam_space=hyperparameter_space,
80+
generations=10,
81+
population_size=20,
82+
early_stopping=True, # Stop early if no improvement
83+
patience=3, # Wait 3 generations
84+
cv=5, # 5-fold cross-validation
85+
seed=42 # Reproducibility
86+
)
87+
88+
# 4) Optimize (no files created by default)
7889
opt.fit(X, y)
7990

80-
print(opt.best_estimator_)
91+
# Access results
92+
print(f"Best score: {opt.best_estimator_.score(X, y)}")
93+
print(f"Trials evaluated: {opt.n_trials_}")
94+
print(f"Time taken: {opt.optimization_time_:.2f}s")
8195
```
8296
Other algorithms can be used, such as `RandomForestClassifier` or `XGBClassifier` which have a
8397
default hyperparameter space defined in the library.
@@ -96,8 +110,14 @@ Examples can be found in [examples](https://mloptimizer.readthedocs.io/en/master
96110
The following dependencies are used in `mloptimizer`:
97111

98112
* [Deap](https://github.com/DEAP/deap) - Genetic Algorithms
99-
* [XGBoost](https://github.com/dmlc/xgboost) - Gradient boosting classifier
113+
* [XGBoost](https://github.com/dmlc/xgboost) - Gradient boosting framework
114+
* [CatBoost](https://catboost.ai/) - Gradient boosting framework
115+
* [LightGBM](https://lightgbm.readthedocs.io/) - Gradient boosting framework
100116
* [Scikit-Learn](https://github.com/scikit-learn/scikit-learn) - Machine learning algorithms and utilities
117+
* [Plotly](https://plotly.com/python/) - Interactive visualizations
118+
* [Seaborn](https://seaborn.pydata.org/) - Statistical visualizations
119+
* [joblib](https://joblib.readthedocs.io/) - Parallel processing
120+
* [tqdm](https://github.com/tqdm/tqdm) - Progress bars
101121

102122
Optional:
103123
* [Keras](https://keras.io) - Deep learning library

docs/_static/custom.js

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,15 @@ document.addEventListener("DOMContentLoaded", function() {
1010
sidebar.style.display = 'none';
1111
}
1212
}
13+
14+
// Force plotly figures to recalculate size on page load
15+
// This fixes the initial rendering issue with responsive plots
16+
setTimeout(function() {
17+
if (typeof Plotly !== 'undefined') {
18+
const plotlyDivs = document.querySelectorAll('.plotly-graph-div');
19+
plotlyDivs.forEach(function(div) {
20+
Plotly.Plots.resize(div);
21+
});
22+
}
23+
}, 100); // Small delay to ensure DOM is fully rendered
1324
});

docs/conf.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,14 @@
5252
'python': ('https://docs.python.org/3', None),
5353
'numpy': ('https://numpy.org/doc/stable/', None),
5454
'scikit-learn': ('https://scikit-learn.org/stable/', None),
55-
'mlflow': ('https://www.mlflow.org/docs/latest/', None),
55+
# MLflow inventory temporarily disabled due to 403 errors
56+
# 'mlflow': ('https://www.mlflow.org/docs/latest/', None),
5657
'xgboost': ('https://xgboost.readthedocs.io/en/latest/', None),
5758
'lightgbm': ('https://lightgbm.readthedocs.io/en/latest/', None),
5859
'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None),
5960
'matplotlib': ('https://matplotlib.org/stable/', None),
6061
'seaborn': ('https://seaborn.pydata.org/', None),
61-
'scipy': ('https://docs.scipy.org/doc/scipy/reference/', None),
62+
'scipy': ('https://docs.scipy.org/doc/scipy/', None), # Updated: removed '/reference/' from URL
6263
'deap': ('https://deap.readthedocs.io/en/master/', None),
6364
}
6465

docs/sections/Advanced/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ The advanced customization options in `mloptimizer` enable fine-tuning of the op
99
score_functions
1010
reproducibility
1111
parallel
12+
logging
1213

1314

1415
Overview of Customization Options
@@ -20,4 +21,6 @@ Overview of Customization Options
2021

2122
- **Parallel Processing**: Accelerate optimization by distributing computations across multiple cores. Parallel processing can significantly reduce runtime, especially for complex models or extensive hyperparameter spaces.
2223

24+
- **Logging Configuration**: Configure logging output to monitor optimization progress, save logs to files, or integrate with your existing logging setup. mloptimizer follows the standard Python library logging pattern for maximum flexibility.
25+
2326
Each section provides detailed guidance on implementing these advanced options.

docs/sections/Introduction/overview.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,24 @@ Setting the same `seed` value across multiple runs will produce identical result
131131

132132
On macOS with newer processor architectures (e.g., M1 or M2 chips), users may experience occasional reproducibility issues due to hardware-related differences in random number generation and floating-point calculations. To ensure consistency across runs, we recommend running `mloptimizer` within a Docker container configured for reproducible behavior. This approach helps isolate the environment and improves reproducibility on macOS hardware.
133133

134+
Logging and Verbosity
135+
---------------------
136+
137+
By default, `mloptimizer` runs silently without logging output. To enable logging, use the ``verbose`` parameter:
138+
139+
.. code-block:: python
140+
141+
# Silent (default)
142+
opt = GeneticSearch(..., verbose=0)
143+
144+
# Info level - shows optimization lifecycle
145+
opt = GeneticSearch(..., verbose=1)
146+
147+
# Debug level - shows detailed info
148+
opt = GeneticSearch(..., verbose=2)
149+
150+
For more advanced logging configuration, see the :doc:`../Advanced/logging` section.
151+
134152
MLflow Integration Example
135153
------------------------------
136154

docs/sections/MLflow/index.rst

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
MLflow Integration
2+
==================
3+
4+
MLflow is an open-source platform for managing the machine learning lifecycle, including experiment tracking, model versioning, and deployment. The `mloptimizer` library integrates seamlessly with MLflow to provide comprehensive tracking of genetic algorithm optimization runs, enabling you to monitor evolution progress, compare hyperparameter configurations, and analyze results.
5+
6+
.. toctree::
7+
:hidden:
8+
9+
mlflow_basics
10+
mlflow_viewing
11+
mlflow_remote
12+
13+
Overview of MLflow Features
14+
----------------------------
15+
16+
- **Experiment Tracking**: Automatically log all optimization runs with their configurations, metrics, and results. Track generation-level metrics to visualize how fitness evolves across generations.
17+
18+
- **Result Visualization**: Use the MLflow UI to interactively explore runs, compare different optimization strategies, and analyze hyperparameter impact on model performance.
19+
20+
- **Remote Tracking**: Configure MLflow to use remote tracking servers for team collaboration and centralized experiment management. Share optimization results across your organization.
21+
22+
Each section provides detailed guidance on using MLflow with mloptimizer.
23+
24+
Key Benefits
25+
------------
26+
27+
**Generation-Level Tracking**
28+
Every generation's best, average, and worst fitness scores are logged, allowing you to visualize the evolution of your population over time.
29+
30+
**Comprehensive Metadata**
31+
Dataset characteristics, optimization configuration, early stopping information, and timing metrics are automatically recorded.
32+
33+
**Flexible Storage**
34+
Use local file-based storage for quick experiments or configure remote MLflow servers with database backends for production deployments.
35+
36+
**Seamless Integration**
37+
Simply add ``use_mlflow=True`` to your ``GeneticSearch`` configuration - no additional code required.

0 commit comments

Comments
 (0)