Smartly Apply Constraints During Cartesian Product#773
Conversation
6a33c52 to
503fef9
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes discrete search space construction by applying discrete constraints incrementally during Cartesian product generation (including improved Polars/Pandas interop), aiming to reduce intermediate memory use and runtime for highly constrained spaces.
Changes:
- Added
baybe.searchspace.utilswith shared Cartesian product helpers and a new incremental constrained-product builder. - Extended discrete constraint interfaces to support (or explicitly refuse) early filtering via
UnsupportedEarlyFilteringError, plus ahas_polars_implementationcapability flag. - Updated discrete search space constructors and tests to use the new incremental filtering path (and added parity tests vs the naive approach).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
baybe/searchspace/utils.py |
New utilities: parameter ordering, pandas/polars cartesian product, and incremental constrained cartesian product builder. |
baybe/searchspace/discrete.py |
Switches discrete space construction to incremental filtering; Polars path builds partial product and merges remainder via pandas. Adds new from_simplex validation. |
baybe/constraints/base.py |
Adds _required_filtering_parameters and has_polars_implementation; updates docs for partial-dataframe filtering semantics. |
baybe/constraints/discrete.py |
Updates discrete constraints to support early/partial filtering and to raise UnsupportedEarlyFilteringError when unsupported. |
baybe/exceptions.py |
Adds UnsupportedEarlyFilteringError. |
tests/constraints/test_constrained_cartesian_product.py |
New test ensuring naive vs incremental constrained product results match across several scenarios. |
tests/constraints/test_constraints_polars.py |
Updates imports for moved cartesian product helpers. |
tests/test_searchspace.py |
Updates imports for moved cartesian product helpers. |
tests/hypothesis_strategies/alternative_creation/test_searchspace.py |
Adjusts simplex-related tests to reflect new from_simplex constraints. |
CHANGELOG.md |
Documents incremental filtering and new constraint capability/exception additions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
AdrianSosic
left a comment
There was a problem hiding this comment.
Hi @Scienfitz, I'll need some more time for the review but wanted to already share some comments so that you can start to think about it / we can discuss. More will follow 🙃
78eb87b to
66c39c4
Compare
AVHopp
left a comment
There was a problem hiding this comment.
Incomplete review. Only had a look at the changes made to the constraints so far. Tried to comprehend the constraints, and the logic for them seems to check out. Will give a more in-depth review after some of the general issues pointed out by the others have been addressed.
66c39c4 to
2c634b5
Compare
de9f44b to
744ecf9
Compare
AVHopp
left a comment
There was a problem hiding this comment.
Close to approve, only need a bit more time to check the last remaining parts
AVHopp
left a comment
There was a problem hiding this comment.
Very cool feature, I do not see any major issue, anything open is already addressed.
fb1bd63 to
f7e84f1
Compare
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>
Separates concerns using a dedicted subclass-specific Boolean check
cb0d194 to
765c0bc
Compare
|
posting for later reference Mini Benchmark:
|
This PR implements a more optimized Cartesian product creation in the presence of constraints which can result in memory and time gains of many orders of magnitude (see mini benchmark below).
Rationale
from_simplexconstructor was usedAs soon as possible filter: A constraint can be applied as soon as all of its affected parameters are in the current crossjoin-df. After this application the constraint is fully ensured and does not have to be applied again. If the order in which cross join goes over the parameters is optimized this would already lead to an improvement as subsequent operations "see" much smaller left-dataframes.Partial/early filter:Look ahead: Some constraints can look ahead based on the possible parameter values that might be incoming and recognize that constraints cannot be fulfilled even in future crossjoin iterations.from_simpleximplements for the very special case of 1 global sum constraint and 1 cardinality constraint. If we ever implement look-ahead filters for all constraints thefrom_simplexconstructor might become obsoleteIMPROVEnotes to remember about tier 3. To achieve thisConstraint.get_invalidwas extended to handle situations where not all parameters are in the df to be filtered. The constraint can the decide whether it can apply early filtering or returns the newUnsupportedEarlyFilteringErrorif it needs all parameters presentparameter_cartesian_prod_pandas_constrainedwhich itself performs the process described above after deciding on a smart parameter order for the crossjoinGood To Know
has_polars_implementation, discussion here_filtering_parameters, discussion hereDiscreteNoLabelDuplicatesConstraintinDiscretePermutationInvarianceConstraint .get_invalidexplained hereMini Benchmark:
from_product, 7×8 cat, NoLabelDuplicates (2M→40K rows)from_simplex, 4-slot mixture + 3 extras (~4.5M→2.4K rows)from_simplex, 6-slot mixture + 3 extras (~12B→22K rows)