Add `DiscreteBatchConstraint` by Scienfitz · Pull Request #765 · emdgroup/baybe

Scienfitz · 2026-03-16T14:32:18Z

Closes #583
Fixes #567

Functionality:
Adds a DiscreteBatchConstraint which works via subspace generation just like the continuous cardinality constraint, just in the discrete searchspace part. While it will quickly lead to many partitions, it is also possible to use multiple DiscreteBatchConstraints and also combine them with the cardinality constraints because there is now a more unified interface for constraints that work via subspace generation.

Notes:

Works identical to cardinality, except it operates on discrete space and generates discrete partitions
The naming (originally taken form the cardinality work) using subspace is not optimal, because there is also the concept of DiscreteSubspace / ContinuousSubspace. I changed the naming that references to creating subspaces for taking care of the constraint into partition. So we would then have things like partition_masks, n_theoretical_partitions etc
Discrete partitions are identified via masks, while continuous partitions remain identified by parameter names
The hyrbid case was now implemented because it is now possible to have discrete and continuous partition-generating constraints at the same time. There is nothing conceptual that speaks against this although the number of subspaces possible explodes due to Cartesian product of individual subspaces. The limit for this is n_max_partitions which controls the max number of subspaces considered in any case
The methods that create the objects that identify the partitions are coded to do this lazily and return iterators. They include arguments to do that indefinitely (replace=True) or shuffle the order (shuffle=True). The latter is used when subsamapling the spaces because with shuffle=True you can just instantiate the fist n objects in the iterator to achieve random subsampling. This has some complications, especially for the overall searchspace class which needs to have a parameter to stop the process when replicated samples are drawn too often (likely no other feasible subspace combinations).
botorch.py in recommenders has become very large as a result of this development. So I split it into submodules
_FixedNumericalContinuousParameter had a bugged property which was named is_numeric but it should be is_numerical
Since hybrid is now implemented, the ContinuousCardinalityConstraint can now also be applied in hybrid spaces, which fixes ContinuousCardinalityConstraint not considered in hybrid spaces #567

Discuss:

n_theoretical_subspaces just computes the upper limit of subspaces in discrete and hybrid cases. Because subspaces that do not have enough candidates are internally skipped the actual number of n_feasible_subspaces might be smaller. However that cannot be easily computed and would require instantiating all masks corresponding to the subspaces. The dispatcher logic is currently based on n_theoretical_subspaces but to be 100% correct it should be n_feasible_subspaces - this is not implemented due to the mentioned difficulty. I think using n_theoretical_subspaces works as a proxy and will not be very different from n_feasible_subspaces for nearly all practical problems

Copilot

Pull request overview

Adds a new batch-level discrete constraint (DiscreteBatchConstraint) that enforces a fixed discrete parameter value across all points in a recommended batch by optimizing over discrete (and hybrid) subspaces, aligning with the existing subspace-based infrastructure used for continuous cardinality constraints.

Changes:

Introduces DiscreteBatchConstraint (non-filtering, batch-level) plus validation preventing duplicate batch constraints per parameter.
Extends discrete/hybrid subspace infrastructure (n_theoretical_subspaces, mask/config iterators, sampling helpers) and wires support into BotorchRecommender and RandomRecommender with compatibility gating for other recommenders.
Adds documentation and test coverage for discrete + hybrid subspace-generating constraints, replacing prior hybrid-only cardinality tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`baybe/constraints/discrete.py`	Adds `DiscreteBatchConstraint` and its subspace mask generation.
`baybe/constraints/validation.py`	Adds validation to prevent multiple batch constraints on the same parameter.
`baybe/constraints/__init__.py`	Exposes `DiscreteBatchConstraint` publicly.
`baybe/searchspace/discrete.py`	Adds discrete subspace-generating constraint plumbing and mask enumeration/sampling helpers.
`baybe/searchspace/continuous.py`	Renames/aligns to `n_theoretical_subspaces` and adds shuffled / with-replacement subspace configuration iteration.
`baybe/searchspace/core.py`	Adds combined discrete+continuous theoretical subspace counting and combined sampling utilities for hybrid spaces.
`baybe/recommenders/pure/base.py`	Adds `supports_discrete_subspace_constraints` gate and raises `IncompatibilityError` when unsupported.
`baybe/recommenders/pure/bayesian/botorch.py`	Implements discrete + hybrid optimization over subspaces when batch constraints are present.
`baybe/recommenders/pure/nonpredictive/sampling.py`	Updates `RandomRecommender` to respect discrete subspace-generating constraints.
`docs/userguide/constraints.md`	Documents `DiscreteBatchConstraint` and recommender compatibility.
`tests/constraints/test_batch_constraint.py`	Adds focused tests for the new constraint (behavior, validation, incompatibility, subspace counting/masks).
`tests/constraints/test_subspace_constraints_hybrid.py`	Adds parametrized hybrid tests covering discrete batch + (discrete/continuous) cardinality constraints.
`tests/constraints/test_cardinality_constraint_hybrid.py`	Removes older hybrid cardinality-only test file (superseded).
`CHANGELOG.md`	Records new user-facing feature entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AVHopp

I do not have any major comments. I personally find it a bit weird that the concept of a "partition" is somehow identical in both discrete and continuous subspaces, but caused by completely different things. However, I do not see any more elegant solution to this, and this might also be personal preference.

Scienfitz · 2026-04-27T11:53:00Z

@AdrianSosic appreciate your review

AdrianSosic

Hey @Scienfitz, thanks for the new feature, very good to have indeed. I'm already submitting the review so that you have something to work with, but I still need to read some parts of the partitioning logic

AdrianSosic · 2026-04-28T08:12:38Z

+    def __attrs_post_init__(self):
+        """Validate that exactly one parameter is specified."""
+        if len(self.parameters) != 1:
+            raise ValueError(
+                f"'{self.__class__.__name__}' requires exactly one parameter, "
+                f"but {len(self.parameters)} were provided: {self.parameters}."


Can we get rid of this restriction? My first naive guess would be that it should not be too hard?

this is unnecessary
it is also already kind of already implemented in the way that you can just use several of such constraints, imo that is much clearer than allowing several parameters for this constraint
allowing >1 parameters here would open the possibility of confusion

Really? To me, it's rather the current design that is super unintuitive. We have an object that takes parameters (plural!) and then we are only allowed to pass a single one, which needs to be validated using additional logic. And then we allow the option to have several such constraints, which effectively does what would happen if we simply allowed more parameters in the first place.

Where do you see potential confusion? I can only see one way how to interpret a call like parameters=["A", "B"], namely exactly the way you've implemented with several such constraints. The mean can only go columnwise – there is no way that A and B would possibly interfere since they have their own separate value ranges.

so there is technical debt here causing this confusion, it is just parameters because all constraints currently share this. This is to be altered/fixed in #517 and not in this PR

But it has now driven me into this weird design that the constraint has parameters but only accepts 1 of them

I would much prefer A:

that you have exactly one batch constraint per parameter you want to constrain

as many batch constraints (for different parameters) as you want

With the alternative being B:

only one allowed batch constraint in the searchspace

it takes multiple unique parameters

I prefer A because the filtering logic is entirely orthogonal, so expressing it as independent constraints make sense to me. I already made the plan to change this also for the dependencies constraint here #670 so far I have not received any differing opinion

I could live with both variants, but it should be amde consistent with the dependencies as well, so let me know. But please do not make your call based on the parameters name arg

Or alternative C:

allow an arbitrary number of parameters per constraint

allow an arbitrary number of such constraints

I don't know why the above should not be possible, and it would be the most natural option for me since:

Pretty much all other constraints behave that way (i.e. there is no restriction on the number of parameters for sum constraint, and we can have several of those)

It feels natural that one might want to use one such constraint per physical limitation in the system. For example, you have one piece of equipment where temperature/pressure must be synced across the batch, but for the other equipment you need to sync the pH value. --> Two constraints, one with temp/pressure, the other with pH

While you could of course achieve the same with three or just one constraint, why take away the possibility to structure it for no reason?

But I think a third opinion is a good idea here: @AVHopp, @kalama-ai?

I think allowing that (option C) would be worst of both worlds - degeneracy in configurations should be avoided especially if its easy (like in this case)

AND

it would be inconsistent with the cardinality constraints where only one set of parameters is specified instead of potentially multiple sets (the set of parameters over which the constraint holds is equivalent to a single parameter in the batch constraint over which the batch constraint holds)

AdrianSosic · 2026-05-05T07:55:59Z

    @property
-    def n_inactive_parameter_combinations(self) -> int:
-        """The number of possible inactive parameter combinations."""
+    def n_theoretical_partitions(self) -> int:


Let's keep the theoretical debate open until we've agreed on the resampling behavior for the discrete space

AdrianSosic · 2026-05-05T08:46:09Z

+def recommend_discrete_with_partitions(
+    recommender: BotorchRecommender,
+    subspace_discrete: SubspaceDiscrete,
+    candidates_exp: pd.DataFrame,


I have a stupid question that might not be related to your PR but is still relevant: why the heck to we actually pass candidates_exp as a separate argument here? If this is really just the experimental representation of the discrete candidates then we could simply get it from subspace_discrete.exp_rep!?

In the old (large) file, there sometimes was docstring like "The experimental representation of all discrete candidate points to be considered.", which sort of indicates to me that this might be a reduced set for whatever reason. Since you just refactored this and perhaps have a better picture in mind: can you explain?

The answer to this also carries over to your new methods, like partition_masks, which also take an explicit candidates_exp which would be available directly from the space!

the candidates are a subset of the searchspace that is prefiltered so its not the same, or what am I missing?

Creating the masks on the candidate set is not necessarily the same as creating them on the entire discrete searchspace.

AdrianSosic · 2026-05-05T09:20:30Z

+    def inactive_parameter_combinations(  # noqa: DOC404
+        self,
+        *,
+        shuffle: bool = False,


I have a suggestion for the shuffle argument here and in the corresponding discrete case. What I dislike about the method signature is that we have two seemingly independent arguments which are however coupled in reality (see your docstring for shuffle). This always opens the door for silent "errors", i.e. when someone provides both flags as True. We've had this countless times in other places and I urgently want to avoid these designs in the future (a la avoid invalid/meaningless configs by design).

For this situation here, we might have a very simple answer, which even simplifies the entire thing: as far as I can tell, there is no use case where we'd actually require a deterministic order, right? Also, I don't think there is a canonical default order – after all, this depends on the (arbitrary) order of constraints etc. So why not simply drop the argument altogether and just make shuffle True by default?

as discussed: awaiting commit with suggestion

AdrianSosic · 2026-05-05T09:41:12Z

+
+        total = math.prod(len(v) for v in per_constraint)
+
+        def _resolve_flat_idx(flat_idx: int) -> frozenset[str]:


Can you please add a (simple) docstring just summarizing what the function does? Otherwise, people (like me) have to read the code to understand that we're mapping from enumeration index to its corresponding inactivation sets.

I think this function exists twice, and unfortunately you looked a exactly the version that has no docstring or comment

the comment for the same function in discrete says:

# Decompose flat index into per-constraint indices. # Example with 3 constraints of partition lengths [3, 2, 4]: # flat_idx=11 -> divmod(11,3)=(3,2) -> A[2] # divmod(3,2)=(1,1) -> B[1] # divmod(1,4)=(0,1) -> C[1] # Result: masks A[2] AND B[1] AND C[1]

Is that enough?

I guess it could be extracted ina utility by adding another parameter to it, should I do that?

extracted into utility 7a5d725

AdrianSosic · 2026-05-07T11:27:00Z

+    def __attrs_post_init__(self):
+        """Validate that exactly one parameter is specified."""
+        if len(self.parameters) != 1:
+            raise ValueError(
+                f"'{self.__class__.__name__}' requires exactly one parameter, "
+                f"but {len(self.parameters)} were provided: {self.parameters}."


Really? To me, it's rather the current design that is super unintuitive. We have an object that takes parameters (plural!) and then we are only allowed to pass a single one, which needs to be validated using additional logic. And then we allow the option to have several such constraints, which effectively does what would happen if we simply allowed more parameters in the first place.

Where do you see potential confusion? I can only see one way how to interpret a call like parameters=["A", "B"], namely exactly the way you've implemented with several such constraints. The mean can only go columnwise – there is no way that A and B would possibly interfere since they have their own separate value ranges.

Co-authored-by: Alexander V. Hopp <alexander.hopp@merckgroup.com>

Consistent with the existing check-return-types=False setting.

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

- Rename DiscreteBatchConstraint.get_invalid to _get_invalid to align with the abstract method rename introduced on main - Add DiscreteBatchConstraint to DISCRETE_CONSTRAINTS_FILTERING_ORDER so build_constrained_product (introduced on main) can sort it - Ignore BadInitialCandidatesWarning in pytest.ini; the warning fires non-deterministically in heavily constrained spaces regardless of data volume

Scienfitz self-assigned this Mar 16, 2026

Scienfitz added the new feature New functionality label Mar 16, 2026

Scienfitz changed the base branch from main to refactor/subspace_constraints March 16, 2026 14:37

Scienfitz force-pushed the feature/discrete_batch_constraint branch from 4a381ec to 1c3bbef Compare March 18, 2026 16:12

Scienfitz marked this pull request as ready for review March 18, 2026 16:24

Scienfitz requested review from AVHopp and AdrianSosic as code owners March 18, 2026 16:24

Copilot AI review requested due to automatic review settings March 18, 2026 16:24

Copilot started reviewing on behalf of Scienfitz March 18, 2026 16:25 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Scienfitz changed the base branch from refactor/subspace_constraints to main March 30, 2026 12:57

Scienfitz mentioned this pull request Mar 30, 2026

Refactor Subspace Constraints #764

Closed

Scienfitz marked this pull request as draft March 30, 2026 13:02

This comment was marked as resolved.

Sign in to view

Scienfitz force-pushed the feature/discrete_batch_constraint branch 4 times, most recently from dd5bbf6 to 23b3348 Compare April 1, 2026 23:37

Scienfitz marked this pull request as ready for review April 2, 2026 00:12

AVHopp reviewed Apr 8, 2026

View reviewed changes

Scienfitz force-pushed the feature/discrete_batch_constraint branch from 843793f to a188d66 Compare April 9, 2026 16:06

Scienfitz added this to the 0.15.0 milestone Apr 23, 2026

AVHopp approved these changes Apr 23, 2026

View reviewed changes

AdrianSosic mentioned this pull request May 4, 2026

Botorch preset #757

Open

AdrianSosic requested changes May 5, 2026

View reviewed changes

AdrianSosic reviewed May 5, 2026

View reviewed changes

Comment thread baybe/searchspace/continuous.py Outdated

AdrianSosic reviewed May 5, 2026

View reviewed changes

AdrianSosic reviewed May 7, 2026

View reviewed changes

Scienfitz and others added 22 commits May 7, 2026 19:00

Adjust constraint property names

3417f91

Improve docstring language

77818e9

Improve partition sampling

4e76ccf

Split BotorchRecommender into submodules

c7597e8

Rename subspace to partition

273567c

Update CHANGELOG

7fdfb5c

Improve docstring

5ab2a6d

Co-authored-by: Alexander V. Hopp <alexander.hopp@merckgroup.com>

Use consistent mask type hint in discrete recommender

231f82a

Mention replacement in docstring

f27cbca

Mention computational expense

ce9c5e3

Simplify infinite iterator in inactive_parameter_combinations

f8c1905

Disable pydoclint yield type checking (DOC404) globally

32a6edf

Consistent with the existing check-return-types=False setting.

Update docstring

86c87b2

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Update signature

d212060

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Update docstring

a835cd4

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Fix formatting

41902c3

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Improve language

d76a6a3

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Improve language

2f14ea6

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Make formatting consistent

5ebccad

Turn docstring into comment

bc19835

Improve tests

a980be9

Scienfitz force-pushed the feature/discrete_batch_constraint branch from 6224f62 to 615b581 Compare May 7, 2026 17:47

AdrianSosic and others added 2 commits May 8, 2026 07:33

Refactor constraint parametrization

e6b60fc

Extract select_via_flat_index as shared utility

7a5d725

Scienfitz force-pushed the feature/discrete_batch_constraint branch from 2ff2f31 to 5b6e220 Compare May 8, 2026 15:39

Scienfitz added 2 commits May 8, 2026 20:36

Rename partition to subset

1ee36c2

Fix docstring

86aaaf9

Scienfitz force-pushed the feature/discrete_batch_constraint branch from 5b6e220 to 86aaaf9 Compare May 8, 2026 18:37

Add formatting rule to AGENTS.md

c1b33ac


		total = math.prod(len(v) for v in per_constraint)

		def _resolve_flat_idx(flat_idx: int) -> frozenset[str]:

Conversation

Scienfitz commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

AVHopp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Scienfitz commented Apr 27, 2026

Uh oh!

AdrianSosic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Scienfitz May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Scienfitz commented Mar 16, 2026 •

edited

Loading

Scienfitz May 7, 2026 •

edited

Loading