Add BO termination conditions#724
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for Bayesian Optimization termination conditions to allow BO runs to stop early based on convergence criteria (initially: the UCB–LCB simple-regret upper bound inspired by Makarova et al., 2022). This integrates termination evaluation into the existing run() benchmark loop and adds a small termination submodule with data models, evaluators, and mapping utilities.
Changes:
- Introduces termination-condition data models (
MaxIterationsTermination,AlwaysContinue,UCBLCBRegretTermination,CombiTerminationCondition). - Adds a termination evaluator (
UCBLCBRegretEvaluator) and mapper utilities to determine which evaluators are needed. - Extends the benchmark runner to optionally stop early and to return a richer
RunResultincluding termination metadata (plus tests).
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| bofire/data_models/termination/termination.py | New termination-condition data models + should_terminate logic. |
| bofire/data_models/termination/api.py | Exports termination-condition data models. |
| bofire/data_models/termination/init.py | Package init (empty). |
| bofire/termination/evaluator.py | Implements evaluator computing UCB–LCB regret bound + noise estimate. |
| bofire/termination/mapper.py | Maps conditions to evaluators; recursively gathers needed evaluators. |
| bofire/termination/api.py | Exports evaluator + mapping helpers. |
| bofire/termination/init.py | Package init (empty). |
| bofire/runners/run.py | Adds termination_condition support, early-stop logic, and new RunResult return type. |
| bofire/runners/api.py | Re-exports RunResult from runners API. |
| tests/bofire/data_models/test_termination.py | Unit tests for termination condition data models + serialization. |
| tests/bofire/termination/test_evaluator.py | Unit + integration tests for evaluator and run() early termination behavior. |
| tests/bofire/termination/test_mapper.py | Tests for termination mapper and evaluator discovery. |
| tests/bofire/termination/init.py | Test package init. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| iteration: int, | ||
| **kwargs, | ||
| ) -> bool: | ||
| return iteration >= self.max_iterations |
There was a problem hiding this comment.
iteration in run() is 0-based (see RunResult.final_iteration == n_iterations-1 in tests), but this condition terminates when iteration >= max_iterations. That makes max_iterations behave like a max iteration index rather than a max number of iterations (off-by-one for typical expectations). Consider using iteration + 1 >= max_iterations (or renaming the field / documenting the 0-based convention).
| return iteration >= self.max_iterations | |
| return iteration + 1 >= self.max_iterations |
| @dataclass | ||
| class RunResult: | ||
| """Result of a single optimization run. | ||
|
|
||
| Supports tuple-style access for backward compatibility with ``run()``, | ||
| which historically returned ``List[Tuple[pd.DataFrame, pd.Series]]``. | ||
| You can still write ``experiments, metric_values = result`` or | ||
| ``result[0]`` / ``result[1]``. | ||
| """ | ||
|
|
||
| experiments: pd.DataFrame | ||
| metric_values: pd.Series | ||
| terminated_early: bool = False | ||
| final_iteration: int = 0 | ||
| termination_metrics: Dict[str, List[float]] = field(default_factory=dict) | ||
|
|
||
| # Backward-compat: behave like (experiments, metric_values) tuple. | ||
| def __iter__(self): | ||
| return iter((self.experiments, self.metric_values)) | ||
|
|
||
| def __getitem__(self, idx): | ||
| return (self.experiments, self.metric_values)[idx] | ||
|
|
||
|
|
There was a problem hiding this comment.
run() now returns List[RunResult] instead of the historical List[Tuple[pd.DataFrame, pd.Series]]. Even with tuple-style __iter__/__getitem__, this is still a public API/type-contract breaking change for callers that check types, serialize results, or rely on tuple methods. Consider either (a) keeping the tuple return and exposing termination info separately, or (b) using a NamedTuple / tuple subclass so isinstance(result, tuple) remains true, plus documenting the change in release notes.
| Where: | ||
| - UCB(x) = mu(x) + beta * sigma(x) (Upper Confidence Bound) | ||
| - LCB(x) = mu(x) - beta * sigma(x) (Lower Confidence Bound) |
There was a problem hiding this comment.
The UCB/LCB formulas in this docstring use beta * sigma(x), but the evaluator implementation uses sqrt(beta) * sigma(x) (GP-UCB style). Please align the documentation with the implementation (or vice versa) to avoid users configuring beta incorrectly.
| Where: | |
| - UCB(x) = mu(x) + beta * sigma(x) (Upper Confidence Bound) | |
| - LCB(x) = mu(x) - beta * sigma(x) (Lower Confidence Bound) | |
| Where (GP-UCB style): | |
| - UCB(x) = mu(x) + sqrt(beta) * sigma(x) (Upper Confidence Bound) | |
| - LCB(x) = mu(x) - sqrt(beta) * sigma(x) (Lower Confidence Bound) |
| iteration: int, | ||
| **kwargs, | ||
| ) -> bool: | ||
| if iteration < self.min_iterations: |
There was a problem hiding this comment.
Similar to MaxIterationsTermination, iteration is currently treated as 0-based by the runner, but the min_iterations gate uses iteration < min_iterations. This means min_iterations=5 won’t start checking until the 6th BO step. Consider switching to iteration + 1 < min_iterations (or clearly documenting that iteration is a 0-based index).
| if iteration < self.min_iterations: | |
| # `iteration` is 0-based, but `min_iterations` represents a count of completed | |
| # optimization steps. Convert to 1-based before applying the threshold so that, | |
| # e.g., `min_iterations=5` starts checking on the 5th BO step, not the 6th. | |
| if iteration + 1 < self.min_iterations: |
| if evaluator is not None: | ||
| evaluators.append(evaluator) | ||
|
|
||
| return evaluators |
There was a problem hiding this comment.
get_required_evaluators can return duplicate evaluator instances when the same evaluator type appears multiple times in a (nested) CombiTerminationCondition. In the runner this will redundantly recompute metrics and also append multiple values per iteration under the same key in termination_metrics. Consider deduplicating by evaluator type (or by condition identity) before returning the list.
| return evaluators | |
| # Deduplicate evaluators by their type to avoid redundant computations | |
| unique_evaluators = [] | |
| seen_types = set() | |
| for evaluator in evaluators: | |
| evaluator_type = type(evaluator) | |
| if evaluator_type in seen_types: | |
| continue | |
| seen_types.add(evaluator_type) | |
| unique_evaluators.append(evaluator) | |
| return unique_evaluators |
| # --- min UCB over evaluated points --- | ||
| evaluated_inputs = experiments[strategy.domain.inputs.get_keys()] | ||
| transformed_evaluated = strategy.domain.inputs.transform( | ||
| evaluated_inputs, | ||
| strategy.input_preprocessing_specs, | ||
| ) | ||
| X_evaluated = torch.from_numpy(transformed_evaluated.values).to(**tkwargs) | ||
|
|
||
| with torch.no_grad(): | ||
| posterior_evaluated = strategy.model.posterior(X_evaluated) | ||
| mean_evaluated = posterior_evaluated.mean.squeeze(-1) | ||
| std_evaluated = posterior_evaluated.variance.squeeze(-1).sqrt() | ||
|
|
||
| ucb_evaluated = mean_evaluated + sqrt_beta * std_evaluated | ||
| min_ucb_evaluated = float(ucb_evaluated.min().item()) | ||
|
|
There was a problem hiding this comment.
There is no test coverage for using UCBLCBRegretTermination / UCBLCBRegretEvaluator on a maximization objective (where the bound logic should differ). Adding at least one test benchmark/domain with MaximizeObjective (or a simple synthetic domain) would prevent regressions and would have caught the current minimization-only implementation.
| @@ -1,2 +1,2 @@ | |||
| from bofire.runners.hyperoptimize import hyperoptimize | |||
| from bofire.runners.run import run | |||
| from bofire.runners.run import RunResult, run | |||
There was a problem hiding this comment.
Import of 'RunResult' is not used.
Import of 'run' is not used.
| from bofire.runners.run import RunResult, run |
| if hasattr(acqf, "beta"): | ||
| return acqf.beta | ||
| except AttributeError: | ||
| pass |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| pass | |
| # If the strategy has no acquisition_function (or it raises AttributeError), | |
| # fall back to the default beta value. | |
| return self.fallback_beta |
|
Hi @wgst, thanks for the PR and sorry for the late answer. @bertiqwerty will provide a thorough and more detailed review. One general comment from my side: I think it would be the best to incorparate the termination criteria not into the runner but into the strategy itself, we have the Best, JOhannes |
| return iteration >= self.max_iterations | ||
|
|
||
|
|
||
| class AlwaysContinue(TerminationCondition, EvaluatableTermination): |
There was a problem hiding this comment.
This one also already exisits in the StepwiseStrategy as a condition.
|
Thanks @wgst and @jduerholt. I agree with Johannes. Please use the |
bertiqwerty
left a comment
There was a problem hiding this comment.
Please refactor into the StepwiseStrategy.
…LCB regret bound - Move termination logic from runner into StepwiseStrategy conditions - Add UCBLCBRegretBoundCondition with 4 threshold modes (gp_noise, range, cv, manual) - Add UCBLCBRegretEvaluator with configurable beta (GP-UCB, Srinivas et al. 2010) - Add top-q filtering support for regret bound computation - Extract _min_ucb_evaluated() method for clarity - Add OptimizationComplete exception for clean termination signaling - Remove old termination data models, mapper, and tests (replaced by conditions) - Add comprehensive test suite (test_condition.py, test_evaluator.py) - All 49 tests passing
…evaluators Option A refactoring: both UCBLCBRegretEvaluator and ExpMinRegretGapEvaluator now support lcb_method parameter to choose between "sample" (default) and "optimize" LCB computation over the domain. Changes: - Convert _ucb_lcb_regret_bound from static to instance method - Add lcb_method parameter to ExpMinRegretGapEvaluator (was Makarova-only) - Move _min_lcb_optimize to base TerminationEvaluator class for code reuse - Unify LCB computation logic in both evaluators via _ucb_lcb_regret_bound - Add threshold computation helpers module (thresholds.py) - Update tests to verify both methods work correctly Both evaluators maintain backward compatibility (default "sample" method). Optimize method uses strategy's acqf_optimizer for more efficient LCB search.
…_min_lcb_optimize to base class Convert _ucb_lcb_regret_bound from static to instance method to access self.lcb_method. Move _min_lcb_optimize to TerminationEvaluator base class for code reuse by both Makarova and Ishibashi evaluators. Both evaluators now flexibly support "sample" (default) and "optimize" methods for LCB computation.
Refactor UCBLCBRegretBoundCondition and ExpMinRegretGapCondition to use new threshold computation helpers from thresholds module. Update data model exports.
Update benchmark functions and exports to support termination condition testing.
Add tests for new lcb_method parameter in both Makarova and Ishibashi evaluators. Verify both "sample" and "optimize" methods work correctly. Update condition specs and evaluator tests to reflect refactored code structure.
Motivation
BO loops in BoFire currently run for a fixed number of iterations. In practice, especially for chemical experiments, where each iteration is very costly, we need a way to stop optimization early enough to reduce costs but still get a converged result.
This is an attempt to implement different termination conditions methods. The first method currently added in this PR is from Makarova, A. et al. (2022). The idea behind this method is to compute an upper bound on the simple regret of the current best recommendation using the GP posterior, and when this bound falls below a threshold ε_BO (derived from the observation noise variance), then we are confident that any further improvement will be smaller than the noise, and optimization can stop.
Two other termination methods - issues with implementing
EMSR Gap: Ishibashi, H. et al. (2023). This method requires computing an Expected Minimum Simple Regret gap between consecutive iterations, which needs per-iteration model snapshots or knowledge of which experiments belong to which iteration. I have not found any tracking functionalities within BoFire, which makes it hard to implement this method at the moment.
Probabilistic Regret Bound (PRB): Wilson, J. et al. (2024). This is a more probabilistic method, in which a statistically valid stopping rule is constructed using Matheron-rule pathwise trajectory samples (Thompson-style posterior samples via pathwise conditioning). Their implementation is based on TensorFlow and BO packages around it, and some crucial functionalities are not available in BoTorch.
Both methods could be good candidates for future work. Adding the method by Ishibashi et al should be fairly easy once the tracking architecture is settled. Method by Wilson could be a bit more problematic, but let's see.
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
Added tests:
test_termination.py)test_evaluator.py)test_mapper.py)Best,
Wojciech