Skip to content

Add BO termination conditions#724

Draft
wgst wants to merge 7 commits intoexperimental-design:mainfrom
wgst:termination-conditions
Draft

Add BO termination conditions#724
wgst wants to merge 7 commits intoexperimental-design:mainfrom
wgst:termination-conditions

Conversation

@wgst
Copy link
Copy Markdown

@wgst wgst commented Feb 13, 2026

Motivation

BO loops in BoFire currently run for a fixed number of iterations. In practice, especially for chemical experiments, where each iteration is very costly, we need a way to stop optimization early enough to reduce costs but still get a converged result.

This is an attempt to implement different termination conditions methods. The first method currently added in this PR is from Makarova, A. et al. (2022). The idea behind this method is to compute an upper bound on the simple regret of the current best recommendation using the GP posterior, and when this bound falls below a threshold ε_BO (derived from the observation noise variance), then we are confident that any further improvement will be smaller than the noise, and optimization can stop.

Two other termination methods - issues with implementing

  1. EMSR Gap: Ishibashi, H. et al. (2023). This method requires computing an Expected Minimum Simple Regret gap between consecutive iterations, which needs per-iteration model snapshots or knowledge of which experiments belong to which iteration. I have not found any tracking functionalities within BoFire, which makes it hard to implement this method at the moment.

  2. Probabilistic Regret Bound (PRB): Wilson, J. et al. (2024). This is a more probabilistic method, in which a statistically valid stopping rule is constructed using Matheron-rule pathwise trajectory samples (Thompson-style posterior samples via pathwise conditioning). Their implementation is based on TensorFlow and BO packages around it, and some crucial functionalities are not available in BoTorch.

Both methods could be good candidates for future work. Adding the method by Ishibashi et al should be fairly easy once the tracking architecture is settled. Method by Wilson could be a bit more problematic, but let's see.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Added tests:

  • data model (test_termination.py)
  • evaluator (test_evaluator.py)
  • mapper (test_mapper.py)

Best,
Wojciech

Copilot AI review requested due to automatic review settings February 13, 2026 02:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for Bayesian Optimization termination conditions to allow BO runs to stop early based on convergence criteria (initially: the UCB–LCB simple-regret upper bound inspired by Makarova et al., 2022). This integrates termination evaluation into the existing run() benchmark loop and adds a small termination submodule with data models, evaluators, and mapping utilities.

Changes:

  • Introduces termination-condition data models (MaxIterationsTermination, AlwaysContinue, UCBLCBRegretTermination, CombiTerminationCondition).
  • Adds a termination evaluator (UCBLCBRegretEvaluator) and mapper utilities to determine which evaluators are needed.
  • Extends the benchmark runner to optionally stop early and to return a richer RunResult including termination metadata (plus tests).

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
bofire/data_models/termination/termination.py New termination-condition data models + should_terminate logic.
bofire/data_models/termination/api.py Exports termination-condition data models.
bofire/data_models/termination/init.py Package init (empty).
bofire/termination/evaluator.py Implements evaluator computing UCB–LCB regret bound + noise estimate.
bofire/termination/mapper.py Maps conditions to evaluators; recursively gathers needed evaluators.
bofire/termination/api.py Exports evaluator + mapping helpers.
bofire/termination/init.py Package init (empty).
bofire/runners/run.py Adds termination_condition support, early-stop logic, and new RunResult return type.
bofire/runners/api.py Re-exports RunResult from runners API.
tests/bofire/data_models/test_termination.py Unit tests for termination condition data models + serialization.
tests/bofire/termination/test_evaluator.py Unit + integration tests for evaluator and run() early termination behavior.
tests/bofire/termination/test_mapper.py Tests for termination mapper and evaluator discovery.
tests/bofire/termination/init.py Test package init.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bofire/termination/evaluator.py Outdated
Comment thread bofire/termination/evaluator.py Outdated
iteration: int,
**kwargs,
) -> bool:
return iteration >= self.max_iterations
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteration in run() is 0-based (see RunResult.final_iteration == n_iterations-1 in tests), but this condition terminates when iteration >= max_iterations. That makes max_iterations behave like a max iteration index rather than a max number of iterations (off-by-one for typical expectations). Consider using iteration + 1 >= max_iterations (or renaming the field / documenting the 0-based convention).

Suggested change
return iteration >= self.max_iterations
return iteration + 1 >= self.max_iterations

Copilot uses AI. Check for mistakes.
Comment thread bofire/runners/run.py Outdated
Comment on lines +20 to +43
@dataclass
class RunResult:
"""Result of a single optimization run.

Supports tuple-style access for backward compatibility with ``run()``,
which historically returned ``List[Tuple[pd.DataFrame, pd.Series]]``.
You can still write ``experiments, metric_values = result`` or
``result[0]`` / ``result[1]``.
"""

experiments: pd.DataFrame
metric_values: pd.Series
terminated_early: bool = False
final_iteration: int = 0
termination_metrics: Dict[str, List[float]] = field(default_factory=dict)

# Backward-compat: behave like (experiments, metric_values) tuple.
def __iter__(self):
return iter((self.experiments, self.metric_values))

def __getitem__(self, idx):
return (self.experiments, self.metric_values)[idx]


Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run() now returns List[RunResult] instead of the historical List[Tuple[pd.DataFrame, pd.Series]]. Even with tuple-style __iter__/__getitem__, this is still a public API/type-contract breaking change for callers that check types, serialize results, or rely on tuple methods. Consider either (a) keeping the tuple return and exposing termination info separately, or (b) using a NamedTuple / tuple subclass so isinstance(result, tuple) remains true, plus documenting the change in release notes.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +110
Where:
- UCB(x) = mu(x) + beta * sigma(x) (Upper Confidence Bound)
- LCB(x) = mu(x) - beta * sigma(x) (Lower Confidence Bound)
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UCB/LCB formulas in this docstring use beta * sigma(x), but the evaluator implementation uses sqrt(beta) * sigma(x) (GP-UCB style). Please align the documentation with the implementation (or vice versa) to avoid users configuring beta incorrectly.

Suggested change
Where:
- UCB(x) = mu(x) + beta * sigma(x) (Upper Confidence Bound)
- LCB(x) = mu(x) - beta * sigma(x) (Lower Confidence Bound)
Where (GP-UCB style):
- UCB(x) = mu(x) + sqrt(beta) * sigma(x) (Upper Confidence Bound)
- LCB(x) = mu(x) - sqrt(beta) * sigma(x) (Lower Confidence Bound)

Copilot uses AI. Check for mistakes.
iteration: int,
**kwargs,
) -> bool:
if iteration < self.min_iterations:
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to MaxIterationsTermination, iteration is currently treated as 0-based by the runner, but the min_iterations gate uses iteration < min_iterations. This means min_iterations=5 won’t start checking until the 6th BO step. Consider switching to iteration + 1 < min_iterations (or clearly documenting that iteration is a 0-based index).

Suggested change
if iteration < self.min_iterations:
# `iteration` is 0-based, but `min_iterations` represents a count of completed
# optimization steps. Convert to 1-based before applying the threshold so that,
# e.g., `min_iterations=5` starts checking on the 5th BO step, not the 6th.
if iteration + 1 < self.min_iterations:

Copilot uses AI. Check for mistakes.
Comment thread bofire/termination/mapper.py Outdated
if evaluator is not None:
evaluators.append(evaluator)

return evaluators
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_required_evaluators can return duplicate evaluator instances when the same evaluator type appears multiple times in a (nested) CombiTerminationCondition. In the runner this will redundantly recompute metrics and also append multiple values per iteration under the same key in termination_metrics. Consider deduplicating by evaluator type (or by condition identity) before returning the list.

Suggested change
return evaluators
# Deduplicate evaluators by their type to avoid redundant computations
unique_evaluators = []
seen_types = set()
for evaluator in evaluators:
evaluator_type = type(evaluator)
if evaluator_type in seen_types:
continue
seen_types.add(evaluator_type)
unique_evaluators.append(evaluator)
return unique_evaluators

Copilot uses AI. Check for mistakes.
Comment thread bofire/termination/evaluator.py Outdated
Comment on lines +143 to +158
# --- min UCB over evaluated points ---
evaluated_inputs = experiments[strategy.domain.inputs.get_keys()]
transformed_evaluated = strategy.domain.inputs.transform(
evaluated_inputs,
strategy.input_preprocessing_specs,
)
X_evaluated = torch.from_numpy(transformed_evaluated.values).to(**tkwargs)

with torch.no_grad():
posterior_evaluated = strategy.model.posterior(X_evaluated)
mean_evaluated = posterior_evaluated.mean.squeeze(-1)
std_evaluated = posterior_evaluated.variance.squeeze(-1).sqrt()

ucb_evaluated = mean_evaluated + sqrt_beta * std_evaluated
min_ucb_evaluated = float(ucb_evaluated.min().item())

Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test coverage for using UCBLCBRegretTermination / UCBLCBRegretEvaluator on a maximization objective (where the bound logic should differ). Adding at least one test benchmark/domain with MaximizeObjective (or a simple synthetic domain) would prevent regressions and would have caught the current minimization-only implementation.

Copilot uses AI. Check for mistakes.
Comment thread bofire/runners/api.py Outdated
@@ -1,2 +1,2 @@
from bofire.runners.hyperoptimize import hyperoptimize
from bofire.runners.run import run
from bofire.runners.run import RunResult, run
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'RunResult' is not used.
Import of 'run' is not used.

Suggested change
from bofire.runners.run import RunResult, run

Copilot uses AI. Check for mistakes.
Comment thread bofire/termination/evaluator.py Outdated
if hasattr(acqf, "beta"):
return acqf.beta
except AttributeError:
pass
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
# If the strategy has no acquisition_function (or it raises AttributeError),
# fall back to the default beta value.
return self.fallback_beta

Copilot uses AI. Check for mistakes.
@jduerholt
Copy link
Copy Markdown
Contributor

Hi @wgst,

thanks for the PR and sorry for the late answer. @bertiqwerty will provide a thorough and more detailed review. One general comment from my side:

I think it would be the best to incorparate the termination criteria not into the runner but into the strategy itself, we have the StepwiseStrategy that can be used to compose different atomic strategies into one including conditions when to execute which one (https://github.com/experimental-design/bofire/blob/main/bofire/data_models/strategies/stepwise/stepwise.py https://github.com/experimental-design/bofire/blob/main/bofire/strategies/stepwise/stepwise.py) I think it would be nice, to implement the stopping criteria as a condition for a certain stop. What do you think?

Best,

JOhannes

return iteration >= self.max_iterations


class AlwaysContinue(TerminationCondition, EvaluatableTermination):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also already exisits in the StepwiseStrategy as a condition.

@bertiqwerty
Copy link
Copy Markdown
Contributor

bertiqwerty commented Feb 19, 2026

Thanks @wgst and @jduerholt. I agree with Johannes. Please use the StepwiseStrategy. If you need more guidance on how to do that, @wgst, let us have a call. I will have a closer look as soon as this refactored.

Copy link
Copy Markdown
Contributor

@bertiqwerty bertiqwerty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor into the StepwiseStrategy.

wgst added 6 commits March 18, 2026 16:36
…LCB regret bound

- Move termination logic from runner into StepwiseStrategy conditions
- Add UCBLCBRegretBoundCondition with 4 threshold modes (gp_noise, range, cv, manual)
- Add UCBLCBRegretEvaluator with configurable beta (GP-UCB, Srinivas et al. 2010)
- Add top-q filtering support for regret bound computation
- Extract _min_ucb_evaluated() method for clarity
- Add OptimizationComplete exception for clean termination signaling
- Remove old termination data models, mapper, and tests (replaced by conditions)
- Add comprehensive test suite (test_condition.py, test_evaluator.py)
- All 49 tests passing
…evaluators

Option A refactoring: both UCBLCBRegretEvaluator and ExpMinRegretGapEvaluator
now support lcb_method parameter to choose between "sample" (default) and
"optimize" LCB computation over the domain.

Changes:
- Convert _ucb_lcb_regret_bound from static to instance method
- Add lcb_method parameter to ExpMinRegretGapEvaluator (was Makarova-only)
- Move _min_lcb_optimize to base TerminationEvaluator class for code reuse
- Unify LCB computation logic in both evaluators via _ucb_lcb_regret_bound
- Add threshold computation helpers module (thresholds.py)
- Update tests to verify both methods work correctly

Both evaluators maintain backward compatibility (default "sample" method).
Optimize method uses strategy's acqf_optimizer for more efficient LCB search.
…_min_lcb_optimize to base class

Convert _ucb_lcb_regret_bound from static to instance method to access self.lcb_method.
Move _min_lcb_optimize to TerminationEvaluator base class for code reuse by both
Makarova and Ishibashi evaluators. Both evaluators now flexibly support "sample"
(default) and "optimize" methods for LCB computation.
Refactor UCBLCBRegretBoundCondition and ExpMinRegretGapCondition to use new
threshold computation helpers from thresholds module. Update data model exports.
Update benchmark functions and exports to support termination condition testing.
Add tests for new lcb_method parameter in both Makarova and Ishibashi evaluators.
Verify both "sample" and "optimize" methods work correctly. Update condition specs
and evaluator tests to reflect refactored code structure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants