LLM Strategy by jduerholt · Pull Request #749 · experimental-design/bofire

jduerholt · 2026-04-08T08:58:28Z

Motivation

This PR implements a strategy making use of an LLM powered Agent to generate candidates. There is growing evidence that this can be helpful for warmstarting etc. if an optimization problem has specific context that can be exploited using the world knowledge of an LLM.

Relevant papers:

LLAMBO: https://arxiv.org/abs/2402.03921
BORA: https://chemrxiv.org/doi/full/10.26434/chemrxiv.10001632/v1
PABLO: https://arxiv.org/abs/2601.22382
Self evolving agents: https://arxiv.org/abs/2512.21782

But there is also clear evidence that agents are not replacing BO, they seem to be especially useful for warmstarting, where I also see the greatest value:

CENTAUR: https://arxiv.org/abs/2603.24647

Have you read the Contributing Guidelines on pull requests?

Yes.

Have you updated `CHANGELOG.md`?

Not yet.

Test Plan

Unit tests.

…mization Enable users to attach free-text context to features, constraints, and the domain itself, so that LLM agents can leverage this information to better understand the optimization problem. Also documents the data model testing infrastructure and registration patterns in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Add methods for LLM prompt building directly on the data model classes: - Input.to_pydantic_field(): returns (type, FieldInfo) for dynamic Pydantic models with ge/le bounds, Literal enums, and context in descriptions - Inputs.to_pydantic_model(): assembles all input features into a dynamic Pydantic model via create_model() - Constraint.to_description(): human-readable math on all constraint subclasses - Objective.to_description(): human-readable on all objective subclasses - ContinuousOutput/CategoricalOutput.to_description(): combines objective + context - Domain.to_description(): assembles objectives, constraints, and problem context

Add standalone LLM provider data models and a mapper to pydantic-ai: - Data models: AnthropicLLMProvider, AnthropicFoundryLLMProvider, OpenAILLMProvider, OpenAICompatibleLLMProvider - API keys referenced via env var names (safe for serialization) - Mapper in bofire/llm/ converts data models to pydantic-ai Model instances - Optional [llm] dependency group in pyproject.toml - Test specs + serialization/deserialization tests

jduerholt

Looks good for the start, but there are still issues.

- Make Input.to_pydantic_field() and Output.to_description() abstract - Make Objective.to_description() abstract - Add docstrings with examples to all to_description/to_pydantic_field methods - Add to_pydantic_field overrides for molecular features (SMILES context) and descriptor features (descriptor mappings) - Handle allow_zero in ContinuousInput.to_pydantic_field (widen ge to 0) - Remove stepsize from ContinuousInput field description - Simplify Maximize/Minimize to_description to just "Maximize"/"Minimize" - Replace all non-essential objective to_description with NotImplementedError - Remove Field(description=...) from context fields (docstrings suffice) - Simplify [llm] dependency to just pydantic-ai, add to [all] group

jduerholt

Still some issues

- InterpointEqualityConstraint, NonlinearEquality/Inequality: NotImplementedError - CategoricalOutput.to_description(): NotImplementedError - Inputs.to_pydantic_model(): use top-level create_model import - Use get_allowed_categories() instead of manual list comprehension in CategoricalInput, CategoricalMolecularInput, CategoricalDescriptorInput

New strategy that uses pydantic-ai to have an LLM propose optimization candidates via structured output: Data model (bofire/data_models/strategies/llm.py): - Single-objective only (Maximize/Minimize), linear + NChooseK constraints - Configurable: temperature, max_tokens, thinking (reasoning effort) - Experiment presentation: n_recent_experiments, n_top_experiments - Custom system prompt support Functional implementation (bofire/strategies/llm.py): - Builds dynamic Pydantic output model from Domain via Inputs.to_pydantic_model() - Injects Domain.to_description() as system prompt context - Output validator runs domain.validate_candidates(), pydantic-ai retries on failure - Experiment selection: union of recent + top-performing, deduplicated - Lazy mapper registration to avoid circular imports

jduerholt

I let you comments, I also saw that tests regarding all the context, field generation etc. methods in the domain, features etc. were missing. Please add them. Also the make method for the LLM strategy is missing.

- Remove top_metric_key (obsolete for single-objective) - Remove is_objective_implemented (not called from base Strategy) - Use Outputs.get_by_objective() in single-objective validator - Move pydantic-ai model + output schema building to __init__ - Make validation errors verbose: show failed candidates + instructions - Label experiment sections for the LLM (recent vs top-k vs both)

- Add LLMStrategy.make() classmethod following existing strategy pattern - Add 31 tests for to_pydantic_field(), to_description(), to_pydantic_model(), context fields, and their roundtrip serialization

Distribute to_pydantic_field(), to_description(), to_pydantic_model(), and context tests into the specific test modules for each class: - test_continuous.py: ContinuousInput, ContinuousOutput - test_categorical.py: CategoricalInput - test_discrete.py: DiscreteInput - test_descriptor.py: CategoricalDescriptorInput, ContinuousDescriptorInput - test_molecular.py: CategoricalMolecularInput, ContinuousMolecularInput - test_constraints.py: Linear, NChooseK, Product constraints - test_domain.py: Domain.to_description, context roundtrip - test_inputs.py: Inputs.to_pydantic_model Remove standalone test_llm_methods.py.

- Linear constraint tests → tests/bofire/data_models/constraints/test_linear.py - NChooseK + Product tests → tests/bofire/data_models/constraints/test_to_description.py - Remove from tests/bofire/data_models/domain/test_constraints.py

Switch from pandas to_string() to JSON list-of-dicts format for both experiment history and validation error messages. This matches the structured output format (JSON in, JSON out) and is more token-efficient.

- LLM mapper: ValueError instead of KeyError, lists supported types, use LLMProvider base type, remove duplicate union, add module docstring - Lazy registration: warnings.warn on ImportError instead of silent pass, ValueError with install hint if strategy type still not found - Fix missing type hint on thinking parameter in LLMStrategy.make() - Make test domain objective explicit (MaximizeObjective) - Add module docstring to bofire/llm/api.py - Add 13 tests for _select_experiments, _build_proposal_model, _resolve_env_var, mapper errors, LLMStrategy validation

- Set output_retries=3 on pydantic-ai Agent for constraint validation retries - Set name="LLMStrategy" on Agent for observability - Reorder CategoricalDescriptorInput: fields before methods (style)

Add output_retries: PositiveInt = 3 to the data model, forwarded to pydantic-ai Agent's output_retries parameter.

… tests Replace all 'assert X in desc' with 'assert desc == expected' for precise matching of description strings across features, constraints, and domain.

Mirrors the kernels/ and priors/ conventions where mapper tests live next to the module they exercise, not in the strategies suite.

LukasHebing · 2026-04-22T06:19:54Z

Wow, great that you implemented the strategy! We were also looking into this already :)
I can have a look at this in the next days!

jduerholt · 2026-04-22T19:47:47Z

Wow, great that you implemented the strategy! We were also looking into this already :) I can have a look at this in the next days!

Thanks, it was super neat to implement it in bofire, as we are using so much pydantic already. I have whole stack of agentic ideas that we could implement in the future. It would be really nice, if you could review this, and also give feedback based on your experience. Would be nice to be able to land it soon ;)

The testing_optimization_only CI job installs without the [llm] extra and then runs tests/bofire/strategies, where test_llm.py imports pydantic_ai inside one of its tests. Add a module-level pytestmark mirroring the importlib.util.find_spec pattern already used for rdkit/sympy/torch.

jduerholt · 2026-04-24T08:46:24Z

@LukasHebing : Last test is now also succeeding.

LukasHebing

I really like the PR. This is a great entry-point for LLM-based optimizations, with a well chosen simple, but powerful approach.
Implementation-wise, I only have one major point: The LLM-provider landscape, and model parameter settings seem to fine-grained for bofire, as this is a very vast and fastly envolving field and any change on models, and we need to keep track of all of these.
I would recommend to check, whether we can rely on other open-source high-level LLM model interfaces. I personally used langchain, and I think this is somehow a strong open-source standard. The functions we need (structured output with pydantic models, and serializability are supported). However, I would also support good alternatives.

One hint, that we support LLM-based optimizations should also be present in the README.md, best with a short example notebook.

The three hand-picked fields were an arbitrary subset of pydantic-ai's ModelSettings — missing top_p, seed, provider-specific keys, etc. — and the per-field validation (temperature in [0, 2]) was provider-specific masquerading as generic. Pass the dict through unchanged: pydantic-ai and the underlying SDK own the settings surface and produce accurate errors for invalid keys.

test_make_strategy enforces strict annotation equality between the data model field and the make() classmethod kwarg. Use Optional[Dict[str, Any]] on both sides.

jduerholt · 2026-04-27T10:05:45Z

Hi @LukasHebing,

thanks for the review and the feedback.

The strategy already uses a high-level open-source LLM framework namely pydantic-ai. This is a package from the developers of pydantic with the scope of developing a pythonic interface for creating agents that produces structured outputs. As the PR heavily relied on pydantic-ai, I am very hesitant to switch it for a different framework, especially as pydanitc-ai is from my opinion better suited for the task that we want to solve here.

Pydantic-ai offers adapters to large range of different LLMs through different providers, only problem is here that these classes are for some library philosophical reasons not itself pydantic classes. This leads to the map functionality here. Note, that most models are also supported just based on a string, but the problem comes with third party providers as Azure Foundry, Bedrock etc. So we need our own serialization for the providers, and I do not see really a way around here.

I agree with you that we should not maintain all the model specific settings, this is unmaintanable (and I changed it) but I hope that the providers are more stable ..., of for you if we try it this way?

I will also add a note in the README, docs for the agentic stuff ;)

Best,

Johannes

Adds a lazy `agent` property that builds the Agent, output schema, and decorators once on first access. Per-call inputs (current experiments, domain) flow in through `_LLMDeps` on each `agent.run()`, matching pydantic-ai's intended design and BoFire's "build once, execute many" philosophy. Drops display of valid_* flag columns and the unused `n_candidates` field on `_LLMDeps`.

Adds the new LLM-driven molecular optimization tutorial (with `execute: eval: false` so CI does not call the provider), a Strategies user-guide section, README bullet, and API reference entries for the data-model and functional `LLMStrategy`.

jduerholt · 2026-04-27T13:18:43Z

Hi @LukasHebing, I also added examples and doc now. So it is ready for re-review ;)

…into feature/context

jduerholt and others added 3 commits April 7, 2026 13:04

jduerholt commented Apr 8, 2026

View reviewed changes

jduerholt added 2 commits April 8, 2026 12:11

Make Constraint.to_description() abstract

cc3b884

jduerholt commented Apr 8, 2026

View reviewed changes

jduerholt added 2 commits April 8, 2026 12:35

jduerholt commented Apr 8, 2026

View reviewed changes

jduerholt added 11 commits April 8, 2026 16:47

Build proposal model per-ask, not at init (domain may change)

afefc30

Add make() classmethod and tests for LLM-related methods

e9a246a

- Add LLMStrategy.make() classmethod following existing strategy pattern - Add 31 tests for to_pydantic_field(), to_description(), to_pydantic_model(), context fields, and their roundtrip serialization

Move constraint tests to dedicated test_nchoosek.py and test_product.py

e9f37f2

Use JSON records for experiment presentation to LLM

c044e6d

Switch from pandas to_string() to JSON list-of-dicts format for both experiment history and validation error messages. This matches the structured output format (JSON in, JSON out) and is more token-efficient.

Further cleanup: output_retries, agent name, field ordering

42b33c8

- Set output_retries=3 on pydantic-ai Agent for constraint validation retries - Set name="LLMStrategy" on Agent for observability - Reorder CategoricalDescriptorInput: fields before methods (style)

Make output_retries a configurable parameter on LLMStrategy

a59262b

Add output_retries: PositiveInt = 3 to the data model, forwarded to pydantic-ai Agent's output_retries parameter.

Use exact equality assertions in all to_description/to_pydantic_field…

59bb38f

… tests Replace all 'assert X in desc' with 'assert desc == expected' for precise matching of description strings across features, constraints, and domain.

jduerholt commented Apr 21, 2026

View reviewed changes

Comment thread tests/bofire/data_models/domain/test_domain.py Outdated

jduerholt added 7 commits April 21, 2026 10:45

change tests, remove file

519606b

remove test

6b7af2b

fix tests

67eb34e

add reasoning traces

15a0be7

add more tests

07e08f4

remove unnecessary tests

dfcb288

Move llm.mapper tests to tests/bofire/llm/

c083d11

Mirrors the kernels/ and priors/ conventions where mapper tests live next to the module they exercise, not in the strategies suite.

jduerholt marked this pull request as ready for review April 21, 2026 12:44

jduerholt requested a review from LukasHebing April 21, 2026 12:44

jduerholt added 2 commits April 21, 2026 23:03

fix bugs

c20a93d

streamline it

53ca84e

jduerholt mentioned this pull request Apr 23, 2026

[FEATURE REQUEST]: Use Field(discriminator="type") on data-model Unions #755

Closed

LukasHebing reviewed Apr 24, 2026

View reviewed changes

Comment thread bofire/data_models/llm/openai_compatible.py Outdated

Comment thread bofire/data_models/strategies/llm.py Outdated

jduerholt added 3 commits April 25, 2026 22:30

Match make() annotation to data model for model_settings

00aabeb

test_make_strategy enforces strict annotation equality between the data model field and the make() classmethod kwarg. Use Optional[Dict[str, Any]] on both sides.

Expand LLMStrategy docstring with usage guidance and thinking example

a195bdc

jduerholt added 2 commits April 27, 2026 13:35

jduerholt requested a review from LukasHebing April 27, 2026 13:18

LukasHebing approved these changes Apr 29, 2026

View reviewed changes

jduerholt added 2 commits April 29, 2026 14:33

Merge branch 'main' of https://github.com/experimental-design/everest …

87f4da8

…into feature/context

refactor register

414aa09

jduerholt merged commit ac8f7bf into main Apr 30, 2026
12 checks passed

Conversation

jduerholt commented Apr 8, 2026

Motivation

Have you read the Contributing Guidelines on pull requests?

Have you updated CHANGELOG.md?

Test Plan

Uh oh!

jduerholt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jduerholt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jduerholt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LukasHebing commented Apr 22, 2026

Uh oh!

jduerholt commented Apr 22, 2026

Uh oh!

jduerholt commented Apr 24, 2026

Uh oh!

LukasHebing left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jduerholt commented Apr 27, 2026

Uh oh!

jduerholt commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Have you updated `CHANGELOG.md`?

jduerholt left a comment •

edited

Loading