Skip to content

LLM Strategy#749

Merged
jduerholt merged 35 commits into
mainfrom
feature/context
Apr 30, 2026
Merged

LLM Strategy#749
jduerholt merged 35 commits into
mainfrom
feature/context

Conversation

@jduerholt
Copy link
Copy Markdown
Contributor

Motivation

This PR implements a strategy making use of an LLM powered Agent to generate candidates. There is growing evidence that this can be helpful for warmstarting etc. if an optimization problem has specific context that can be exploited using the world knowledge of an LLM.

Relevant papers:

But there is also clear evidence that agents are not replacing BO, they seem to be especially useful for warmstarting, where I also see the greatest value:

Have you read the Contributing Guidelines on pull requests?

Yes.

Have you updated CHANGELOG.md?

Not yet.

Test Plan

Unit tests.

jduerholt and others added 3 commits April 7, 2026 13:04
…mization

Enable users to attach free-text context to features, constraints, and the
domain itself, so that LLM agents can leverage this information to better
understand the optimization problem. Also documents the data model testing
infrastructure and registration patterns in CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add methods for LLM prompt building directly on the data model classes:
- Input.to_pydantic_field(): returns (type, FieldInfo) for dynamic Pydantic
  models with ge/le bounds, Literal enums, and context in descriptions
- Inputs.to_pydantic_model(): assembles all input features into a dynamic
  Pydantic model via create_model()
- Constraint.to_description(): human-readable math on all constraint subclasses
- Objective.to_description(): human-readable on all objective subclasses
- ContinuousOutput/CategoricalOutput.to_description(): combines objective + context
- Domain.to_description(): assembles objectives, constraints, and problem context
Add standalone LLM provider data models and a mapper to pydantic-ai:
- Data models: AnthropicLLMProvider, AnthropicFoundryLLMProvider,
  OpenAILLMProvider, OpenAICompatibleLLMProvider
- API keys referenced via env var names (safe for serialization)
- Mapper in bofire/llm/ converts data models to pydantic-ai Model instances
- Optional [llm] dependency group in pyproject.toml
- Test specs + serialization/deserialization tests
Copy link
Copy Markdown
Contributor Author

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the start, but there are still issues.

Comment thread bofire/data_models/constraints/categorical.py
Comment thread bofire/data_models/features/categorical.py
Comment thread bofire/data_models/domain/domain.py Outdated
Comment thread bofire/data_models/domain/domain.py Outdated
Comment thread bofire/data_models/features/continuous.py Outdated
Comment thread bofire/data_models/objectives/identity.py
Comment thread bofire/data_models/objectives/objective.py
Comment thread bofire/data_models/objectives/sigmoid.py
Comment thread bofire/data_models/objectives/target.py
Comment thread pyproject.toml Outdated
- Make Input.to_pydantic_field() and Output.to_description() abstract
- Make Objective.to_description() abstract
- Add docstrings with examples to all to_description/to_pydantic_field methods
- Add to_pydantic_field overrides for molecular features (SMILES context)
  and descriptor features (descriptor mappings)
- Handle allow_zero in ContinuousInput.to_pydantic_field (widen ge to 0)
- Remove stepsize from ContinuousInput field description
- Simplify Maximize/Minimize to_description to just "Maximize"/"Minimize"
- Replace all non-essential objective to_description with NotImplementedError
- Remove Field(description=...) from context fields (docstrings suffice)
- Simplify [llm] dependency to just pydantic-ai, add to [all] group
Copy link
Copy Markdown
Contributor Author

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some issues

Comment thread bofire/data_models/constraints/interpoint.py
Comment thread bofire/data_models/constraints/nonlinear.py
Comment thread bofire/data_models/constraints/nonlinear.py
Comment thread bofire/data_models/domain/features.py Outdated
Comment thread bofire/data_models/features/categorical.py
Comment thread bofire/data_models/features/descriptor.py Outdated
Comment thread bofire/data_models/features/molecular.py Outdated
- InterpointEqualityConstraint, NonlinearEquality/Inequality: NotImplementedError
- CategoricalOutput.to_description(): NotImplementedError
- Inputs.to_pydantic_model(): use top-level create_model import
- Use get_allowed_categories() instead of manual list comprehension
  in CategoricalInput, CategoricalMolecularInput, CategoricalDescriptorInput
New strategy that uses pydantic-ai to have an LLM propose optimization
candidates via structured output:

Data model (bofire/data_models/strategies/llm.py):
- Single-objective only (Maximize/Minimize), linear + NChooseK constraints
- Configurable: temperature, max_tokens, thinking (reasoning effort)
- Experiment presentation: n_recent_experiments, n_top_experiments
- Custom system prompt support

Functional implementation (bofire/strategies/llm.py):
- Builds dynamic Pydantic output model from Domain via Inputs.to_pydantic_model()
- Injects Domain.to_description() as system prompt context
- Output validator runs domain.validate_candidates(), pydantic-ai retries on failure
- Experiment selection: union of recent + top-performing, deduplicated
- Lazy mapper registration to avoid circular imports
Copy link
Copy Markdown
Contributor Author

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I let you comments, I also saw that tests regarding all the context, field generation etc. methods in the domain, features etc. were missing. Please add them. Also the make method for the LLM strategy is missing.

Comment thread bofire/data_models/strategies/llm.py Outdated
Comment thread bofire/data_models/strategies/llm.py Outdated
Comment thread bofire/data_models/strategies/llm.py Outdated
Comment thread bofire/data_models/strategies/llm.py Outdated
Comment thread bofire/strategies/llm.py Outdated
Comment thread bofire/strategies/llm.py Outdated
Comment thread bofire/strategies/llm.py Outdated
Comment thread bofire/strategies/llm.py Outdated
Comment thread bofire/strategies/mapper.py Outdated
Comment thread bofire/strategies/mapper_actual.py Outdated
jduerholt added 11 commits April 8, 2026 16:47
- Remove top_metric_key (obsolete for single-objective)
- Remove is_objective_implemented (not called from base Strategy)
- Use Outputs.get_by_objective() in single-objective validator
- Move pydantic-ai model + output schema building to __init__
- Make validation errors verbose: show failed candidates + instructions
- Label experiment sections for the LLM (recent vs top-k vs both)
- Add LLMStrategy.make() classmethod following existing strategy pattern
- Add 31 tests for to_pydantic_field(), to_description(), to_pydantic_model(),
  context fields, and their roundtrip serialization
Distribute to_pydantic_field(), to_description(), to_pydantic_model(), and
context tests into the specific test modules for each class:
- test_continuous.py: ContinuousInput, ContinuousOutput
- test_categorical.py: CategoricalInput
- test_discrete.py: DiscreteInput
- test_descriptor.py: CategoricalDescriptorInput, ContinuousDescriptorInput
- test_molecular.py: CategoricalMolecularInput, ContinuousMolecularInput
- test_constraints.py: Linear, NChooseK, Product constraints
- test_domain.py: Domain.to_description, context roundtrip
- test_inputs.py: Inputs.to_pydantic_model

Remove standalone test_llm_methods.py.
- Linear constraint tests → tests/bofire/data_models/constraints/test_linear.py
- NChooseK + Product tests → tests/bofire/data_models/constraints/test_to_description.py
- Remove from tests/bofire/data_models/domain/test_constraints.py
Switch from pandas to_string() to JSON list-of-dicts format for both
experiment history and validation error messages. This matches the
structured output format (JSON in, JSON out) and is more token-efficient.
- LLM mapper: ValueError instead of KeyError, lists supported types,
  use LLMProvider base type, remove duplicate union, add module docstring
- Lazy registration: warnings.warn on ImportError instead of silent pass,
  ValueError with install hint if strategy type still not found
- Fix missing type hint on thinking parameter in LLMStrategy.make()
- Make test domain objective explicit (MaximizeObjective)
- Add module docstring to bofire/llm/api.py
- Add 13 tests for _select_experiments, _build_proposal_model,
  _resolve_env_var, mapper errors, LLMStrategy validation
- Set output_retries=3 on pydantic-ai Agent for constraint validation retries
- Set name="LLMStrategy" on Agent for observability
- Reorder CategoricalDescriptorInput: fields before methods (style)
Add output_retries: PositiveInt = 3 to the data model, forwarded to
pydantic-ai Agent's output_retries parameter.
… tests

Replace all 'assert X in desc' with 'assert desc == expected' for precise
matching of description strings across features, constraints, and domain.
Comment thread tests/bofire/data_models/domain/test_domain.py Outdated
Mirrors the kernels/ and priors/ conventions where mapper tests live next
to the module they exercise, not in the strategies suite.
@jduerholt jduerholt marked this pull request as ready for review April 21, 2026 12:44
@jduerholt jduerholt requested a review from LukasHebing April 21, 2026 12:44
@LukasHebing
Copy link
Copy Markdown
Contributor

Wow, great that you implemented the strategy! We were also looking into this already :)
I can have a look at this in the next days!

@jduerholt
Copy link
Copy Markdown
Contributor Author

Wow, great that you implemented the strategy! We were also looking into this already :) I can have a look at this in the next days!

Thanks, it was super neat to implement it in bofire, as we are using so much pydantic already. I have whole stack of agentic ideas that we could implement in the future. It would be really nice, if you could review this, and also give feedback based on your experience. Would be nice to be able to land it soon ;)

The testing_optimization_only CI job installs without the [llm] extra and
then runs tests/bofire/strategies, where test_llm.py imports pydantic_ai
inside one of its tests. Add a module-level pytestmark mirroring the
importlib.util.find_spec pattern already used for rdkit/sympy/torch.
@jduerholt
Copy link
Copy Markdown
Contributor Author

@LukasHebing : Last test is now also succeeding.

Copy link
Copy Markdown
Contributor

@LukasHebing LukasHebing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the PR. This is a great entry-point for LLM-based optimizations, with a well chosen simple, but powerful approach.
Implementation-wise, I only have one major point: The LLM-provider landscape, and model parameter settings seem to fine-grained for bofire, as this is a very vast and fastly envolving field and any change on models, and we need to keep track of all of these.
I would recommend to check, whether we can rely on other open-source high-level LLM model interfaces. I personally used langchain, and I think this is somehow a strong open-source standard. The functions we need (structured output with pydantic models, and serializability are supported). However, I would also support good alternatives.

One hint, that we support LLM-based optimizations should also be present in the README.md, best with a short example notebook.

Comment thread bofire/data_models/llm/openai_compatible.py Outdated
Comment thread bofire/data_models/strategies/llm.py Outdated
The three hand-picked fields were an arbitrary subset of pydantic-ai's
ModelSettings — missing top_p, seed, provider-specific keys, etc. — and
the per-field validation (temperature in [0, 2]) was provider-specific
masquerading as generic. Pass the dict through unchanged: pydantic-ai
and the underlying SDK own the settings surface and produce accurate
errors for invalid keys.
test_make_strategy enforces strict annotation equality between the data
model field and the make() classmethod kwarg. Use Optional[Dict[str, Any]]
on both sides.
@jduerholt
Copy link
Copy Markdown
Contributor Author

Hi @LukasHebing,

thanks for the review and the feedback.

The strategy already uses a high-level open-source LLM framework namely pydantic-ai. This is a package from the developers of pydantic with the scope of developing a pythonic interface for creating agents that produces structured outputs. As the PR heavily relied on pydantic-ai, I am very hesitant to switch it for a different framework, especially as pydanitc-ai is from my opinion better suited for the task that we want to solve here.

Pydantic-ai offers adapters to large range of different LLMs through different providers, only problem is here that these classes are for some library philosophical reasons not itself pydantic classes. This leads to the map functionality here. Note, that most models are also supported just based on a string, but the problem comes with third party providers as Azure Foundry, Bedrock etc. So we need our own serialization for the providers, and I do not see really a way around here.

I agree with you that we should not maintain all the model specific settings, this is unmaintanable (and I changed it) but I hope that the providers are more stable ..., of for you if we try it this way?

I will also add a note in the README, docs for the agentic stuff ;)

Best,

Johannes

Adds a lazy `agent` property that builds the Agent, output schema, and
decorators once on first access. Per-call inputs (current experiments,
domain) flow in through `_LLMDeps` on each `agent.run()`, matching
pydantic-ai's intended design and BoFire's "build once, execute many"
philosophy. Drops display of valid_* flag columns and the unused
`n_candidates` field on `_LLMDeps`.
Adds the new LLM-driven molecular optimization tutorial (with
`execute: eval: false` so CI does not call the provider), a Strategies
user-guide section, README bullet, and API reference entries for the
data-model and functional `LLMStrategy`.
@jduerholt
Copy link
Copy Markdown
Contributor Author

Hi @LukasHebing, I also added examples and doc now. So it is ready for re-review ;)

@jduerholt jduerholt requested a review from LukasHebing April 27, 2026 13:18
@jduerholt jduerholt merged commit ac8f7bf into main Apr 30, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants