LLM Strategy#749
Conversation
…mization Enable users to attach free-text context to features, constraints, and the domain itself, so that LLM agents can leverage this information to better understand the optimization problem. Also documents the data model testing infrastructure and registration patterns in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add methods for LLM prompt building directly on the data model classes: - Input.to_pydantic_field(): returns (type, FieldInfo) for dynamic Pydantic models with ge/le bounds, Literal enums, and context in descriptions - Inputs.to_pydantic_model(): assembles all input features into a dynamic Pydantic model via create_model() - Constraint.to_description(): human-readable math on all constraint subclasses - Objective.to_description(): human-readable on all objective subclasses - ContinuousOutput/CategoricalOutput.to_description(): combines objective + context - Domain.to_description(): assembles objectives, constraints, and problem context
Add standalone LLM provider data models and a mapper to pydantic-ai: - Data models: AnthropicLLMProvider, AnthropicFoundryLLMProvider, OpenAILLMProvider, OpenAICompatibleLLMProvider - API keys referenced via env var names (safe for serialization) - Mapper in bofire/llm/ converts data models to pydantic-ai Model instances - Optional [llm] dependency group in pyproject.toml - Test specs + serialization/deserialization tests
jduerholt
left a comment
There was a problem hiding this comment.
Looks good for the start, but there are still issues.
- Make Input.to_pydantic_field() and Output.to_description() abstract - Make Objective.to_description() abstract - Add docstrings with examples to all to_description/to_pydantic_field methods - Add to_pydantic_field overrides for molecular features (SMILES context) and descriptor features (descriptor mappings) - Handle allow_zero in ContinuousInput.to_pydantic_field (widen ge to 0) - Remove stepsize from ContinuousInput field description - Simplify Maximize/Minimize to_description to just "Maximize"/"Minimize" - Replace all non-essential objective to_description with NotImplementedError - Remove Field(description=...) from context fields (docstrings suffice) - Simplify [llm] dependency to just pydantic-ai, add to [all] group
jduerholt
left a comment
There was a problem hiding this comment.
Still some issues
- InterpointEqualityConstraint, NonlinearEquality/Inequality: NotImplementedError - CategoricalOutput.to_description(): NotImplementedError - Inputs.to_pydantic_model(): use top-level create_model import - Use get_allowed_categories() instead of manual list comprehension in CategoricalInput, CategoricalMolecularInput, CategoricalDescriptorInput
New strategy that uses pydantic-ai to have an LLM propose optimization candidates via structured output: Data model (bofire/data_models/strategies/llm.py): - Single-objective only (Maximize/Minimize), linear + NChooseK constraints - Configurable: temperature, max_tokens, thinking (reasoning effort) - Experiment presentation: n_recent_experiments, n_top_experiments - Custom system prompt support Functional implementation (bofire/strategies/llm.py): - Builds dynamic Pydantic output model from Domain via Inputs.to_pydantic_model() - Injects Domain.to_description() as system prompt context - Output validator runs domain.validate_candidates(), pydantic-ai retries on failure - Experiment selection: union of recent + top-performing, deduplicated - Lazy mapper registration to avoid circular imports
- Remove top_metric_key (obsolete for single-objective) - Remove is_objective_implemented (not called from base Strategy) - Use Outputs.get_by_objective() in single-objective validator - Move pydantic-ai model + output schema building to __init__ - Make validation errors verbose: show failed candidates + instructions - Label experiment sections for the LLM (recent vs top-k vs both)
- Add LLMStrategy.make() classmethod following existing strategy pattern - Add 31 tests for to_pydantic_field(), to_description(), to_pydantic_model(), context fields, and their roundtrip serialization
Distribute to_pydantic_field(), to_description(), to_pydantic_model(), and context tests into the specific test modules for each class: - test_continuous.py: ContinuousInput, ContinuousOutput - test_categorical.py: CategoricalInput - test_discrete.py: DiscreteInput - test_descriptor.py: CategoricalDescriptorInput, ContinuousDescriptorInput - test_molecular.py: CategoricalMolecularInput, ContinuousMolecularInput - test_constraints.py: Linear, NChooseK, Product constraints - test_domain.py: Domain.to_description, context roundtrip - test_inputs.py: Inputs.to_pydantic_model Remove standalone test_llm_methods.py.
- Linear constraint tests → tests/bofire/data_models/constraints/test_linear.py - NChooseK + Product tests → tests/bofire/data_models/constraints/test_to_description.py - Remove from tests/bofire/data_models/domain/test_constraints.py
Switch from pandas to_string() to JSON list-of-dicts format for both experiment history and validation error messages. This matches the structured output format (JSON in, JSON out) and is more token-efficient.
- LLM mapper: ValueError instead of KeyError, lists supported types, use LLMProvider base type, remove duplicate union, add module docstring - Lazy registration: warnings.warn on ImportError instead of silent pass, ValueError with install hint if strategy type still not found - Fix missing type hint on thinking parameter in LLMStrategy.make() - Make test domain objective explicit (MaximizeObjective) - Add module docstring to bofire/llm/api.py - Add 13 tests for _select_experiments, _build_proposal_model, _resolve_env_var, mapper errors, LLMStrategy validation
- Set output_retries=3 on pydantic-ai Agent for constraint validation retries - Set name="LLMStrategy" on Agent for observability - Reorder CategoricalDescriptorInput: fields before methods (style)
Add output_retries: PositiveInt = 3 to the data model, forwarded to pydantic-ai Agent's output_retries parameter.
… tests Replace all 'assert X in desc' with 'assert desc == expected' for precise matching of description strings across features, constraints, and domain.
Mirrors the kernels/ and priors/ conventions where mapper tests live next to the module they exercise, not in the strategies suite.
|
Wow, great that you implemented the strategy! We were also looking into this already :) |
Thanks, it was super neat to implement it in bofire, as we are using so much pydantic already. I have whole stack of agentic ideas that we could implement in the future. It would be really nice, if you could review this, and also give feedback based on your experience. Would be nice to be able to land it soon ;) |
The testing_optimization_only CI job installs without the [llm] extra and then runs tests/bofire/strategies, where test_llm.py imports pydantic_ai inside one of its tests. Add a module-level pytestmark mirroring the importlib.util.find_spec pattern already used for rdkit/sympy/torch.
|
@LukasHebing : Last test is now also succeeding. |
LukasHebing
left a comment
There was a problem hiding this comment.
I really like the PR. This is a great entry-point for LLM-based optimizations, with a well chosen simple, but powerful approach.
Implementation-wise, I only have one major point: The LLM-provider landscape, and model parameter settings seem to fine-grained for bofire, as this is a very vast and fastly envolving field and any change on models, and we need to keep track of all of these.
I would recommend to check, whether we can rely on other open-source high-level LLM model interfaces. I personally used langchain, and I think this is somehow a strong open-source standard. The functions we need (structured output with pydantic models, and serializability are supported). However, I would also support good alternatives.
One hint, that we support LLM-based optimizations should also be present in the README.md, best with a short example notebook.
The three hand-picked fields were an arbitrary subset of pydantic-ai's ModelSettings — missing top_p, seed, provider-specific keys, etc. — and the per-field validation (temperature in [0, 2]) was provider-specific masquerading as generic. Pass the dict through unchanged: pydantic-ai and the underlying SDK own the settings surface and produce accurate errors for invalid keys.
test_make_strategy enforces strict annotation equality between the data model field and the make() classmethod kwarg. Use Optional[Dict[str, Any]] on both sides.
|
Hi @LukasHebing, thanks for the review and the feedback. The strategy already uses a high-level open-source LLM framework namely pydantic-ai. This is a package from the developers of pydantic with the scope of developing a pythonic interface for creating agents that produces structured outputs. As the PR heavily relied on pydantic-ai, I am very hesitant to switch it for a different framework, especially as pydanitc-ai is from my opinion better suited for the task that we want to solve here. Pydantic-ai offers adapters to large range of different LLMs through different providers, only problem is here that these classes are for some library philosophical reasons not itself pydantic classes. This leads to the map functionality here. Note, that most models are also supported just based on a string, but the problem comes with third party providers as Azure Foundry, Bedrock etc. So we need our own serialization for the providers, and I do not see really a way around here. I agree with you that we should not maintain all the model specific settings, this is unmaintanable (and I changed it) but I hope that the providers are more stable ..., of for you if we try it this way? I will also add a note in the README, docs for the agentic stuff ;) Best, Johannes |
Adds a lazy `agent` property that builds the Agent, output schema, and decorators once on first access. Per-call inputs (current experiments, domain) flow in through `_LLMDeps` on each `agent.run()`, matching pydantic-ai's intended design and BoFire's "build once, execute many" philosophy. Drops display of valid_* flag columns and the unused `n_candidates` field on `_LLMDeps`.
Adds the new LLM-driven molecular optimization tutorial (with `execute: eval: false` so CI does not call the provider), a Strategies user-guide section, README bullet, and API reference entries for the data-model and functional `LLMStrategy`.
|
Hi @LukasHebing, I also added examples and doc now. So it is ready for re-review ;) |
…into feature/context
Motivation
This PR implements a strategy making use of an LLM powered Agent to generate candidates. There is growing evidence that this can be helpful for warmstarting etc. if an optimization problem has specific context that can be exploited using the world knowledge of an LLM.
Relevant papers:
But there is also clear evidence that agents are not replacing BO, they seem to be especially useful for warmstarting, where I also see the greatest value:
Have you read the Contributing Guidelines on pull requests?
Yes.
Have you updated
CHANGELOG.md?Not yet.
Test Plan
Unit tests.