diff --git a/.gitignore b/.gitignore
index f49473c966..d1e57c77eb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,6 +36,10 @@ htmlcov
# Pictures created by backtesting
*.png
+# Picture components for documentation
+docs/scripts/graphics/*.svg
+docs/scripts/graphics/*.png
+
# Folders that are temporarily created when building the documentation
docs/_autosummary
docs/_build
diff --git a/README.md b/README.md
index 2688af7f32..3305740eaa 100644
--- a/README.md
+++ b/README.md
@@ -27,35 +27,112 @@
# BayBE โ A Bayesian Back End for Design of Experiments
-The **Bay**esian **B**ack **E**nd (**BayBE**) is a general-purpose toolbox for Bayesian Design
-of Experiments, focusing on additions that enable real-world experimental campaigns.
+The **Bay**esian **B**ack **E**nd (**BayBE**) helps to find **good parameter configurations**
+within complex parameter search spaces.
-## ๐ Batteries Included
-Besides its core functionality to perform a typical recommend-measure loop, BayBE
-offers a range of โจ**built‑in features**โจ crucial for real-world use cases.
-The following provides a non-comprehensive overview:
-
-- ๐ ๏ธ Custom parameter encodings: Improve your campaign with domain knowledge
-- ๐งช Built-in chemical encodings: Improve your campaign with chemical knowledge
-- ๐ฏ Numerical and binary targets with min, max and match objectives
-- โ๏ธ Multi-target support via Pareto optimization and desirability scalarization
-- ๐ Insights: Easily analyze feature importance and model behavior
-- ๐ญ Hybrid (mixed continuous and discrete) spaces
-- ๐ Transfer learning: Mix data from multiple campaigns and accelerate optimization
-- ๐ฐ Bandit models: Efficiently find the best among many options in noisy environments (e.g. A/B Testing)
-- ๐ข Cardinality constraints: Control the number of active factors in your design
-- ๐ Distributed workflows: Run campaigns asynchronously with pending experiments and partial measurements
-- ๐ Active learning: Perform smart data acquisition campaigns
-- โ๏ธ Custom surrogate models: Enhance your predictions through mechanistic understanding
-- ๐ Comprehensive backtest, simulation and imputation utilities: Benchmark and find your best settings
-- ๐ Fully typed and hypothesis-tested: Robust code base
-- ๐ All objects are fully (de-)serializable: Useful for storing results in databases or use in wrappers like APIs
+
+
+
+
+
+BayBE can help to solve many real-world optimization problems, such as:
+
+- ๐งช Find chemical reaction conditions or process parameters
+- ๐ฅฃ Create materials, chemical mixtures or formulations with desired properties
+- โ๏ธ Optimize the 3D shape of a physical object
+- ๐ฅ๏ธ Optimize a virtual simulation
+- โ๏ธ Select model hyperparameters
+- ๐ซ Find tasty espresso machine settings
+
+This is achieved via **Bayesian Design of Experiments**,
+which helps to efficiently navigate parameter search spaces.
+It balances
+exploitation of parameter space regions known to lead to good outcomes
+and exploration of unknown regions.
+
+BayBE provides a **general-purpose toolbox** for Bayesian Design of Experiments,
+focusing on making this procedure easily accessible for real-world experiments.
+Its utility was already shown in a variety of [real-world experimental campaigns](#citation) in both industry and academia.
+
+## ๐ Batteries Included
+BayBE offers a range of โจ**built‑in features**โจ, including:
+
+
+
+ ๐ ๏ธ Flexible modeling options
+
+
+
+ - Use both continuous and discrete parameters within a single hybrid search space.
+ - Exclude undesired or impossible parameter configurations (e.g., to define a maximal number of mixture components) using constraints.
+ - Choose between different optimization strategies to balance exploration and exploitation of the search space:
+
+
+ - Specify the desired target value via target transformations.
+ - Optimize multiple targets at the same time via Pareto optimization or desirability scalarization.
+
+
+
+
+
+ ๐ Mechanisms for leveraging additional information
+
+
+
+
+
+ ๐ Advanced optimization workflows
+
+
+
+ - Run campaigns asynchronously with partial measurements and pending experiments.
+ - Store BayBE objects and use API wrappers with the serialization functionality.
+
+
+
+
+
+ ๐ Performance evaluation tools
+
+
+
+ - Gain insights about the optimization campaigns by analyzing model behavior and feature importance.
+ - Conduct benchmarks to select between different Bayesian optimization settings via backtesting.
+
+
+
## โก Quick Start
-Let us consider a simple experiment where we control three parameters and want to
-maximize a single target called `Yield`.
+To perform Bayesian Design of Experiments with BayBE,
+you should first specify the **parameter search space** and **objective** to be optimized.
+Based on this information and any **available data** about outcomes of specific parameter configurations,
+BayBE will **recommend the next set of parameter configurations** to be **measured**.
+To inform the next recommendation cycle, the newly generated measurements can be added to BayBE.
+
+
+
+
+
+
+
+From the user perspective, the most important part is the "setup" step (top of the figure).
+
+Below we show a simple optimization procedure, starting with the setup step and subsequently
+performing the recommendation loop.
+The provided example aims to maximize the yield of a chemical reaction by adjusting its parameter configurations
+(also known as reaction conditions).
First, install BayBE into your Python environment:
```bash
@@ -66,7 +143,7 @@ For more information on this step, see our
### Defining the Optimization Objective
-In BayBE's language, the `Yield` can be represented as a `NumericalTarget`,
+In BayBE's language, the reaction yield can be represented as a `NumericalTarget`,
which we wrap into a `SingleTargetObjective`:
```python
@@ -76,9 +153,9 @@ from baybe.objectives import SingleTargetObjective
target = NumericalTarget(name="Yield")
objective = SingleTargetObjective(target=target)
```
-In cases where we are confronted with multiple (potentially conflicting) targets,
-the `ParetoObjective` or `DesirabilityObjective` can be used instead.
-These allow to define additional settings, such as how the targets should be balanced.
+In cases where we are confronted with multiple (potentially conflicting) targets
+(e.g., yield vs selectivity),
+the `ParetoObjective` or `DesirabilityObjective` can be used to define how the targets should be balanced.
For more details, see the
[objectives section](https://emdgroup.github.io/baybe/stable/userguide/objectives.html)
of the user guide.
@@ -86,11 +163,9 @@ of the user guide.
### Defining the Search Space
Next, we inform BayBE about the available "control knobs", that is, the underlying
-system parameters we can tune to optimize our targets. This also involves specifying
-their values/ranges and other parameter-specific details.
-
-For our example, we assume that we can control three parameters โ `Granularity`,
-`Pressure[bar]`, and `Solvent` โ as follows:
+reaction parameters we can tune to optimize the yield.
+In this case we tune granularity, pressure and solvent, each being encoded as a `Parameter`.
+We also need to specify which values individual parameters can take.
```python
from baybe.parameters import (
@@ -147,20 +222,15 @@ and alternative ways of construction.
### Optional: Defining the Optimization Strategy
-As an optional step, we can specify details on how the optimization should be
-conducted. If omitted, BayBE will choose a default setting.
+As an optional step, we can specify details on how the optimization of the experimental configurations should be
+performed. If omitted, BayBE will choose a default Bayesian optimization setting.
For our example, we combine two recommenders via a so-called meta recommender named
`TwoPhaseMetaRecommender`:
1. In cases where no measurements have been made prior to the interaction with BayBE,
- a selection via `initial_recommender` is used.
-2. As soon as the first measurements are available, we switch to `recommender`.
-
-For more details on the different recommenders, their underlying algorithmic
-details, and their configuration settings, see the
-[recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html)
-of the user guide.
+ the parameters will be recommended with the `initial_recommender`.
+2. As soon as the first measurements are available, we switch to the `recommender`.
```python
from baybe.recommenders import (
@@ -175,9 +245,14 @@ recommender = TwoPhaseMetaRecommender(
)
```
+For more details on the different recommenders, their underlying algorithmic
+details and how their settings can be adjusted, see the
+[recommenders section](https://emdgroup.github.io/baybe/stable/userguide/recommenders.html)
+of the user guide.
+
### The Optimization Loop
-We can now construct a campaign object that brings all pieces of the puzzle together:
+We can now construct a `Campaign` that performs the Bayesian optimization of the experimental configurations:
```python
from baybe import Campaign
@@ -185,22 +260,27 @@ from baybe import Campaign
campaign = Campaign(searchspace, objective, recommender)
```
-With this object at hand, we can start our experimentation cycle.
+With this object at hand, we can start our optimization cycle.
In particular:
-* We can ask BayBE to `recommend` new experiments.
-* We can `add_measurements` for certain experimental settings to the campaign's
- database.
+* The campaign can `recommend` new experiments.
+* We can `add_measurements` of target values for the measured parameter configurations
+ to the campaign's database.
Note that these two steps can be performed in any order.
In particular, available measurements can be submitted at any time and also several
times before querying the next recommendations.
```python
-df = campaign.recommend(batch_size=3)
+df = campaign.recommend(
+ batch_size=3
+) # Recommend three experimental configurations to test
print(df)
```
+The below table shows the three parameter configurations for which BayBE recommended to
+measure the reaction yield.
+
```none
Granularity Pressure[bar] Solvent
15 medium 1.0 Solvent D
@@ -208,32 +288,51 @@ print(df)
29 fine 5.0 Solvent B
```
-Note that the specific recommendations will depend on both the data
-already fed to the campaign and the random number generator seed that is used.
+Next, we need to conduct the recommended experiments and record the corresponding `Target` values.
+
+```python
+df["Yield"] = [
+ 79.8,
+ 54.1,
+ 59.4,
+] # Measured yields for the three recommended parameter configurations
+print(df)
+```
+```none
+ Granularity Pressure[bar] Solvent Yield
+15 medium 1.0 Solvent D 79.8
+10 coarse 10.0 Solvent C 54.1
+29 fine 5.0 Solvent B 59.4
+```
-After having conducted the corresponding experiments, we can add our measured
-targets to the table and feed it back to the campaign:
+Now, we can add the newly measured `Target` values to the `Campaign`:
```python
-df["Yield"] = [79.8, 54.1, 59.4]
campaign.add_measurements(df)
```
-With the newly arrived data, BayBE can produce a refined design for the next iteration.
-This loop would typically continue until a desired target value has been achieved in
-the experiment.
+With the newly provided data, BayBE can produce a refined recommendation for the next iteration.
+This loop typically continues until a desired `Target` value is achieved in the experiment.
-### Advanced Example: Chemical Substances
-BayBE has several modules to go beyond traditional approaches. One such example is the
-use of custom encodings for categorical parameters. Chemical encodings for substances
-are a special built-in case of this that comes with BayBE.
+### Inspect the Progress of the Experimental Configuration Optimization
+
+The below plot shows progression of a campaign that optimized direct arylation reaction
+by tuning the solvent, base and ligand
+(from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)).
+Each line shows the best target value that was cumulatively achieved after a given number of experimental iterations.
+
+
+Different lines show outcomes of `Campaigns` with different settings.
-In the following picture you can see
-the outcome for treating the solvent, base and ligand in a direct arylation reaction
-optimization (from [Shields, B.J. et al.](https://doi.org/10.1038/s41586-021-03213-y)) with
-chemical encodings compared to one-hot and a random baseline:

+In particular, the five `Campaigns` differ in how molecules are encoded within
+each chemical `Parameter`. Instead of simply one-hot encoding each SMILES string,
+`SubstanceParameter` can be used to directly compute chemical fingerprints from
+the input SMILES.
+We can see that optimization is more efficient when
+using chemical encodings (e.g., *MORDRED*) rather than encoding categories with *one-hot* encoding. The latter is, in fact, no better than *randomly* suggesting parameter configurations at each experimental iteration.
+
(installation)=
## ๐ป Installation
@@ -264,7 +363,7 @@ pip install git+https://github.com/emdgroup/baybe.git@main
Alternatively, you can install the package from your own local copy.
First, clone the repository, navigate to the repository root folder, check out the
-desired commit, and run:
+desired commit and run:
```bash
pip install .
@@ -312,6 +411,8 @@ The available groups are:
## ๐ก Telemetry
Telemetry was fully and permanently removed in version 0.14.0.
+
+(citation)=
## ๐ Citation
If you find BayBE useful, please consider citing [our paper](https://doi.org/10.1039/D5DD00050E):
diff --git a/docs/scripts/graphics/landscape.py b/docs/scripts/graphics/landscape.py
new file mode 100644
index 0000000000..2ed8526c87
--- /dev/null
+++ b/docs/scripts/graphics/landscape.py
@@ -0,0 +1,90 @@
+"""A 3D plot showing an optimization landscape."""
+
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.colors import LinearSegmentedColormap
+
+N_POINTS = 50 # Reduce for SVG size
+COLORS = ["#1a0033", "#3498db", "#316E91", "#f39c12", "#9e2a1d"]
+"""
+Colors:
+1a0033 โ dark indigo
+3498db โ light blue
+316E91 โ teal blue
+f39c12 โ amber
+9e2a1d โ brick red
+"""
+N_COLOR_BINS = 256
+
+
+def chebfun2(x, y):
+ """Chebfun test function for 2D optimization landscapes."""
+ # https://www.chebfun.org/docs/guide/guide12.html
+ return (
+ 3 * (1 - x) ** 2.0 * np.exp(-(x**2) - (y + 1) ** 2)
+ - 10 * (x / 5 - x**3 - y**5) * np.exp(-(x**2) - y**2)
+ - 1 / 3 * np.exp(-((x + 1) ** 2) - y**2)
+ )
+
+
+# Generate function data
+x = np.linspace(-3, 3, N_POINTS)
+y = np.linspace(-3, 3, N_POINTS)
+X, Y = np.meshgrid(x, y)
+Z = chebfun2(X, Y)
+
+# 3D plot
+fig = plt.figure(figsize=(2.5, 2))
+ax = fig.add_subplot(111, projection="3d")
+fancy_cmap = LinearSegmentedColormap.from_list("fancy_logo", COLORS, N=N_COLOR_BINS)
+surf = ax.plot_surface(
+ X,
+ Y,
+ Z,
+ cmap=fancy_cmap,
+ linewidth=0.0, # Remove edge lines for SVG
+)
+ax.view_init(elev=20, azim=-120)
+ax.set_box_aspect([1, 1, 0.5])
+
+# Option used for figure without axes
+plt.axis("off")
+
+plt.tight_layout(pad=0)
+plt.savefig(
+ "landscape.svg",
+ transparent=True,
+ dpi=150,
+ # Crops the image
+ bbox_inches=fig.subplots_adjust(left=-0.1, right=1.05, top=3, bottom=-2),
+ pad_inches=0,
+)
+
+
+def optimize_svg(svg_file):
+ """Optimize the SVG file."""
+ import shutil
+ import subprocess
+
+ # Check if svgo is available
+ svgo_path = shutil.which("svgo")
+
+ if svgo_path is None:
+ print(
+ "svgo for file size optimization is not installed. "
+ + "Install it with: brew install svgo"
+ )
+ return False
+
+ # Run svgo to optimize the SVG
+ try:
+ subprocess.run([svgo_path, svg_file, "-o", svg_file], check=True)
+ print(f"SVG optimized: {svg_file}")
+ return True
+ except subprocess.CalledProcessError as e:
+ print(f"Error optimizing SVG: {e}")
+ return False
+
+
+# After saving the SVG, optimize it
+optimize_svg("landscape.svg")
diff --git a/docs/userguide/userguide.md b/docs/userguide/userguide.md
index 059f6ca581..e83ffa4411 100644
--- a/docs/userguide/userguide.md
+++ b/docs/userguide/userguide.md
@@ -14,7 +14,7 @@ The most commonly used interface BayBE provides is the central
which suggests new measurements and administers the current state of
your experimental operation. The diagram below explains how the
[`Campaign`](baybe.campaign.Campaign) can be used to perform
-the bayesian optimization loop, how it can be configured and
+the Bayesian optimization loop, how it can be configured and
how the results can be post-analysed.
```{image} ../_static/api_overview_dark.svg