Clone the repository including submodules:
git clone --recurse-submodules https://github.com/Kripner/leantree.gitEnsure Lean is installed via elan.
lake --versionLeanTree currently pins Lean v4.27.0 (see lean-repl/lean-toolchain); other 4.27.x patch versions may work but are untested.
Make sure pip is installed.
Then, run:
make installAlternatively, use Poetry explicitly:
pip install poetry==1.8.4
poetry installThe make install step also runs lake build against the bundled lean-repl/ submodule, producing the REPL executable at lean-repl/.lake/build/bin/repl.
For running tests or experiments, refer to the Development section.
Note: You must create a Lean project to use Lean. See Lean project guide for more information.
Start by creating or loading a Lean project:
from leantree import LeanProject
# Create a fresh project (Mathlib is *not* added unless you ask for it):
project = LeanProject.create("path/to/project", lean_version="v4.27.0", libraries=["mathlib"])
# Or load an existing project:
project = LeanProject("path/to/project")
# Or decide automatically:
project = LeanProject("path/to/project", create=True)If no path is provided to LeanProject.create or the constructor, the project is created in or loaded from the leantree_project subdirectory of the current directory.
libraries accepts either short names (currently only "mathlib" is recognised) or LeanLibrary instances; pin a particular revision with LeanLibrary(name=..., scope=..., git=..., rev=...). When lean_version is set and Mathlib is requested without an explicit revision, LeanTree pins Mathlib to the matching tag.
Using the environment, you can initialize a proof search by supplying a theorem with one or more sorry keywords.
with project.environment() as env:
env.send_command("import Mathlib\nopen BigOperators Real Nat Topology Rat")
branch = env.proof_from_sorry("theorem succ_less_double_succ (n : Nat) : n > 0 → n < 2 * n := by sorry")
zero, succ = branch.apply_tactic("cases n")
print("Factorized proof states after `cases n`:")
print(zero.state)
print(succ.state)
assert not zero.is_solved
...apply_tactic returns one LeanProofBranch per independent sub-goal (computed from the metavariable graph), which is what makes the resulting proof states factorized rather than threaded through a single linear goal stack. Use branch.try_apply_tactic(...) to get a ValueOrError instead of an exception on failure.
By default, search-style tactics (apply?, rw?, exact?, grind?) are rejected because they can close the goal in the REPL while leaving the proof unsound. Pass ban_search_tactics=False to opt out:
branch.apply_tactic("apply?", ban_search_tactics=False)async with project.environment() as env:
await env.send_command_async("import Mathlib\nopen BigOperators Real Nat Topology Rat")
branch = await env.proof_from_sorry_async("theorem succ_less_double_succ (n : Nat) : n > 0 → n < 2 * n := by sorry")
zero, succ = await branch.apply_tactic_async("cases n")
...Most environment and branch methods have an _async counterpart that can be awaited inside an async context manager.
Using the project, you can parse a Lean file and build all proof trees. Then, you can use the environment to start a proof for each tactic block in the file.
from leantree import StoredError
file = project.load_file("Example.lean")
# Pretty-print all proof trees.
for thm in file.theorems:
if isinstance(thm, StoredError):
print(f"Skipping: {thm.error}")
continue
print(thm.load_source() + "\n")
for by_block in thm.by_blocks:
if isinstance(by_block.tree, StoredError):
print(f" tree extraction failed: {by_block.tree}")
continue
print(by_block.tree.pretty_print())
print("-" * 100)
# Start proofs for each tactic block in the file.
for thm, branches in env.file_proofs(file):
if isinstance(branches, Exception):
print(f"Could not start theorem '{thm}' due to exception: {branches}")
continue
for branch in branches:
...Theorem and tactic-block extraction can both fail independently (file parse error vs. tree-builder error on a single by-block); both failure modes surface as StoredError rather than exceptions, so you can iterate large corpora without try/except wrappers. A worked example lives in examples/parse_file.py.
You can easily generate a dataset containing all Lean files in a directory. For production-ready dataset generation, see the Datasets section
import glob
import json
with open("dataset.json", "w") as f:
for path in glob.glob('**/*.lean', recursive=True):
file = project.load_file(path)
f.write(json.dumps(file.serialize()) + "\n")The environment state can be saved to disk and restored later.
with project.environment() as env:
env.send_command("import Mathlib\nopen BigOperators Real Nat Topology Rat")
env.pickle("env.pkl")
with project.environment() as env:
env.unpickle("env.pkl")
branch = env.proof_from_sorry("theorem succ_less_double_succ (n : Nat) : n > 0 → n < 2 * n := by sorry")
zero, succ = branch.apply_tactic("cases n")
...For finer-grained control inside a single environment, use env.checkpoint() and env.rollback_to(checkpoint) instead of pickling.
Assuming you have already created a Lean project in leantree_project, you can recreate the whole Mathlib dataset by running:
python dataset/tree_dataset.py generate \
--project_path leantree_project \
--source_files mathlib/Mathlib \
--output_dir leantree_generated \
--num_workers 16--num_workers defaults to 1 (sequential); raising it spins up a work-stealing pool of that many worker processes and shows a live tqdm progress bar with running totals of extracted theorems, blocks, and failures. Per-file processing is bounded by --file_timeout (default 1800 seconds); files that exceed it are recorded in the .errors sidecar.
For reference, regenerating against current Mathlib (v4.27.0, May 2026) yields roughly:
| files processed | theorems extracted | tactic blocks with proof trees | total proof-tree nodes |
|---|---|---|---|
| 7,516 | 92,163 (95.2%) | 92,903 (90.4%) | 287,132 |
The remaining ~5% of theorems and ~10% of by-blocks land in the .errors sidecar, typically because of REPL/tree-builder edge cases on exotic tactics.
For multi-node setups (e.g. SLURM array jobs), each node processes a disjoint shard
via --total_workers / --worker_id. Each shard writes its own JSONL file
(leantree-sf=<source>-<worker_id>.jsonl) and .errors file. Combine --num_workers
within each shard to use multiple local processes.
python dataset/tree_dataset.py generate \
--project_path leantree_project \
--source_files mathlib/Mathlib \
--output_dir leantree_generated \
--total_workers $SLURM_ARRAY_TASK_COUNT \
--worker_id $SLURM_ARRAY_TASK_ID \
--num_workers 16The last worker (worker_id == total_workers - 1) picks up any remainder files
when len(files) isn't divisible by total_workers, so no files are dropped.
After all shards finish, combine them into a single JSONL file:
python dataset/tree_dataset.py merge_shards leantree_generated \
--output_dir leantree_merged \
--shards_count $SLURM_ARRAY_TASK_COUNTThis produces leantree_merged/leantree-sf=<source>.jsonl by concatenating all shard
files in worker_id order. It errors out if any shard file is missing, so you're
guaranteed a complete dataset on success. Use --force to overwrite an existing
output file.
Note: .errors files are not merged; concatenate them manually if needed
(cat leantree_generated/*.errors > leantree_merged/all.errors).
The CLI also exposes view_trees, view_stats, error_stats, show_errors, and deepseek_convert subcommands for inspecting and converting generated datasets; run with --help for flags.
For long-running services that need to amortize REPL startup across many proof attempts, LeanTree ships a standalone HTTP server that wraps a recycling process pool. It is installed as the leanserver console script:
leanserver \
--address localhost --port 8000 \
--max-processes 8 \
--imports Mathlib \
--warmup --warmup-batch-size 32 --warmup-timeout 600 \
--rss-hard-limit-gib 32 --pss-recycle-limit-gib 4Key options:
--repl-exe/LEAN_REPL_EXE— path to thereplbinary (defaults to the bundledlean-repl/.lake/build/bin/repl).--project-path/LEAN_PROJECT_PATH— Lean project to run inside (defaults to./leantree_project).--imports— packages toimportduring warmup (e.g.Mathlib); skipped without--warmup.--warmup/--warmup-batch-size/--warmup-timeout— pre-start every worker with the given imports applied, in batches, before accepting requests.--rss-hard-limit-gib—RLIMIT_ASceiling per subprocess (default32);<= 0disables.--pss-recycle-limit-gib— recycle a worker on return whose PSS exceeds this (default4);<= 0disables.
Once running, kill -USR1 <pid> dumps Python tracebacks for every thread, kill -USR2 <pid> (or pressing Enter in the foreground terminal) prints pool status.
Install all development and experiments dependencies:
make install-devOther Make targets: make build (poetry build), make clean (remove build artifacts).
To run tests, first create a Lean v4.27.0 project called leantree_project in the LeanTree directory.
You can use LeanTree for that - in the leantree directory, run the following Python code:
from leantree import LeanProject
project = LeanProject.create(lean_version="v4.27.0", libraries=["mathlib"])After the project is created, run:
make testThe test suite covers the REPL adapter, process pool invariants, async timeout handling, file-span utilities, and the Lean server, in addition to the data-extraction pipeline.
When working with a Lean environment, you can use env.take_control() to debug the underlying Lean REPL.
This method connects your stdin/stdout to the REPL's stdin/stdout.
For benchmarking, scripts/bench_warmup.py measures end-to-end pool warmup time and scripts/stress_test_server.py exercises the leanserver under load.
For a detailed comparison, refer to the LeanTree paper.
@inproceedings{
kripner2025leantree,
title={LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4},
author={Mat{\v{e}}j Kripner and Michal Sustr and Milan Straka},
booktitle={2nd AI for Math Workshop @ ICML 2025},
year={2025},
url={https://openreview.net/forum?id=pROVJwTb6w}
}