feat(persistent): in-place solver updates (Solver framework + HiGHS/Gurobi/Xpress/Mosek) by FabianHofmann · Pull Request #718 · PyPSA/linopy

FabianHofmann · 2026-05-21T15:05:02Z

Changes proposed in this Pull Request

Adds an opt-in persistent-update framework so a built solver can be re-solved against a mutated Model without a full rebuild.

Core

linopy.persistent: ModelSnapshot, ModelDiff, StructuralKey, VarKind, RebuildReason. ModelDiff stores changes in flat-native arrays (bounds / var-types / RHS / signs / COO coefs / linear objective / sense) plus per-container VarSlice / ConSlice views.
ModelDiff.from_snapshot(snap, model) and ModelDiff.from_models(m1, m2) — snapshot-based and snapshot-free diffs.
_coef_dirty flag on constraints with RHS-setter short-circuit so RHS-only edits skip the coefficient re-walk.

Solver orchestration

Solver gains track_updates, lazy-build (first solve(model, …) builds), apply_update, update, disallow_rebuild, and structured rebuild reasons. Backends without persistent-update support short-circuit to rebuild.
Per-backend apply_update:
- HiGHS — changeColsBounds / changeColsIntegrality / changeRowBounds / changeCoeff / changeColsCost / changeObjectiveSense. Sign change → rebuild.
- Gurobi — setAttr(LB/UB/VType/RHS/Sense/Obj), chgCoeff, ModelSense. In-place sign change.
- Xpress — chgbounds / chgcoltype / chgrhs / chgrowtype / chgmcoef / chgobj / chgobjsense. In-place sign change.
- Mosek — chgvarbound / chgconbound / putvartypelist / putaijlist / putclist / putobjsense. Sign change → rebuild.

Tests

New test_persistent_snapshot_diff.py covering all ModelDiff semantics.
New parametrized test_persistent_apply_update.py running 9 cases × 4 backends (skipped per backend when license/installation is unavailable).
Cross-instance, pickle, and threading coverage in test_persistent_solver_extras.py.

Checklist

Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
Unit tests for new features were added (if applicable).
A note for the release notes doc/release_notes.rst of the upcoming release is included.
I consent to the release of this PR's code under the MIT license.

Tracks per-Constraint coefficient mutation via a single boolean slot, flipped in coeffs/vars/lhs setters. Pure-constant rhs writes now short-circuit and leave coeffs/vars buffers untouched (by identity), so rhs-only updates don't trigger expensive coefficient recompare on the persistent-solver fast path.

Pure-Python snapshot primitives for the persistent-solver Phase 1. Deep-copies value-side fields (var_lb/ub, con_rhs/sign, obj_linear), holds vlabels/clabels by reference, stores canonical CSR (indptr, indices) per constraint container. No Solver import.

Pure-function diff for the persistent-solver Phase 1. Detects structural, coord, sparsity, quadratic-objective, value-only var/con, and objective-linear/sense changes. Supports same_model fast path via _coef_dirty and cross-model full re-scan. Includes a focused test suite covering capture, mutation paths, deep-copy invariant, and the same_model toggle.

- supports_persistent_update class flag (default False) - snapshot/_rebuilds/_in_place_updates/_last_rebuild_reason fields - snapshot capture at end of direct _build, _clear_coef_dirty helper - apply_update stub raising UnsupportedUpdate - solve(model, assign) dispatcher with diff-or-rebuild path - update(model, apply=True) primitive returning ModelDiff - threading.Lock around diff+apply+resnapshot - __getstate__/__setstate__ drop native handle and snapshot

…date support Skip diff computation entirely when supports_persistent_update is False on apply, per plan: 'dispatcher checks flag before calling — if False, skips diffing entirely and goes to rebuild.'

Replace xarray-based snapshot and CSR pattern compare with per-row canonicalised numpy buffers; new ContainerVarUpdate / ContainerRowUpdate payloads. Gurobi/HiGHS apply_update rewritten around batched setAttr / changeColsBounds / changeColsCost / changeColsIntegrality; coefficient writes touch only changed cells. Cross-model diff now ~matches same-model cost for bound/rhs/coef-value sweeps.

compute_diff/Solver.solve/Solver.update grow an ignore_dims kwarg. None (default) keeps the current no-coord-check behaviour; any iterable opts into per-container coord-equality on every dim not in the set, supporting rolling-horizon workflows where e.g. the snapshot dim is expected to drift.

…_rebuild - Solver.from_name now accepts model=None; the first solve(m, ...) builds. - compute_diff folded into ModelDiff.from_snapshot classmethod; new ModelDiff.from_models diffs two linopy models directly. - Solver.solve grows disallow_rebuild=True, which raises RebuildRequiredError instead of falling back to a rebuild.

…m_models - Add `track_updates` flag (default False) to Solver; skip ModelSnapshot capture when disabled. Raise UpdatesDisabledError on solve(model)/update() if a built solver was constructed without tracking. - Rewrite ModelDiff.from_models to build directly from two models without capturing snapshots; share helpers with from_snapshot. - Update persistent tests to opt into track_updates=True; add coverage for the disabled path.

Cross-instance resolves now diff via from_models against the previously built model, with no snapshot. Same-instance mutation still raises UpdatesDisabledError. Snapshot recapture is skipped in this mode. Add cross-instance solve/update tests for the no-snapshot path.

Collapse _diff_objective QUAD_OBJ branches; cache n_coef_updates; short-circuit _canonicalize_rows when rows already sorted; tighten buffer extraction. Introduce VarKind enum used across snapshot/diff and HiGHS/Gurobi apply_update; reuse linopy.constants sign tokens. Move _clear_coef_dirty into ModelSnapshot.capture.

Source con buffers from Constraint.to_matrix_with_rhs, replacing the dense (n_rows, max_n_term) arrays with CSR (indptr, indices, data). Sign dtype adopts 'U1' across the persistent layer and apply_update in HiGHS/Gurobi consumes CSR-slice payloads instead of -1 masks. Deletes _canonicalize_rows and the _INT64_MAX sentinel.

Replace per-container ContainerVarUpdate/ContainerRowUpdate dicts with flat arrays (var_bounds_*, var_type_*, con_coef_* COO, con_rhs_*, con_sign_*) plus VarSlice/ConSlice per-container offsets for diagnostics. Add con_rhs_as_bounds() for ranged-row solvers. Backend apply_update bodies collapse to flat-array calls; remove duplicated label->position resolution.

Implement in-place model updates for Xpress (chgbounds/chgrhs/chgmcoef/ chgrowtype/chgobj/chgobjsense/chgcoltype) and Mosek (chgvarbound/ chgconbound/putaijlist/putclist/putvartypelist/putobjsense). Mosek rejects constraint sign change to trigger rebuild. Consolidate gurobi/highs apply_update tests into a single parametrized file that also covers xpress and mosek.

for more information, see https://pre-commit.ci

FabianHofmann · 2026-05-21T15:22:08Z

@FBumann here we go, if you want to take a first look. docs to come

FBumann · 2026-05-22T06:28:10Z

@FBumann here we go, if you want to take a first look. docs to come

Ill have a look ltoday

* hold solver lock through _run_direct so two threads calling solve(model) on the same Solver no longer race on the native handle (HiGHS returned 0.0 from the second concurrent solve). * narrow Optional ndarrays in persistent.diff.push_var / push_con and in HiGHS/Gurobi/Xpress/Mosek apply_update objective paths. * widen Constraint.rhs setter to ExpressionLike | VariableLike | ConstantLike to match the as_expression call in the body. * widen Constraints.__getitem__(str) return type to Constraint (the dominant case) so tests can set .rhs/.coeffs/.sign without ignores. * add docs for in-place solver updates.

FBumann · 2026-05-22T08:58:18Z

@FabianHofmann Sorry, I wont be able to review this today.

FabianHofmann · 2026-05-22T09:24:55Z

@FabianHofmann Sorry, I wont be able to review this today.

take your time, there is no hurry. I'll do some integration tests anyway

FBumann · 2026-05-22T14:54:21Z

Are there any conflicts with #717? Are we sure we want to merge and publish this before we go v1?

Just making sure...

MaykThewessen · 2026-05-23T16:43:37Z

Benched PR #718 ModelDiff vs hand-rolled changeRowsBounds path on a synthetic 69K-var / 160K-row DC-OPF, 8 weekly-style chunks (RHS + objective mutations only, identical structure across chunks). Per-chunk averages, chunk 0 excluded:

route	t_build	t_diff	t_apply	t_solve	t_total	in_place
blueprint + `changeRowsBounds` (hand-rolled)	0.000	0.001	0.006	0.759	0.766	0
ModelDiff sweep (fresh `Model` per chunk)	0.222	0.012	0.039	12.365	12.652	7
ModelDiff in-place (mutate `model.constraints[name].rhs.values[...]`)	0.003	0.010	0.041	4.919	4.982	7
ModelDiff in-place + manual `solver.solver_model.clearSolver()`	0.003	0.010	0.041	0.572	0.637	7

Diff compute + apply_update sum to ~50 ms — those are not the bottleneck. The regression is entirely in t_solve: after in-place RHS mutation the persistent path keeps HiGHS's prior basis, and HiGHS then skips presolve ("Solving LP with useful basis so presolve not used"). For LPs with strong presolve reduction (89 % row reduction on our production model, similar on this synthetic) the skipped presolve costs more than the warm basis saves.

Manually calling solver.solver_model.clearSolver() between solver.update() and solver.solve() recovers presolve and brings the persistent path slightly under the hand-rolled baseline. But that defeats the encapsulation — the user reaches past the Solver API to touch the native HiGHS handle.

Suggestion: expose an opt-in on the persistent API so users with presolve-heavy LPs can drop the warm basis without bypassing the encapsulation. Either:

a keep_basis: bool = True kwarg on Solver.solve() / Solver.update(), or
a Solver.clear_basis() method on the base class with backend-specific implementations (Highs.clearSolver(), Gurobi reset(), etc.).

Zero structural rebuilds observed across all 7 re-solves in both ModelDiff routes (RebuildReason.NONE). The diff correctly classifies these as RHS + coefficient-only changes — the persistent API is doing its job, the missing piece is just the basis-clear hook.

Caveats:

Synthetic DC-OPF surface (no SoC handoff, no NTC, no PST). Production LP is ~9× larger; presolve reduction and basis-warmstart behavior may differ at scale. Plan to repeat at production scale once the basis-clear hook lands (or we wire clearSolver() through manually).
Linopy pinned to PR feat(persistent): in-place solver updates (Solver framework + HiGHS/Gurobi/Xpress/Mosek) #718 head 089cf2e09. Bench harness is internal; happy to share details on request.

MaykThewessen · 2026-05-24T11:19:28Z

Production-scale follow-up: PSA-MEE Validation_w2025 LP measures 1.14 M cols × 1.97 M rows per weekly chunk (DC-OPF + ramp-penalty aux vars). The persistent-update phase — changeRowsBounds + changeColsCost on the time-varying rows (~15 % of the matrix), the equivalent of solver.update(model, apply=True) in the PR #718 API — completes in 0.16 s at this scale. Linear scaling from the synthetic bench's ~50 ms, so ModelDiff's diff + apply overhead stays negligible at production size.

Solve-phase comparison (B vs SQ vs B+clear) at production scale couldn't be captured this round — the reduced-pipeline harness wasn't LP-feasible past chunk 1, so HiGHS timings came from an infeasibility path and would mislead. Will retry once the harness uses the full PSA-MEE chunk loop with storage + N-1 enabled. The presolve-skip mechanism that drove the synthetic regression (HiGHS logging "Solving LP with useful basis so presolve not used" after warm in-place RHS updates) is shape-independent, so I expect the gap to widen with the production LP (89 % presolve reduction on this model — more lost when skipped) — but not posting that as a result until it's measured.

Bottom line for the PR scope: the solver.update(...) / solver.solve(model) mechanism is the right primitive and the linear scaling holds. The remaining ask is still the opt-in basis-clear hook — without it, every backend that gates presolve on basis presence (HiGHS does, others may) leaves the persistent path slower than solve_problem_from_model on warm-state-hostile LPs.

FBumann · 2026-05-24T14:23:48Z

@MaykThewessen @FabianHofmann
I would argue that we should (for now) not provide methods like clear_basis()(or clear_mip_state(), clear_pdlp_iterate(), clear_scaling() etc.).
Those are properly exposed by the Native solver instances and support and naming differs between solvers.
Users should instead do Model.solver.solver_model.*. This also saves us from differing in default behaviour from the respective solver.

We might be able to add such later. But I'd definitely defer it from this first PR. We should put a note into the docs about users maybe profiting from calling such methods on the solver directly.

One downside: We propose suage of Model.solver.solver_model, which should therefore stay stable.

FBumann · 2026-05-24T14:44:15Z

I read through the code and overall api. I would propose the following api guide lines:

1. Mutation API: `Variable.update()` / `Constraint.update()` — and only that

m.variables["x"].update(lower=new_lb, upper=new_ub)
m.constraints["demand"].update(rhs=new_rhs, sign="<=")

One canonical mutation path. Routes through validation&alignment; flips _coef_dirty in one place.
Drop the existing .lower / .upper / .rhs / .sign / .lhs / .coeffs / .vars setters. The read-side properties stay; mutation via = becomes AttributeError with a pointer to .update(). Easy to add back if users push back — much harder to remove later once two APIs coexist.
The Constraint.rhs.setter magic (rhs=Variable/Expression silently rewriting lhs) becomes an explicit method.
.values[...] stays possible (xarray exposes it) but not supported: snapshot diff stays array-based as a safety net, (needed to compare different models anyway), but no contract that direct mutation propagates.

2. No `clear_basis()` — use `solver.solver_model` directly

solver.apply(model_new)
solver.solver_model.clearSolver()    # backend-specific
solver.solve(assign=True)

Each backend has one obvious method; no shared logic to abstract.
Adding clear_basis() opens a cascade (clear_mip_state, clear_pdlp_iterate, clear_scaling, …) where every method needs per-backend semantics that don't unify cleanly (Mosek IPM has no "basis" in HiGHS's sense).
Linopy's value is build + diff + apply. Native state management is the user's.
Document the escape hatch in the persistent-solver guide with HiGHS + Gurobi recipes.

3. Split `Solver.update(model, apply=...)` into `diff()` / `apply()`

call	reads model	mutates handle	runs solver
`solver.diff(model)`	✓
`solver.apply(model)`	✓	✓
`solver.solve(model=...)`	✓	✓	✓

update(model, apply=True|False) does two structurally different things behind one bool; default apply=True silently mutates.
Name collides with gurobipy.Model.update() (lazy-flush attribute changes) — different semantic.
Split makes the basis-clear flow legible (apply separately from solve).

Good news: No conflicts with the new v1 semantics convention

FBumann · 2026-05-24T15:05:51Z

Besides the v1 semantics, im also working on making add_variables() and the underlying as_dataarray more predictable and stictly follow the passed coords. THis should simplify some validation here i think.
See #722 #725 #726

MaykThewessen · 2026-05-25T10:14:03Z

@FBumann Agreed: drop clear_basis() (and siblings) from this PR.

Native handle via Model.solver.solver_model.* already covers the power-user case, and any wrapper would have to flatten genuinely divergent semantics:

solver	basis/state reset
HiGHS	`h.clearSolver()` (drops basis + dual ray)
Gurobi	`m.reset(0)` solution only, `m.reset(1)` incl. basis
Xpress	`p.postsolve()` / `p.clearmipfilter()` (MIP)
Mosek	`task.putintparam(iparam.bi_clean_optimizer, ...)`
CPLEX	`c.cleanup(epsilon)`
PDLP	iterate is not user-exposed at all

A uniform clear_basis() would either lie about what it cleared (HiGHS clears more than basis; Gurobi reset(1) clears less than HiGHS clearSolver) or force per-solver branching back into linopy. Both worse than solver_model.*.

My benchmark comment's "B+clear" route was exploratory, not a request for the wrapper. Happy to leave basis-management outside this PR and revisit only if a clear cross-solver use case emerges.

Suggested docs note for doc/release_notes.rst (or wherever the persistent-solver section lands):

Resetting solver state. Solver.update() preserves the warm-started basis / MIP state by default. Solvers expose basis-, MIP-, scaling-, and iterate-reset under different names; linopy does not unify them. If a re-solve benefits from discarding internal state, call the native solver directly via Model.solver.solver_model. Examples: HiGHS m.solver.solver_model.clearSolver(); Gurobi m.solver.solver_model.reset(1); Xpress m.solver.solver_model.postsolve(). See each solver's API docs for the full menu.

LGTM to defer the unified API: additive later, non-breaking.

FBumann · 2026-05-25T10:45:56Z

@FabianHofmann first round of my Review:

Architecture

A1. _coef_dirty is a correctness footgun glued to a perf shortcut.
linopy/persistent/diff.py:401-402 enables skip_coef_compare = same_model and not coef_dirty, and from_snapshot(same_model=True) is the default (diff.py:346). So the flag is trusted: if it says nothing changed, the CSR walk is skipped.

Through any supported mutation path — the existing setters at constraints.py:1122,1136,1180, and the new Constraint.update(...) in #727 (which sets the flag from one place; setters become shims) — this is precise: True iff coefs were touched. Through c.coeffs.values[...] = ... (or any in-place numpy mutation) the flag is stale, and same_model=True means the diff silently misses it. Backend keeps stale coefs. Silent wrong result.

Two coherent positions:

A. Keep same_model=True default. .values[...] is documented as unsupported; users who hold up their end get correctness and the optimization. Risk: silent wrong diff on any unsupported mutation.
B. Default same_model=False. The flag becomes an opt-in perf hint. Cost: extra CSR walk per same-model warm update. Benefit: footgun closed regardless of user discipline.

Recommendation: B is the safer default. Cross-model paths already use False; in the same-model path the perf cost is small. Solver-aware callers who promise to use .update() exclusively can opt back into the optimization with same_model=True.

A2. Read-only ModelSnapshot.capture mutates the model.
snapshot.py:159-161 walks every Constraint and resets _coef_dirty=False. capture reads as a pure observation, but it isn't — anyone calling it for inspection (debug, tests) silently changes the model. The dirty-clear belongs in the solver-side bookkeeping (Solver._update_locked after a successful apply), not in the snapshot constructor.

A3. from_snapshot vs from_models is two parallel walkers.
diff.py:341-525 — the two classmethods share ~80% of their body (structural checks, per-container var/con/objective walks, builder finalization). They differ only in whether the baseline buffers come from a stored ModelSnapshot or are freshly _extract_*'d from model_a. Factor out a single _diff(get_var_base, get_con_base, get_obj_base, same_model) core; the two public entry points become 10-line adapters.

A4. Per-backend apply_update reinvents the dispatch frame 4×.
solvers.py:1404, 1891, 2455, 3183 — each backend has the same seven blocks in the same order (var_bounds, var_type (+binary bound re-clamp), con_rhs, con_sign, con_coef, obj_c, obj_sense), each guarded by if diff.<...>.size:. The only thing that varies is the API call. A template method in Solver iterating sections plus per-backend hooks (_apply_var_bounds, _apply_coefs, …) collapses ~600 lines and removes drift risk.

A5. Lock is too coarse.
solvers.py:704 holds self._lock for the entire solve(...) body, including the actual native solve. This serializes solves on the same Solver instance for the full solver runtime. Protecting solver_model/snapshot mutation is correct; holding through _run_direct for minutes is not. Narrow to the build/diff/apply window, then drop the lock around the run.

A6. Snapshot is the wrong unit of caching.
_update_locked re-captures the entire snapshot after every successful in-place update (solvers.py:811). If you're tracking updates so each iteration is fast, paying an O(nnz_total) re-capture per iteration defeats the purpose. Either advance the snapshot incrementally from the applied diff (cheap — the diff already contains the new values + indices), or recompute lazily only when a diff is actually requested.

DRY

D1. from_snapshot ↔ from_models (see A3). The biggest DRY hit in the new code.

D2. _HIGHS_VTYPE_MAP lazy-init pattern (solvers.py:1402, 1469-1477). cls._HIGHS_VTYPE_MAP or self._init_highs_vtype_map() runs on every apply_update. A @functools.cache classmethod returning the map is one method, no class-state mutation, no per-call truthiness check.

D3. Xpress .astype(np.int64, copy=False).tolist() repeated 7× in apply_update (lines 2496-2517). A _to_int_list(arr) helper or pre-converting once would clean it up.

D4. "Set binary integrality and then re-clamp bounds to [0,1]" is hand-rolled in HiGHS (1433-1442), Xpress (2485-2492), MOSEK (3221-3225). Lift to a Solver utility — the rule is solver-independent (it's linopy semantics, not a backend quirk).

D5. _extract_var_buffers / _extract_con_buffers are imported privately from snapshot.py by diff.py (diff.py:12-21). When two modules share private helpers, the helpers belong in a shared internal module, not the one that happened to introduce them.

FBumann · 2026-05-25T11:07:03Z

Test coverage gaps — `RebuildReason` audit

Reason	Covered?
`NONE`	✅
`STRUCTURAL_CONTAINERS`	⚠️ only in `in {LABELS, CONTAINERS}` sets
`STRUCTURAL_LABELS`	⚠️ same; per-container branch (`diff.py:557,606`) unreachable from public API
`COORD_REINDEX`	✅
`SPARSITY`	✅
`QUAD_OBJ` (`diff.py:702`)	❌ no tests
`BACKEND_REJECTED`	✅

Gaps worth fixing:

QUAD_OBJ has zero tests — both directions (linear→quad, quad→linear) are reachable. Without coverage, QP users risk silent miscompare.
Loose in {…} assertions in test_sparsity_change_triggers_rebuild, test_cross_model_sparsity_change_rebuilds, test_cross_model_structural_mismatch_rebuilds — tighten to is so future refactors of check ordering get caught.
disallow_rebuild=True only tested for one reason (STRUCTURAL_CONTAINERS). The contract is "every rebuild trigger raises" — add tests for SPARSITY, COORD_REINDEX, QUAD_OBJ, BACKEND_REJECTED.
from_models negative paths uncovered — only the happy path is tested. Mirror the from_snapshot cases (structural/sparsity/quad/coord) since the buffer-extraction path is independent.
Dead code or missing test at diff.py:557 and :606 — the per-container STRUCTURAL_LABELS returns can't fire after the global label check at :372-377 short-circuits. Either delete (replace with assert) or write the test that justifies them.

MaykThewessen · 2026-05-25T11:20:47Z

@FBumann Strong review. Adding bench-grounded color on A1 / A5 / A6 since my numbers are the ones being cited.

A1: `_coef_dirty` default

Agree with Option B (default same_model=False, flag as opt-in perf hint). One framing nit: the "~50-200 ms at Mayk-scale, small relative to solve time" line under-sells the cost. At the 1.14 M × 1.97 M production size, the apply phase is 0.16 s; a 50-200 ms CSR walk would be 30-125% of apply, not 30-125% of solve. Still small absolute, and still the right call. But if the trade-off is sold as "free", reviewers will be surprised when the apply-phase profile shows the walk dominating once a real solver loop runs hundreds of iterations.

Concrete: I'd take B plus a same_model=True opt-in documented as "caller asserts only .update() / typed setters were used; .values[...] invalidates this guarantee." Solver-aware orchestrators (the use case driving #718) will set it; ad-hoc users get correctness.

A2: `ModelSnapshot.capture` mutating `_coef_dirty`

Agree. Moving the clear into Solver._update_locked after a successful apply is correct on two grounds: (1) snapshot becomes a pure observation; (2) the clear becomes coupled to "the solver actually saw these coefs", which is what the flag is supposed to mean.

A3 / A4 / D-series

No bench-side input. Agree the DRY hits are worth taking. A4 in particular: four parallel 7-block dispatch bodies will drift the moment one backend grows an 8th section.

A5: lock around full solve

Confirmed real at production scale. My weekly-chunk run holds the solve phase for minutes; with the current solvers.py:704 lock width, any concurrent solve() on the same Solver blocks for the full run. Persistent-update orchestrators (the use case here) are exactly the ones likely to want concurrent solves on sibling models or pipelined iterations. Narrowing the lock to build/diff/apply is the right scope.

A6: re-capture after every successful update

This one bites at production size. _update_locked re-capturing the full snapshot at solvers.py:811 is O(nnz_total) per iteration. At 1.14 M × 1.97 M with ~6× nnz/row, that's tens of millions of cells re-read per chunk just to re-establish a baseline the diff already knew how to advance. The diff carries (indices, new_values) for every mutated buffer; applying that delta to the held snapshot is O(|diff|), which is what the persistent path is for. Strong +1 to "advance the snapshot incrementally from the applied diff." Lazy recompute on next diff() is a fine fallback if incremental advance is awkward for some backend.

Test gaps

All five gaps look right. QUAD_OBJ is the one I'd treat as blocker: QP users are exactly the audience for #718 (warm-starting a barrier solve from a previous iterate), and silent miscompare on a quad to linear flip would be hard to diagnose downstream.

The disallow_rebuild=True coverage gap is structural: the contract is "every rebuild trigger raises", but the test only proves one of six does. A parametrized test over RebuildReason enum members closes it in five lines.

Happy to take A6 (incremental snapshot advance) as a follow-up PR after #718 lands if Fabian wants to keep this one focused; the API surface doesn't change, only the body of _update_locked.

MaykThewessen · 2026-05-25T11:22:42Z

Correction to my A1 framing: I overshot.

Microbench of the actual compare path (np.array_equal on labels/indptr/indices + new_buf.data != base_buf.data + the row_value_changed build at diff.py:624-629), numpy 2.4.6, M1, median of 7:

scale	n_rows	nnz	extra cost (`same_model=False` vs `True`)
synth-69k	69 k	0.2 M	0.5 ms
mid-300k	300 k	1.5 M	3.2 ms
prod-1.97M @ 6	1.97 M	11.8 M	34.9 ms
prod-1.97M @ 10	1.97 M	19.7 M	54.6 ms

So at the realistic production size (6 nnz/row), the extra walk is ~35 ms, not the 50-200 ms my last comment implied as the floor. That's ~22 % of a 160 ms apply phase, or ~0.3 % of a 12 s solve. Felix's original "small relative to solve time" framing was right; my "30-125 % of apply" was the worst-case interpretation, not the typical one.

Doesn't change the recommendation (still Option B: default same_model=False, flag as opt-in perf hint). Just wanted the numbers on the record so reviewers don't price the trade-off at 3-5× what it actually costs.

Bench script available on request, or I can drop it in benchmark/ if useful.

Closes the A1 residual from the #718 review. The flag-trust path (`skip_coef_compare = same_model and not coef_dirty`) is correct through Constraint.update() (set in one place, shims forward), but `c.coeffs.values[...] = ...` still bypasses _coef_dirty. With same_model=True as the default, that bypass silently produces wrong diffs. Flip the default to False. Cross-model paths (the only production caller, Solver._update_locked, passes explicitly) are unaffected. Same-model warm-update paths now value-diff the CSR data — small perf hit (50-200ms at Mayk-scale per Mayk's bench), correct by default. Solver-aware callers who own the mutation contract can opt back into the optimization with `same_model=True`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coroa · 2026-05-26T07:15:48Z

Have not reviewed the code yet, just read the comments during some spare time, but my position is always to try to have perf by default:

A. Keep same_model=True default. .values[...] is documented as unsupported; users who hold up their end get correctness and the optimization. Risk: silent wrong diff on any unsupported mutation.
B. Default same_model=False. The flag becomes an opt-in perf hint. Cost: extra CSR walk per same-model warm update. Benefit: footgun closed regardless of user discipline.

So, i'd argue for the footgun A, but put the security mechanism on by flagging the numpy arrays as read only:

a = np.array([1, 2, 3])
a.flags.writeable = False
a[0] = 99  # ValueError: assignment destination is read-only

FBumann · 2026-05-26T07:23:32Z

@coroa I wrote some comments about the user api here: #727

In short:

We could skip the user parameter entirely by storing a weakref to the source model on ModelSnapshot.

Together with your suggestion about the writable flag this would be very safe, convenient and uses the performant branch whenever possible.

FBumann · 2026-05-26T07:26:55Z

@coroa I asked Claude code** about potential gaps of the writeable flag:

Dask — .flags.writeable is numpy-only; dask-backed buffers ignore it. Inconsistent enforcement on mixed-backend models.
.view() escape — v = arr.view(); v.flags.writeable = True; v[...] = 99 re-arms the protection. Stops casual mutation, not deliberate.
Lifecycle — needs an explicit thaw on every rebuild / close / pickle round-trip path. Easy to miss one.
Internal paths — every linopy internal that touches .values[...] during a frozen window breaks (sanitize_zeros, SOS reformulation, …). Needs an audit.
Error UX — ValueError: assignment destination is read-only doesn't explain why or how to fix.
Doesn't cover cross-model — orthogonal to the from_snapshot(snap_from_m1, m2) footgun.

Feels 80% safe, but might produce rough edges in our internals and many places where we have to reassign this flag.

FabianHofmann · 2026-05-26T07:43:35Z

@FBumann great review! I am still wondering what the benefit of an explicit .update function would be over the already supported setters. the latter already exist and from an API perspective supporting both would be a functional duplication. what would .update do differently than funneling the coef dirty logic?

MaykThewessen · 2026-05-26T08:16:17Z

The three proposals (coroa's writable-flag, @FBumann's weakref auto-decide, and the .update() API in #727) compose into one design rather than competing. Sketching:

Unified shape:

ModelSnapshot.capture(model):
    snap.model_ref = weakref.ref(model)
    snap.copy buffers (the existing snapshot.py:67-78 path)
    freeze live model arrays: var.lower.values.flags.writeable = False, etc.

ModelDiff.from_snapshot(snap, model=None):
    # weakref derives same_model automatically; no user-facing param
    same_model = (model is None) or (snap.model_ref() is model)

Constraint.update(rhs=..., sign=..., coef=...):
    with snap.thawed():        # context manager, re-entrant
        ... apply changes ...
        flip _coef_dirty
    # __exit__ re-freezes

Solver._update_locked:
    with snap.thawed():
        apply_update(diff)
        re-capture or advance incrementally

Each proposal lands on a different layer:

layer	proposal	closes
pairing	weakref auto-`same_model`	cross-model footgun (`snap_from_m1, m2`)
mutation channel	writable-flag freeze	A1 (live `.values[...]` mutation)
mutation API	`.update()`	the only path that thaws → mutates → freezes
flag flip	`.update()` chokepoint	`_coef_dirty` correctness

That answers @FabianHofmann's "what does .update() do that setters don't?" — once writable-flag freezes the live arrays after capture, .values[...] is no longer a parallel path; it raises. .update() is the only thing that thaws-mutates-re-freezes, which is what makes it not a functional duplicate of setters. Setters either get deprecated or become thin aliases for .update().

On the gap-list

@FBumann the six gaps you listed are real, but the weights I'd put on them:

Point 4 (internal-path audit) is the dominant cost, not point 1 (dask) or point 2 (view-escape). A quick grep for .values[ + = in linopy/ core: ~30 candidate sites including sanitize_zeros, sos_reformulation, piecewise.py LP padding, every backend apply_update, Constraint._set_rhs. Each needs a with snap.thawed(): wrap or a direct buffer-assignment alternative. That's the actual scope of the change, not a footnote.
Point 2 (view-escape) is mostly cosmetic — requires deliberate arr.view().flags.writeable = True, same level as __slots__ not stopping object.__setattr__. Stops accidental, not malicious. Acceptable.
Point 1 (dask) is genuinely orthogonal — dask buffers don't expose mutable contents anyway, so writable-flag is a no-op there; the model just becomes implicitly safe in that backend without enforcement.
Point 6 (cross-model) is what the weakref closes, so it's already handled by the composed design.

Two gaps not in the list:

Re-entrancy. apply_update may trigger a path that itself wants to thaw (e.g. coef change → SOS reformulation → buffer rewrite). The thaw context has to be a refcount-stacked counter, not a boolean. Easy to get wrong on first cut.
Error UX with location. ValueError: assignment destination is read-only is the bare numpy error. Most lightweight fix: catch ValueError at the public-API boundary and re-raise as LinopyFrozenModelError("Mutate via .update(...) between snapshot capture and apply."). Subclassing ndarray is not recommended by numpy.

Recommended landing order

Weakref auto-same_model first — small, mechanical, removes the user-param confusion immediately. Doesn't depend on freeze.
.update() API as the single supported mutation path (FBumann's Variable.update() / Constraint.update() as canonical mutation API #727). Setters keep working.
Writable-flag freeze, behind an opt-in Solver(track_updates=True, strict_freeze=True) flag. Audit internal paths before flipping the default.
Flip default once the internal-path audit is clean.

Step 3 behind a flag lets you ship 1 + 2 immediately and gather real-world data on which internal paths break under freeze before forcing everyone through them. Mayk-scale benches stay green throughout (the freeze is free at runtime; it's the audit that costs).

I can do the internal-path audit + thaw-context wrap as a follow-up PR if useful — mechanical work that benefits from running real PSA-MEE models against each candidate site.

FBumann · 2026-05-26T09:15:12Z

@FBumann great review! I am still wondering what the benefit of an explicit .update function would be over the already supported setters. the latter already exist and from an API perspective supporting both would be a functional duplication. what would .update do differently than funneling the coef dirty logic?

@FabianHofmann my main reasons are:

update() is a much cleaner api contract:
- It's a single source of truth of what is possible to update (and what isnt).
- It's discoverable: Need to modify an existing model? use *.update() - for both Variables and Constraints. Everything else is documented inside the respective update(), not scattered over different setters, cross referencing.
- It exposes the linopy internals much less. Users don't need to know which fields Variable and especially Constraint have.
- Setting con.rhs = 2 * x needs to leak into lhs, which is very surprising for a setter, but not for a method.
update() is more flexible than scattered setters
- the api can focus on usability: For example, Constraint.update() can mirror the signature of m.add_constraints(), allowing you to pass an entirely new constraint instead of modifying lhs/rhs/sign individually . This usecase would feel really weird with only setters. Which to modify first? cross-mutation? ....
- We can also add additional parameters that describe how to update if needed, which will not be possible with setters.
update() has much better ergonomics:
- Constraint.update() can mirror the signature of m.add_constraints(), allowing to pass full new constraints. Its the same as creating a constraint - Users didnt construct a Constraint with sign, coeffs, etc, but with m.add_variables().
- passing rhs=4, sign="<=" is much better than 2 separate setters. It skips the weird intermediate state
- Validation and conflict resolution: For example, when modifying Variable.upper and Variable.lower, the new upper might conflict with the old lower, but not with the new one. We cannot validate this properly with setters.
update() is DRY-er:
- Setting _coeff_dirty and potentially *.flags.writeable can live in one method instead of scattered over setters.

FBumann · 2026-05-26T09:33:28Z

@FabianHofmann Updated my comment a bit

FabianHofmann · 2026-05-26T10:09:06Z

@FBumann sounds good. I'll have a look at the pr this afternoon (tomorrow I will have more time to work on this in general)

coroa · 2026-05-26T18:30:34Z

I don't understand the intricacies of model snapshots yet, so can't comment on the weakref proposal. mark's comment is close to what i had in mind with a context manager for making internal modifications.

i'd hope the user does not have to specify whether she is passing the same model.

i like the .update proposal, especially for cases where multiple things are updated. I'd keep the getters and allow the user to make in place updates but only when using the context manager which could set a dirty mark.

FabianHofmann and others added 19 commits May 19, 2026 15:19

test(persistent): smoke test Solver orchestrator with fake backend

61b3647

feat(solvers): short-circuit rebuild when backend lacks persistent-up…

ff1ae15

…date support Skip diff computation entirely when supports_persistent_update is False on apply, per plan: 'dispatcher checks flag before calling — if False, skips diffing entirely and goes to rebuild.'

feat(solvers): Gurobi apply_update for persistent solves

f67cec6

feat(solvers): HiGHS apply_update for persistent solves

8026293

test(persistent): cross-model, pickle, threading, failure-path coverage

b6601e3

[pre-commit.ci] auto fixes from pre-commit.com hooks

5c0a369

for more information, see https://pre-commit.ci

FabianHofmann requested review from FBumann and coroa May 21, 2026 15:21

Merge branch 'master' into feat/solver-update

b38f4f3

harden coords comparison

089cf2e

FBumann mentioned this pull request May 24, 2026

Variable.update() / Constraint.update() as canonical mutation API #727

Open

3 tasks

Conversation

FabianHofmann commented May 21, 2026

Changes proposed in this Pull Request

Checklist

Uh oh!

FabianHofmann commented May 21, 2026

Uh oh!

FBumann commented May 22, 2026

Uh oh!

FBumann commented May 22, 2026

Uh oh!

FabianHofmann commented May 22, 2026

Uh oh!

FBumann commented May 22, 2026

Uh oh!

MaykThewessen commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaykThewessen commented May 24, 2026

Uh oh!

FBumann commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBumann commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Mutation API: Variable.update() / Constraint.update() — and only that

2. No clear_basis() — use solver.solver_model directly

3. Split Solver.update(model, apply=...) into diff() / apply()

Uh oh!

FBumann commented May 24, 2026

Uh oh!

MaykThewessen commented May 25, 2026

Uh oh!

FBumann commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture

DRY

Uh oh!

FBumann commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test coverage gaps — RebuildReason audit

Uh oh!

MaykThewessen commented May 25, 2026

A1: _coef_dirty default

A2: ModelSnapshot.capture mutating _coef_dirty

A3 / A4 / D-series

A5: lock around full solve

A6: re-capture after every successful update

Test gaps

Uh oh!

MaykThewessen commented May 25, 2026

Uh oh!

coroa commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBumann commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBumann commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FabianHofmann commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaykThewessen commented May 26, 2026

On the gap-list

Recommended landing order

Uh oh!

FBumann commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBumann commented May 26, 2026

Uh oh!

FabianHofmann commented May 26, 2026

Uh oh!

coroa commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

MaykThewessen commented May 23, 2026 •

edited

Loading

FBumann commented May 24, 2026 •

edited

Loading

FBumann commented May 24, 2026 •

edited

Loading

1. Mutation API: `Variable.update()` / `Constraint.update()` — and only that

2. No `clear_basis()` — use `solver.solver_model` directly

3. Split `Solver.update(model, apply=...)` into `diff()` / `apply()`

FBumann commented May 25, 2026 •

edited

Loading

FBumann commented May 25, 2026 •

edited

Loading

Test coverage gaps — `RebuildReason` audit

A1: `_coef_dirty` default

A2: `ModelSnapshot.capture` mutating `_coef_dirty`

coroa commented May 26, 2026 •

edited

Loading

FBumann commented May 26, 2026 •

edited

Loading

FBumann commented May 26, 2026 •

edited

Loading

FabianHofmann commented May 26, 2026 •

edited

Loading

FBumann commented May 26, 2026 •

edited

Loading