Fix master CI: pin DifferentiationInterface < 0.7.17 (NLLS Hessian regression)#933
Closed
ChrisRackauckas-Claude wants to merge 1 commit into
Closed
Conversation
…ession) DifferentiationInterface v0.7.17 introduced a regression in the ForwardDiff-over-ForwardDiff path used by NLLS Hessian computation (test/forward_ad_tests.jl:128, "NLLS Hessian SciML#445"). DI PR #974 ("fix: make wrong-mode pushforward/pullback return the correct array type") rewrote `arroftup_to_tupofarr` in DifferentiationInterface/src/utils/linalg.jl from: arroftup_to_tupofarr(x::AbstractArray{<:NTuple{B}}) where {B} = ntuple(b -> getindex.(x, b), Val(B)) to: function arroftup_to_tupofarr(tx::AbstractArray{<:NTuple{B, <:Number}}, x::AbstractArray{<:Number}) where {B} return ntuple(b -> similar(x) .= getindex.(tx, b), Val(B)) end When nesting ForwardDiff (e.g. ForwardDiff.hessian over a NonlinearLeastSquares solve), the eltype of `x` is `Dual{InnerTag, Float64, N}` while `tx` carries the outer-tagged Duals. `similar(x)` allocates with the inner-Dual eltype, and the `.= getindex.(tx, b)` assignment then tries to convert outer Duals into inner Duals, which triggers `Float64(::ForwardDiff.Dual)` and fails with MethodError. This pins DI compat to "0.6.16, 0.7.3 - 0.7.16" across the NonlinearSolve package set as a short-term fix while the upstream regression is being reported. Once DI ships a fix (a new 0.7.x), the upper bound should be relaxed. Upstream regression: JuliaDiff/DifferentiationInterface.jl PR #974, released in DifferentiationInterface v0.7.17 (2026-04-29). NonlinearSolve master CI went from green (0ea19b8, 2026-04-25, DI v0.7.16) to red (b15bf2b, 2026-04-27, DI v0.7.17) with no NonlinearSolve code changes between, only a docs commit. Note: the failing tests are NOT Mooncake-related, despite the surface similarity to other current SciML CI failures. The branch name `fix-master-mooncake` reflects the original triage hypothesis, not the actual root cause. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
|
@ChrisRackauckas can you try the DI branch from JuliaDiff/DifferentiationInterface.jl#1003 to see if the fix works? |
Collaborator
|
That being said, I'm rather surprised that SciML runs into this function anywhere, since "wrong-mode" pushforwards and pullbacks are only for operators that are not implemented in a given backend. An MWE would help me figure out why this happens, and whether it is a bottleneck in NonlinearSolve or not |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pins
DifferentiationInterfacecompat to< 0.7.17across the NonlinearSolve package set to fix the NLLS Hessian master CI red (test/forward_ad_tests.jl:128, "NLLS Hessian #445").Note: the branch is named
fix-master-mooncakebecause the failure was originally triaged as part of a suspected Mooncake regression affecting three SciML repos. The actual root cause turned out to be aDifferentiationInterfaceregression — Mooncake is not involved in this failure path. Branch name kept for traceability.Root cause
DifferentiationInterfacev0.7.17 (released 2026-04-29) included PR #974 ("fix: make wrong-mode pushforward/pullback return the correct array type"), which rewrotearroftup_to_tupofarrinDifferentiationInterface/src/utils/linalg.jl:When nesting ForwardDiff (e.g.
ForwardDiff.hessianover aNonlinearLeastSquaresProblemsolve, which is what the NLLS Hessian test exercises), the eltype ofxisDual{InnerTag, Float64, N}whiletxcarries the outer-tagged Duals.similar(x)allocates with the inner-Dual eltype, and the.= getindex.(tx, b)assignment then tries to convert outer Duals into inner Duals, which falls through toFloat64(::ForwardDiff.Dual)and fails withMethodError.This is the exact stacktrace from the failing CI run (run 25311679337):
Bisect evidence
Between green and red, only a docs commit (
b15bf2b) landed in NonlinearSolve. The DI bump from 0.7.16 → 0.7.17 happened in the resolver due to compat allowing the new release.Fix
Pin DI compat to
\"0.6.16, 0.7.3 - 0.7.16\"(NonlinearSolveBase) and\"0.7.3 - 0.7.16\"(others) across:Project.toml(NonlinearSolve)lib/NonlinearSolveBase/Project.tomllib/NonlinearSolveFirstOrder/Project.tomllib/NonlinearSolveHomotopyContinuation/Project.tomllib/SciMLJacobianOperators/Project.tomllib/SimpleNonlinearSolve/Project.tomlPatch versions bumped accordingly.
Follow-up
This is a known-bad-version pin, not a real fix. The upstream regression should be reported to JuliaDiff/DifferentiationInterface.jl with a minimal reproducer; once a 0.7.x release fixes it, the upper bound here should be relaxed.
I have not opened the upstream issue (per CLAUDE.md, JuliaDiff is not a SciML-org repo and requires explicit permission). Recommend asking the DI maintainers (gdalle / adrhill) to either revert the
arroftup_to_tupofarrrewrite or make it preserve the actual eltype oftx(e.g. viasimilar(x, eltype(eltype(tx)))or anntuple(b -> getindex.(tx, b), Val(B))-style fallback wheneltype(tx) !== eltype(x)).Test plan
Please ignore until reviewed by @ChrisRackauckas.
Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com
🤖 Generated with Claude Code