Skip to content

Fix master CI: pin DifferentiationInterface < 0.7.17 (NLLS Hessian regression)#933

Closed
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:masterfrom
ChrisRackauckas-Claude:fix-master-mooncake
Closed

Fix master CI: pin DifferentiationInterface < 0.7.17 (NLLS Hessian regression)#933
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:masterfrom
ChrisRackauckas-Claude:fix-master-mooncake

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Summary

Pins DifferentiationInterface compat to < 0.7.17 across the NonlinearSolve package set to fix the NLLS Hessian master CI red (test/forward_ad_tests.jl:128, "NLLS Hessian #445").

Note: the branch is named fix-master-mooncake because the failure was originally triaged as part of a suspected Mooncake regression affecting three SciML repos. The actual root cause turned out to be a DifferentiationInterface regression — Mooncake is not involved in this failure path. Branch name kept for traceability.

Root cause

DifferentiationInterface v0.7.17 (released 2026-04-29) included PR #974 ("fix: make wrong-mode pushforward/pullback return the correct array type"), which rewrote arroftup_to_tupofarr in DifferentiationInterface/src/utils/linalg.jl:

# Before:
arroftup_to_tupofarr(x::AbstractArray{<:NTuple{B}}) where {B} =
    ntuple(b -> getindex.(x, b), Val(B))

# After:
function arroftup_to_tupofarr(tx::AbstractArray{<:NTuple{B, <:Number}},
                              x::AbstractArray{<:Number}) where {B}
    return ntuple(b -> similar(x) .= getindex.(tx, b), Val(B))
end

When nesting ForwardDiff (e.g. ForwardDiff.hessian over a NonlinearLeastSquaresProblem solve, which is what the NLLS Hessian test exercises), the eltype of x is Dual{InnerTag, Float64, N} while tx carries the outer-tagged Duals. similar(x) allocates with the inner-Dual eltype, and the .= getindex.(tx, b) assignment then tries to convert outer Duals into inner Duals, which falls through to Float64(::ForwardDiff.Dual) and fails with MethodError.

This is the exact stacktrace from the failing CI run (run 25311679337):

MethodError: no method matching Float64(::ForwardDiff.Dual{...})
  ...
  @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/IS0Dg/src/utils/linalg.jl:47
  @ ./ntuple.jl:48 [inlined]
  @ ~/.julia/packages/DifferentiationInterface/IS0Dg/src/first_order/pullback.jl:498 [inlined]
  ...
  @ NonlinearSolveBaseForwardDiffExt ~/work/NonlinearSolve.jl/.../NonlinearSolveBaseForwardDiffExt.jl:178

Bisect evidence

Date Commit DI version Status
2026-04-25 0ea19b8 0.7.16 green
2026-04-27 b15bf2b 0.7.17 red
2026-05-04 d637062 0.7.17 red

Between green and red, only a docs commit (b15bf2b) landed in NonlinearSolve. The DI bump from 0.7.16 → 0.7.17 happened in the resolver due to compat allowing the new release.

Fix

Pin DI compat to \"0.6.16, 0.7.3 - 0.7.16\" (NonlinearSolveBase) and \"0.7.3 - 0.7.16\" (others) across:

  • Project.toml (NonlinearSolve)
  • lib/NonlinearSolveBase/Project.toml
  • lib/NonlinearSolveFirstOrder/Project.toml
  • lib/NonlinearSolveHomotopyContinuation/Project.toml
  • lib/SciMLJacobianOperators/Project.toml
  • lib/SimpleNonlinearSolve/Project.toml

Patch versions bumped accordingly.

Follow-up

This is a known-bad-version pin, not a real fix. The upstream regression should be reported to JuliaDiff/DifferentiationInterface.jl with a minimal reproducer; once a 0.7.x release fixes it, the upper bound here should be relaxed.

I have not opened the upstream issue (per CLAUDE.md, JuliaDiff is not a SciML-org repo and requires explicit permission). Recommend asking the DI maintainers (gdalle / adrhill) to either revert the arroftup_to_tupofarr rewrite or make it preserve the actual eltype of tx (e.g. via similar(x, eltype(eltype(tx))) or an ntuple(b -> getindex.(tx, b), Val(B))-style fallback when eltype(tx) !== eltype(x)).

Test plan

  • CI passes on this branch (NLLS Hessian test in particular)
  • Other affected tests (forward_ad_tests.jl, core_tests.jl) remain green
  • Once upstream DI 0.7.18+ ships a fix, relax the upper bound in a follow-up PR

Please ignore until reviewed by @ChrisRackauckas.

Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com

🤖 Generated with Claude Code

…ession)

DifferentiationInterface v0.7.17 introduced a regression in the
ForwardDiff-over-ForwardDiff path used by NLLS Hessian computation
(test/forward_ad_tests.jl:128, "NLLS Hessian SciML#445").

DI PR #974 ("fix: make wrong-mode pushforward/pullback return the correct
array type") rewrote `arroftup_to_tupofarr` in
DifferentiationInterface/src/utils/linalg.jl from:

    arroftup_to_tupofarr(x::AbstractArray{<:NTuple{B}}) where {B} =
        ntuple(b -> getindex.(x, b), Val(B))

to:

    function arroftup_to_tupofarr(tx::AbstractArray{<:NTuple{B, <:Number}},
                                  x::AbstractArray{<:Number}) where {B}
        return ntuple(b -> similar(x) .= getindex.(tx, b), Val(B))
    end

When nesting ForwardDiff (e.g. ForwardDiff.hessian over a NonlinearLeastSquares
solve), the eltype of `x` is `Dual{InnerTag, Float64, N}` while `tx` carries
the outer-tagged Duals. `similar(x)` allocates with the inner-Dual eltype, and
the `.= getindex.(tx, b)` assignment then tries to convert outer Duals into
inner Duals, which triggers `Float64(::ForwardDiff.Dual)` and fails with
MethodError.

This pins DI compat to "0.6.16, 0.7.3 - 0.7.16" across the NonlinearSolve
package set as a short-term fix while the upstream regression is being
reported. Once DI ships a fix (a new 0.7.x), the upper bound should be
relaxed.

Upstream regression: JuliaDiff/DifferentiationInterface.jl PR #974, released
in DifferentiationInterface v0.7.17 (2026-04-29). NonlinearSolve master CI
went from green (0ea19b8, 2026-04-25, DI v0.7.16) to red (b15bf2b, 2026-04-27,
DI v0.7.17) with no NonlinearSolve code changes between, only a docs commit.

Note: the failing tests are NOT Mooncake-related, despite the surface
similarity to other current SciML CI failures. The branch name
`fix-master-mooncake` reflects the original triage hypothesis, not the
actual root cause.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gdalle
Copy link
Copy Markdown
Collaborator

gdalle commented May 11, 2026

@ChrisRackauckas can you try the DI branch from JuliaDiff/DifferentiationInterface.jl#1003 to see if the fix works?

@gdalle
Copy link
Copy Markdown
Collaborator

gdalle commented May 11, 2026

That being said, I'm rather surprised that SciML runs into this function anywhere, since "wrong-mode" pushforwards and pullbacks are only for operators that are not implemented in a given backend. An MWE would help me figure out why this happens, and whether it is a bottleneck in NonlinearSolve or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants