Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 89 additions & 7 deletions source/api_cc/src/DeepSpinPTExpt.cc
Original file line number Diff line number Diff line change
Expand Up @@ -371,12 +371,43 @@ void DeepSpinPTExpt::compute(ENERGYVTYPE& ener,
int nloc = nall_real - nghost_real;
int nframes = 1;

// Build spin tensor for real atoms using bkw_map
std::vector<VALUETYPE> dspin(static_cast<size_t>(nall_real) * 3);
for (int ii = 0; ii < nall_real; ++ii) {
// Phantom-atom padding for the empty-subdomain corner case
// (``nloc_real == 0``). Multi-rank spin MD can land a rank with zero
// real local atoms when atoms migrate to other subdomains. The
// with-comm AOTI artifact, traced with ``nloc_min=1`` and lowered by
// inductor with an even stricter ``nloc >= 2`` runtime-check
// (silently bypassed because ``AOTI_RUNTIME_CHECK_INPUTS`` is unset by
// default), then SIGFPEs at runtime with an "integer divide by zero"
// inside inductor-generated shape arithmetic that uses ``nloc`` as a
// divisor. The failure is intermittent because inductor re-codegens
// across runs and only some compiles emit the offending divide.
//
// Fix: prepend two phantom atoms with no neighbours so the AOTI graph
// runs with ``nloc == 2``. The phantoms have an empty nlist row and
// therefore contribute zero atomic energy / force / virial, preserving
// the physically-correct "this rank has no real atoms" semantics.
// ``nlocal`` in the comm tensors is set to ``2`` so border_op writes
// received ghost features past the phantom slots; outputs are stripped
// of the phantom prefix before being scattered back to LAMMPS atoms
// via ``select_map``.
const int phantom_n = (nloc_real == 0 && nall_real > 0) ? 2 : 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Subtract phantom atoms before returning energy

When an empty-subdomain rank takes this new path, the two inserted type-0 phantom atoms are included in the model's energy_redu, and that reduced energy is assigned directly before any phantom prefix is stripped. DeepMD energy fitting nets can add per-type biases / nonzero zero-neighbor outputs, so phantoms are not guaranteed to have zero atomic energy; in those models every rank with nloc_real == 0 will contribute extra phantom energy to the MPI-reduced LAMMPS total. Please either remove/subtract the phantom atomic energies from energy_redu or make the model see them through a mask that excludes them from reductions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit c43468fb7 (force-pushed). I went one step beyond a phantom-only subtract: a rank with no real local atoms contributes zero to the total energy by definition, so I just zero ener directly when phantom_n > 0.

The simpler-subtract approach would have been incomplete here — the spin path doubles atoms internally (real + spin half), so both halves carry the per-type-bias and zero-neighbour MLP output into energy_redu. output_map["energy"] only exposes the real half after the SpinModel's [:, :nloc] slice, so subtracting that alone would leave the spin half leaking through. (Confirmed empirically: an earlier attempt to subtract the real half showed mpi-2=-2.45 vs mpi-1=-1.49 in CI run 26796476553. The zero-out has no such residual.)

Forces / force_mag / virial don't need an analogous correction because phantom atomic outputs are coord-independent (no neighbours) so their derivatives are zero.

if (phantom_n > 0) {
dcoord.insert(dcoord.begin(), static_cast<size_t>(phantom_n) * 3,
static_cast<VALUETYPE>(0));
datype.insert(datype.begin(), static_cast<size_t>(phantom_n), 0);
nall_real += phantom_n;
nloc_real = phantom_n;
nloc = nall_real - nghost_real;
}

// Build spin tensor for real atoms using bkw_map (skip phantom prefix
// which keeps zero spin).
std::vector<VALUETYPE> dspin(static_cast<size_t>(nall_real) * 3,
static_cast<VALUETYPE>(0));
for (int ii = phantom_n; ii < nall_real; ++ii) {
for (int dd = 0; dd < 3; ++dd) {
dspin[static_cast<size_t>(ii) * 3 + dd] =
spin[static_cast<size_t>(bkw_map[ii]) * 3 + dd];
spin[static_cast<size_t>(bkw_map[ii - phantom_n]) * 3 + dd];
}
}

Expand Down Expand Up @@ -445,11 +476,16 @@ void DeepSpinPTExpt::compute(ENERGYVTYPE& ener,
nlist_data.shuffle_exclude_empty(fwd_map);
nlist_data.padding();

// Rebuild mapping tensor
// Rebuild mapping tensor. Phantom slots (when phantom_n > 0) get
// identity entries — they index into their own row and never appear
// in any other atom's nlist (their nlist rows are all -1 below).
if (lmp_list.mapping) {
std::vector<std::int64_t> mapping(nall_real);
for (int ii = 0; ii < nall_real; ii++) {
mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii]]];
for (int ii = 0; ii < phantom_n; ii++) {
mapping[ii] = ii;
}
for (int ii = phantom_n; ii < nall_real; ii++) {
mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]];
}
Comment on lines +487 to 489
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Missing phantom_n offset in mapping values for real atoms.

When phantom_n > 0, fwd_map[...] returns indices in the pre-padding coordinate space [0, nall_real_orig). After phantom padding, real atoms are shifted to indices [phantom_n, nall_real), so the mapping values must be offset accordingly.

For example, with phantom_n=2: if fwd_map[...] returns 0 (first real ghost in pre-padding space), the mapping should point to index 2 (first real ghost in post-padding space), not 0 (which is now a phantom slot).

🐛 Proposed fix
     for (int ii = phantom_n; ii < nall_real; ii++) {
-      mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]];
+      mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]] + phantom_n;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (int ii = phantom_n; ii < nall_real; ii++) {
mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]];
}
for (int ii = phantom_n; ii < nall_real; ii++) {
mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]] + phantom_n;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/api_cc/src/DeepSpinPTExpt.cc` around lines 487 - 489, The mapping
assignment for real atoms fails to account for phantom padding: when phantom_n >
0 the indices returned by fwd_map(...) are in the pre-padding space and must be
shifted by phantom_n. Update the loop that sets mapping[ii] (which uses
fwd_map[lmp_list.mapping[bkw_map[ii - phantom_n]]]) to add phantom_n to the
fwd_map result so real-atom targets point into the post-padding index range;
keep all other indexing (ii, bkw_map, lmp_list.mapping) unchanged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reachable in current code, so I'm not applying the suggestion. The reasoning:

  1. The if (lmp_list.mapping) branch is single-rank-only. Both pair_deepmd.cpp:194 and pair_deepspin.cpp:204-257 gate lmp_list.set_mapping(...) on comm->nprocs == 1. So in multi-rank, lmp_list.mapping == nullptr and we take the else (identity-fallback) branch instead.

  2. phantom_n > 0 is multi-rank-only — it only fires when nloc_real == 0 && nall_real > 0, which can only happen in a multi-rank LAMMPS run with an empty subdomain.

  3. Combining: phantom_n > 0 implies multi-rank, which implies lmp_list.mapping == nullptr, which means the branch you're flagging is never entered. The identity-fallback branch handles phantoms correctly because mapping[ii] = ii is self-consistent regardless of phantom offsets, and the with-comm AOTI artifact ignores mapping for ghost feature exchange anyway (border_op does that).

If a future change ever populates lmp_list.mapping in multi-rank, the fix would become applicable. Until then, applying it would be defensive against an unreachable scenario.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wanghan-iapcm, that's a clear and well-documented analysis — thank you. The mutual exclusivity is airtight:

  • lmp_list.mapping != nullptr ↔ single-rank (nprocs == 1)
  • phantom_n > 0 ↔ multi-rank empty subdomain (nloc_real == 0 && nall_real > 0)

So the two conditions can never hold simultaneously in the current LAMMPS pair-style code, and the branch is indeed unreachable. I'll withdraw the suggestion and record this constraint to avoid flagging it again.

(ᵒ ᵕ ᵒ)ᵇ


✏️ Learnings added
Learnt from: wanghan-iapcm
Repo: deepmodeling/deepmd-kit PR: 5485
File: source/api_cc/src/DeepSpinPTExpt.cc:487-489
Timestamp: 2026-06-02T06:34:42.158Z
Learning: In `source/api_cc/src/DeepSpinPTExpt.cc`, the combination `(phantom_n > 0) && (lmp_list.mapping != nullptr)` is structurally unreachable:
- `lmp_list.mapping` is populated only when `comm->nprocs == 1` (single-rank), gated in `source/lmp/pair_deepmd.cpp:194` and `source/lmp/pair_deepspin.cpp:204-257`.
- `phantom_n > 0` only fires when `nloc_real == 0 && nall_real > 0`, which can only occur in a multi-rank LAMMPS run with an empty subdomain.
Therefore the `if (lmp_list.mapping)` block inside the `ago == 0` section is never entered when phantom padding is active; the identity-fallback `else` branch handles that case correctly. Do not flag the missing `+ phantom_n` offset in the `lmp_list.mapping` branch as a bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: wanghan-iapcm
Repo: deepmodeling/deepmd-kit PR: 5407
File: source/api_cc/src/DeepSpinPTExpt.cc:135-151
Timestamp: 2026-04-20T13:36:43.594Z
Learning: In deepmd-kit, `.pt2` model files do not include any version/schema field. When reviewing code in `*PTExpt.cc` that reads `.pt2` metadata, do not add or require a version/schema lookup/expectation. Ensure backward compatibility is implemented via metadata fallback defaults—e.g., if `do_atomic_virial` is missing, default it to `true`, and if `nnei` metadata is missing, default `nnei` to `sum(sel)`—so older exported `.pt2` models continue to work.

mapping_tensor =
torch::from_blob(mapping.data(), {1, nall_real}, int_option)
Expand All @@ -472,8 +508,16 @@ void DeepSpinPTExpt::compute(ENERGYVTYPE& ener,
}

// Flatten raw nlist — the .pt2 model sorts by distance on-device.
// Phantom rows (all -1) are prepended below so the AOTI graph sees
// nloc == phantom_n + nloc_real_orig instead of 0.
firstneigh_tensor =
createNlistTensor(nlist_data.jlist, nnei).to(torch::kInt64).to(device);
if (phantom_n > 0) {
auto phantom_rows = torch::full(
{1, phantom_n, nnei}, static_cast<std::int64_t>(-1),
torch::TensorOptions().dtype(torch::kInt64).device(device));
firstneigh_tensor = torch::cat({phantom_rows, firstneigh_tensor}, 1);
}
}

// Build fparam/aparam tensors
Expand Down Expand Up @@ -566,6 +610,23 @@ void DeepSpinPTExpt::compute(ENERGYVTYPE& ener,
ener.assign(flat_energy_.data_ptr<ENERGYTYPE>(),
flat_energy_.data_ptr<ENERGYTYPE>() + flat_energy_.numel());

// Zero the reduced energy on an empty rank. Phantoms have constant
// atomic outputs (per-type bias + zero-neighbour MLP) that flow into
// ``energy_redu`` -- and on the spin path the SpinModel doubles atoms
// so the bias contribution appears for both real and spin phantom
// halves; subtracting only the real-half exposed by
// ``output_map["energy"]`` after the ``[:, :nloc]`` slice leaves the
// spin-half leaking into the MPI-reduced LAMMPS total. The physical
// contribution of a rank with no real local atoms is zero by
// definition, so just clear ``ener`` directly.
//
// Forces, force_mag, and virial are unaffected because phantom atomic
// outputs are coord-independent (no neighbours) so their derivatives
// are zero -- no analogous correction is needed.
if (phantom_n > 0) {
std::fill(ener.begin(), ener.end(), static_cast<ENERGYTYPE>(0));
}

// Extract force: energy_derv_r (nf, nall, 1, 3) -> (nf, nall, 3)
torch::Tensor force_tensor =
output_map["energy_derv_r"].squeeze(-2).view({-1}).to(floatType);
Expand All @@ -588,6 +649,17 @@ void DeepSpinPTExpt::compute(ENERGYVTYPE& ener,
virial.assign(cpu_virial_.data_ptr<VALUETYPE>(),
cpu_virial_.data_ptr<VALUETYPE>() + cpu_virial_.numel());

// Strip the phantom prefix (see phantom-atom padding comment near
// ``select_real_atoms_coord``) so the ``bkw_map`` lookup below sees
// only the real / ghost atoms it was built for. The phantom slots
// carry zero forces because their nlist rows were all -1 — they
// produce no neighbour contributions, so dropping them is exact.
if (phantom_n > 0) {
dforce.erase(dforce.begin(), dforce.begin() + phantom_n * 3);
dforce_mag.erase(dforce_mag.begin(), dforce_mag.begin() + phantom_n * 3);
nall_real -= phantom_n;
}

// bkw map: map force from real atoms back to full atom list
force.resize(static_cast<size_t>(nframes) * fwd_map.size() * 3);
force_mag.resize(static_cast<size_t>(nframes) * fwd_map.size() * 3);
Expand All @@ -612,6 +684,16 @@ void DeepSpinPTExpt::compute(ENERGYVTYPE& ener,
cpu_atom_virial_.data_ptr<VALUETYPE>(),
cpu_atom_virial_.data_ptr<VALUETYPE>() + cpu_atom_virial_.numel());

// Strip the phantom prefix from atomic outputs as well (see force
// block above). Phantom slots carry zero atomic energy / virial
// because their nlist rows were all -1.
if (phantom_n > 0) {
datom_energy.erase(datom_energy.begin(),
datom_energy.begin() + phantom_n);
datom_virial.erase(datom_virial.begin(),
datom_virial.begin() + phantom_n * 9);
}

atom_energy.resize(static_cast<size_t>(nframes) * fwd_map.size());
atom_virial.resize(static_cast<size_t>(nframes) * fwd_map.size() * 9);
select_map<VALUETYPE>(atom_energy, datom_energy, bkw_map, 1, nframes,
Expand Down
Loading