Skip to content

Fix CUDA error 700 when docking with flexible residues#176

Draft
caic99 wants to merge 2 commits intodptech-corp:mainfrom
caic99:fix/flex-residue-guard-159
Draft

Fix CUDA error 700 when docking with flexible residues#176
caic99 wants to merge 2 commits intodptech-corp:mainfrom
caic99:fix/flex-residue-guard-159

Conversation

@caic99
Copy link
Copy Markdown
Member

@caic99 caic99 commented Apr 10, 2026

Summary

Fixes #159 — CUDA error 700 (cudaErrorIllegalAddress) when using --flex with --gpu_batch.

The CUDA kernel supports at most 1 flex torsion (MAX_NUM_OF_FLEX_TORSION = 1), but exceeding this limit crashed with an opaque CUDA error instead of a clear message. The crash had three root causes:

  1. Wrong Config grouping (main.cpp): Ligand classification ignored receptor flex atoms/torsions. A combined model with 165 atoms was placed into SmallConfig (max 40), causing out-of-bounds GPU memory access → CUDA error 700.
  2. Dead assert (monte_carlo.cu): assert(m.num_other_pairs() == 0) was compiled out by NDEBUG in Release builds, providing zero protection.
  3. Missing flex conf in results (vina.h): cuda_to_vina didn't populate c.flex, so CPU-side pose refinement segfaulted on model.set(c).

Changes

File Change
main.cpp Fix grouping to include receptor flex atoms/torsions; add early exit with clear error when flex torsions exceed GPU kernel limit
monte_carlo.cu Replace 4 dead asserts with comments noting the limitation
vina.h Initialize flex conf from model before pose refinement

Behavior

Scenario Before After
21 flex residues (51 torsions) CUDA error code=700 cudaErrorIllegalAddress Clean error: "51 torsions, supports at most 1"
1 flex residue (1 torsion) CUDA error code=700 Works correctly, docking completes
No flex Works Works (unchanged)

Test plan

  • Reproduced original CUDA error 700 with issue reporter's test files (2am9, 21 flex residues)
  • Verified fix produces clean error message for unsupported configs (EXIT=1, no crash)
  • Verified 1 flex residue (1 torsion) works end-to-end on GPU (SmallConfig, EXIT=0)
  • Verified normal non-flex docking is unchanged

🤖 Generated with Claude Code

The CUDA kernel supports at most MAX_NUM_OF_FLEX_TORSION (1) flex
torsion, but using --flex with --gpu_batch crashed with CUDA error 700
instead of reporting the limitation. Root causes:

1. Ligand Config grouping (main.cpp) ignored receptor flex atoms and
   torsions, placing combined models (e.g. 165 atoms) into undersized
   Config groups (e.g. SmallConfig with max 40), causing out-of-bounds
   GPU memory access.
2. assert(m.num_other_pairs() == 0) was compiled out in Release builds
   (NDEBUG), providing no protection at all.
3. cuda_to_vina result conversion did not populate c.flex, causing
   segfault during CPU-side pose refinement when model.set(c) was
   called.

This commit:
- Fixes grouping to account for receptor flex atoms and torsions so
  that the correct Config is selected
- Replaces the ineffective assert with a comment noting the limitation
- Initializes flex conf from the model before pose refinement
- Adds an early check that exits cleanly with a clear error message
  when flex torsions exceed the kernel's supported limit

Single flex residue docking (1 torsion) now works correctly on GPU.
Configurations exceeding the limit get a clear error instead of a
CUDA crash.

Refs dptech-corp#159

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA error 700 when docking with flexible residues

1 participant