Skip to content

Fix CUDA error when docking with flexible residues#175

Closed
caic99 wants to merge 1 commit intodptech-corp:mainfrom
caic99:fix/flex-residue-cuda-159
Closed

Fix CUDA error when docking with flexible residues#175
caic99 wants to merge 1 commit intodptech-corp:mainfrom
caic99:fix/flex-residue-cuda-159

Conversation

@caic99
Copy link
Copy Markdown
Member

@caic99 caic99 commented Apr 10, 2026

Summary

Fixes #159 — CUDA error 700 (cudaErrorIllegalAddress) when using --flex for flexible receptor residues.

The CUDA kernel completely lacked support for flexible residues, causing crashes through five interrelated bugs:

  • Blocking assert: assert(m.num_other_pairs() == 0) in 4 locations in monte_carlo.cu immediately crashed when flex residues created other_pairs (flex-flex interaction pairs)
  • Missing other_pairs in CUDA: Flex-flex interaction pairs were never stored in GPU structures or evaluated in the scoring kernel
  • Buffer overflow: MAX_NUM_OF_FLEX_TORSION = 1 was too small for real flex residues (which can have 20+ torsions across multiple residues), causing memory corruption when copying torsion data
  • Wrong Config grouping: Ligand classification in main.cpp ignored receptor flex atoms/torsions, placing combined models (e.g., 165 atoms) into undersized Config groups (e.g., SmallConfig with max 40 atoms), causing out-of-bounds GPU memory access
  • Missing flex conf in results: cuda_to_vina didn't initialize flex conformation data, causing segfault during CPU-side pose refinement when model.set() was called

Changes

File Changes
kernel.h Increase MAX_NUM_OF_FLEX_TORSION 1→24 in all Config structs; add other_pairs structs to m_cuda_t and m_cuda_t_<Config>
monte_carlo.cu Remove 4 blocking asserts; add other_pairs GPU data copying; fix flex torsion copy to flatten all residues
warp_ops.cuh Evaluate other_pairs in m_eval_deriv_warp; copy other_pairs in m_cuda_init_with_m_cuda_warp
main.cpp Account for receptor flex atoms/torsions in ligand Config group classification
vina.h Initialize flex conf from model before pose refinement
CMakeLists.txt Add sm_100 (Blackwell/B200) to CMAKE_CUDA_ARCHITECTURES

Limitations

This fix enables flex docking to run without crashing and correctly evaluates flex-flex interaction energy (other_pairs). However, flex torsions are not yet optimized during the GPU Monte Carlo search — they remain at their initial values and are refined only during the CPU-side post-processing step. Full flex torsion optimization in the CUDA kernel (flex tree coordinate updates, flex torsion mutation, flex derivatives) would be a follow-up enhancement.

Test plan

  • Reproduced original CUDA error 700 with the issue reporter's test files (2am9 receptor with 21 flex residues)
  • Verified fix produces docking output (affinity: -3.467 kcal/mol, EXIT_CODE=0)
  • Verified normal (non-flex) docking regression: 5798 ligands in 2.6s, identical results

🤖 Generated with Claude Code

The CUDA kernel lacked support for flexible receptor residues, causing
crashes when using the --flex flag. This fixes five interrelated bugs:

1. Remove assert(m.num_other_pairs() == 0) that blocked flex docking
2. Add other_pairs (flex-flex interactions) storage and evaluation in
   the CUDA kernel scoring function
3. Increase MAX_NUM_OF_FLEX_TORSION from 1 to 24 to prevent buffer
   overflow when copying flex torsion data
4. Fix ligand grouping to account for receptor flex atoms/torsions,
   preventing models from being placed in undersized Config groups
5. Initialize flex conf before pose refinement to prevent segfault
   when the CPU-side quasi_newton calls model.set()

Also adds sm_100 (Blackwell/B200) to CMAKE_CUDA_ARCHITECTURES.

Closes dptech-corp#159

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA error 700 when docking with flexible residues

1 participant