Skip to content

docs: add Windows ROCm/HIP build guide#2041

Open
T0nd3 wants to merge 2 commits into
OpenNMT:masterfrom
T0nd3:docs/windows-rocm-build-guide
Open

docs: add Windows ROCm/HIP build guide#2041
T0nd3 wants to merge 2 commits into
OpenNMT:masterfrom
T0nd3:docs/windows-rocm-build-guide

Conversation

@T0nd3
Copy link
Copy Markdown

@T0nd3 T0nd3 commented May 11, 2026

Summary

  • Adds : a step-by-step guide for building CTranslate2 with AMD GPU support (ROCm/HIP) on Windows, validated on RX 7900 XTX (gfx1100) with ROCm 7.2
  • Updates to include the AMD/ROCm GPU section (previously unmentioned)
  • Updates with links to the new guide in the pip and build-from-source sections
  • Adds the new page to under Get started

Motivation

ROCm support was merged in v4.7.0 (February 2026) but no documentation existed for building from source on Windows. The Windows build has several non-obvious requirements that differ from the Linux CI script:

  • ROCm must be installed via Python wheels (not the AMD HIP SDK installer)
  • CMake requires explicit paths for Ninja and rc.exe when Clang is the compiler
  • All CMake paths must use forward slashes to avoid escape sequence errors in the cache
  • tar.exe -C is unreliable on Windows; Python tarfile is the safe alternative
  • os.add_dll_directory must be called before importing the module at runtime

Test plan

  • Full build completed successfully on Windows 11, RX 7900 XTX (gfx1100), ROCm 7.2
  • Smoke test passed: get_cuda_device_count() == 1, all compute types (float16, bfloat16, int8) available
  • Docs render correctly in Sphinx (MyST admonitions used consistently with existing docs)

Adds a step-by-step guide for building CTranslate2 with AMD GPU support
(ROCm/HIP) on Windows, validated on RX 7900 XTX (gfx1100) with ROCm 7.2.

Changes:
- docs/building_rocm_windows.md: new detailed guide covering all
  prerequisites (VS Build Tools, ROCm wheels, Intel MKL, oneDNN),
  CMake configuration, C++ and Python builds, and known limitations
- docs/hardware_support.md: add AMD/ROCm GPU section
- docs/installation.md: link to ROCm guide from pip and build sections
- docs/index.rst: add building_rocm_windows to the Get started toctree
…ible

Issue OpenNMT#2016 reports that bundling CTranslate2 inside a PyInstaller `.exe`
breaks when the end user installs the standalone AMD HIP SDK Installer:
the wheel is linked against `hipblas.dll` (the name in ROCm 7.2, which is
what the rocm-sdk pip wheels ship), but the latest publicly available
AMD HIP SDK Installer is still on 7.1.1, where the same library is named
`libhipblas.dll`.  The Windows dynamic loader can't find the symbol the
wheel asks for and `import ctranslate2` fails with a generic
"module not found" error.

Add a `Why not the official AMD HIP SDK Installer?` subsection right
under the existing "ROCm via Python wheels" instructions that:
  - documents the version skew and the DLL name change,
  - explicitly recommends the Python wheels for CTranslate2,
  - explains the two options for users who need the .exe installer
    (build against 7.1, or wait for AMD to ship a 7.2 installer).

The original wording already said "this method does not require the AMD
HIP SDK installer", but didn't make it obvious that mixing the two is
actively broken — which is exactly what the issue reporter ran into.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant