cuda: enable CUDA feature extraction on Windows (MSYS2/MinGW)#1472
cuda: enable CUDA feature extraction on Windows (MSYS2/MinGW)#1472birkdev wants to merge 2 commits intoNetflix:masterfrom
Conversation
On Windows, nvcc uses MSVC's cl.exe as its host compiler for preprocessing. Several headers included by .cu files contain MinGW/GCC-specific constructs that cl.exe cannot process: - Remove unnecessary #include <pthread.h> from cuda/common.h. No declarations in this header use pthread types; consumers that need pthread (ring_buffer.c, libvmaf.c) include it directly. - In cuda_helper.cuh, use <cuda.h> (from CUDA toolkit) instead of <ffnvcodec/dynlink_cuda.h> (from MinGW) for the DEVICE_CODE path. Device code only needs CUDA driver API types, not the dynamic loader machinery. - In picture.h, use <cuda.h> and forward-declare VmafCudaState for the DEVICE_CODE path, avoiding ffnvcodec and libvmaf_cuda.h which pull in host-only dependencies. - Guard C99 designated initializers in integer_adm.h with #ifndef __CUDACC__, as nvcc compiles .cu files as C++ where this syntax is not portable across host compilers. - Guard #include "feature_collector.h" in ADM .cu files with #ifndef DEVICE_CODE. This header is host-only (contains pthread usage) and is never referenced by device kernel code. These changes are no-ops on Linux where nvcc uses GCC (which handles all of the above natively). They enable CUDA compilation on Windows where nvcc must use MSVC's cl.exe. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Enable CUDA feature extraction on Windows (MSYS2/MinGW) by handling the nvcc + MSVC toolchain requirements in the meson build: - Auto-detect MSVC cl.exe via vswhere without adding it to PATH (which would cause meson to pick MSVC as the default C compiler instead of GCC). Pass it to nvcc via -ccbin. - Discover MSVC and Windows SDK include directories and pass them as -I flags to nvcc, since cl.exe runs outside a vcvars environment and cannot find system headers otherwise. - Add -D_USE_MATH_DEFINES so MSVC's math.h exposes M_PI. On Linux, all new code paths are skipped (guarded by host_machine.system() == 'windows'). Tested with CUDA 13.2, MSVC Build Tools (VS 18), Windows 11 SDK, and MinGW GCC 15.2.0 on MSYS2. The full pipeline (nvcc -> fatbin -> bin2c -> static lib -> shared lib) completes successfully, and the resulting libvmaf integrates with FFmpeg's libvmaf_cuda filter. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
I believe nv-codec-headers should work on Windows. |
|
You're right, nv-codec-headers works fine on Windows for the host code (compiled by GCC/MinGW). The Separately, the CUDA code in vmaf requires nv-codec-headers built from git master (specifically commit |
Summary
Enables building libvmaf with
-Denable_cuda=trueon Windows using MSYS2/MinGW, closing the gap described in #1154.On Windows, nvcc requires MSVC's cl.exe as its host compiler for preprocessing
.cufiles, even when the rest of the project is built with MinGW GCC. This creates two categories of issues that this PR addresses:Source portability (commit 1): Headers included by
.cufiles contain constructs that cl.exe cannot process —<pthread.h>(POSIX-only),<ffnvcodec/dynlink_*.h>(installed in MinGW paths), C99 designated initializers (not supported by nvcc in C++ mode with MSVC). These are resolved with#ifdef DEVICE_CODE/#ifndef __CUDACC__guards that are no-ops on Linux.Build system (commit 2): The meson build auto-detects cl.exe via
vswhere(without polluting PATH, which would cause meson to pick MSVC as the default compiler), discovers MSVC and Windows SDK include directories, and passes them to nvcc as-Iflags since cl.exe runs outside a vcvars environment.Prerequisites on Windows
nvcc,bin2c)876af32or later). The latest release (n13.0.19.0) is missing severalCudaFunctionsmembers that vmaf uses (cuMemFreeHost,cuStreamCreateWithPriority,cuLaunchHostFunc, etc.). This is a pre-existing issue, not specific to this PR.cl.exe, needed by nvcc for preprocessing)Test results
Tested on Windows 11 with:
The full pipeline works end-to-end:
meson setup -Denable_cuda=trueconfigures successfullyninjacompiles all 7.cufiles to fatbin, library linkslibvmaf_cudafilter scores at 1,135 fps on 1080p60 content (37-minute video scored in 2 minutes)Transparency note
This patch was developed with assistance from Claude (Anthropic's AI), as reflected in the Co-Authored-By lines. All changes were manually tested on real hardware and real video content.
Closes #1154
🤖 Generated with Claude Code