Atomics on 16 bits: prevent reading 4 bytes for 2-byte locations. #3005
+37
−7
ROCm Repo Management API / Tests / Tests / Test PyTorch / Run pytorch_test_2
failed
Feb 27, 2026 in 0s
failed: 4, skipped: 54, passed: 1184
failed: 4, skipped: 54, passed: 1184
Details
GPUTests.test_softmax_cuda
torch._inductor.exc.InductorError: RuntimeError: CUDA driver error: 301
Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_torchinductor.py GPUTests.test_softmax_cuda
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Stack trace
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 15650, in new_test
return value(self)
^^^^^^^^^^^
File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 3824, in test_softmax
self.common(fn, (torch.randn(8, 8), torch.randn(8, 8)))
File "/opt/conda/envs/py_3.12/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 715, in check_model_gpu
check_model(
File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 518, in check_model
actual = run(*example_inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1029, in compile_wrapper
raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1046, in _compile_fx_inner
raise InductorError(e, currentframe()).with_traceback(
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1030, in _compile_fx_inner
mb_compiled_graph = fx_codegen_and_compile(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1791, in fx_codegen_and_compile
return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1563, in codegen_and_compile
compiled_module = graph.compile_to_module()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2494, in compile_to_module
return self._compile_to_module()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2504, in _compile_to_module
mod = self._compile_to_module_lines(wrapper_code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2579, in _compile_to_module_lines
mod = PyCodeCache.load_by_key_path(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3747, in load_by_key_path
mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
exec(code, mod.__dict__, mod.__dict__)
File "/tmp/tmp6yczqejw/tm/ctmbbejxyfriagath4hhkuznryxqtnhjp4gk44hdpwdepbiowka6.py", line 192, in <module>
async_compile.wait(globals())
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 699, in wait
self._wait_futures(scope)
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 719, in _wait_futures
kernel = result.result()
^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 4494, in result
return self.result_fn()
^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 453, in get_result
kernel.precompile(
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 503, in precompile
self._make_launchers()
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 659, in _make_launchers
launchers.append(result.make_launcher())
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1795, in make_launcher
self.kernel.load_kernel(device)
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/static_triton_launcher.py", line 143, in load_kernel
(self.function, self.n_regs, self.n_spills) = self.C_impl._load_kernel(
^^^^^^^^^^^^^^^^^^^^^^^^^
torch._inductor.exc.InductorError: RuntimeError: CUDA driver error: 301
Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_torchinductor.py GPUTests.test_softmax_cuda
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Standard out
stats [('calls_captured', 4)]
aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
inductor [('pattern_matcher_nodes', 12), ('async_compile_cache_miss', 4), ('pattern_matcher_count', 3), ('async_compile_cache_hit', 2), ('fxgraph_cache_miss', 1)]
graph_break []
stats [('calls_captured', 4)]
aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
inductor [('pattern_matcher_nodes', 12), ('async_compile_cache_miss', 4), ('pattern_matcher_count', 3), ('async_compile_cache_hit', 2), ('fxgraph_cache_miss', 1)]
graph_break []
TestCudaMallocAsync.test_device_memory_used
AssertionError: False is not true
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Stack trace
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 4960, in test_device_memory_used
self.assertTrue(num_bytes // 32 <= mem_bytes <= num_bytes * 32)
File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/case.py", line 727, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
TestCudaMallocAsync.test_device_memory_used
AssertionError: False is not true
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Stack trace
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 4960, in test_device_memory_used
self.assertTrue(num_bytes // 32 <= mem_bytes <= num_bytes * 32)
File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/case.py", line 727, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
TestCudaMallocAsync.test_device_memory_used
AssertionError: False is not true
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Stack trace
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 4960, in test_device_memory_used
self.assertTrue(num_bytes // 32 <= mem_bytes <= num_bytes * 32)
File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/case.py", line 727, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true
To execute this test, run the following from the base repo dir:
PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Loading