Atomics on 16 bits: prevent reading 4 bytes for 2-byte locations. #3005

failed: 4, skipped: 54, passed: 1184

`GPUTests.test_softmax_cuda`

torch._inductor.exc.InductorError: RuntimeError: CUDA driver error: 301

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"


To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_torchinductor.py GPUTests.test_softmax_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Stack trace

Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 15650, in new_test
    return value(self)
           ^^^^^^^^^^^
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 3824, in test_softmax
    self.common(fn, (torch.randn(8, 8), torch.randn(8, 8)))
  File "/opt/conda/envs/py_3.12/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 715, in check_model_gpu
    check_model(
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 518, in check_model
    actual = run(*example_inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1029, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1046, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1030, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1791, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1563, in codegen_and_compile
    compiled_module = graph.compile_to_module()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2494, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2504, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2579, in _compile_to_module_lines
    mod = PyCodeCache.load_by_key_path(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3747, in load_by_key_path
    mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/tmp6yczqejw/tm/ctmbbejxyfriagath4hhkuznryxqtnhjp4gk44hdpwdepbiowka6.py", line 192, in <module>
    async_compile.wait(globals())
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 699, in wait
    self._wait_futures(scope)
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 719, in _wait_futures
    kernel = result.result()
             ^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 4494, in result
    return self.result_fn()
           ^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 453, in get_result
    kernel.precompile(
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 503, in precompile
    self._make_launchers()
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 659, in _make_launchers
    launchers.append(result.make_launcher())
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1795, in make_launcher
    self.kernel.load_kernel(device)
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/static_triton_launcher.py", line 143, in load_kernel
    (self.function, self.n_regs, self.n_spills) = self.C_impl._load_kernel(
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^
torch._inductor.exc.InductorError: RuntimeError: CUDA driver error: 301

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"


To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_torchinductor.py GPUTests.test_softmax_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Standard out

stats [('calls_captured', 4)]
aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
inductor [('pattern_matcher_nodes', 12), ('async_compile_cache_miss', 4), ('pattern_matcher_count', 3), ('async_compile_cache_hit', 2), ('fxgraph_cache_miss', 1)]
graph_break []
stats [('calls_captured', 4)]
aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
inductor [('pattern_matcher_nodes', 12), ('async_compile_cache_miss', 4), ('pattern_matcher_count', 3), ('async_compile_cache_hit', 2), ('fxgraph_cache_miss', 1)]
graph_break []

`TestCudaMallocAsync.test_device_memory_used`

AssertionError: False is not true

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Stack trace

Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 4960, in test_device_memory_used
    self.assertTrue(num_bytes // 32 <= mem_bytes <= num_bytes * 32)
  File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/case.py", line 727, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

`TestCudaMallocAsync.test_device_memory_used`

AssertionError: False is not true

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Stack trace

Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 4960, in test_device_memory_used
    self.assertTrue(num_bytes // 32 <= mem_bytes <= num_bytes * 32)
  File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/case.py", line 727, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

`TestCudaMallocAsync.test_device_memory_used`

AssertionError: False is not true

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Stack trace

Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 4960, in test_device_memory_used
    self.assertTrue(num_bytes // 32 <= mem_bytes <= num_bytes * 32)
  File "/opt/conda/envs/py_3.12/lib/python3.12/unittest/case.py", line 727, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestCudaMallocAsync.test_device_memory_used

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomics on 16 bits: prevent reading 4 bytes for 2-byte locations. #3005

Uh oh!

Atomics on 16 bits: prevent reading 4 bytes for 2-byte locations. #3005

Uh oh!

failed: 4, skipped: 54, passed: 1184

Details

`GPUTests.test_softmax_cuda`

`TestCudaMallocAsync.test_device_memory_used`

`TestCudaMallocAsync.test_device_memory_used`

`TestCudaMallocAsync.test_device_memory_used`

Re-running checks...

Atomics on 16 bits: prevent reading 4 bytes for 2-byte locations. #3005

Are you sure you want to change the base?

Atomics on 16 bits: prevent reading 4 bytes for 2-byte locations.

Uh oh!

Atomics on 16 bits: prevent reading 4 bytes for 2-byte locations. #3005

Uh oh!

failed: 4, skipped: 54, passed: 1184

Details

GPUTests.test_softmax_cuda

TestCudaMallocAsync.test_device_memory_used

TestCudaMallocAsync.test_device_memory_used

TestCudaMallocAsync.test_device_memory_used

Re-running checks...

`GPUTests.test_softmax_cuda`

`TestCudaMallocAsync.test_device_memory_used`

`TestCudaMallocAsync.test_device_memory_used`

`TestCudaMallocAsync.test_device_memory_used`