[WIP] Skip tests using managed memory if `CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0` by rwgk · Pull Request #1576 · NVIDIA/cuda-python

rwgk · 2026-02-04T23:10:55Z

Closes nvbug 5815123

Background: #1539

This PR will have two stages: 1. identify the tests we need to skip, 2. add the skips

copy-pr-bot · 2026-02-04T23:10:59Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2026-02-04T23:15:05Z

The temporary commit bddca29 is a trick to identify the tests we need to skip.

Full build and test logs (internal access only);

/home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/cuda-python_qa_bindings_linux_2026-02-04+145720_build_log.txt
/home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/cuda-python_qa_bindings_linux_2026-02-04+150226_testslog.txt

smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python $ grep -a '^FAILED ' /home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/cuda-python_qa_bindings_linux_2026-02-04+150226_testslog.txt
FAILED tests/memory_ipc/test_serialize.py::TestObjectPassing::test_main[DeviceMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_serialize.py::TestObjectPassing::test_main[PinnedMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationDirect::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationDirect::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationWithMR::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationWithMR::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferObjects::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferObjects::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferDescriptors::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferDescriptors::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCMempoolMultiple::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCMempoolMultiple::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIpcMempool::test_main[PinnedMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_memory_ipc.py::TestIpcMempool::test_main[DeviceMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingIPCDescriptors::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingIPCDescriptors::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPool::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPool::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingRegistry::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingRegistry::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcSendBuffers::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcSendBuffers::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcReexport::test_main[PinnedMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcReexport::test_main[DeviceMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int16--1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-2-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-float64-err] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-4] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint32-0xFFFFFFFF] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_copy_to - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint16-0x1234] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int8--1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_copy_from - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_external_managed[True] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-0] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int32-max] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int16-max] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_external_managed[False] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int32-min] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int16-min] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_dunder_dlpack_device_success[DummyUnifiedMemoryResource-expected2] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int8-127] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint8-255] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-2] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-float32-1.0] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-1000] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint32-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint8-0] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int8--128] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint64-err] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-256] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int32--1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_close - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-0x42] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint16-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint16-0xFFFF] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-4-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int64-err] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bad-type-str] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint32-0xDEADBEEF] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_initialization - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-neg] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_helpers.py::test_patterngen_seeds - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_helpers.py::test_patterngen_values - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_helpers.py::test_latchkernel - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-thread_local] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc_with_output[thread_local] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-no_graph] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-relaxed] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc_with_output[global] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-global] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-global] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-no_graph] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-relaxed] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc_with_output[relaxed] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-thread_local] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'

Treat missing cuMemAllocManaged as disabled access and gate managed-memory test paths in cuda_core and cuda_bindings to avoid false failures. Co-authored-by: Cursor <cursoragent@cursor.com>

…_core/tests/test_launcher.py::test_launch_invalid_values

rwgk · 2026-02-05T01:28:50Z

I backed out the band-aid change made with PR #1567 (commit 85f76f5) because it could later mask if we miss skips.

Cursor-generated skips (commit b9f8452) pass local testing, with the MANUALLYDISABLEDcuMemAllocManaged commit bddca29 intentionally still in place.

Running the CI to see if we still have tests that depend on cuMemAllocManaged but only run on platforms other than my dev workstation.

rwgk · 2026-02-05T01:29:01Z

/ok to test

github-actions · 2026-02-05T01:41:39Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1576/
https://nvidia.github.io/cuda-python/pr-preview/pr-1576/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1576/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1576/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

Move the managed-memory skip logic into cuda_python_test_helpers and point bindings/core tests at the shared module, with path bootstrapping to prefer in-repo helpers. This avoids relying on bindings test helpers that are absent in 12.9.x wheels. Co-authored-by: Cursor <cursoragent@cursor.com>

rwgk · 2026-02-12T16:31:40Z

Closing in favor of #1607

Removed preview folders for the following PRs: - PR #1576

rwgk · 2026-02-17T22:27:43Z

It looks like #1618/#1607 turned out different from what I was aiming for in this PR: resolving Windows flakiness. Reopening, so I don't forget to come back to this.

(I got stuck a bit with this PR because of what's now solved under the pending #1218, specifically #1218 — Once that's merged, the test changes here should be easy.)

Made-with: Cursor

Reuse the shared managed-memory skip helper and keep the conftest import lazy so test bootstrap order stays intact without duplicate skip logic. Made-with: Cursor

Restore the cuMemAllocManaged binding, validate concurrent managed access per active device, and drop the test-helper skip for missing symbols. Made-with: Cursor

rwgk · 2026-03-16T05:44:00Z

/ok to test

Force managed-memory skip checks to return None so tests run without CMA filtering. Made-with: Cursor

rwgk · 2026-03-16T14:59:25Z

/ok to test

rwgk · 2026-03-16T16:36:53Z

The negative test (commit af0e03a) worked exactly as expected:

Test win-64 / py3.10, 13.0.2, local, rtxpro6000 (TCC)   fail    12m43s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110630
Test win-64 / py3.10, 13.2.0, local, rtxpro6000 (TCC)   fail    13m10s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110530
Test win-64 / py3.11, 13.0.2, wheels, rtx4090 (WDDM)    fail    6m29s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110657
Test win-64 / py3.11, 13.2.0, wheels, rtx4090 (WDDM)    fail    6m28s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110690
Test win-64 / py3.12, 13.0.2, local, a100 (TCC)         fail    7m41s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110520
Test win-64 / py3.12, 13.2.0, local, a100 (TCC)         fail    8m40s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110559
Test win-64 / py3.13, 13.0.2, wheels, rtxpro6000 (MCDM) fail    8m57s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110557
Test win-64 / py3.13, 13.2.0, wheels, rtxpro6000 (MCDM) fail    11m12s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110601
Test win-64 / py3.14, 13.0.2, local, l4 (MCDM)          fail    5m44s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110510
Test win-64 / py3.14, 13.2.0, local, l4 (MCDM)          fail    5m39s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110566
Test win-64 / py3.14t, 13.0.2, wheels, a100 (MCDM)      fail    4m24s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110583
Test win-64 / py3.14t, 13.2.0, wheels, a100 (MCDM)      fail    5m23s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110579
Test win-64 / py3.10, 12.9.1, wheels, rtx2080 (WDDM)    pass    9m9s    https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110587
Test win-64 / py3.11, 12.9.1, local, v100 (MCDM)        pass    11m36s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110522
Test win-64 / py3.12, 12.9.1, wheels, l4 (MCDM)         pass    7m24s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110560
Test win-64 / py3.13, 12.9.1, local, l4 (TCC)           pass    10m14s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110627
Test win-64 / py3.14, 12.9.1, wheels, v100 (TCC)        pass    7m42s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110680
Test win-64 / py3.14t, 12.9.1, local, l4 (TCC)          pass    9m29s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110618

However, in the meantime the idea of adding the guard code in cuda_bindings/cuda/bindings/driver.pyx.in was put into question, because there are narrowly defined valid use cases for cuMemAllocManaged() even if CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0.

rwgk · 2026-03-16T17:23:51Z

I want to backtrack to my main goal: our tests should not be flaky

Background:

flakiness seems very limited in our CI (I'm actually not aware of any flakes in our CI, although I have not looked systematically, because that's difficiult)
I've seen around 30% flakiness in local testing on my main workstation, e.g. in 100 trails running the entire test suite:

    25  FAILED tests/test_helpers.py::test_latchkernel - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    25  FAILED tests/test_helpers.py::test_patterngen_values - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-global] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-no_graph] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-relaxed] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-thread_local] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-global] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-no_graph] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-relaxed] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-thread_local] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN

our QA team also reported flakiness, i.e. it's not just my main workstation

Here is a Cursor-generated analysis and concrete suggestions:

What Becomes Undefined When `concurrentManagedAccess == 0`

The CUDA Programming Guide (Unified Memory on Windows/WSL/Tegra) states that
when concurrentManagedAccess is 0, simultaneous CPU and GPU access to managed
memory is not supported. In that mode, any host access to managed memory while
any GPU kernel is in flight is undefined, even if the kernel does not touch the
same allocation. The only safe pattern is to synchronize (stream or device, as
appropriate) before the host touches managed memory.

Why The Flaky Tests Look Like Victims

The Windows OSError: [WinError -1073741818] crashes align with tests that do
host-side reads or writes of managed memory while relying only on stream-local
synchronization. The shared test helpers (for example scratch buffers and
compare helpers) allocate managed memory and call memset/memcmp on the host.
If any kernel is still running on any stream, that host access can fault on
devices where concurrentManagedAccess == 0. Graph tests and helper tests
exercise these paths and are therefore susceptible to cross-test or cross-stream
in-flight work.

Practical Path To Well-Behaved Tests (No Flakes)

Guard host managed-memory access on CMA=0
Add a small helper (in helpers/buffers.py) that calls Device.sync() (or
otherwise ensures no work is in flight) before any host memset/memcmp of
managed memory when concurrentManagedAccess == 0. This is targeted and
keeps behavior unchanged on CMA=1 systems.
Use pinned or host memory for scratch on CMA=0
Replace scratch buffers used only for host comparisons with pinned or host
allocations when CMA=0, avoiding managed memory host access entirely in those
helper paths.
Add an autouse device-sync fixture on CMA=0
As a coarse safety net, synchronize the device after each test when
concurrentManagedAccess == 0. This reduces cross-test contamination but
does not fix in-test undefined access, so it is best combined with (1) or (2).

rwgk · 2026-03-16T22:32:28Z

Closing: see #1769 (comment)

Removed preview folders for the following PRs: - PR #1576 - PR #1729 - PR #1766

MANUALLYDISABLEDcuMemAllocManaged

bddca29

rwgk and others added 2 commits February 4, 2026 17:15

Skip managed-memory tests when concurrent access is disabled

b9f8452

Treat missing cuMemAllocManaged as disabled access and gate managed-memory test paths in cuda_core and cuda_bindings to avoid false failures. Co-authored-by: Cursor <cursoragent@cursor.com>

Partial revert of 6bdcda0 (NVIDIA#1567): Remove stream.sync() in cuda…

85f76f5

…_core/tests/test_launcher.py::test_launch_invalid_values

rwgk force-pushed the avoid_managed_memory_on_windows branch from 71f271a to a48565f Compare February 5, 2026 22:19

rwgk closed this Feb 12, 2026

github-actions bot pushed a commit that referenced this pull request Feb 13, 2026

Clean up PR preview folders for 1 closed/merged PRs

b57e2ad

Removed preview folders for the following PRs: - PR #1576

rwgk reopened this Feb 17, 2026

rwgk self-assigned this Mar 3, 2026

rwgk added cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module test Improvements or additions to tests labels Mar 3, 2026

rwgk added 4 commits March 15, 2026 20:22

Undo selected changes

bb4d6c1

Merge main

7731c96

Made-with: Cursor

Refactor managed-memory test skips

20d0197

Reuse the shared managed-memory skip helper and keep the conftest import lazy so test bootstrap order stays intact without duplicate skip logic. Made-with: Cursor

Guard cuMemAllocManaged on concurrent managed access

e411f2d

Restore the cuMemAllocManaged binding, validate concurrent managed access per active device, and drop the test-helper skip for missing symbols. Made-with: Cursor

rwgk added 2 commits March 16, 2026 07:57

Merge branch 'main' into avoid_managed_memory_on_windows

6034b24

Disable managed-memory skips for negative test

af0e03a

Force managed-memory skip checks to return None so tests run without CMA filtering. Made-with: Cursor

rwgk mentioned this pull request Mar 16, 2026

[WIP] Guard host managed-memory access on concurrentManagedAccess=0 #1769

Closed

rwgk closed this Mar 16, 2026

github-actions bot pushed a commit that referenced this pull request Mar 17, 2026

Clean up PR preview folders for 3 closed/merged PRs

9ed594b

Removed preview folders for the following PRs: - PR #1576 - PR #1729 - PR #1766

rwgk deleted the avoid_managed_memory_on_windows branch March 17, 2026 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Skip tests using managed memory if `CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0`#1576

[WIP] Skip tests using managed memory if `CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0`#1576
rwgk wants to merge 10 commits intoNVIDIA:mainfrom
rwgk:avoid_managed_memory_on_windows

rwgk commented Feb 4, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 4, 2026

Uh oh!

rwgk commented Feb 4, 2026

Uh oh!

rwgk commented Feb 5, 2026 •

edited

Loading

Uh oh!

rwgk commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rwgk commented Feb 12, 2026

Uh oh!

rwgk commented Feb 17, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rwgk commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 4, 2026

Uh oh!

rwgk commented Feb 4, 2026

Uh oh!

rwgk commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rwgk commented Feb 12, 2026

Uh oh!

rwgk commented Feb 17, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

What Becomes Undefined When concurrentManagedAccess == 0

Why The Flaky Tests Look Like Victims

Practical Path To Well-Behaved Tests (No Flakes)

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rwgk commented Feb 4, 2026 •

edited

Loading

rwgk commented Feb 5, 2026 •

edited

Loading

What Becomes Undefined When `concurrentManagedAccess == 0`