Skip to content

[WIP] Skip tests using managed memory if CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0#1576

Closed
rwgk wants to merge 10 commits intoNVIDIA:mainfrom
rwgk:avoid_managed_memory_on_windows
Closed

[WIP] Skip tests using managed memory if CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0#1576
rwgk wants to merge 10 commits intoNVIDIA:mainfrom
rwgk:avoid_managed_memory_on_windows

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Feb 4, 2026

Closes nvbug 5815123

Background: #1539

This PR will have two stages: 1. identify the tests we need to skip, 2. add the skips

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Feb 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Feb 4, 2026

The temporary commit bddca29 is a trick to identify the tests we need to skip.

Full build and test logs (internal access only);

/home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/cuda-python_qa_bindings_linux_2026-02-04+145720_build_log.txt
/home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/cuda-python_qa_bindings_linux_2026-02-04+150226_testslog.txt
smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python $ grep -a '^FAILED ' /home/scratch.rgrossekunst_sw/logs_mirror/smc120-0009.ipp2a2.colossus/logs/cuda-python_qa_bindings_linux_2026-02-04+150226_testslog.txt
FAILED tests/memory_ipc/test_serialize.py::TestObjectPassing::test_main[DeviceMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_serialize.py::TestObjectPassing::test_main[PinnedMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationDirect::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationDirect::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationWithMR::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_serialize.py::TestObjectSerializationWithMR::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferObjects::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferObjects::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferDescriptors::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCSharedAllocationHandleAndBufferDescriptors::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCMempoolMultiple::test_main[DeviceMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIPCMempoolMultiple::test_main[PinnedMR] - AssertionError: assert 1 == 0
FAILED tests/memory_ipc/test_memory_ipc.py::TestIpcMempool::test_main[PinnedMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_memory_ipc.py::TestIpcMempool::test_main[DeviceMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingIPCDescriptors::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingIPCDescriptors::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPool::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPool::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingRegistry::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_workerpool.py::TestIpcWorkerPoolUsingRegistry::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcSendBuffers::test_main[1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcSendBuffers::test_main[3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcReexport::test_main[PinnedMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/memory_ipc/test_send_buffers.py::TestIpcReexport::test_main[DeviceMR] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int16--1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-2-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-float64-err] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-4] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint32-0xFFFFFFFF] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_copy_to - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint16-0x1234] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int8--1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_copy_from - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_external_managed[True] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-0] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int32-max] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int16-max] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_external_managed[False] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int32-min] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int16-min] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_dunder_dlpack_device_success[DummyUnifiedMemoryResource-expected2] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-3] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int8-127] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint8-255] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-2] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-float32-1.0] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-1000] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint32-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint8-0] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int8--128] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint64-err] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-256] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int32--1] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_close - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-0x42] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint16-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint16-0xFFFF] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bytes-4-bad-size] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-int64-err] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-bad-type-str] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-np-uint32-0xDEADBEEF] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_initialization - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_memory.py::test_buffer_fill[unified-int-neg] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_helpers.py::test_patterngen_seeds - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_helpers.py::test_patterngen_values - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_helpers.py::test_latchkernel - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-thread_local] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc_with_output[thread_local] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-no_graph] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-relaxed] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc_with_output[global] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-global] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[fill-global] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-no_graph] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-relaxed] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc_with_output[relaxed] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'
FAILED tests/test_graph_mem.py::test_graph_alloc[incr-thread_local] - AttributeError: module 'cuda.bindings.driver' has no attribute 'cuMemAllocManaged'

rwgk and others added 2 commits February 4, 2026 17:15
Treat missing cuMemAllocManaged as disabled access and gate managed-memory
test paths in cuda_core and cuda_bindings to avoid false failures.

Co-authored-by: Cursor <cursoragent@cursor.com>
…_core/tests/test_launcher.py::test_launch_invalid_values
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Feb 5, 2026

I backed out the band-aid change made with PR #1567 (commit 85f76f5) because it could later mask if we miss skips.

Cursor-generated skips (commit b9f8452) pass local testing, with the MANUALLYDISABLEDcuMemAllocManaged commit bddca29 intentionally still in place.

Running the CI to see if we still have tests that depend on cuMemAllocManaged but only run on platforms other than my dev workstation.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Feb 5, 2026

/ok to test

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 5, 2026

Move the managed-memory skip logic into cuda_python_test_helpers and point
bindings/core tests at the shared module, with path bootstrapping to prefer
in-repo helpers. This avoids relying on bindings test helpers that are absent
in 12.9.x wheels.

Co-authored-by: Cursor <cursoragent@cursor.com>
@rwgk rwgk force-pushed the avoid_managed_memory_on_windows branch from 71f271a to a48565f Compare February 5, 2026 22:19
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Feb 12, 2026

Closing in favor of #1607

@rwgk rwgk closed this Feb 12, 2026
github-actions bot pushed a commit that referenced this pull request Feb 13, 2026
Removed preview folders for the following PRs:
- PR #1576
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Feb 17, 2026

It looks like #1618/#1607 turned out different from what I was aiming for in this PR: resolving Windows flakiness. Reopening, so I don't forget to come back to this.

(I got stuck a bit with this PR because of what's now solved under the pending #1218, specifically #1218 — Once that's merged, the test changes here should be easy.)

@rwgk rwgk reopened this Feb 17, 2026
@rwgk rwgk self-assigned this Mar 3, 2026
@rwgk rwgk added cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module test Improvements or additions to tests labels Mar 3, 2026
rwgk added 4 commits March 15, 2026 20:22
Made-with: Cursor
Reuse the shared managed-memory skip helper and keep the conftest import lazy
so test bootstrap order stays intact without duplicate skip logic.

Made-with: Cursor
Restore the cuMemAllocManaged binding, validate concurrent managed access
per active device, and drop the test-helper skip for missing symbols.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

/ok to test

rwgk added 2 commits March 16, 2026 07:57
Force managed-memory skip checks to return None so tests run without CMA
filtering.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

/ok to test

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

The negative test (commit af0e03a) worked exactly as expected:

Test win-64 / py3.10, 13.0.2, local, rtxpro6000 (TCC)   fail    12m43s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110630
Test win-64 / py3.10, 13.2.0, local, rtxpro6000 (TCC)   fail    13m10s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110530
Test win-64 / py3.11, 13.0.2, wheels, rtx4090 (WDDM)    fail    6m29s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110657
Test win-64 / py3.11, 13.2.0, wheels, rtx4090 (WDDM)    fail    6m28s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110690
Test win-64 / py3.12, 13.0.2, local, a100 (TCC)         fail    7m41s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110520
Test win-64 / py3.12, 13.2.0, local, a100 (TCC)         fail    8m40s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110559
Test win-64 / py3.13, 13.0.2, wheels, rtxpro6000 (MCDM) fail    8m57s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110557
Test win-64 / py3.13, 13.2.0, wheels, rtxpro6000 (MCDM) fail    11m12s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110601
Test win-64 / py3.14, 13.0.2, local, l4 (MCDM)          fail    5m44s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110510
Test win-64 / py3.14, 13.2.0, local, l4 (MCDM)          fail    5m39s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110566
Test win-64 / py3.14t, 13.0.2, wheels, a100 (MCDM)      fail    4m24s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110583
Test win-64 / py3.14t, 13.2.0, wheels, a100 (MCDM)      fail    5m23s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110579
Test win-64 / py3.10, 12.9.1, wheels, rtx2080 (WDDM)    pass    9m9s    https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110587
Test win-64 / py3.11, 12.9.1, local, v100 (MCDM)        pass    11m36s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110522
Test win-64 / py3.12, 12.9.1, wheels, l4 (MCDM)         pass    7m24s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110560
Test win-64 / py3.13, 12.9.1, local, l4 (TCC)           pass    10m14s  https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110627
Test win-64 / py3.14, 12.9.1, wheels, v100 (TCC)        pass    7m42s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110680
Test win-64 / py3.14t, 12.9.1, local, l4 (TCC)          pass    9m29s   https://github.com/NVIDIA/cuda-python/actions/runs/23150259719/job/67252110618

However, in the meantime the idea of adding the guard code in cuda_bindings/cuda/bindings/driver.pyx.in was put into question, because there are narrowly defined valid use cases for cuMemAllocManaged() even if CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

I want to backtrack to my main goal: our tests should not be flaky

Background:

  • flakiness seems very limited in our CI (I'm actually not aware of any flakes in our CI, although I have not looked systematically, because that's difficiult)

  • I've seen around 30% flakiness in local testing on my main workstation, e.g. in 100 trails running the entire test suite:

    25  FAILED tests/test_helpers.py::test_latchkernel - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    25  FAILED tests/test_helpers.py::test_patterngen_values - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-global] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-no_graph] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-relaxed] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[fill-thread_local] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-global] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-no_graph] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-relaxed] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
    19  FAILED tests/test_graph_mem.py::test_graph_alloc[incr-thread_local] - OSError: [WinError -1073741818] Windows Error 0xcNNNNNNN
  • our QA team also reported flakiness, i.e. it's not just my main workstation

Here is a Cursor-generated analysis and concrete suggestions:

What Becomes Undefined When concurrentManagedAccess == 0

The CUDA Programming Guide (Unified Memory on Windows/WSL/Tegra) states that
when concurrentManagedAccess is 0, simultaneous CPU and GPU access to managed
memory is not supported. In that mode, any host access to managed memory while
any GPU kernel is in flight is undefined, even if the kernel does not touch the
same allocation. The only safe pattern is to synchronize (stream or device, as
appropriate) before the host touches managed memory.

Why The Flaky Tests Look Like Victims

The Windows OSError: [WinError -1073741818] crashes align with tests that do
host-side reads or writes of managed memory while relying only on stream-local
synchronization. The shared test helpers (for example scratch buffers and
compare helpers) allocate managed memory and call memset/memcmp on the host.
If any kernel is still running on any stream, that host access can fault on
devices where concurrentManagedAccess == 0. Graph tests and helper tests
exercise these paths and are therefore susceptible to cross-test or cross-stream
in-flight work.

Practical Path To Well-Behaved Tests (No Flakes)

  1. Guard host managed-memory access on CMA=0
    Add a small helper (in helpers/buffers.py) that calls Device.sync() (or
    otherwise ensures no work is in flight) before any host memset/memcmp of
    managed memory when concurrentManagedAccess == 0. This is targeted and
    keeps behavior unchanged on CMA=1 systems.

  2. Use pinned or host memory for scratch on CMA=0
    Replace scratch buffers used only for host comparisons with pinned or host
    allocations when CMA=0, avoiding managed memory host access entirely in those
    helper paths.

  3. Add an autouse device-sync fixture on CMA=0
    As a coarse safety net, synchronize the device after each test when
    concurrentManagedAccess == 0. This reduces cross-test contamination but
    does not fix in-test undefined access, so it is best combined with (1) or (2).

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

Closing: see #1769 (comment)

@rwgk rwgk closed this Mar 16, 2026
github-actions bot pushed a commit that referenced this pull request Mar 17, 2026
Removed preview folders for the following PRs:
- PR #1576
- PR #1729
- PR #1766
@rwgk rwgk deleted the avoid_managed_memory_on_windows branch March 17, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants