Bug report
Found by @vfdev-5.
This is specific to the free threading build and 3.14.
XLA/Jax uses the following code to create a heap type:
// We need to use heap-allocated type objects because we want to add
// additional methods dynamically.
...
nb::str name = nb::str("PmapFunction");
nb::str qualname = nb::str("PmapFunction");
PyHeapTypeObject* heap_type = reinterpret_cast<PyHeapTypeObject*>(
PyType_Type.tp_alloc(&PyType_Type, 0));
// Caution: we must not call any functions that might invoke the GC until
// PyType_Ready() is called. Otherwise the GC might see a half-constructed
// type object.
CHECK(heap_type) << "Unable to create heap type object";
heap_type->ht_name = name.release().ptr();
heap_type->ht_qualname = qualname.release().ptr();
...
https://github.com/openxla/xla/blob/19a8e8e05fb34c5c4b8c38c9a8225e89f008c8c1/xla/python/pmap_lib.cc#L1027-L1058
In other words, the heap type is created by by calling PyType_Type.tp_alloc and filling in the fields, instead of the more common use of PyType_FromSpec. This leaves unique_id zero initialized. The problem is that unique_id=0 currently looks like a valid unique id for per-thread reference counting, which leads to reference counting errors and use-after-frees.
I think we should change the per-thread reference counting so that unique_id=0 is the sentinel value indicating that it's not assigned instead of the current unique_id=-1 convention.
Full repro
Linked PRs
Bug report
Found by @vfdev-5.
This is specific to the free threading build and 3.14.
XLA/Jax uses the following code to create a heap type:
https://github.com/openxla/xla/blob/19a8e8e05fb34c5c4b8c38c9a8225e89f008c8c1/xla/python/pmap_lib.cc#L1027-L1058
In other words, the heap type is created by by calling
PyType_Type.tp_allocand filling in the fields, instead of the more common use ofPyType_FromSpec. This leavesunique_idzero initialized. The problem is thatunique_id=0currently looks like a valid unique id for per-thread reference counting, which leads to reference counting errors and use-after-frees.I think we should change the per-thread reference counting so that
unique_id=0is the sentinel value indicating that it's not assigned instead of the currentunique_id=-1convention.Full repro
Linked PRs
__module__#128951