Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2409, and partially closes #2427
Rationale for this change
This PR fixes a critical thread safety issue in the
ExpireSnapshotsclass where concurrent snapshot expiration operations on different tables would share snapshot IDs, causing operations to fail with "snapshot does not exist" errors.Root Cause:
The
ExpireSnapshotsclass had class-level attributes (_snapshot_ids_to_expire,_updates,_requirements) that were shared across all instances. When multiple threads created differentExpireSnapshotsinstances, they all shared the same underlyingset()object for tracking snapshot IDs.Impact:
table1.expire_snapshots().by_id(1001)adds1001to shared settable2.expire_snapshots().by_id(2001)adds2001to same shared set{1001, 2001}and try to expire snapshot1001fromtable2, causing failureSolution:
Moved the shared class-level attributes to instance-level attributes in the
__init__method, ensuring eachExpireSnapshotsinstance has its own isolated state.Are these changes tested?
Yes, comprehensive test coverage has been added:
test_thread_safety_fix()- Verifies that different ExpireSnapshots instances have separate snapshot setstest_concurrent_operations()- Tests concurrent operations don't contaminate each othertest_concurrent_different_tables_expiration()- Reproduces the exact scenario from GitHub issue commit on expire_snapshot tries to remove snapshot from wrong table. #2409test_concurrent_same_table_different_snapshots()- Tests concurrent operations on the same tabletest_cross_table_snapshot_id_isolation()- Validates no cross-contamination of snapshot IDs between tablestest_batch_expire_snapshots()- Tests batch expiration operations in threaded environmentsAll existing tests continue to pass, ensuring no regression in functionality.
Are there any user-facing changes?
No breaking changes. The public API remains identical:
ExpireSnapshotsmethods work the same wayBehavioral improvement:
expire_snapshots()operations on different tables now work correctlyThis is a pure bug fix with no user-facing API changes.