Skip to content

Expire snapshot mutability issue#2430

Merged
Fokko merged 6 commits intoapache:mainfrom
ForeverAngry:expire-snapshot-thread-safety-issue-2409
Sep 26, 2025
Merged

Expire snapshot mutability issue#2430
Fokko merged 6 commits intoapache:mainfrom
ForeverAngry:expire-snapshot-thread-safety-issue-2409

Conversation

@ForeverAngry
Copy link
Copy Markdown
Contributor

@ForeverAngry ForeverAngry commented Sep 4, 2025

Closes #2409, and partially closes #2427

Rationale for this change

This PR fixes a critical thread safety issue in the ExpireSnapshots class where concurrent snapshot expiration operations on different tables would share snapshot IDs, causing operations to fail with "snapshot does not exist" errors.

Root Cause:
The ExpireSnapshots class had class-level attributes (_snapshot_ids_to_expire, _updates, _requirements) that were shared across all instances. When multiple threads created different ExpireSnapshots instances, they all shared the same underlying set() object for tracking snapshot IDs.

Impact:

  • Thread 1: table1.expire_snapshots().by_id(1001) adds 1001 to shared set
  • Thread 2: table2.expire_snapshots().by_id(2001) adds 2001 to same shared set
  • Result: Both threads have {1001, 2001} and try to expire snapshot 1001 from table2, causing failure

Solution:
Moved the shared class-level attributes to instance-level attributes in the __init__ method, ensuring each ExpireSnapshots instance has its own isolated state.

Are these changes tested?

📢 🔥 Big shout-out to @QlikFrederic, as the testing methodology was largely derived from the testing and analysis done by the user! 🔥 📢

Yes, comprehensive test coverage has been added:

  • test_thread_safety_fix() - Verifies that different ExpireSnapshots instances have separate snapshot sets
  • test_concurrent_operations() - Tests concurrent operations don't contaminate each other
  • test_concurrent_different_tables_expiration() - Reproduces the exact scenario from GitHub issue commit on expire_snapshot tries to remove snapshot from wrong table. #2409
  • test_concurrent_same_table_different_snapshots() - Tests concurrent operations on the same table
  • test_cross_table_snapshot_id_isolation() - Validates no cross-contamination of snapshot IDs between tables
  • test_batch_expire_snapshots() - Tests batch expiration operations in threaded environments

All existing tests continue to pass, ensuring no regression in functionality.

Are there any user-facing changes?

No breaking changes. The public API remains identical:

  • All existing ExpireSnapshots methods work the same way
  • Method signatures are unchanged
  • Behavior is identical except for the thread safety fix

Behavioral improvement:

  • Concurrent expire_snapshots() operations on different tables now work correctly
  • No more "snapshot does not exist" errors when using ExpireSnapshots in multi-threaded environments

This is a pure bug fix with no user-facing API changes.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

check for class with mutable state as class attributes commit on expire_snapshot tries to remove snapshot from wrong table.

6 participants