Skip to content

feat: add progressive_set_intersection to disjoint_set#14492

Open
Devanik21 wants to merge 4 commits intoTheAlgorithms:masterfrom
Devanik21:master
Open

feat: add progressive_set_intersection to disjoint_set#14492
Devanik21 wants to merge 4 commits intoTheAlgorithms:masterfrom
Devanik21:master

Conversation

@Devanik21
Copy link
Copy Markdown

@Devanik21 Devanik21 commented Apr 1, 2026

Checklist

  • I have read the CONTRIBUTING.md.
  • I have performed a self-review of my own code.
  • My code follows the style guidelines of this project.
  • I have added tests (doctests) for my changes.
  • All new and existing tests passed.
  • I have added the algorithm to the correct folder (disjoint_set).

Description

This PR adds progressive_set_intersection() to data_structures/disjoint_set/.

What this algorithm solves

Python's built-in set.intersection(*others) is already highly optimized in C.
However, when intersecting many sets (50–100+) or dealing with highly imbalanced sizes (one small set vs several large sets with millions of elements), we can improve practical performance by:

  • Sorting all input sets by size (smallest first)
  • Starting with a copy of the smallest set
  • Progressively intersecting while pruning early when the result becomes empty

This implements the "smallest-first + progressive pruning" heuristic discussed in #14368.

Why add it?

  • Strong educational value — clearly demonstrates an important optimization pattern for multi-set operations.
  • Pure Python with zero external dependencies.
  • Works with any hashable elements.
  • Comprehensive doctests included.
  • Handles all edge cases gracefully.

Note: For most everyday use cases, Python’s built-in set.intersection() remains the best choice. This module is mainly for learning and teaching the pruning technique.

Complexity

  • Time complexity: Worst case ≈ O(min_size × k), but much faster in practice due to early pruning.
  • Space complexity: O(size of the smallest set)

Related Issue

Closes #14368

Example Usage

from data_structures.disjoint_set.progressive_set_intersection import progressive_set_intersection

s1 = {1, 2, 3, 4}
s2 = {2, 3, 5, 6}
s3 = {2, 3, 7}

result = progressive_set_intersection(s1, s2, s3)
print(result)  # Output: {2, 3}

Thanks to @Starglen and @dinakars777 for the discussion in #14368.
Happy to add more variants (sorted array intersection or bitmap approach) in a follow-up PR if needed.

Devanik21 and others added 3 commits April 1, 2026 13:39
This function computes the intersection of multiple sets efficiently by sorting them by size and using early termination.
@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant