Hash Sets
A hash set is the simplest answer to "have I seen this before?" It stores unique elements and answers membership questions in O(1) time. No duplicates, no values, just pure existence checking.
Why Sets?
Sometimes you don't need to store values - you just need to know if something exists. That's where sets shine. They're like a guest list at a party: either you're on it, or you're not.
Sets automatically handle duplicates for you. Add the same element twice? It only appears once. This makes them perfect for finding unique elements.
Use a set when you care about "is it there?" but not "what's its value?" If you need key-value pairs, reach for a hash map instead.
Time Complexity
| Operation | Average |
|---|---|
Check membership (in) |
O(1) |
| Add element | O(1) |
| Remove element | O(1) |
The Classic Pattern
# Check for duplicates
seen = set()
for item in items:
if item in seen:
return True # Duplicate found!
seen.add(item)
return False
Real Example: Contains Duplicate
Given an array, return true if any value appears at least twice. This is the most straightforward set problem:
def containsDuplicate(nums):
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
# Or even simpler:
def containsDuplicate(nums):
return len(nums) != len(set(nums))
When to Use a Hash Set
- Duplicate detection - Have I seen this element before?
- Finding unique elements - Remove duplicates from a list
- Set operations - Union, intersection, difference
- Fast lookup - When you only need yes/no answers
- Filtering - Check if element belongs to a valid set
Powerful Set Operations
Python sets support mathematical operations that can simplify your code:
a = {1, 2, 3}
b = {2, 3, 4}
a | b # Union: {1, 2, 3, 4}
a & b # Intersection: {2, 3}
a - b # Difference: {1}
a ^ b # Symmetric diff: {1, 4}
Converting a list to a set and back is the fastest way to remove duplicates: list(set(items)). Just note that order isn't preserved.
Set vs List for Lookup
Checking x in my_list is O(n) - it scans every element. But x in my_set is O(1). When you're doing many lookups, this difference is huge.