mirror of
https://git.adityakumar.xyz/dsa.git
synced 2024-11-21 09:42:52 +00:00
add disjoint set
This commit is contained in:
parent
91eb606dca
commit
29d013d705
1 changed files with 467 additions and 0 deletions
467
content/docs/dsa/set.md
Normal file
467
content/docs/dsa/set.md
Normal file
|
@ -0,0 +1,467 @@
|
||||||
|
---
|
||||||
|
title: "Set"
|
||||||
|
weight: 1
|
||||||
|
# bookFlatSection: false
|
||||||
|
# bookToc: true
|
||||||
|
# bookHidden: false
|
||||||
|
# bookCollapseSection: false
|
||||||
|
# bookComments: false
|
||||||
|
# bookSearchExclude: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# Set
|
||||||
|
|
||||||
|
A Set is a fundamental data structure available in many programming languages, which stores unique
|
||||||
|
elements. The primary characteristic of a Set is that it contains no duplicates; each element can
|
||||||
|
appear only once. Sets are particularly useful when you want to keep track of a collection of
|
||||||
|
elements without worrying about their order or occurrence count. Here's an overview of the key
|
||||||
|
aspects and operations associated with Sets:
|
||||||
|
|
||||||
|
## Basic Definition and Properties
|
||||||
|
|
||||||
|
- **Uniqueness**: A Set is defined by its unique members, ensuring no duplicates are present within
|
||||||
|
it. This makes checking for membership (whether a particular element exists in the set) efficient
|
||||||
|
compared to data structures that allow duplicates, like lists or arrays.
|
||||||
|
- **Ordering**: The order of elements in a Set can vary based on the implementation and whether
|
||||||
|
you're working with an unordered collection (like Python's `set` or Java's `HashSet`) versus an
|
||||||
|
ordered one (e.g., Python's `frozenset`, which is actually just a frozen set, but lacks methods that
|
||||||
|
modify its content).
|
||||||
|
- **Dynamic Size**: Sets can grow and shrink dynamically as elements are added and removed, though
|
||||||
|
their performance characteristics depend on the underlying implementation.
|
||||||
|
|
||||||
|
## Disjoint Set
|
||||||
|
|
||||||
|
The Disjoint Set data structure, also known as Union-Find or Merge-Find Set, is a powerful abstract
|
||||||
|
data type that allows you to efficiently manage and query the connected components of a graph. It
|
||||||
|
supports two primary operations: **Union** (combining sets) and **Find** (determining which set an
|
||||||
|
element belongs to), both having efficient implementations.
|
||||||
|
|
||||||
|
### Key Properties and Operations:
|
||||||
|
|
||||||
|
1. **Disjoint Sets**: Each disjoint set consists of elements partitioned into non-overlapping
|
||||||
|
subsets, ensuring that no two elements in the same subset are connected by a path.
|
||||||
|
2. **Union Operation**: This operation merges two distinct sets into one. It's typically implemented
|
||||||
|
with careful consideration to maintain optimal time complexity (usually O(log n) for both insertions
|
||||||
|
and unions).
|
||||||
|
3. **Find Operation**: Determines the representative element of a set in which an item belongs,
|
||||||
|
usually through path compression techniques that flatten the structure of the tree representing
|
||||||
|
sets, achieving nearly constant-time operations.
|
||||||
|
|
||||||
|
### Implementation Details:
|
||||||
|
|
||||||
|
The Disjoint Set Data Structure can be implemented using two main approaches: **Weighted Quick Union
|
||||||
|
(WQU)** and **Quick Find** for union operation, along with path compression optimization in both.
|
||||||
|
For the find operation, there are also variations like **Lazy Union** and **Path Compression** that
|
||||||
|
further optimize performance.
|
||||||
|
|
||||||
|
### Weighted Quick Union (WQU):
|
||||||
|
|
||||||
|
In WQU, each set is represented by a tree where elements point to their parents, with trees of
|
||||||
|
different sizes linked together in a specific order during the union operation to keep the depth of
|
||||||
|
the trees as balanced as possible. The find operation traverses up the parent pointers until it
|
||||||
|
finds the root of an element's set (the representative).
|
||||||
|
|
||||||
|
### Quick Find:
|
||||||
|
|
||||||
|
Quick Find is simpler but less efficient for larger datasets due to its O(n) time complexity for
|
||||||
|
both union and find operations, where n is the number of elements. Each element points directly to a
|
||||||
|
set representative. However, this method provides constant-time performance for the find operation
|
||||||
|
but not for unions or dynamic insertion.
|
||||||
|
|
||||||
|
### Path Compression:
|
||||||
|
|
||||||
|
Path compression optimizes the efficiency of both operations by making every visited node in the
|
||||||
|
find operation point directly to the root when found. This significantly reduces the height of trees
|
||||||
|
over time, leading to nearly constant-time performance even for subsequent operations.
|
||||||
|
|
||||||
|
### Applications:
|
||||||
|
|
||||||
|
Disjoint Set Data Structures are widely used in computer science applications requiring efficient
|
||||||
|
management of disconnected components, including network connectivity problems, Kruskal's algorithm
|
||||||
|
for finding a minimum spanning tree (MST) of a graph, and cycle detection in graphs.
|
||||||
|
|
||||||
|
## Algorithm
|
||||||
|
|
||||||
|
Here's an algorithm for implementing disjoint set.
|
||||||
|
|
||||||
|
### Algorithm for Disjoint Set with Path Compression
|
||||||
|
|
||||||
|
1. **Initialization**: Start by representing each element as a node, where the parent of each node
|
||||||
|
is itself initially (indicating that they are their own sets). This can be implemented using an
|
||||||
|
array `parent[]` where `parent[i] = i` for all elements from 0 to N-1.
|
||||||
|
|
||||||
|
2. **Find**: To find which set a particular element belongs to, follow these steps:
|
||||||
|
|
||||||
|
- Start at the node corresponding to the given element's index (element).
|
||||||
|
- If this node is its own parent, it's the representative of its set, and you can return this
|
||||||
|
value directly.
|
||||||
|
- Otherwise, recursively or iteratively traverse up through the parents until you reach an
|
||||||
|
element that points to itself. This path represents a sequence from the given element back to the
|
||||||
|
root of its set (the set's representative). - To optimize future Find operations, apply Path Compression: after finding the representative,
|
||||||
|
make every node on this path point directly to the representative by updating each node's parent
|
||||||
|
pointer to the representative. This step significantly speeds up subsequent Find operations for
|
||||||
|
these nodes and any others connected through them.
|
||||||
|
|
||||||
|
3. **Union**: To merge two disjoint sets into a single set, execute the following steps:
|
||||||
|
- Perform Find on both elements (A and B) to find their respective representatives (roots). Let's
|
||||||
|
say `rootA` is the root of A's set and `rootB` is the root of B's set.
|
||||||
|
- If they are already in the same set, no action is needed. However, if they are different sets
|
||||||
|
(i.e., their roots are not equal), make one representative point to the other by setting the parent
|
||||||
|
of `rootA` or `rootB` to be the other root. This unites the two sets into a single set.
|
||||||
|
- Optionally, apply Path Compression again during this operation for all nodes found in either
|
||||||
|
path (including those from previous Union operations) as they may need to update their pointers
|
||||||
|
directly to the new representative.
|
||||||
|
|
||||||
|
### Pseudocode
|
||||||
|
|
||||||
|
```
|
||||||
|
SimpleUnion(i, j) {
|
||||||
|
p[i] = j;
|
||||||
|
}
|
||||||
|
|
||||||
|
SimpleFind(i) {
|
||||||
|
while (p[i] >= 0) do
|
||||||
|
i = p[i];
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
|
||||||
|
WeightedUnion(i, j) {
|
||||||
|
// Union sets with roots i and j. i != j, using the weighting rule
|
||||||
|
// p[i] = -count[i] and p[j] = -count[j]
|
||||||
|
temp = p[i] + p[j];
|
||||||
|
if (p[i] > p[j]) then
|
||||||
|
// i has fewer nodes
|
||||||
|
p[i] j;
|
||||||
|
p[j] = temp;
|
||||||
|
else
|
||||||
|
// j has fewer or equal nodes
|
||||||
|
p[j] = i;
|
||||||
|
p[i] = temp;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Code
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
import <algorithm>;
|
||||||
|
import <numeric>;
|
||||||
|
import <print>;
|
||||||
|
import <ranges>;
|
||||||
|
import <vector>;
|
||||||
|
|
||||||
|
struct Set {
|
||||||
|
std::vector<ssize_t> parent{};
|
||||||
|
std::vector<ssize_t> rank{};
|
||||||
|
|
||||||
|
constexpr Set(const ssize_t &size) {
|
||||||
|
parent.resize(size);
|
||||||
|
rank.resize(size);
|
||||||
|
|
||||||
|
std::iota(parent.begin(), parent.end(), 0);
|
||||||
|
std::ranges::fill(rank, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
constexpr auto find(const ssize_t &node) -> ssize_t {
|
||||||
|
if (node == parent.at(node))
|
||||||
|
return node;
|
||||||
|
|
||||||
|
return parent.at(node) = find(parent.at(node));
|
||||||
|
}
|
||||||
|
|
||||||
|
constexpr auto union_set(ssize_t u, ssize_t v) -> void {
|
||||||
|
u = find(u);
|
||||||
|
v = find(v);
|
||||||
|
|
||||||
|
if (u != v) {
|
||||||
|
if (rank.at(u) < rank.at(v))
|
||||||
|
std::swap(u, v);
|
||||||
|
parent.at(v) = u;
|
||||||
|
if (rank.at(u) == rank.at(v))
|
||||||
|
++rank[u];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
int main() {
|
||||||
|
const ssize_t size{5};
|
||||||
|
Set disjoint_set(size);
|
||||||
|
|
||||||
|
disjoint_set.union_set(0, 1);
|
||||||
|
disjoint_set.union_set(1, 2);
|
||||||
|
disjoint_set.union_set(3, 4);
|
||||||
|
|
||||||
|
for (ssize_t i : std::ranges::iota_view{0, size})
|
||||||
|
std::println("Find({}):{}", i, disjoint_set.find(i));
|
||||||
|
|
||||||
|
std::print("Parent array: ");
|
||||||
|
for (ssize_t i : std::ranges::iota_view{0, size})
|
||||||
|
std::print("{} ", disjoint_set.parent[i]);
|
||||||
|
std::print("\n");
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Explanation
|
||||||
|
|
||||||
|
#### 1. **Struct Definition**
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
struct Set {
|
||||||
|
std::vector<ssize_t> parent{};
|
||||||
|
std::vector<ssize_t> rank{};
|
||||||
|
```
|
||||||
|
|
||||||
|
- `struct Set` defines a new struct type named `Set`.
|
||||||
|
- Inside the struct, two member variables are declared:
|
||||||
|
- `std::vector<ssize_t> parent{}`: This is a vector that will hold the parent of each element. It is used to keep track of the representatives (or roots) of each subset.
|
||||||
|
- `std::vector<ssize_t> rank{}`: This is a vector that will hold the rank (or depth) of each element. It is used to keep the tree flat by attaching smaller trees under the root of larger trees.
|
||||||
|
|
||||||
|
#### 2. **Constructor**
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
constexpr Set(const ssize_t &size) {
|
||||||
|
parent.resize(size);
|
||||||
|
rank.resize(size);
|
||||||
|
|
||||||
|
std::iota(parent.begin(), parent.end(), 0);
|
||||||
|
std::ranges::fill(rank, 0);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `constexpr Set(const ssize_t &size)` is a constructor that initializes a `Set` instance with a given size.
|
||||||
|
- `parent.resize(size);` and `rank.resize(size);` resize the `parent` and `rank` vectors to the given size, initializing them to hold `size` elements.
|
||||||
|
- `std::iota(parent.begin(), parent.end(), 0);` initializes the parent vector such that each element is its own parent. `std::iota` is a standard algorithm that fills the range with sequentially increasing values starting from 0. After this, `parent[i] == i` for all `i` in the range `[0, size)`.
|
||||||
|
- `std::ranges::fill(rank, 0);` sets all elements in the rank vector to 0. `std::ranges::fill` is a standard algorithm that assigns the value 0 to each element in the rank vector.
|
||||||
|
|
||||||
|
#### 3. **`find()` Method**
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
constexpr auto find(const ssize_t &node) -> ssize_t {
|
||||||
|
// If the node is its own parent, it is the root of its set
|
||||||
|
if (node == parent.at(node))
|
||||||
|
return node;
|
||||||
|
|
||||||
|
// Path compression: recursively find the root and update the parent
|
||||||
|
return parent.at(node) = find(parent.at(node));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Method Definition
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
constexpr auto find(const ssize_t &node) -> ssize_t {
|
||||||
|
```
|
||||||
|
|
||||||
|
- `constexpr auto find(const ssize_t &node) -> ssize_t`: This defines a constexpr method named `find` that takes a single parameter `node` of type `ssize_t` and returns a value of type `ssize_t`.
|
||||||
|
|
||||||
|
##### Method Body
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
if (node == parent.at(node))
|
||||||
|
return node;
|
||||||
|
```
|
||||||
|
|
||||||
|
- The method checks if `node` is its own parent, i.e., if `node` is the root of its set. This is done using `parent.at(node)`, which accesses the element at index `node` in the `parent` vector with bounds checking (thanks to the `.at()` method).
|
||||||
|
- If `node` is its own parent, it means `node` is the representative of its set, and the method returns `node`.
|
||||||
|
|
||||||
|
##### Path Compression
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
return parent.at(node) = find(parent.at(node));
|
||||||
|
```
|
||||||
|
|
||||||
|
- If `node` is not its own parent, the method recursively calls `find on parent.at(node)`, which finds the root of `node`'s set.
|
||||||
|
- The result of the recursive `find` call is then assigned back to `parent.at(node)`. This step is the path compression optimization: it makes each `node` on the path from node to the root point directly to the root. This flattens the structure of the tree, reducing the time complexity of future `find` operations.
|
||||||
|
- Finally, the method returns the root of the set containing `node`.
|
||||||
|
|
||||||
|
#### 4. **`union_set()` Method**
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
constexpr auto union_set(ssize_t u, ssize_t v) -> void {
|
||||||
|
// Find the roots of the sets containing u and v
|
||||||
|
u = find(u);
|
||||||
|
v = find(v);
|
||||||
|
|
||||||
|
// If u and v are in different sets, merge them
|
||||||
|
if (u != v) {
|
||||||
|
// Union by rank: ensure the higher rank tree remains the root
|
||||||
|
if (rank.at(u) < rank.at(v))
|
||||||
|
std::swap(u, v);
|
||||||
|
|
||||||
|
// Make u the parent of v
|
||||||
|
parent.at(v) = u;
|
||||||
|
|
||||||
|
// If ranks were equal, increment the rank of the new root
|
||||||
|
if (rank.at(u) == rank.at(v))
|
||||||
|
++rank[u];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Method Definition
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
constexpr auto union_set(ssize_t u, ssize_t v) -> void {
|
||||||
|
```
|
||||||
|
|
||||||
|
- `constexpr auto union_set(ssize_t u, ssize_t v) -> void`: This defines a constexpr method named `union_set` that takes two parameters `u` and `v` of type `ssize_t` and returns `void`.
|
||||||
|
|
||||||
|
##### Finding the Roots
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
u = find(u);
|
||||||
|
v = find(v);
|
||||||
|
```
|
||||||
|
|
||||||
|
- The method starts by finding the roots of the sets containing `u` and `v`. This is done using the `find` method previously defined. After this step, `u` and `v` are the representatives (roots) of their respective sets.
|
||||||
|
|
||||||
|
##### Checking if Already Unified
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
if (u != v) {
|
||||||
|
```
|
||||||
|
|
||||||
|
- The condition checks if the roots `u` and `v` are different. If they are the same, `u` and `v` are already in the same set, and no union operation is needed.
|
||||||
|
|
||||||
|
##### Union by Rank
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
if (rank.at(u) < rank.at(v))
|
||||||
|
std::swap(u, v);
|
||||||
|
```
|
||||||
|
|
||||||
|
- If `u` and `v` are different, the method performs union by rank. It compares the ranks of the roots `u` and `v`.
|
||||||
|
- If `rank[u] < rank[v]`, it swaps `u` and `v` to ensure that `u` has the higher rank. This keeps the tree shallower by attaching the smaller tree under the root of the larger tree.
|
||||||
|
|
||||||
|
##### Merging the Sets
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
parent.at(v) = u;
|
||||||
|
```
|
||||||
|
|
||||||
|
- The method sets `parent[v]` to `u`, effectively making `u` the parent of `v`. This merges the set containing `v` into the set containing `u`.
|
||||||
|
|
||||||
|
##### Updating the rank
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
if (rank.at(u) == rank.at(v))
|
||||||
|
++rank[u];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- If the ranks of `u` and `v` were equal, the rank of the new root `u` is incremented by 1. This is because the depth of the tree increases when two trees of the same rank are merged.
|
||||||
|
|
||||||
|
#### 5. **`main()` Method**
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
int main() {
|
||||||
|
// Define the size of the disjoint set
|
||||||
|
const ssize_t size{5};
|
||||||
|
// Create an instance of Set with the specified size
|
||||||
|
Set disjoint_set(size);
|
||||||
|
|
||||||
|
// Perform union operations
|
||||||
|
disjoint_set.union_set(0, 1);
|
||||||
|
disjoint_set.union_set(1, 2);
|
||||||
|
disjoint_set.union_set(3, 4);
|
||||||
|
|
||||||
|
// Print the results of find operations for each element
|
||||||
|
for (ssize_t i : std::ranges::iota_view{0, size})
|
||||||
|
std::println("Find({}):{}", i, disjoint_set.find(i));
|
||||||
|
|
||||||
|
// Print the parent array
|
||||||
|
std::print("Parent array: ");
|
||||||
|
for (ssize_t i : std::ranges::iota_view{0, size})
|
||||||
|
std::print("{} ", disjoint_set.parent[i]);
|
||||||
|
std::print("\n");
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Performing Union Operations
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
disjoint_set.union_set(0, 1);
|
||||||
|
disjoint_set.union_set(1, 2);
|
||||||
|
disjoint_set.union_set(3, 4);
|
||||||
|
```
|
||||||
|
|
||||||
|
- `disjoint_set.union_set(0, 1);` merges the sets containing elements 0 and 1.
|
||||||
|
- `disjoint_set.union_set(1, 2);` merges the sets containing elements 1 and 2. Since 1 is already united with 0, this effectively unites elements 0, 1, and 2 into a single set.
|
||||||
|
- `disjoint_set.union_set(3, 4);` merges the sets containing elements 3 and 4.
|
||||||
|
|
||||||
|
##### Printing the Results of Find Operations
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
for (ssize_t i : std::ranges::iota_view{0, size})
|
||||||
|
std::println("Find({}):{}", i, disjoint_set.find(i));
|
||||||
|
```
|
||||||
|
|
||||||
|
- This loop iterates over the `range [0, size)` using `std::ranges::iota_view{0, size}`.
|
||||||
|
- For each element `i`, it calls `disjoint_set.find(i)` to find the representative (root) of the set containing `i`.
|
||||||
|
- `std::println("Find({}):{}", i, disjoint_set.find(i));` prints the result in the format `Find(i):root`, where root is the representative of the set containing `i`.
|
||||||
|
|
||||||
|
##### Printing the Parent Array
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
std::print("Parent array: ");
|
||||||
|
for (ssize_t i : std::ranges::iota_view{0, size})
|
||||||
|
std::print("{} ", disjoint_set.parent[i]);
|
||||||
|
std::print("\n");
|
||||||
|
```
|
||||||
|
|
||||||
|
- `std::print("Parent array: ");` prints a label for the parent array.
|
||||||
|
This loop iterates over the range `[0, size)` using `std::ranges::iota_view{0, size}`.
|
||||||
|
- For each element `i`, it prints the value of `disjoint_set.parent[i]` followed by a space.
|
||||||
|
- `std::print("\n");` prints a newline character to end the line.
|
||||||
|
|
||||||
|
### Output
|
||||||
|
|
||||||
|
```console
|
||||||
|
❯ ./main
|
||||||
|
Find(0):0
|
||||||
|
Find(1):0
|
||||||
|
Find(2):0
|
||||||
|
Find(3):3
|
||||||
|
Find(4):3
|
||||||
|
Parent array: 0 0 0 3 3
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Explanation
|
||||||
|
|
||||||
|
1. **Initialization**:
|
||||||
|
|
||||||
|
- parent array: [0, 1, 2, 3, 4]
|
||||||
|
- rank array: [0, 0, 0, 0, 0]
|
||||||
|
|
||||||
|
2. **Union Operations**:
|
||||||
|
|
||||||
|
- `union_set(0, 1)`:
|
||||||
|
|
||||||
|
- `find(0)` returns 0.
|
||||||
|
- `find(1)` returns 1.
|
||||||
|
- `rank[0] == rank[1]`, so `parent[1]` is set to 0 and `rank[0]` is incremented.
|
||||||
|
- `parent` array: [0, 0, 2, 3, 4]
|
||||||
|
- `rank` array: [1, 0, 0, 0, 0]
|
||||||
|
|
||||||
|
- `union_set(1, 2)`:
|
||||||
|
|
||||||
|
- `find(1)` returns 0 (since `parent[1]` is 0).
|
||||||
|
- `find(2)` returns 2.
|
||||||
|
- `rank[0] > rank[2]`, so `parent[2]` is set to 0.
|
||||||
|
- `parent` array: [0, 0, 0, 3, 4]
|
||||||
|
- `rank` array: [1, 0, 0, 0, 0]
|
||||||
|
|
||||||
|
- `union_set(3, 4)`:
|
||||||
|
- `find(3)` returns 3.
|
||||||
|
- `find(4)` returns 4.
|
||||||
|
- `rank[3] == rank[4]`, so `parent[4]` is set to 3 and `rank[3]` is incremented.
|
||||||
|
- `parent` array: [0, 0, 0, 3, 3]
|
||||||
|
- `rank` array: [1, 0, 0, 1, 0]
|
||||||
|
|
||||||
|
3. **Find Operations**:
|
||||||
|
|
||||||
|
- `find(0)` returns 0.
|
||||||
|
- `find(1)` returns 0 (since `parent[1]` is 0).
|
||||||
|
- `find(2)` returns 0 (since `parent[2]` is 0).
|
||||||
|
- `find(3)` returns 3.
|
||||||
|
- `find(4)` returns 3 (since `parent[4]` is 3).
|
Loading…
Reference in a new issue