From 29d013d70554571cb19eebc6d3ac2b94c7663ec5 Mon Sep 17 00:00:00 2001 From: Aditya Date: Tue, 25 Jun 2024 16:10:41 +0530 Subject: [PATCH] add disjoint set --- content/docs/dsa/set.md | 467 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 467 insertions(+) create mode 100644 content/docs/dsa/set.md diff --git a/content/docs/dsa/set.md b/content/docs/dsa/set.md new file mode 100644 index 0000000..861a7b2 --- /dev/null +++ b/content/docs/dsa/set.md @@ -0,0 +1,467 @@ +--- +title: "Set" +weight: 1 +# bookFlatSection: false +# bookToc: true +# bookHidden: false +# bookCollapseSection: false +# bookComments: false +# bookSearchExclude: false +--- + +# Set + +A Set is a fundamental data structure available in many programming languages, which stores unique +elements. The primary characteristic of a Set is that it contains no duplicates; each element can +appear only once. Sets are particularly useful when you want to keep track of a collection of +elements without worrying about their order or occurrence count. Here's an overview of the key +aspects and operations associated with Sets: + +## Basic Definition and Properties + +- **Uniqueness**: A Set is defined by its unique members, ensuring no duplicates are present within + it. This makes checking for membership (whether a particular element exists in the set) efficient + compared to data structures that allow duplicates, like lists or arrays. +- **Ordering**: The order of elements in a Set can vary based on the implementation and whether + you're working with an unordered collection (like Python's `set` or Java's `HashSet`) versus an + ordered one (e.g., Python's `frozenset`, which is actually just a frozen set, but lacks methods that + modify its content). +- **Dynamic Size**: Sets can grow and shrink dynamically as elements are added and removed, though + their performance characteristics depend on the underlying implementation. + +## Disjoint Set + +The Disjoint Set data structure, also known as Union-Find or Merge-Find Set, is a powerful abstract +data type that allows you to efficiently manage and query the connected components of a graph. It +supports two primary operations: **Union** (combining sets) and **Find** (determining which set an +element belongs to), both having efficient implementations. + +### Key Properties and Operations: + +1. **Disjoint Sets**: Each disjoint set consists of elements partitioned into non-overlapping + subsets, ensuring that no two elements in the same subset are connected by a path. +2. **Union Operation**: This operation merges two distinct sets into one. It's typically implemented + with careful consideration to maintain optimal time complexity (usually O(log n) for both insertions + and unions). +3. **Find Operation**: Determines the representative element of a set in which an item belongs, + usually through path compression techniques that flatten the structure of the tree representing + sets, achieving nearly constant-time operations. + +### Implementation Details: + +The Disjoint Set Data Structure can be implemented using two main approaches: **Weighted Quick Union +(WQU)** and **Quick Find** for union operation, along with path compression optimization in both. +For the find operation, there are also variations like **Lazy Union** and **Path Compression** that +further optimize performance. + +### Weighted Quick Union (WQU): + +In WQU, each set is represented by a tree where elements point to their parents, with trees of +different sizes linked together in a specific order during the union operation to keep the depth of +the trees as balanced as possible. The find operation traverses up the parent pointers until it +finds the root of an element's set (the representative). + +### Quick Find: + +Quick Find is simpler but less efficient for larger datasets due to its O(n) time complexity for +both union and find operations, where n is the number of elements. Each element points directly to a +set representative. However, this method provides constant-time performance for the find operation +but not for unions or dynamic insertion. + +### Path Compression: + +Path compression optimizes the efficiency of both operations by making every visited node in the +find operation point directly to the root when found. This significantly reduces the height of trees +over time, leading to nearly constant-time performance even for subsequent operations. + +### Applications: + +Disjoint Set Data Structures are widely used in computer science applications requiring efficient +management of disconnected components, including network connectivity problems, Kruskal's algorithm +for finding a minimum spanning tree (MST) of a graph, and cycle detection in graphs. + +## Algorithm + +Here's an algorithm for implementing disjoint set. + +### Algorithm for Disjoint Set with Path Compression + +1. **Initialization**: Start by representing each element as a node, where the parent of each node + is itself initially (indicating that they are their own sets). This can be implemented using an + array `parent[]` where `parent[i] = i` for all elements from 0 to N-1. + +2. **Find**: To find which set a particular element belongs to, follow these steps: + + - Start at the node corresponding to the given element's index (element). + - If this node is its own parent, it's the representative of its set, and you can return this + value directly. + - Otherwise, recursively or iteratively traverse up through the parents until you reach an + element that points to itself. This path represents a sequence from the given element back to the + root of its set (the set's representative). - To optimize future Find operations, apply Path Compression: after finding the representative, + make every node on this path point directly to the representative by updating each node's parent + pointer to the representative. This step significantly speeds up subsequent Find operations for + these nodes and any others connected through them. + +3. **Union**: To merge two disjoint sets into a single set, execute the following steps: + - Perform Find on both elements (A and B) to find their respective representatives (roots). Let's + say `rootA` is the root of A's set and `rootB` is the root of B's set. + - If they are already in the same set, no action is needed. However, if they are different sets + (i.e., their roots are not equal), make one representative point to the other by setting the parent + of `rootA` or `rootB` to be the other root. This unites the two sets into a single set. + - Optionally, apply Path Compression again during this operation for all nodes found in either + path (including those from previous Union operations) as they may need to update their pointers + directly to the new representative. + +### Pseudocode + +``` +SimpleUnion(i, j) { + p[i] = j; +} + +SimpleFind(i) { + while (p[i] >= 0) do + i = p[i]; + return i; +} + +WeightedUnion(i, j) { + // Union sets with roots i and j. i != j, using the weighting rule + // p[i] = -count[i] and p[j] = -count[j] + temp = p[i] + p[j]; + if (p[i] > p[j]) then + // i has fewer nodes + p[i] j; + p[j] = temp; + else + // j has fewer or equal nodes + p[j] = i; + p[i] = temp; +} +``` + +## Code + +```cpp +import ; +import ; +import ; +import ; +import ; + +struct Set { + std::vector parent{}; + std::vector rank{}; + + constexpr Set(const ssize_t &size) { + parent.resize(size); + rank.resize(size); + + std::iota(parent.begin(), parent.end(), 0); + std::ranges::fill(rank, 0); + } + + constexpr auto find(const ssize_t &node) -> ssize_t { + if (node == parent.at(node)) + return node; + + return parent.at(node) = find(parent.at(node)); + } + + constexpr auto union_set(ssize_t u, ssize_t v) -> void { + u = find(u); + v = find(v); + + if (u != v) { + if (rank.at(u) < rank.at(v)) + std::swap(u, v); + parent.at(v) = u; + if (rank.at(u) == rank.at(v)) + ++rank[u]; + } + } +}; +int main() { + const ssize_t size{5}; + Set disjoint_set(size); + + disjoint_set.union_set(0, 1); + disjoint_set.union_set(1, 2); + disjoint_set.union_set(3, 4); + + for (ssize_t i : std::ranges::iota_view{0, size}) + std::println("Find({}):{}", i, disjoint_set.find(i)); + + std::print("Parent array: "); + for (ssize_t i : std::ranges::iota_view{0, size}) + std::print("{} ", disjoint_set.parent[i]); + std::print("\n"); + return 0; +} +``` + +### Explanation + +#### 1. **Struct Definition** + +```cpp +struct Set { + std::vector parent{}; + std::vector rank{}; +``` + +- `struct Set` defines a new struct type named `Set`. +- Inside the struct, two member variables are declared: + - `std::vector parent{}`: This is a vector that will hold the parent of each element. It is used to keep track of the representatives (or roots) of each subset. + - `std::vector rank{}`: This is a vector that will hold the rank (or depth) of each element. It is used to keep the tree flat by attaching smaller trees under the root of larger trees. + +#### 2. **Constructor** + +```cpp + constexpr Set(const ssize_t &size) { + parent.resize(size); + rank.resize(size); + + std::iota(parent.begin(), parent.end(), 0); + std::ranges::fill(rank, 0); + } +``` + +- `constexpr Set(const ssize_t &size)` is a constructor that initializes a `Set` instance with a given size. +- `parent.resize(size);` and `rank.resize(size);` resize the `parent` and `rank` vectors to the given size, initializing them to hold `size` elements. +- `std::iota(parent.begin(), parent.end(), 0);` initializes the parent vector such that each element is its own parent. `std::iota` is a standard algorithm that fills the range with sequentially increasing values starting from 0. After this, `parent[i] == i` for all `i` in the range `[0, size)`. +- `std::ranges::fill(rank, 0);` sets all elements in the rank vector to 0. `std::ranges::fill` is a standard algorithm that assigns the value 0 to each element in the rank vector. + +#### 3. **`find()` Method** + +```cpp +constexpr auto find(const ssize_t &node) -> ssize_t { + // If the node is its own parent, it is the root of its set + if (node == parent.at(node)) + return node; + + // Path compression: recursively find the root and update the parent + return parent.at(node) = find(parent.at(node)); +} +``` + +##### Method Definition + +```cpp +constexpr auto find(const ssize_t &node) -> ssize_t { +``` + +- `constexpr auto find(const ssize_t &node) -> ssize_t`: This defines a constexpr method named `find` that takes a single parameter `node` of type `ssize_t` and returns a value of type `ssize_t`. + +##### Method Body + +```cpp + if (node == parent.at(node)) + return node; +``` + +- The method checks if `node` is its own parent, i.e., if `node` is the root of its set. This is done using `parent.at(node)`, which accesses the element at index `node` in the `parent` vector with bounds checking (thanks to the `.at()` method). +- If `node` is its own parent, it means `node` is the representative of its set, and the method returns `node`. + +##### Path Compression + +```cpp + return parent.at(node) = find(parent.at(node)); +``` + +- If `node` is not its own parent, the method recursively calls `find on parent.at(node)`, which finds the root of `node`'s set. +- The result of the recursive `find` call is then assigned back to `parent.at(node)`. This step is the path compression optimization: it makes each `node` on the path from node to the root point directly to the root. This flattens the structure of the tree, reducing the time complexity of future `find` operations. +- Finally, the method returns the root of the set containing `node`. + +#### 4. **`union_set()` Method** + +```cpp +constexpr auto union_set(ssize_t u, ssize_t v) -> void { + // Find the roots of the sets containing u and v + u = find(u); + v = find(v); + + // If u and v are in different sets, merge them + if (u != v) { + // Union by rank: ensure the higher rank tree remains the root + if (rank.at(u) < rank.at(v)) + std::swap(u, v); + + // Make u the parent of v + parent.at(v) = u; + + // If ranks were equal, increment the rank of the new root + if (rank.at(u) == rank.at(v)) + ++rank[u]; + } +} +``` + +##### Method Definition + +```cpp +constexpr auto union_set(ssize_t u, ssize_t v) -> void { +``` + +- `constexpr auto union_set(ssize_t u, ssize_t v) -> void`: This defines a constexpr method named `union_set` that takes two parameters `u` and `v` of type `ssize_t` and returns `void`. + +##### Finding the Roots + +```cpp + u = find(u); + v = find(v); +``` + +- The method starts by finding the roots of the sets containing `u` and `v`. This is done using the `find` method previously defined. After this step, `u` and `v` are the representatives (roots) of their respective sets. + +##### Checking if Already Unified + +```cpp + if (u != v) { +``` + +- The condition checks if the roots `u` and `v` are different. If they are the same, `u` and `v` are already in the same set, and no union operation is needed. + +##### Union by Rank + +```cpp + if (rank.at(u) < rank.at(v)) + std::swap(u, v); +``` + +- If `u` and `v` are different, the method performs union by rank. It compares the ranks of the roots `u` and `v`. + - If `rank[u] < rank[v]`, it swaps `u` and `v` to ensure that `u` has the higher rank. This keeps the tree shallower by attaching the smaller tree under the root of the larger tree. + +##### Merging the Sets + +```cpp + parent.at(v) = u; +``` + +- The method sets `parent[v]` to `u`, effectively making `u` the parent of `v`. This merges the set containing `v` into the set containing `u`. + +##### Updating the rank + +```cpp + if (rank.at(u) == rank.at(v)) + ++rank[u]; + } +``` + +- If the ranks of `u` and `v` were equal, the rank of the new root `u` is incremented by 1. This is because the depth of the tree increases when two trees of the same rank are merged. + +#### 5. **`main()` Method** + +```cpp +int main() { + // Define the size of the disjoint set + const ssize_t size{5}; + // Create an instance of Set with the specified size + Set disjoint_set(size); + + // Perform union operations + disjoint_set.union_set(0, 1); + disjoint_set.union_set(1, 2); + disjoint_set.union_set(3, 4); + + // Print the results of find operations for each element + for (ssize_t i : std::ranges::iota_view{0, size}) + std::println("Find({}):{}", i, disjoint_set.find(i)); + + // Print the parent array + std::print("Parent array: "); + for (ssize_t i : std::ranges::iota_view{0, size}) + std::print("{} ", disjoint_set.parent[i]); + std::print("\n"); + return 0; +} +``` + +##### Performing Union Operations + +```cpp + disjoint_set.union_set(0, 1); + disjoint_set.union_set(1, 2); + disjoint_set.union_set(3, 4); +``` + +- `disjoint_set.union_set(0, 1);` merges the sets containing elements 0 and 1. +- `disjoint_set.union_set(1, 2);` merges the sets containing elements 1 and 2. Since 1 is already united with 0, this effectively unites elements 0, 1, and 2 into a single set. +- `disjoint_set.union_set(3, 4);` merges the sets containing elements 3 and 4. + +##### Printing the Results of Find Operations + +```cpp + for (ssize_t i : std::ranges::iota_view{0, size}) + std::println("Find({}):{}", i, disjoint_set.find(i)); +``` + +- This loop iterates over the `range [0, size)` using `std::ranges::iota_view{0, size}`. +- For each element `i`, it calls `disjoint_set.find(i)` to find the representative (root) of the set containing `i`. +- `std::println("Find({}):{}", i, disjoint_set.find(i));` prints the result in the format `Find(i):root`, where root is the representative of the set containing `i`. + +##### Printing the Parent Array + +```cpp + std::print("Parent array: "); + for (ssize_t i : std::ranges::iota_view{0, size}) + std::print("{} ", disjoint_set.parent[i]); + std::print("\n"); +``` + +- `std::print("Parent array: ");` prints a label for the parent array. + This loop iterates over the range `[0, size)` using `std::ranges::iota_view{0, size}`. +- For each element `i`, it prints the value of `disjoint_set.parent[i]` followed by a space. +- `std::print("\n");` prints a newline character to end the line. + +### Output + +```console +❯ ./main +Find(0):0 +Find(1):0 +Find(2):0 +Find(3):3 +Find(4):3 +Parent array: 0 0 0 3 3 +``` + +#### Explanation + +1. **Initialization**: + +- parent array: [0, 1, 2, 3, 4] +- rank array: [0, 0, 0, 0, 0] + +2. **Union Operations**: + +- `union_set(0, 1)`: + + - `find(0)` returns 0. + - `find(1)` returns 1. + - `rank[0] == rank[1]`, so `parent[1]` is set to 0 and `rank[0]` is incremented. + - `parent` array: [0, 0, 2, 3, 4] + - `rank` array: [1, 0, 0, 0, 0] + +- `union_set(1, 2)`: + + - `find(1)` returns 0 (since `parent[1]` is 0). + - `find(2)` returns 2. + - `rank[0] > rank[2]`, so `parent[2]` is set to 0. + - `parent` array: [0, 0, 0, 3, 4] + - `rank` array: [1, 0, 0, 0, 0] + +- `union_set(3, 4)`: + - `find(3)` returns 3. + - `find(4)` returns 4. + - `rank[3] == rank[4]`, so `parent[4]` is set to 3 and `rank[3]` is incremented. + - `parent` array: [0, 0, 0, 3, 3] + - `rank` array: [1, 0, 0, 1, 0] + +3. **Find Operations**: + +- `find(0)` returns 0. +- `find(1)` returns 0 (since `parent[1]` is 0). +- `find(2)` returns 0 (since `parent[2]` is 0). +- `find(3)` returns 3. +- `find(4)` returns 3 (since `parent[4]` is 3).