# Using disjoint sets with a vector

## Motivation

After much struggling with making `boost::disjoint_sets`

work in a slightly
non-trivial way, I present here a solution given the following restrictions on
the data and conditions on the final setting:

- The kind of object we are trying to separate into disjoint sets is not an
`int`

, but a`class`

with some internal logic for how the sets should be formed. - Moreover, we have edit access to this class’ code, so we can add some attributes and methods if we need to.
`std::map`

is considered too expensive, so this alternative is not an acceptable solution. In particular, it requires an empty constructor for the`operator[]`

, which is a no go in my context.- The total number of objects to be separated into different sets is known in
advance, and the objects are available
*before*starting to make the sets.

## An overview of the solution

Our solution consists in using a single `std::vector`

to store all the elements,
whose class is modified so that the structures needed by `disjoint_sets`

are all
part of the objects themselves. We add three integers to the class, which seems
like a reasonable price to pay compared to the overhead involved in a `map`

, for
example.

### The modified `Element`

class

Firstly, let’s say we have a class `Element`

, and it’s this kind of object that
we want to separate into disjoint sets. We modify the class so it has 3 new
attributes: an *ID* (which will uniquely identify the instance), a *rank* (which
helps with the union by rank), and a *parent* (which is another element of the
same set). In this case, the only other attribute is an integer, but the class
could very well be much more complex.

```
class Element
{
public:
explicit
Element(int n) : mSomeInt(n) { }
int someInt() const { return mSomeInt; }
// Specific for disjoint_sets
size_t dsID;
size_t dsRank;
size_t dsParent;
private:
int mSomeInt;
};
```

Note how I made these additional attributes `public`

for convenience. However,
you can always make them `private`

and provide read and write accessors for all
of them (getters and setters, if you will).

### Comparison operators

We will need two common operators for this class: `==`

and `!=`

. If these are
already provided by the class, there’s nothing else to do. For the purposes of
this example, I added them outside.

```
inline bool
operator==(Element const& lhs, Element const& rhs)
{
return lhs.someInt() == rhs.someInt();
}
inline bool
operator!=(Element const& lhs, Element const& rhs)
{
return ! operator==(lhs, rhs);
}
```

In no case the attributes added for disjoint sets should be used in any of these comparisons.

One more compare function is added here, used to sort the `vector`

by *parent*,
so each element of the partition is contiguous.

```
inline bool
compareByParent(Element const& lhs, Element const& rhs)
{
return lhs.dsParent < rhs.dsParent;
}
```

### Rank and Parent

Two classes are required by `boost::disjoint_sets`

: `Rank`

and `Parent`

. Note
that we simply define them to contain a reference to the vector that actually
contains our elements. Although they could be just one class definition, I’m
separating them for the sake of readability.

```
class Parent
{
public:
Parent(std::vector<Element>& e) : mElements(e) { }
std::vector<Element>& mElements;
};
class Rank
{
public:
Rank(std::vector<Element>& e) : mElements(e) { }
std::vector<Element>& mElements;
};
```

As before, you could have `mElements`

as a `private`

member if you wish.

The `Rank`

class requires this `typedef`

to find some value type, and this
requires to `#include <boost/pending/property.hpp>`

.

```
template <>
struct boost::property_traits<Rank*>
{
typedef size_t value_type;
};
```

`get`

and `put`

Both `Rank`

and `Parent`

require an interface to read and write the
corresponding value for a given key, which is an element of the vector.

```
inline Element const&
get(Parent* pa, Element const& k)
{
return pa->mElements.at(k.dsParent);
}
inline void
put(Parent* pa, Element k, Element const& val)
{
pa->mElements.at(k.dsID).dsParent = val.dsID;
}
inline size_t const&
get(Rank*, Element const& k)
{
return k.dsRank;
}
inline void
put(Rank* pa, Element k, size_t const& val)
{
pa->mElements.at(k.dsID).dsRank = val;
}
```

In both versions of `put`

, the second parameter is a copy of an `Element`

, which
might not be optimal. So far, it has worked for me after replacing `Element k`

by `Element const& k`

, but I don’t know whether that could cause trouble in some
weird scenario.

## Putting it all together

It’s time to test this setting. In our example program, we will reserve space
for a fixed number of `Element`

s, fill it with some randomly created objects,
and then create the sets and make some unions at random as well.

```
std::vector<Element> elements;
elements.reserve(30);
for (size_t i = 0; i < elements.capacity(); ++i)
elements.push_back(Element(rand() % 90));
for (size_t i = 0; i < elements.size(); ++i)
elements[i].dsID = i;
Rank rank(elements);
Parent parent(elements);
boost::disjoint_sets<Rank*, Parent*> sets(&rank, &parent);
for (size_t i = 0; i < elements.size(); ++i)
sets.make_set(elements.at(i));
for (size_t i = 0; i < elements.size(); ++i)
{
// Union with an element randomly chosen from the rest
size_t j = rand() % elements.size();
sets.union_set(elements[i], elements[j]);
}
std::cout << "Found " << sets.count_sets(elements.begin(), elements.end()) << " sets." << std::endl;
```

### Obtaining the sets

So far, we’ve been relying on the identifier `dsID`

to be exactly the position
of the element in the vector. Once we have finished joining sets, however, it is
time to sort the elements so each resulting set is a contiguous chunk of memory.

For this, we first flatten the tree of parent identifiers, so that the parent of each element is exactly the representative of the set. This is called path compression, and is again provided directly by boost.

`sets.compress_sets(elements.begin(), elements.end());`

Now we can sort by `dsParent`

, using the function we defined before.

`std::sort(elements.begin(), elements.end(), compareByParent);`

### Looping through each set

After sorting by parent, which in turn was done *after* path compression, we are
ready to loop through each set of the partition, and through each element of
each set.

#### Using indices

If we wish, we could use the inner loop to update `mID`

so it reflects the new
position of the element in the vector.

```
size_t first = 0;
size_t total = elements.size();
while (first < total)
{
size_t parent = elements[first].dsParent;
size_t last = first;
while (last < total && elements.[last].dsParent == parent)
++last;
// The range [first,last) is a set of the partition
for (size_t i = first; i < last; ++i)
{ /* elements[i] belongs to the current set */ }
first = last;
}
```

#### Using iterators

This method, in particular, requires `dsParent`

to be sorted in increasing
order, so `compareByParent`

will be the appropriate comparison for the
`upper_bound`

algorithm.

```
typedef std::vector<Element>::iterator Iterator;
Iterator first = elements.begin();
Iterator end = elements.end();
while (first != elements.end())
{
Iterator last = std::upper_bound(
first, end, *first, compareByParent);
// The range [first,last) is a set of the partition
for (Iterator it = first; it != last; ++it)
{ /* *it belongs to the current set */ }
first = last;
}
```

## Conclusion

This is one specific example of usage. I hope it helps somebody somewhere, as I know by experience how painful it can be to try and make sense of all of this with the unfortunate documentation that boost provides.

To see the full code, you can visit the repository where I uploaded this example.