Data Structures FTW!

Data Structures FTW!

Today’s short blog is about the importance of data structures in solving various algorithmic problems. Too often when faced with a non-trivial problem, we are guilty of designing and coding cryptic algorithms for solving our issues. On account of their complexity, these can be hard to code, hard to debug, and even harder to maintain. Frequently, it pays to take a step back, think about the crux of the problem, and try to design our algorithms around data structures that will greatly simplify the complexity.

This philosophy is best illustrated with some examples. Take a look at this interesting problem from leetcode: Islands.

The problem doesn’t seem too bad, and at least three approaches quickly come to mind.

Approach 1: Greedy Approach using a Queue

The first approach doesn’t involve any uncommon data structures. I keep the problem matrix within a vector of vectors, and use a queue within the algorithm. The algorithm proceeds via a discovery process starting from a seed of a single 1. I add neighboring 1s and grow the island in a greedy fashion. Here is the pseudo-code:


while (at least some non-zero value in matrix) {
____cell = scan from L->R and T->B and find first non-zero cell
____queue.push(cell);
____island = emptyIsland;
____while (!queue.empty()) {
________curcell = queue.top(); queue.pop();
________island.add(curcell);
________tcell = cell on top of  curcell;
________bcell = cell at the bottom of curcell;
________rcell = cell on the right of curcell;
________lcell = cell on the left of curcell;
________if (x <- [tcell, bcell, rcell, lcell] has not been visited and is a '1') {
____________mark x visited;
____________queue.push(x);
________}
____}
____add island to our output list of islands.
____Mark all the cells on this island as zeros
}

The above approach isn’t too bad, but it doesn’t use any existing algorithms. This makes its correctness and algorithmic efficiency non-obvious. Someone looking at it for the first time will also have to go through the entire code before the approach becomes obvious.

Approach 2: Use a Graph Data Structure

This approach illustrates how using a common data structure can ease the issues mentioned in approach 1.

We can model each 1 in the matrix as a node in a graph. A node connects its neighbors (Top, Left, Right, Down) if those neighbors are also 1s.

This approach has immediate benefits. Indeed, the problem of counting the number of islands now reduces to counting the number of connected components in a graph. This is a very well-known graph theory problem, and is easily solved with BFS/DFS.

The perceptive reader will note that this is basically what approach 1 was doing, but without using the data structure formally. Indeed, in order to perform a BFS we will use a queue to visit the nodes in level-order. However, once I formally recognize that the problem is graph based, I can use any existing libraries to find the connected components for me. And software reuse is a beautiful thing! 🙂

 

Approach 3: Use a Union-Find Data Structure

This approach again illustrates how using another data structure can completely trivialize the problem.

We can instantiate a Union-Find object, where each cell (r, c) is represented initially by its own id. i.e. We start off treating each 1 as its own separate little island. Now we can scan from L->R and T->B in one pass, and “Union” those islands that are touching each other. At the end, we can ask Union-Find to tell us how many roots remain in the data structure, which is our answer. In this case, all we did was write a nested loop to examine the input and enter it within Union-Find. The data structure basically *solved* the problem, and told us the result!

This is the approach I implemented and here is the code. Note that the Union-Find implementation precedes the main solution, but the actual solution itself is very small and simple.

 

Improving existing Algorithms

There are ways in which Data Structures can benefit existing algorithms as well. Indeed Prof. Skiena, in his book The Algorithm Design Manual notes that “Changing a data structure in a slow program can work the same way an organ transplant does in a sick patient”.

I experienced it first-hand the other day, while I was solving this problem from leetcode.

My initial solution was a standard quadratic dynamic programming algorithm O(n^2). This program was NOT fast enough to pass all test cases without a “Time Limit Exceeded” error.

As I was pondering on how to improve the code, I noticed that the inner for loop was basically performing  a Range-Minimum-Query. I implemented a simple data structure to do the RMQ in O(sqrt(n)) instead of O(n) time, and BAM… just like that, my overall complexity dropped to O(n*sqrt n). It now passed all the test cases within the 2 second time limit.

Of course, in this case it turns out that I was able to devise an O(n) algorithm later on for this problem. 🙂

But the lesson still stands, switching out some code for the appropriate data structure can quickly provide immediate runtime benefits.

Another example is in sorting. Selection sort implemented without using any special data structures is O(n^2) complexity. The same algorithm drops to O(nlogn) complexity when we use a heap to give us the next element (heapsort).

In parting, I will leave you with this quote from Linus Torvalds:

 I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”

Leave a comment