Contest: GoDaddy Hackathon

Contest: GoDaddy Hackathon

Last weekend I competed in the GoDaddy Hackathon competition on HackerRank. It was a fun contest, and I placed 48 out of 1097 participants. The contest had the following problems, and my solutions on GitHub are linked below as well.

# Problem My Solution Algorithmic Technique Time Complexity Space Complexity Level
1 A Perfect Set GitHub Pigeonhole Principle O(1) O(1) Easy
2 Hexagonal War GitHub Union Find, Matrix O(m log n) O(n^2) Moderate
3 Diamond-Land GitHub Sorting, Dynamic Programming O(n^2) O(n) Moderate
4 LCS Revisited GitHub Dynamic Programming O(2^k * n^k) with k=3 O(n^(k-1)) Difficult
5 A Perfect Pair GitHub Brute Force 😉 O(q*(n+m)) O(n + m) Difficult

This time around I had 2 favorite problems. Diamond-Land and LCS Revisited. I will discuss both in this blog post. Prior to discussing the two problems, here are some general notes on all the problems:

  • Hexagonal War: This was a fun problem. The problem itself wasn’t hard, and it could be solved using either graphs (w/ search) or using Union Find. I chose the latter for no particular reason, even though it is a tad slower. The fun thing though was the coordinate representation. It took a bit of thinking to figure out how to represent the hexagonal adjacency structure using a simple 2D matrix.
  • Diamond-Land: I won’t say too much about it here, since it is discussed in detail below. Note that my solution to this problem was *not* optimal. It was the best that I could come up with, and I got 17/21 tests passing, with 4/21 timing out.
  • LCS Revisited: I won’t say too much about this here either, since it is discussed in detail below. Note that my solution to this problem was *not* optimal either. It was the best that I could come up with, and I got 7/15 passing, with the rest timing out.
  • A Perfect Pair: I could only come up with a brute force solution. The tests were very aggressive, and only 2/11 passed the brute force, with the rest timing out. I looked at the editorial, and the efficient solution required knowledge of Biconnected components, and Block-Cut Trees (which sounds Greek and Latin at the moment), so I’m happy I didn’t spend too much time on this. 🙂

Problem: Diamond-Land

Problem Statement

Zloba has finally arrived in D-Land! D-Land is a big country, so we can represent D-Land as an infinite grid. Zloba knows a secret about this great country: D-Land has a lot of diamonds. He found a map with the positions of n sets of diamonds in this country. The map shows the position of each set, and the number of diamonds in it.

i-th set is placed at point (xi,yi) and has wi diamonds. No two sets are placed at the same point, and there is no set placed at point (0,0). At the beginning of his expedition, Zloba is located at point (0,0). He can ONLY move in two directions:

  1. Up – from point (x,y) to point (x,y+1).
  2. Right – from point (x,y) to point (x+1,y).

When Zloba is located at a set of diamonds, he instantly collects all of them. What is the maximum amount of diamonds that can be collected by moving only in these two directions?

Input Format

The first line of input contains one number n, the number of sets of diamonds.

The next n lines contain three space separated numbers xi, yi, and wi.

These integers are defined as:

x coordinate of i-th set, y coordinate of i-th set, and w as the number of diamonds in i-th set. It is guaranteed that no two sets are located at the same point, and no set is located at point (0,0).

Constraints

For 15% score 1≤n≤18.

For 36% score 1≤n≤10^4.

For 100% score 1≤n≤10^5

In all the testcases:

0≤xi,yi≤10^9

1≤wi≤10^9

Output Format

In a single line, print one integer: the maximum number of diamonds which can be collected by Zloba.

Sample Input 1:

3

1 1 5

3 5 7

2 3 3

Sample Output 1:

15

Sample Input 2:

2

1 6 10

2 4 8

Sample Output 2:

10

Solution

The first thing we notice are the constraints on the problem. The grid itself can be 10^9 * 10^9 wide. This is essentially *infinite* for any practical purposes, and any solution that treats the layout as a grid and tries to actually allocate it as a 2D array  is doomed to fail (both in space and time).

In order to formulate a solution, we will have to exploit the travel constraints mentioned in the problem. i.e. if we are at any given position on the grid, we can only move towards +ve x and +ve y.

Reduction to the rescue

Here’s a question we can ask ourselves. What would happen if we sorted the locations of all the diamonds along their x-coordinates? Now we are left with the y-positions (and weights) of a bunch of diamonds in the array, and we need to choose certain positions that only increase in y-coordinate (because the increase in x-coordinate is given to us for free via the sort). Of course, many such combinations exist in the array. But we need to choose that one combination, which maximizes the total weight of the diamonds.

Does this sound familiar? Yes! In fact, this is essentially a spin on the Longest Increasing Subsequence (LIS) Problem!

How so? Consider these parallels between the LIS problem and our Diamond-Land problem:

Longest Increasing Subsequence: Given an array of numbers, choose a subsequence such that:

  • Each element in the subsequence is greater than the previous.
  • The length of the subsequence is the *longest* amongst all Increasing Subsequences.

Diamond Land: Given an array containing pairs of (Diamond Weight, Diamond Y-Location), choose a subsequence of pairs such that:

  • Each Y-location of a subsequent pair is greater than or equal to the previous.
  • The *total weight* of the subsequence is the largest among all such Non-Decreasing Subsequences.

Thus we see that our Diamond-Land problem can be reduced to the LIS Problem after sorting!

In order to solve the problem, we will keep yet another sorted container (called scont in the code). This container will also have pairs of (weight, y-coord). However, the weight here will be the total weight thus far, and the y-coord will be the coordinate of the *last* diamond in the chosen subsequence thus far.

Now when we process a new diamond from the earlier x-sorted container, we will look through the weight-sorted (scont) container and scan backwards for the first Y-coord with y-value less than our current y-coord. This will give us the *largest weight* subsequence *thus far* that ends in a y-coord smaller than our current coord. By adding the current diamond to this subsequence, we can construct an even larger-weight subsequence, which still maintains the problem constraints (+ve x and +ve y movement). Finally, we insert this new (total-weight, current-y) pair into the sorted scont. This insertion is similar to the insertion step in insertion sort.

Since we process all n diamonds in O(n) steps, and scan backwards through our container for each diamond [O(n) time per step], this is an O(n^2) algorithm. We also use O(n) space to store the scont.

A possible optimization

As hinted in the editorial, we can use another data-structure to reduce the time per step from O(n) to O(log n). For example, use the RMQ segment tree to maintain the weights described above, and process a range-maximum-query from [0, current-y] for the current diamond to find the *largest weight subsequence thus far, ending w/ Y <= current Y*. Using this approach, the total runtime will drop to O(n log n). Had I done this, I would have had the final 4 tests passing as well!

Ok. Onwards to the next problem in the contest…

Problem: LCS Revisited

Problem Statement

You have to find the Longest common subsequence of 3 given arrays. Each array is size N. (Find the maximum size of the array, which is a subsequence of each given array.)

All array numbers are generated randomly. In other words, each number is taken with equal probability from 1 to N.

For example, if N is 5,then, for each number, there is a 20% probability that it’s “1”, also 20% that it’s “2”, 20% that it’s “3”, 20% that it’s “4” and 20% that it’s “5”.

Input Format

The first line contains one number, T, which is the number of tests.

For each test: The next line contains N, and the following 3 lines will contain 3 arrays. Each array is the size of N.

Constraints

1≤T≤5

For 40% score 1≤N≤500

For 100% score 1≤N≤3000

For all of the testcases, the array elements will be an integer between 1 and N.

Output Format

For each test, print the length of the longest common subsequence.

Sample Input

2

4

2 3 2 3

3 1 2 4

4 4 2 1

4

1 3 2 1

4 4 3 2

2 2 3 2

Sample Output

1

2

Solution

The LCS of 2-arrays is a well-known Dynamic Programming problem with O(n^2) runtime. I simply extended this to 3-arrays, with the following DP recurrence equation:

Let A, B, C represent the three sequences. At any point in the algorithm, let i, j, k represent their respective indices that are being considered.

If the corresponding elements are equal (if (A[i] == B[j] && B[j] == C[k])):

LCS[i][j][k] = 1 + LCS[i-1][j-1][k-1]

If the corresponding elements are not equal, we have 7 options:

cands[0] = cache[i-1][j][k];

cands[1] = cache[i][j-1][k];

cands[2] = cache[i][j][k-1];

cands[3] = cache[i-1][j-1][k];

cands[4] = cache[i-1][j][k-1];

cands[5] = cache[i][j-1][k-1];

cands[6] = cache[i-1][j-1][k-1];

LCS[i][j][k] = max(cands[0..6]);

i.e. we consider all 7 possibilities (from dropping 1 element from each individual sequence, to dropping 1 element from each combination of 2 sequences, to dropping 1 element from all 3 sequences).

Of course, the interesting thing is that as our number of sequences increases from 3 to say k, our number of options to consider increase to 2^k – 1.

i.e. work per step is O(2^k)

And the number of steps is O(n^k) i.e. k nested loops each running upto n.

Therefore the runtime of this algorithm (in the general case of k sequences) is O((2^k) * (n^k)). The naïve space requirement is O(n^k), but we can reduce this to O(n^(k-1) by observing that we only need the previous value of i i.e. (i-1). [This trick can only be played once for the outermost loop].

Notice that my solution is NOT exploiting the information on the probabilities of the various values in the sequences. I didn’t know how to exploit this. But this information can be exploited to get a much better algorithm. The editorial had a more efficient O(n^2) algorithm, but I wish they had put some more effort into explaining it because I don’t understand the very short explanation at all. 😦 If you have a better exposition on it, please let me know!