AFK for a few days/weeks

I just thought I’d leave a comment on here indicating that I am going to be away from competitions for a few weeks.

I just moved jobs, and there is a ton of exciting ramp-up that I am going through. Also I need to check with work to make sure that I can keep competing and writing code for various contests (some contest rules are funny, and can conflict with work rules). Additionally, we are also relocating geographically, and there is some time/pain associated with moving physically and selling our house in Folsom.

Anyway, once we find a new place and get settled in, and once I get the green light to go ahead at work, I am planning to get back on HackerRank. Stay tuned! 🙂

Contest: HackerRank (1) Ad Infinitum 13 and (2) CodeStorm 2015

Contest: HackerRank (1) Ad Infinitum 13 and (2) CodeStorm 2015

I haven’t blogged in a few weeks, and I wanted to report out on my recent programming adventures. I competed in a couple of contests recently on HackerRank.

The first was a Math Programming contest, where I placed 111 out of 1220 participants. This was my first time participating in a Math Programming event, and the problems were definitely way more mathematical (than algorithmic). The first 5 problems were a lot of fun, requiring knowledge of Pick’s theorem, invariants, and some clever math to compute a large GP sum in modulo arithmetic (efficiently). The last 2-3 problems were very cryptic, and relied on knowledge of the Mebius function, representation theory, and Stirling numbers of the second kind (I don’t even know what Stirling numbers of the first kind are, LOL!). I coded some un-optimal solutions to these last ones, which cleared some but not all test cases. Overall I had fun, but not as much as I do on the algorithmic contests. This is definitely due to my lack of knowledge of the hardcore math involved. It was an interesting experience though.

The second contest, CodeStorm 2015, was definitely my type of contest. It was more of a CS centric contest, and efficient solutions required algorithmic (rather than pure math) understanding. I placed 151 out of 4850 participants, which was a pretty good placement.

Here is a breakdown of the problems:

# Problem My Solution Algorithmic Technique Time Complexity Space Complexity Level
1 Emma’s Notebook GitHub Basics O(t) O(1) Easy
2 Save Quantumland GitHub Greedy O(n) O(n) Easy
3 Counting Triangles GitHub Binary Search O(n^2 log n) O(n) Moderate
4 Boomerang GitHub Reverse Engineering 😉 O(T log n) O(1) Moderate
5 Game of Reduction GitHub Precomputation,

DP for nCk,

Game Theory

O(nmax)

O(nmax^2)

O(m + n) per test case

O(nmax^2)

 

Difficult
6 Independent Vertex Cover Not Attempted N/A N/A N/A Difficult
7 Alexey’s Tree GitHub Memoization

Graph Theory

O(n+m) O(mn)? Not sure. Lots!? Advanced

 

I solved the first 4 problems perfectly. Problem #5 cleared 4/14 test cases, the rest ran out of space. Problem #6 cleared 48/59 test cases, the rest again running out of memory. Of course, neither of these was the optimal solution. I didn’t have time to attempt #6 at all, though it looks like a fascinating problem.

 

 

Contest: HackerRank Women’s Cup

Contest: HackerRank Women’s Cup

This weekend I competed in the HackerRank Women’s Cup contest. And yes… I checked with HackerRank to make sure that I was *eligible* to compete even though I’m not a woman. 🙂 There was a catch however. Though I could compete, I was *not eligible* to win any prizes. Those were reserved only for women.

Anyway, I placed 45 out of 1137 participants, which was a pretty good placement. I scored full points on the first 6 problems, but only partial points on the last one (since 15/33 tests passed on that one, but the rest timed out). I tried many different things to speed up the last problem (e.g. using interval trees), but the optimal solution involved using segment trees again.

Here is a summary of the problems, and my solutions:

# Problem My Solution Algorithmic Technique Time Complexity Space Complexity Level
1 Possible Maximum GitHub Basics O(n^2) O(1) Easy
2 Project Earth GitHub Math, AdHoc O(1) O(1) Easy
3 Magical Pearls GitHub Heaps O(n log n) O(n) Moderate
4 Cabalistic Hills GitHub Eulerian Cycle, Graphs O(|E| + |V|) O(|E| + |V|) Moderate
5 Magical String GitHub Pre-compute

AdHoc

O(n) precomp

O(1) query

O(n) Moderate
6 Coding Camp GitHub Sorting, AdHoc O(m^2 log m) O(m+n) Difficult
7 Counting Pairs GitHub Brute Force Query 1 O(1)

Query 3 O(1)

Query 2 O(min(n, (s2-s1)))

O(n) Difficult

Here’s a fun problem from the contest:

Problem: Cabalistic Hills

Problem Statement

Antony and Cleopatra are true soul mates and, like true soul mates, they fight. Instead of giving her an elixir like last time, Antony wants to bestow her with the magical lamp available at the Cabalistic Hills.

The Cabalistic Hills were created by Lucifer millions of years ago. They have N hills numbered 1 to N. The magical lamp is situated at the peak of hill N. Some of these hills are connected by one way rope bridges(directed), and these rope bridges are the only way to reach hill N and return back to hill 1. Each rope bridge disappears after being used once.

Lucifer has created traps for the climbers. Some rope bridges do not lead to hill N, and some lead to hill N but without a way to return back to hill 1. Antony MUST use all the rope bridges in his journey from hill 1 to N and back to 1 or he is imprisoned by Lucifer.If multiple ropes are connecting two hills then it should be counted multiple times and all of the ropes must be used once. You must advise Antony whether the hills are safe to climb or they are devious trap! Given N hills and M pairs of connected hills, determine whether it’s a trap.

Input Format

The first line contains T: the number of test cases. Each test case contains N and M separated by space. Next M lines contains v1, v2 separated by a space where v1 and v2 represent the hills connected by a rope bridge.

Constraints

1≤T≤15

1≤N≤10^5

0≤M≤2×10^5

Output Format

Output T lines. For each test case, either print “Go on get the Magical Lamp” or, if it’s a trap, “Danger!! Lucifer will trap you” without quotes.

Sample Input

2

4 4

1 2

2 3

3 4

4 1

4 3

1 2

2 3

3 1

Sample Output

Go on get the Magical Lamp

Danger!! Lucifer will trap you

Solution

As kids, we have all at some point been posed the Boyd Puzzle. The riddle asks us to draw a certain figure, starting from one point, and ending at the same point with these rules:

  • You cannot lift your pencil off the paper.
  • You cannot cover the same line twice.
  • You must draw all the lines in the figure.

This is an unsolvable riddle, for the particular figure posed, and there is actually a mathematical theorem that proves the same. In fact the famous mathematician Euler proved it rigorously with his famous problem “Seven Bridges of Konigsberg”. This was a major historical development, and laid the foundations of graph theory, which is so ubiquitous in computer science!

So what does this have to do with our current problem? Our problem is a generalized version of this Boyd’s Puzzle, where we need to give a yes/no answer to “Is it possible to construct a Eulerian cycle for the provided input?”

i.e. Let’s construct a graph with the input. Now if you read the problem statement carefully you will notice the following.

  • We must start at 1, visit N, and come back to 1 (i.e. trace a cycle).
  • We must also visit *every edge*, only in the direction stated.
  • We can visit every edge exactly once (the bridges disappear after travel).
  • We *must* use all bridges.

We see that this problem is a sneaky way of asking us to detect a Eulerian Cycle in the graph! Once you realize this, the problem becomes extremely simple. Indeed, there exists a polynomial time algorithm to do this. The algorithm is as follows.

A directed graph has a Eulerian Cycle if and only if:

  1. All vertices with in/out edges belong to the same single strongly connected component.
  2. For each vertex in-degree(v) == out-degree(v)

We can do #1 above by running Kosaraju’s SCC algorithm like one of our previous contests. We can keep some bookkeeping information along the way to figure out #2 quite easily as well.

Note: The diagram below is credited to mathworld since I was too lazy to draw my own.

eulerian-cycle

It is interesting that detecting/constructing a Eulerian Cycle (using all edges exactly once) is a polynomial time algorithm, but constructing a Hamiltonian Cycle (using all vertices exactly once) is an NP-Complete problem!

 

 

 

 

 

Contest: GoDaddy Hackathon

Contest: GoDaddy Hackathon

Last weekend I competed in the GoDaddy Hackathon competition on HackerRank. It was a fun contest, and I placed 48 out of 1097 participants. The contest had the following problems, and my solutions on GitHub are linked below as well.

# Problem My Solution Algorithmic Technique Time Complexity Space Complexity Level
1 A Perfect Set GitHub Pigeonhole Principle O(1) O(1) Easy
2 Hexagonal War GitHub Union Find, Matrix O(m log n) O(n^2) Moderate
3 Diamond-Land GitHub Sorting, Dynamic Programming O(n^2) O(n) Moderate
4 LCS Revisited GitHub Dynamic Programming O(2^k * n^k) with k=3 O(n^(k-1)) Difficult
5 A Perfect Pair GitHub Brute Force 😉 O(q*(n+m)) O(n + m) Difficult

This time around I had 2 favorite problems. Diamond-Land and LCS Revisited. I will discuss both in this blog post. Prior to discussing the two problems, here are some general notes on all the problems:

  • Hexagonal War: This was a fun problem. The problem itself wasn’t hard, and it could be solved using either graphs (w/ search) or using Union Find. I chose the latter for no particular reason, even though it is a tad slower. The fun thing though was the coordinate representation. It took a bit of thinking to figure out how to represent the hexagonal adjacency structure using a simple 2D matrix.
  • Diamond-Land: I won’t say too much about it here, since it is discussed in detail below. Note that my solution to this problem was *not* optimal. It was the best that I could come up with, and I got 17/21 tests passing, with 4/21 timing out.
  • LCS Revisited: I won’t say too much about this here either, since it is discussed in detail below. Note that my solution to this problem was *not* optimal either. It was the best that I could come up with, and I got 7/15 passing, with the rest timing out.
  • A Perfect Pair: I could only come up with a brute force solution. The tests were very aggressive, and only 2/11 passed the brute force, with the rest timing out. I looked at the editorial, and the efficient solution required knowledge of Biconnected components, and Block-Cut Trees (which sounds Greek and Latin at the moment), so I’m happy I didn’t spend too much time on this. 🙂

Problem: Diamond-Land

Problem Statement

Zloba has finally arrived in D-Land! D-Land is a big country, so we can represent D-Land as an infinite grid. Zloba knows a secret about this great country: D-Land has a lot of diamonds. He found a map with the positions of n sets of diamonds in this country. The map shows the position of each set, and the number of diamonds in it.

i-th set is placed at point (xi,yi) and has wi diamonds. No two sets are placed at the same point, and there is no set placed at point (0,0). At the beginning of his expedition, Zloba is located at point (0,0). He can ONLY move in two directions:

  1. Up – from point (x,y) to point (x,y+1).
  2. Right – from point (x,y) to point (x+1,y).

When Zloba is located at a set of diamonds, he instantly collects all of them. What is the maximum amount of diamonds that can be collected by moving only in these two directions?

Input Format

The first line of input contains one number n, the number of sets of diamonds.

The next n lines contain three space separated numbers xi, yi, and wi.

These integers are defined as:

x coordinate of i-th set, y coordinate of i-th set, and w as the number of diamonds in i-th set. It is guaranteed that no two sets are located at the same point, and no set is located at point (0,0).

Constraints

For 15% score 1≤n≤18.

For 36% score 1≤n≤10^4.

For 100% score 1≤n≤10^5

In all the testcases:

0≤xi,yi≤10^9

1≤wi≤10^9

Output Format

In a single line, print one integer: the maximum number of diamonds which can be collected by Zloba.

Sample Input 1:

3

1 1 5

3 5 7

2 3 3

Sample Output 1:

15

Sample Input 2:

2

1 6 10

2 4 8

Sample Output 2:

10

Solution

The first thing we notice are the constraints on the problem. The grid itself can be 10^9 * 10^9 wide. This is essentially *infinite* for any practical purposes, and any solution that treats the layout as a grid and tries to actually allocate it as a 2D array  is doomed to fail (both in space and time).

In order to formulate a solution, we will have to exploit the travel constraints mentioned in the problem. i.e. if we are at any given position on the grid, we can only move towards +ve x and +ve y.

Reduction to the rescue

Here’s a question we can ask ourselves. What would happen if we sorted the locations of all the diamonds along their x-coordinates? Now we are left with the y-positions (and weights) of a bunch of diamonds in the array, and we need to choose certain positions that only increase in y-coordinate (because the increase in x-coordinate is given to us for free via the sort). Of course, many such combinations exist in the array. But we need to choose that one combination, which maximizes the total weight of the diamonds.

Does this sound familiar? Yes! In fact, this is essentially a spin on the Longest Increasing Subsequence (LIS) Problem!

How so? Consider these parallels between the LIS problem and our Diamond-Land problem:

Longest Increasing Subsequence: Given an array of numbers, choose a subsequence such that:

  • Each element in the subsequence is greater than the previous.
  • The length of the subsequence is the *longest* amongst all Increasing Subsequences.

Diamond Land: Given an array containing pairs of (Diamond Weight, Diamond Y-Location), choose a subsequence of pairs such that:

  • Each Y-location of a subsequent pair is greater than or equal to the previous.
  • The *total weight* of the subsequence is the largest among all such Non-Decreasing Subsequences.

Thus we see that our Diamond-Land problem can be reduced to the LIS Problem after sorting!

In order to solve the problem, we will keep yet another sorted container (called scont in the code). This container will also have pairs of (weight, y-coord). However, the weight here will be the total weight thus far, and the y-coord will be the coordinate of the *last* diamond in the chosen subsequence thus far.

Now when we process a new diamond from the earlier x-sorted container, we will look through the weight-sorted (scont) container and scan backwards for the first Y-coord with y-value less than our current y-coord. This will give us the *largest weight* subsequence *thus far* that ends in a y-coord smaller than our current coord. By adding the current diamond to this subsequence, we can construct an even larger-weight subsequence, which still maintains the problem constraints (+ve x and +ve y movement). Finally, we insert this new (total-weight, current-y) pair into the sorted scont. This insertion is similar to the insertion step in insertion sort.

Since we process all n diamonds in O(n) steps, and scan backwards through our container for each diamond [O(n) time per step], this is an O(n^2) algorithm. We also use O(n) space to store the scont.

A possible optimization

As hinted in the editorial, we can use another data-structure to reduce the time per step from O(n) to O(log n). For example, use the RMQ segment tree to maintain the weights described above, and process a range-maximum-query from [0, current-y] for the current diamond to find the *largest weight subsequence thus far, ending w/ Y <= current Y*. Using this approach, the total runtime will drop to O(n log n). Had I done this, I would have had the final 4 tests passing as well!

Ok. Onwards to the next problem in the contest…

Problem: LCS Revisited

Problem Statement

You have to find the Longest common subsequence of 3 given arrays. Each array is size N. (Find the maximum size of the array, which is a subsequence of each given array.)

All array numbers are generated randomly. In other words, each number is taken with equal probability from 1 to N.

For example, if N is 5,then, for each number, there is a 20% probability that it’s “1”, also 20% that it’s “2”, 20% that it’s “3”, 20% that it’s “4” and 20% that it’s “5”.

Input Format

The first line contains one number, T, which is the number of tests.

For each test: The next line contains N, and the following 3 lines will contain 3 arrays. Each array is the size of N.

Constraints

1≤T≤5

For 40% score 1≤N≤500

For 100% score 1≤N≤3000

For all of the testcases, the array elements will be an integer between 1 and N.

Output Format

For each test, print the length of the longest common subsequence.

Sample Input

2

4

2 3 2 3

3 1 2 4

4 4 2 1

4

1 3 2 1

4 4 3 2

2 2 3 2

Sample Output

1

2

Solution

The LCS of 2-arrays is a well-known Dynamic Programming problem with O(n^2) runtime. I simply extended this to 3-arrays, with the following DP recurrence equation:

Let A, B, C represent the three sequences. At any point in the algorithm, let i, j, k represent their respective indices that are being considered.

If the corresponding elements are equal (if (A[i] == B[j] && B[j] == C[k])):

LCS[i][j][k] = 1 + LCS[i-1][j-1][k-1]

If the corresponding elements are not equal, we have 7 options:

cands[0] = cache[i-1][j][k];

cands[1] = cache[i][j-1][k];

cands[2] = cache[i][j][k-1];

cands[3] = cache[i-1][j-1][k];

cands[4] = cache[i-1][j][k-1];

cands[5] = cache[i][j-1][k-1];

cands[6] = cache[i-1][j-1][k-1];

LCS[i][j][k] = max(cands[0..6]);

i.e. we consider all 7 possibilities (from dropping 1 element from each individual sequence, to dropping 1 element from each combination of 2 sequences, to dropping 1 element from all 3 sequences).

Of course, the interesting thing is that as our number of sequences increases from 3 to say k, our number of options to consider increase to 2^k – 1.

i.e. work per step is O(2^k)

And the number of steps is O(n^k) i.e. k nested loops each running upto n.

Therefore the runtime of this algorithm (in the general case of k sequences) is O((2^k) * (n^k)). The naïve space requirement is O(n^k), but we can reduce this to O(n^(k-1) by observing that we only need the previous value of i i.e. (i-1). [This trick can only be played once for the outermost loop].

Notice that my solution is NOT exploiting the information on the probabilities of the various values in the sequences. I didn’t know how to exploit this. But this information can be exploited to get a much better algorithm. The editorial had a more efficient O(n^2) algorithm, but I wish they had put some more effort into explaining it because I don’t understand the very short explanation at all. 😦 If you have a better exposition on it, please let me know!

 

 

 

Contest: HackerRank World Cup Semi-Finals

Contest: HackerRank World Cup Semi-Finals

I’ve been overdue for reporting out on my performance in the HackerRank World Cup Semifinals. I’ve been on vacation for the past week, enjoying the beach at Bodega Bay with family. Therefore this blog is a bit late.

As expected, the contest problems were much harder this time around. And the time was reduced to 24 hours as well. I definitely felt the downside to the lack of team members this time around. I feel like I could have placed better with 1 or 2 other people to offload some problems onto. Anyway, I placed 228 out of 1411 *teams*, which, all said and done, wasn’t a terrible performance. In fact it was supposedly good enough to net me a HackerRank T-shirt (which I’m still waiting for), so “Yay!”

I will talk about one of the problems that I really liked in the contest. It was called “Delivery System” (Difficult level problem).

Problem Statement

Bill is the distributor for the “ByteLand Delivery System”. He has some orders to deliver. You have a map of ByteLand, and there are N cities and M directed ways. You also have a set of K different cities where Bill has to make a delivery. You have to find a good order of cities for Bill to visit. The order C is good if and only if, it’s possible to reach City number C[x+1] from City number C[x] for each acceptable x.

As you qualified to the semifinal, Bill thinks you are the one who can solve the problem he is facing!

If there are multiple solutions, print the lexicographically smallest.

Constraints

1≤K≤N≤10^5

1≤M≤2∗10^5

1≤T≤3

Input Format

First line contains one integer T, number of testcases.

For each test:

First line contains three integers N, M and K.

Second line contains K integers, cities Bill has to visit.

Next M line contains two integers X and Y,

denotes that there is a direct way from X to Y.

Output Format

For each test print a line with lexicographically smallest answer,but if there is no answer print -1.

Sample Input

1

4 5 3

4 3 2

3 4

3 1

1 4

2 1

3 2

Sample Output

3 2 4

Explanation

Start from city 3 and go to city 4 in passing by 2 is good and lexicographically smallest.

Solution

This is one of those problems where a solid knowledge of CS graph theory really helps. In fact, the problem is pretty much intractable unless we know two fundamental graph theoretic algorithms, which I will discuss below. This is a good reason why we should make sure we are improving our CS fundamentals constantly. Even if you are extremely smart, solving these problems is a nightmare without applying the appropriate CS algorithms. On the other hand, knowing the correct algorithms makes the problem pretty easy!

For the sake of argument, let’s consider a graph as shown below representing the various cities and the roads.

SCC-DAG-TSort

The problem mentions that we need a “lexicographically smallest” output. i.e. among many possible valid outputs, we need to present the one that is lexicographically smallest.

This suggests the following approach.

First, we find the strongly connected components of the directed graph.

A strongly connected component is a subgraph (with a subset of vertices and subset of edges) such that any vertex in that sub-graph can be reached from any other vertex in that sub-graph. Specifically, we can run Kosaraju’s SCC algorithm, which does 2 DFS traversals in a very specific order.

Why do we want to do this? Consider the example graph shown above. We have 4 SCCs in this graph, which are the four clusters shown in the boxed regions. Now we can figure out the various clusters that our candidate cities fall into. *Within* a single cluster, we know that all the candidate cities we are interested in that lie in that cluster can reach each other. Therefore, we can sort the candidate cities within each cluster. This will give us a lexicographically smallest *local* ordering.

However, just having these local orderings is obviously not enough. We need another fundamental graph algorithm.

Next, we perform a topological sort on the DAG induced by the SCCs.

A DAG is a directed acyclic graph. Once we find the SCCs in our directed graph, we can replace each SCC by a meta-node. This process induces a meta-graph on top of our original graph, which is acyclic by nature (if it were cyclic, it means we didn’t quite find the correct SCCs in the first place). In the example figure shown above, if we consider each *box* to be a meta-node, then the boxes (vertices) and the edges between boxes for a DAG.

DAGs have the property that we can perform topological sort on them. A topological sort is a *partial* ordering, such that each source node appears before *any* destination nodes that it points to. It is called a partial ordering, because (unlike for example sorting numbers) there are multiple correct orders possible.

We perform a topological sort, because we want to know that we are able to traverse the various clusters in order.

For example, let’s call our boxes (meta-nodes) as N, S, E, W (for the four directions), each corresponding to the respective box in the meta-graph. Now consider two cases.

Case 1: Our original list of candidate cities are contained within clusters W, N, and E. In this case, we can visit all the cities in the W cluster first (in lexicographic order), then all the cities in the N cluster (in lexicographic order), and finally all the cities in the E cluster (in lexicographic order again). This will give us an overall correct order, that is also lexicographically smallest. i.e. the problem has a solution in this scenario.

Case 2: Our original list of candidate cities are contained in clusters N, S, E, W. We can of course visit the W cluster first. But now, if we visit the N cluster, we can no longer visit the S cluster. On the other hand, if we visit the S cluster after the W cluster, we can no longer visit the N cluster. i.e. We find that the problem in this case has no solution.

My solution using Kosaraju’s SCC algorithm and topological sorting is on GitHub.

The algorithmic complexity for both algorithms is basically O(|V| + |E|) since we are simply performing multiple DFS traversals and hence very fast. Of course, we also do a vanilla sort *within* each SCC, which can be O(|V| log |V|). So the overall complexity is O(|E| + |V| log |V|)

 

 

 

 

Contest: HackerRank World Cup Qualifiers

Contest: HackerRank World Cup – Qualifiers

I’ve been sick with a cold for the past couple of days, so this is probably going to be a shorter blog. But I thought it would be fun to write about my latest HackerRank contest – the World Cup Qualifiers. It took place last weekend. At first I was a bit apprehensive, because the contest allowed teams of up to 3 people to participate. I was worried that the problems might take too long for one person to solve, but I took a leap of faith and participated anyway (as a solo team). I’m glad I did, because it was actually a really cool contest! I placed 151 out of 4408 *teams*, so overall not too bad at all.  Unfortunately this was an unrated contest, so no medals were handed out. I also had to create a new handle for my lone wolf team, which I called golambda, so the contest rankings show up on this handle instead of my usual cbrghostrider.

The top 50% of the teams qualified for the *semi-finals*, which are going to be this weekend. Unlike the qualifiers, the semi-finals are going to be extremely brutal, because only 50 teams will make it to the finals. There are definitely more than 50 competitive programming pros that are much much better at this than I am, so I fear it is an automatic knockout for me. But I’m still going to compete, because it is a ton of fun! 🙂

The qualifiers had 6 problems of varying difficulty levels. Here’s a quick summary of the contest.

# Problem My Solution Algorithmic Technique Time Complexity per test Level
1 WC Team Formation GitHub Basics O(1) Easy
2 Swapping Bridges GitHub Union-Find, Basics O(n log n) Moderate
3 World Cup Game GitHub Tree Minimax,

Memoization

O(n) Moderate
4 Bishop War GitHub Backtracking w/ pruning O(2^(mn)) Difficult
5 Alien Age GitHub Dynamic Programming, Greedy O(m*n*(log m + log n)) Advanced
6 Ikbals Arrays N/A Segment Trees N/A Advanced

 

Here are some notes on the problems:

Swapping Bridges:

Initially this problem sounds fairly complicated. But when you read it closely, you realize that the constraints on the problem limit the amount of complexity drastically. Specifically each city has *exactly* one incoming and outgoing edge. Given the other constraints, the only topologies possible are circular lists. i.e. the entire graph is a disjoint collection of circular lists. Furthermore, every time we perform the permitted operation of bridge-swap, we merge two such circles. The problem can be solved in O(n), but why bother when I already have a Union-Find implementation ready? So I saved some (implementation) time and solved it in O(n log n) using Union-Find.

World Cup Game:

This was actually a really cool and elegant game theory problem, and had me stumped for a while. I finally solved it, and loved it so much I immediately decided to blog about it. I won’t say much about it here, because I have an entire section underneath the talks about it.

Bishop War:

This is a variation on the N-Queens problem, with Bishops instead of Queens. Some random obstacles are also added to the chess board. I had a functional version ready quick, but struggled a bit to get it running within the time limits. I experimented with various pruning strategies to solve this problem. Initially I only had a few submissions passing, but I gradually improved the pruning until I could get 40/41 tests passing within 2 seconds. Interestingly, the editorial has a better algorithm, with a O(m*n*2^(2m)) complexity. My O(2^(n*m)) is definitely much worse in theory, but in practice all that pruning I did reduces the search space drastically.

Alien Age:

This was another cool problem, and involved dynamic programming, which I love so much. After solving the first portion using DP, we need another algorithmic technique to finally solve the problem. I had thought of the same approach as the editorial (binary search after DP), but ended up going with another approach (greedy choice after DP). Both results are functionally correct, though mine is just a tad less efficient O(n*m*(log n + log m)) instead of the editorials O(n*m*(log min(m, n))).

Ikbals Arrays:

I did not actually attempt this problem during the contest. I’ve seen this type of problem before, and know that it involves using a segment tree. The issue however is that I don’t have a ready segment tree implementation in my personal library yet. Also I was completely thrown off by the constraints of the problem (the array size could be up to 10^9, which is insane!). I (wrongly) concluded that even if I wrote a segment tree implementation, the tree itself would be too freaking huge, and would result in Memory-Limit-Exceeded (HackerRank only allows 640MB memory during the contest). I figured there was some other technique that needed to be used. Apparently I concluded wrong, and the problem is solvable using segment trees (though I’m still stumped by how to avoid the MLE issue).

World Cup Game

Ok, so now for my favorite problem. This problem was a game theory-ish problem, and very elegant. It definitely had an “aha-moment” to it, and really made me smile when I figured it out. I have the problem statement, constraints, and an example below:

Problem Statement

Bob invented a game on a tree for the Hackerrank World Cup! The game is played by two players. Players will take turns and in each move they will choose a node, but they have to follow some rules:

Rule 1: A player can’t choose a node which is already chosen by himself or by his opponent in previous rounds.

Rule 2: If the player is making his first move, he can choose any node as long as he doesn’t violate the first rule.

Rule 3: If it is not the first move of the player, he has to choose a node which is adjacent to one of the nodes he already selected in previous moves.

If a player is not able to make a move, he misses that turn and the game continues. The game ends when neither player can make a move.

In each node there are some tokens. The score of a player is the sum of numbers of tokens in the nodes he has chosen. Each player will try to maximize his own score.

For a given tree, calculate the maximum score that can be obtained by the first player. Assume that both players are playing optimally (none of them will make any mistakes).

Input Format

First line contains N, number of nodes in a tree. The nodes are numbered from 1 to N.

Second line contains N numbers, ith number denotes number of tokens in node i.

Each of the next N−1 line contains two integers U and V which indicates there is an edge in the tree between node U and nod e V.

Constraints

1≤N≤500000

1≤A[i]≤10^9

Output Format

Print maximum score of first player, if both players play optimally.

Sample Input

3

4 2 4

1 2

2 3

Sample Output

6

Explanation

One scenario is, First player chooses node 2 and then the second player will choose node 1 or node 3 and then first one will choose the last node. In both scenarios, the first player will have 6 points.

Solution

I usually give a pretty detailed solution for my favorite problems, including efficiency and correctness claims. However, rarely do we just *solve* the problem when we first see it. Often, it involves quite a bit of struggle, with false starts and walks down dead ends and into pitfalls.

For this problem, I think I am going to first describe my initial *incorrect thought process*, and how overwhelming it can get. I think it is interesting to look back and reflect on how the solution unfolded. It is also enlightening to look back on the exact moment that we get the “aha” in problems that have this “aha-moment”. Following this, I will describe the *correct thought process*, followed by notes on the correctness and efficiency of the algorithm.

Incorrect Thought Process

When we first read the problem, the situation seems bleak. How do we even begin to solve this?

Consider the very first step. The first player can choose absolutely any node to begin with. Therefore whatever strategy we choose, we need to iterate through all the nodes at least on the first choice. The first node for the second player is an almost similar problem.

Ok, let’s forget about the first node for the moment. Let’s say the game is in progress, and player 1 has to make the choice of the next node. Do we at least have a simpler problem here? Very quickly our thoughts can pull us in multiple directions. Here are some ad-hoc things I considered:

  • If a node has score more than the sum of all scores of all other remaining nodes, I want to choose it. This is a pseudo-greedy criteria.
  • I obviously want to try and choose a node with many points, to maximize my final score.
  • I want to choose a node with as many neighbors as possible. This will open up many options for me in the future (considering rule 3).
  • I also want to choose nodes that have neighbors with high points. This is so that I can try to claim those neighbors in the future.
  • [Already the overall situation is starting to look chaotic and bleak by this point… Notice that I really have no idea what kind of algorithm I’m trying to get. I’m just brainstorming at this point, and it isn’t going very well.]
  • Ok, how about we try to choose nodes that will limit the choices we present to our opponent. Maybe we should choose nodes that will block off our opponent. [How do I do this?]

I could go on and on. But the above bullets summarize the primary problem. All I have so far are random and chaotic thoughts, and really no algorithm. I was tempted to just *code some ad-hoc algorithm* based on the above criteria, to at least get some partial credit. This is a *horrible* thought, and I am so glad I did not pursue it. The bottom line is that you want to solve the problem on paper first, and convince yourself that your approach is rock solid. Only then should you start typing any code. Otherwise you will spend a lot of time painfully debugging and hacking away at a piss-poor “algorithm”.

Another thing to notice is the constraint on the input. The tree can have as many as half a million nodes. This rules out algorithmically intense algorithms, because the algorithm essentially has to run in linear time to process such a huge amount of data.

The situation looks bleak.

Correct thought process

Before I go ahead, lets clarify one thing. Although the problem says “tree”, what we really have (given the input sample) is an “un-rooted” tree. i.e. a tree with no specific node marked as a root. Essentially, we have a spanning tree in an undirected graph. i.e. The tree in this problem is really an undirected graph with no cycles and forms one big connected component.

Many times, it is instructive to draw out a slightly non-trivial example on paper, and try to reason about the problem. Visualizing the problem helps the brain focus on the issues at hand, and can lead to interesting solutions.

This is the approach that I now employed. Notice that the sample example is extremely trivial, and as such doesn’t hint at any possible solution. Here is the actual example I drew out on paper. I purposely didn’t assign any points to the nodes, because I wanted to reason about the topology before I started complicating the problem by thinking about actual point values.

 

tree-game

Here’s a critical step. Let’s just pick a node (say the red one), and simulate what happens. Say player 1 picks the central node. Once you do this you realize something. The entire graph has now split into multiple trees, which are disconnected from each other. This is due to rule #1, which says that no player can re-pick an already picked node. Also since the starting point is a tree, there were no cycles. Therefore all future plays will have to choose a node from one of the disconnected trees (the 4 branches of the middle node).

Ok, this is good! Why? Because this immediately makes us realize the following. The second player is *stuck* with only 1 tree! Why is this? Well, consider rule #3. Once the second player picks a random node for his first play, he can only pick nodes in that sub-tree.

At this point, I had the epiphany. “Oh my God, this is a minimax problem!”

How so? We now have 2 key pieces of information. After the first player makes his first choice, for the remaining trees, the second player will:

  • Choose the tree with the highest total weight.
  • Choose the node that is directly adjacent to the first player’s node in this tree.

Consider that after the first player makes his first choice, the second player is limited to one sub-tree. Now ideally he wants to claim the largest such tree to maximize his final score, and therefore bullet one holds!

Also, ideally he wants to claim ALL nodes in his tree. This will give him his maximum score. If he does not choose the node adjacent to player 1’s node, then player 1 will pick that node in the second player’s tree in his next play. Thus player 2 won’t have *his* maximum score. Therefore bullet two holds!

And that’s it! At this point, believe it or not, the whole problem is solved. The game will play out as follows.

Player 1 will pick a node with the following characteristic. Out of all the sub-trees for that node, consider the maximum value subtree. Player 1 will pick a node that “minimizes this maximum value”. This is why it is a minimax problem! This is because he knows that players 2 will pick the maximum value sub-tree from the trees that remain. And he wants to lose the fewest points, and therefore picks a node that minimizes this value for player 2.

Player 2 will predictably pick out the maximum value sub-tree from the resulting sub-trees, and specifically pick out the node next to player 1’s. After that the game just plays out by picking adjacent nodes (breaking ties arbitrarily). Knowing how the game will play out, we can compute the desired scores.

Notice how elegant this solution is:

  • Everything hinges on the first play by the first player.
  • This solution very elegantly satisfies all the ad-hoc sub-points we listed earlier.

Try to construct a counter-example for this algorithm, and you will find that you are unable to do so! Now the actual implementation is trivial, save for one detail. You need to do top-down DP (memoization) so that you don’t recomputed the sub-tree sizes again and again. Here is my final solution, which runs in linear time, and consists of very little code.

What a beautiful problem! 🙂

Contest: Week of Code 17

Contest: Week of Code – 17

It’s another lazy Sunday evening, and I thought it would be a good time to report out on my HackerRank contest from the first week of September 2015. This was one of the “Week of Code” contests, WoC #17 to be exact. I talked about WoC #16 previously here.

This time around I placed 57 out of 3569 participants, placing me once again in the “gold” category and in the top 2%. I always seem to do much better in the “marathon matches” that aren’t timed as aggressively as the 101-Hacks. Of course, I need to work on my speed at the faster-paced contests. I also need to work on the consistency of my results now. Getting the occasional gold is good, but I want to be able to consistently get gold (top 4%) or silver (next 8%), with the occasional bronze (next 13%).

To recap: This contest has multiple problems that are released one per day over the course of a week. The problem difficulty scales each day, and the last few are especially tricky/difficult ones. This week, we had the pleasure of solving the following problems:

# Problem My Solution Algorithmic Technique Time Complexity per test case Level
1 Find the Robot GitHub Basics O(1) Easy
2 Count Friday the 13th GitHub Pre-computation, and basics O(#months) Easy
3 Road Reform GitHub Dijkstra (graph) O(n log m), O(n^2) Moderate
4 Cross the River GitHub Dynamic Programming O(n^2) Difficult
5 Garden Planning GitHub Backtracking w/ Pruning O(2^n) Advanced
6 Counting Permutations GitHub Tree Isomorphism, Memoization O(n^2) Expert

Note that some of my solutions are actually different from those in the editorials. For example:

  • I solved #4 using DP w/ O(n^2) complexity. The editorial mentions segment tree and heaps, with O(n log n) complexity. This is why my solution timed out on 2 of the 21 tests, but computed all the others correctly. I think my solution is easy to understand though 🙂 and probably has small hidden constants (which is why it ran all but 2 of the tests within 2 seconds, even with a quadratic solution). We could probably drop the algorithmic complexity a bit by using a good data structure for “nearest neighbors” geometric algorithms.
  • I couldn’t think of anything except brute force for #5, which was exponential time. 😦 I used a backtracking approach, w/ as much pruning of the search space as I could. However it was still abysmally slow. I only cleared 11/41 tests, the rest just timed out! The editorial mentioned a polynomial time algorithm, but I haven’t read it in detail yet.
  • The last problem #6 was really cool, and I urge the reader to take a quick glance at the problem statement. Here’s the strange part. My solution matches the editorial almost exactly in concept. I came up with the same idea and the same 3 cases (no child, 1 child, 2 children), and the same formulas for computing the results. However there is obviously some bug in my code, because I passed all 6 pre-tests, but failed on the full suite of tests. I’m going to try to spend some time tonight to debug my code for this problem.

As is customary, I will discuss my favorite problem of the contest in more detail below. This time around, I had a tough time picking out a favorite problem, because I really liked all of the last 4 problems! But after some deliberation, I decided to pick out the dynamic programming problem, since DP is one of my favorite algorithmic techniques. And if you are a regular reader of this blog you probably know that, since I have talked about DP many times. I know, I know, I really need to stop talking about DP. But I can’t help it, it is just so elegant. 🙂

Here is the problem statement w/ the constraints.

Problem: Cross the River

You’re standing on a shore of a river. You’d like to reach the opposite shore.

The river can be described with two straight lines on the Cartesian plane, describing the shores. The shore you’re standing on is Y=0 and another one is Y=H.

There are some rocks in the river. Each rock is described with its coordinates and the number of points you’ll gain in case you step on this rock.

You can choose the starting position arbitrarily on the first shore. Then, you will make jumps. More precisely, you can jump to the position (X2,Y2) from the position (X1,Y1) in case |Y2−Y1|≤dH, |X2−X1|≤dW and Y2>Y1. You can jump only on the rocks and the shores.

What is the maximal sum of scores of all the used rocks you can obtain so that you cross the river, i.e. get to the opposite shore?

Input Format

The first line contains four single space-separated integer numbers, N,H,dH,dW, denoting the number of rocks, the distance to the opposite shore, and the jump parameters dH and dW, respectively.

Each of the following N lines describe the rocks in the format Yi Xi Zi, where Yi and Xi are the coordinates of the rock and Zi is the number of points you’ll gain in case you’ll jump at this rock.

Constraints

1≤N≤105

1≤H≤107

1≤dH<H

1≤dW≤105

1≤Yi<H

0≤Xi≤105

−104≤Zi≤104

No two rocks share the same position.

There is always at least one way to cross the river.

Output Format

Output the maximum possible score on a single line.

Sample Input

5 10 3 3

2 2 7

2 3 5

5 5 -1

8 3 10

8 8 12

Sample Output

18

I decided to sketch out a diagram to better explain how we arrive at the answer for the sample above.

CrossTheRiver

I have positioned the rocks (with their associated points) in approximately the correct positions. Given the constraints of the problem, the optimal path to take is to jump from the bottom shore, onto the rocks in the order shown, and finally to the other end. This nets us the desired maximum of 18 points.

Modeling the problem:

Note that the problem constraints only allow for upward motion in the solution. i.e. once you move up a certain distance, you can only jump to rocks that are even higher up, not back down. Of course, you can only jump to rocks within dH and dW of the current position.

How do we model this problem, and why does DP work? We can answer this question simultaneously by exhibiting the two salient features of DP in this problem. But prior to doing so, we need to do some pre-work. In particular, we will sort the input (positions of the rocks) along the values of their y-coordinates.

Optimal Substructure: Consider the rocks nearest to the destination shore.

Consider one such rock, say the one marked with 10 points. Since there is no higher rock, *if we were to end up at this rock* while following the optimal path, we would just have to jump to the shore. Therefore, in the DP cache, the points allocated for this position is 10. Now *if this rock were indeed in the optimal path*, then we just have to recursively solve the subproblem of getting from the bottom shore to this rock. Notice how we broke up the problem, into two steps… getting from this rock to the shore upwards, and recursively getting from the bottom to this rock.

In fact, there exists another rock 12, off to the right, for which we can perform an exactly similar analysis. We can divide up the problem into getting from 12 to the shore at the top, and the subproblem being getting from the shore at the bottom to the rock marked 12.

Of course, we don’t really know whether the rocks marked 10 or 12 is on the optimal path. In classic DP style, when we don’t know which possibility is the correct one, we evaluate them ALL, and choose the best! This is what my function findBestAlloc() does in the code.

Overlapping subproblems: Let’s exhibit the other salient property of DP.

In particular, you may have noticed that in the above analysis, we solve 2 sub-problems:

  1. The sub-problem of getting from the bottom shore to the rock marked 10.
  2. The sub-problem of getting from the bottom shore to the rock marked 12.

As you have guessed correctly, these sub-problems involve yet additional subproblems, which recursively involve their own sub-problems. And many of these will be in common, and will be evaluated again-and-again. For example, in finding the answers for subproblems #1 and #2, we may evaluate the path from *bottom-shore to rock marked -1* again and again.

Now envision a test case with tens of thousands of rocks. Evaluating each such sub-problem recursively will result in an exponential number of calls. Of course, since we decide to memoize the results in a software cache, we end up converting our O(2^n) calls to O(n^2) lookups in the cache!

Here is the recurrence equation. Let us use MP(i) to denote the Max Points obtained by getting from rock #i to the shore on the top. Here i refers to the index of the rock in the sorted order previously mentioned.

MP(i) = Max-across-all-candidates-j (rocks[i]+ MP(j))
We choose candidate indices j w/ these constraints:

j > i, j < rocks.size()
yj > yi
abs(yj – yi) <= dH
abs(xj – xi) <= dW

And there you have it. That is the whole solution to the problem. At the end, we look at all the rocks that we can *immediately* jump to from the shore at the bottom, keep a max of the points we computed to each of these rocks, and that is our answer. 🙂

Contest: Grid Maze (Graph Theory)

Grid Maze

I competed in yet another contest this week on HackerRank. This one was called 101-Hack. It is a rather intense and fast paced contest. I have come to realize that the 101-Hack’s constantly kick my butt, far more than other contests (I placed 414 out of 1359 contestants). The reason is that we have 5 problems, at least 3 of which are difficult, and only 2 hours to solve them all!

Now to be perfectly honest, I think I am a decent programmer. I have kept improving my Algorithms and DS knowledge, and I’m pretty good and quick with both C++ and Haskell coding. The problem though, is that “pretty good” doesn’t cut it for 101-Hack. I look at the guys that top the leaderboards, and these guys are *insanely* quick! Often I notice that I haven’t even finished reading the problem statement for the first problem, and some guys have turned in working and correct code for it already. Jeez… that’s just crazy. 🙂

Sure, I haven’t competed anywhere as much as these guys, but someday I hope to be as quick as the pros. Until then, I will keep competing and improving my abilities. Even the contests that are tough, are so much fun!

Anyway, like every contest, this contest had one problem that immediately stood out from the others, and was my favorite problem on the contest. Now unfortunately, I did not get enough time to attempt this problem during the contest. But I worked on it after the contest, and have a partial solution. I say partial, because as of now this solution only clears 16/23 test cases. It is algorithmically very efficient, but there is a functional issue with it. This is something I have already anticipated, and I have a very good handle on what exactly is wrong with it. I will talk about the problem in detail towards the end, but I just wanted the reader to be aware of this issue. Hopefully I can work on the full solution once I find some more time.

Problem: Grid Maze ; Level: Difficult

Without further ado, here is the problem. I will summarize the problem below:

We are given a maze, in the form of an NxM grid. The maze has certain cells that are empty, and others that are “walled off”. The empty cells are denoted by ‘.’ And the walled off cells are denoted by ‘#’. In addition to this, we have two special cells, marked ‘S’ and ‘P’. Both the cells marked ‘S’ and ‘P’ have no walls in them.

We find ourselves in the starting cell marked ‘S’. Our goal is to make our way to the cell marked ‘P’, loot the treasure located there, and then make a beeline for the exit. You can only move N, S, E or W. The exit is any cell along the perimeter of the NxM grid. As you know, certain cells are walled off, and there may not exist a direct path to the loot or to the exit. Our goal is to complete our mission (loot and run), by knocking down the “smallest number” of walls!

What a gorgeous problem. 🙂

Constraints:  (1 <= n, m <= 1000)

Here is a pictorial representation and the first example from the sample:

5 6

##..##

######

S#####

##P##.

###..#

We can break down 2 walls, loot the treasure, and then exit without breaking anything else. i.e. one example optimal path is to go down from S, then go to the right and loot P. Then we can just go left towards the exit (along the walls we just knocked off). Thus the optimal answer is 2.

Here is another sample:

4 4

…#

#P.#

##.#

.#S.

Here we can just head north from S, up two cells, make a left, take the loot, and head back the way we came to exit. Therefore we don’t need to break any walls, and the optimal answer is 0.

Solving the problem:

The first thing that comes to mind on reading this problem is “graph shortest path”. Sure, there are additional constraints on the problem, so we need to set up the graph very precisely. But this seems like a very good direction to proceed in. Note that the “shortest path” isn’t necessarily the “shortest hop” path.

We can follow various steps to set up, model, and solve the problem.

Setting up the problem:

Here are some things we can conclude from the problem statement:

  • It is preferable to move through many empty rooms to get to our goal, instead of breaking down even a single wall. i.e. we will prefer paths that don’t involve wall-breaking.
  • For a first try, we can model this as two separate shortest path problems. Get the shortest path from S to P. Then get the shortest path from P to exit.

If your intuition says that the second conclusion is too simplistic, you are correct. This is in fact the source of my incorrect 7 test cases. The reason is as follows:

Consider the shortest paths from S->P and P->Exit.

Also consider a roundabout path that goes from S->N->P and P->N->exit. Each of these individually may be longer than the corresponding shortest paths. (e.g. S->N->P may be longer than S->P and P->N->exit may be longer than P->exit). However, by virtue of sharing a common segment, we may actually end up breaking fewer walls if we take the roundabout paths.

To be sure, this is not a frequent occurrence. But it is frequent enough to fail 7/23 tests. More on this later with an example.

For now let’s proceed with our plan.

Modeling the graph:

So how do we construct the graph?

  • Consider each room as a node in the graph. We will *name* these nodes uniquely based on their row# and col#. This way, each Coord in the grid can be changed to an integral node number, which can be used to index into the adjacency list of the graph (as shown in the code later).
  • Each room is connected to its N, S, E, W neighbors if they exist.
  • Going from an empty room to another empty room costs EMPTYCOST.
  • Going from an empty room to a walled room involves breaking down a wall. So it would cost WALLCOST.
  • Similarly, going from a walled room to another walled room involves WALLCOST.
  • Going from a walled room to an empty room is free, and costs EMPTYCOST.
  • Note that I purposely set this up as a *directed* graph, and not as an *undirected* graph. This is because going from a walled room to an empty room costs EMPTYCOST, while going back from that empty room to the walled room costs WALLCOST. i.e. it is NOT symmetric in nature.
  • Finally, what values should I pick for the EMPTYCOST and WALLCOST? I could pick say 1 or 0 for EMPTYCOST. I picked 1, but even 0 should do I think. For WALLCOST, remember that it is preferably to walk the entire maze of empty rooms, instead of breaking down a single wall. Therefore I set wall cost to N*M, which is the cost to walk the entire maze assuming it is empty.
  • Good, we are almost done. Just one more thing to do. Let’s construct a phantom node, call it EXIT, and connect all the perimeter cells to this node. Now we just need to run the shortest path algorithm from P->EXIT, instead of P->(every cell in perimeter).

Coding:

Here is my C++ code for this problem. A few things to note:

  • I have utility functions makeNode() and getCoord() to convert between an integer representation for the cell, and the actual coord for that cell.
  • computeGraph() is used to construct the graph from the input, as specified in the section above.
  • shortestPath() runs Dijkstra’s shortest path algorithm from Src to Dst. It returns a vector of nodes that we need to visit in order.
  • breakWalls() takes this list of Nodes, and breaks the walls along this path. I also modify the graph itself, so that the edges are properly manipulated.
  • Dijkstra is called twice, once for S->P and once for P->EXIT.
  • breakWalls() is similarly called twice, and keeps track of how many walls are broken.
  • We return this number (of walls broken) at the end, through minWallBreakdowns().

Algorithmic Efficiency:

This is the standard O(|E| + |V| log |V|) required for Dijkstra. Pretty quick and efficient!

Functional Debugging:

As mentioned previously, 16/23 tests pass without any issues. The rest fail due to the problem mentioned before.

Here is one such failing test, so that we have a concrete example:

4 5

.###.

.##S#

.#P##

.##..

Note that there are two shortest paths from S->P

  1. This path takes a left from S, then moves down to P.
  2. This path moves down from S, then takes a left to P.

Both these paths involve breaking down a single (different) wall. My algorithm chooses randomly among the best paths, so goes with #1.

However, had I chosen #2, I then would have had a free run from P, to the right and down to freedom (since the wall at the right of P would already have been broken). But since I chose #1, I had to incur the cost of another break here (during the P->EXIT run). This is why my algorithm fails certain tests.

A possible fix:

A possible way to fix this is to compute ALL shortest paths from S->P, and then simulate wall-breaks and corresponding P->EXIT shortest paths for all the prior shortest paths. Choosing the best among these would have given me the correct result.

However, it won’t solve the issue I mentioned previously. i.e. if two sub-optimal paths can be joined together to provide a better solution than 2 optimal but disjoint paths. To be honest, I need to come up with a concrete example to convince myself of this issue. But I feel intuitively that this is a real problem. 🙂

Anyway, so that’s how this contest went. Hopefully I can work on this problem some more, and get the final 7 tests passing soon. I have a few more contests coming up this month, so I’m pretty stoked about that as well.

Thanks for reading!

Contest: When Constant Factors Matter…

When Constant Factors Matter

This is Part II of the competitive programming blog I started last time. You can read Part I here.

This blog is a bit of a departure from the previous blogs. Previously I discussed some algorithms/data-structures that I employed, to solve various problems. This blog continues in that tradition, but it also talks about the “constant factors” hidden behind the Big-Oh notation. In particular, we will see that sometimes optimizing the hidden constant can be enough to just push us into the desired time-limit, even with a sub-optimal algorithmic solution. To be sure, it is not something I recommend constantly. But it is an option when one can’t further think of algorithmic optimizations.

Previously, I discussed all but the last problem from the contest, so without further ado, here is the last problem:

Problem 5: White Falcon and XOR

Here is the problem statement and example from HackerRank:

There are two sequences, a and b, each of length N. We will create new sequences out of a and b and write them onto a paper one by one with the following rules:

Choose two integers L and R. (1≤L≤R≤N)

Obtain a new sequence with size R−L+1 by choosing one of ai or bi for each i L≤i≤R.

More formally let the new sequence be w. Then:

w1∈{aL,bL}

w2∈{aL+1,bL+1}

w3∈{aL+2,bL+2}

wR−L+1∈{aR,bR}.

Note that there are 2R−L+1 ways to choose w in total, given L and R.

We would like to know number of sequences on the paper with xor sum equal to 0 for each length. Specifically, we want N numbers, where the ith number denotes the number of sequences with length i and with xor sum equal to 0. We only want each number modulo M.

Constraints

1≤N≤8000

0≤ai<64

0≤bi<64

1≤M≤10^5

Example:

N = 4  M = 1000

A = 2 2 3 1

B = 2 3 2 2

Output:

0

5

2

4

Explanation

There are 5 possible ws with length 2:

For (L,R)=(1,2): (2,2) and (2,2)

For (L,R)=(2,3): (2,2) and (3,3)

For (L,R)=(3,4): (2,2)

There are 2 possible ws with length 3:

For (L,R)=(2,4): (2,3,1) and (3,2,1)

There are 4 possible ws with length 4:

For (L,R)=(1,4): (2,2,2,2), (2,2,2,2), (2,3,3,2) and (2,3,3,2)

A sub-optimal solution:

At first the problem appears pretty intractable. But then we notice that the various values for ai and bi lie between [0, 63]. This can be exploited in order to design a dynamic programming solution. Here was my first take on this solution.

In particular, consider maintaining a DP cache of capacity [2][64]. The [2] portion is used to maintain the sizes of the cache for the “previous” and the “next” iterations in the bottom-up DP algorithm. The [64] is used to track the number of sequences that XOR to that particular value.

This cache is initialized by the actual a[i] and b[i] counts (i.e. all XOR sequences with size = 1). This is our base case. We then proceed through increasing values of size from 2 to L, and keep updating the counts in the DP cache. Simultaneously, we update the xorCount[] array with the results after each iteration of size, so that we can print out the answers at the end.

This solution is functionally correct. However it is really slow. The algorithmic complexity is O(N^2 * 2^M), or in this case O(N^2 * 64); where N = length of the sequences and 2^M = the max value of each element (Here 2^M = 64). In terms of actual runtime, this algorithm takes around 19 seconds on most of the inputs. The time limit cutoff was 2 seconds, so this is about 10 times slower!

An Algorithmic Optimization Attempt

Of course, my first thought in the contest was to try and optimize this algorithmically. It seems that the problem should be amenable to a divide-and-conquer approach. Divide the sequences into two portions, compute the corresponding values for size/2, and then in the merge step get the corresponding values for everything from size/2+1 to size.

Seems simple in theory. There was only one problem…. For the life of me, I could not figure out how to do the merge step in time less than O(n^2). No matter what I tried, the merge step kept taking quadratic time. And if the merge takes quadratic time, then the overall complexity of the algorithm will be dominated by this factor, as per the master theorem. Clearly such a solution is no better than my DP solution.

EDIT: I peeked at the editorial after the contest. Turns out D&C was indeed the way to go, but the merge step required knowledge of Fast Fourier Transform to do it faster than O(N^2). Sounds fascinating! I need to put that in my mental queue, so that I can go back and learn this technique.

So what do I do now? The clock is ticking on the problem, and I can’t seem to figure out a solution better than O(N^2). Well, since it was a weekend, and I had plenty of time, I literally started fooling around with the code. What optimization tricks could I pull off on my DP solution, in order to make it run faster?

When Constant Factors Matter…

I’m lucky to have had a really good compilers professor in grad school. I have also worked at Intel on the x86 simulator team for the last 5 years, so I’m pretty familiar with the low level workings of a CPU. Could I perhaps exploit this knowledge to improve the runtime of my code somehow? Of course, there’s no way I could improve the runtime from 19 seconds to 2 seconds needed for a successful submission… could I? I had no idea, but I knew I wanted to tweak the code as much as possible…

Let’s examine the bottlenecks in my code:

Optimizing away the modulus:

The very first thing that jumps out is the number of times I perform modulus all over the code. This wouldn’t be a problem if M was a power of two. In such a case this turns out to be a simple bit operation. However the general integer division instruction is far slower. (Integer division performs both divide and modulus to give quotient and remainder). So performing modulus all over the place is definitely slowing down the code by introducing latencies in the CPU pipeline. These have got to go.

But how do we do this? I mean the problem statement asks for the answers “modulo M”. The key insight here is to understand that we don’t need to perform the modulo on *every* iteration. Luckily for us, the maximum value of M is 10^5. And U32’s can hold about 4×10^9 and U64 can go up to 16×10^18. This gives us quite a bit of room to play with!

If we use a 32-bit number to hold the results, we may be able to get away with performing the modulo every X iterations. On the other hand, if we use a 64-bit number, we may be able to perform the modulo every Y iterations (Y > X). But keep in mind this increases the size of our DP cache (in bytes) because a U64 takes twice the space of a U32. This will affect CPU data cache performance somewhat.

So what do we do? Take the smaller or larger data type? And how do we determine the values of X and Y? Luckily, both of these questions are answered by performing repeated experiments. In fact, let’s take a page out of the algorithms techniques, and binary search on these values in our experiments! Therefore I performed a number of experiments using U32 and U64, and determined the following:

//#define MODCNT (45)  //experimentally determined for U64, dont’ touch

//#define MODCNT (15)  //experimentally determined for U32, dont’ touch

i.e. X = 15 and Y = 45. Every 15 iterations I need to perform the mod M operation if I use a U32, and every 45 iterations I need to perform the mod M if I used a U64. This would ensure that the modulus was done *before* any data overflow, and hence keep the results correct. In practice, the latter (U64, w/ 45 iterations) performed better, and that is what I settled on.

Here’s another interesting problem, how do I know (within the inner for loop) whether I have passed 45 iterations? The naïve way is to do a modulo 45 on the iteration counter to figure this out. But the whole point is that we are trying to optimize away the modulo in the for loop. 🙂 Therefore, I pre-computed the Boolean values for the various iteration counts *outside* the loops:

for (int mult=0; (mult*MODCNT)&lt;LIMIT; mult++) {
        int val  = mult*MODCNT;
        multCheck[val] = true;
}

Optimizing away unnecessary memory writes:

At the end of the for loop, I am swapping the prev and next caches, so that I can start writing into them. Instead of copying the values, I re-factored the code to perform a simple pointer swap. This saved the write of 64 U64s or 512 bytes of memory at the end of every inner iteration. While this may not seem like much, over the course of the entire program, it saved (8000 * 8000 * 512) = ~31 GB of unnecessary memory writes! This was HUGE!

Manual loop unrolling, and re-factoring writes to the cache:

Here’s another place where I could optimize:

U8 xorval = i ^ as[R];
NumSeq incBy = cache[prev][L].results[i];
cache[next][L].results[xorval] += incBy;
cache[next][L].results[xorval] %= M
                :
                :
U8 xorval = i ^ bs[R];
NumSeq incBy = cache[prev][L].results[i];
cache[next][L].results[xorval] += incBy;
cache[next][L].results[xorval] %= M;

This is a snippet of code in the inner for loop. The compiler really has no idea whether these various memory writes alias, and therefore it is very conservative in generating code for these writes.

Instead, with a bit of refactoring, I could re-write the code in the following fashion:

cacheNext[i] = cachePrev[asr ^ i] + cachePrev[bsr ^ i];

This loop goes from i=0 to 63. Not only is there no dependency across loop iterations, but it has very good L1-cache behavior as well! This is a definite win. I wasn’t sure whether the compiler would perform loop unrolling, so I manually unrolled all 64 iterations for i.

Here is my resulting final submission.

Here’s the real shocker:

On gcc 4.8.1, with O3 optimization, by first DP submission ran for 19 seconds.

On gcc 4.8.1, with O3 optimization, my final DP submission ran for 1.92 seconds!

This is an astonishing 10x improvement *over O3 optimized gcc code*. Without any algorithmic complexity changes! Of course, the solution is still functionally correct.

I can honestly say that I have never performed such an optimization before. It was good enough to pass the 2 second time limit on all the test cases, and I succeeded with this problem in the contest!

It was also an eye-opener for me. Yes, algorithmic optimization is the preferred way to go. But when you can’t optimize it any further with Big-Oh, perhaps optimizing away the hidden constants can help you hit your target runtimes! 🙂

Competitive Programming

Competitive Programming

Last week I competed in yet another online programming competition hosted by HackerRank, where I placed 32 out of 2324 competitors. This placed me in the “gold” category for this contest (top 4% gold, next 8% silver, next 13% bronze). Of course, it’s all fun and games, but can get quite addicting. You get to compete against some really talented programmers, many of whom I admire greatly, and hope to someday emulate. I thought it would be nice to blog about the contest, the problems and my solutions, and to talk a bit about why any serious programmer should at least try competitive programming. Finally I will go more into depth on 2 of the 5 problems, because I thought they were interesting and my solution to one was a bit… unconventional!

Why competitive programming?

Quite simply, no one ever improves by doing the same thing day in and day out. In order to improve yourself at *anything*, you really need to step outside your comfort zone. If you do the same routine in the gym every day, you will maintain your level of fitness. But in order to progress, you *must* push yourself to do more reps/weights or the same amount in less time.

Programming is no different. Ideally you want to pick problems that are difficult for you, but not insanely so. Choose something that is just outside your abilities, and try hard to come up with a correct/efficient solution. As you get better and better, keep scaling the difficulty level. Iterating this way will greatly enhance your skills as a developer. And competitions are a great way to develop in such a manner.

Let’s also be honest. Competing is addictive. 🙂 You really have no idea how good you are until you test yourself against your peers. Programming competitions give me the same feeling that motorcycle racing did on the racetrack… *butterflies in the stomach*. Prior to any contest you get the *butterflies in the stomach* feeling. What will be on the contest? Will it be computational geometry? I know I should have practiced more at that, I suck at it right now. Or maybe it will be dynamic programming. I rather enjoy those, I wish it is dynamic programming… Just remember, *butterflies in the stomach* is good, because it pushes you to get outside your comfort zone.

Finally, competing is good for another reason. It teaches you how to fall gracefully. If you don’t place well, it reminds you that you need to pick yourself back up and try better next time. This is another quality really worth cultivating. As Rocky says, “It’s not about how hard you can hit… it is about how hard you can get hit and keep moving forward.” Dealing with failure and learning from it, and going back in the arena to fight another day, is a rare quality worth cultivating!

Contest Format

Ok, enough philosophy for now. Onwards with the contest…

This particular contest was Week of Code – #16. This type of contest contains 5 problems, and its format is intriguing. It starts on a Monday, and every day the organizers release 1 problem for contestants to solve. The problem difficulty scales up as you progress through the week, which makes for an exciting time. Hidden test cases are run against the solutions at the end of the day, so unless you have guaranteed the correctness and algorithmic efficiency of your code, it will most likely fail the full test suite. Also, for any problem, the clock doesn’t start ticking until you actually open the problem. I cannot stress how useful this is for us working professionals. This means that we can work all day, go workout, come home and spend some time with the family, and then do some competitive programming for a couple of hours prior to bed. What a perfect way to end the day! 🙂

I will discuss the specific problems out of order, so that I can save my favorites for last.

Problem 1: Sum of Absolutes

The first problem in the contest. This wasn’t hard at all, but then again, it was a warmup. The interesting thing is that if you implemented the naïve solution as indicated by the actual function Find O(N*Q) algorithm, you would have almost certainly run out of time. I implemented a variant on the prefix sum, taking into account the absolute, and my algorithm had a complexity of O(Q) after an O(N) pre-computation, for O(Q+N) overall complexity. This was very efficient and easily passed the test cases (for correctness as well as speed).

Problem 3: Sprinklers

This was the third problem in the contest. This problem had 2 variables which contributed to the cost function, and we needed to minimize the cost function by choosing appropriate values for both. Note that if you *just knew* the optimum value of Q, you could code up a greedy strategy to place the N sprinklers properly, to be able to water all roses. My algorithm determined the minimum value of Q that was needed, and ran the sprinkler-placement subroutine for gradually increasing values of Q. At the end, since I had explored all the values of Q, I could determine the minimum cost function.

Of course, this O(N^2) solution isn’t the fastest algorithmically. An obvious question is whether I could speed up the search for Q by using binary search (instead of a linear scan). I ran a small experiment where I printed out the values of (N, Q, Cost) for increasing values of Q. I found that the cost function was *NOT* monotonically increasing or decreasing with Q, which meant that I could not optimize by using binary search. i.e. it had multiple local minimas. This was a bummer, because as it stands, my solution was correct but not the most efficient. And sure enough, it passed all the preliminary test cases, but only cleared 12/18 hidden tests, timing out on 6/18. The time for this problem had been reduced to 1 second, so that didn’t help either. 🙂

I haven’t looked at the editorial yet, so I don’t know what the author’s solution was. I’ll probably try once more for a quicker solution before looking at the editorial.

EDIT: Oh bummer! I looked at the editorial, and it turns out we *could* optimize using binary search. Obviously I was on the right track, but my experiment above was doing something incorrect. I should have checked the experimental setup, before ruling out binary search! Oh well… live and learn.

Problem 4: Vim War

Another interesting problem! This one just screams NP-complete/NP-hard/NP. I think the low value of M is also a good indicator that it is some form of NP, and the solution is some function of O(2^M). This was just my gut feeling, and I haven’t checked the editorial yet. I may very well be wrong about the NP-complete nature of the problem. 🙂

Anyway, I was able to devise a correct solution pretty quickly. The solution has a complexity of O(N*2^M) though, so it isn’t exactly the fastest kid on the block. Now either the problem is some variant of NP, and this is the best we can do, or it just looks NP, but is actually P. Given that I passed 22/34, but timed out on 12/34 tests, I suspect it is the latter.

EDIT: I peeked at the editorial. Turns out that the problem was a variant of NP after all! But there was a slightly more efficient O(N*M + M*2^M) solution. Oh well…

Problem 2: Lucky Numbers

What a beautiful problem! This was my favorite problem on the contest!!

Here is the problem statement: You are given three integers xy and z. You have to find the sum of all numbers whose digits are made of only 45 and 6, that have at most x fours in decimal representation, at most y fives in decimal representation and at most z sixes in decimal representation. As summation can be very large, print it modulo 10^9+7.

And here is an example for x=1, y=1, z=1:

4+5+6+45+54+56+65+46+64+456+465+546+564+645+654=3675

Let’s first talk about the intractable naïve solution. Let’s also examine the worst case input of x=100, y=100, and z=100.

We could construct all 1 digit numbers, all 2 digit numbers….. and so on… until we get all (100+100+100) digit numbers.

Let’s see how many 300 digit numbers we can construct. Exactly 100 of these digits are 4s and exactly 100 5s and exactly 100 6s, though these could be in any order within a single number. This gives us, (300 C 100) * (200 C 100) numbers. This turns out to be: 376523493564631064367712071965768747782444205128669798396168767743500485766630075466163294008566118208045715304490994009624725072511252178400

That is a 141 digit number. There are about 10^141 candidates just for all the 300 digit numbers!

Let’s put this into some perspective.

Imagine a future CPU that can run at 10 Ghz, and does 1 addition per *cycle*. So it can perform 10^10 additions per second. Imagine that we have a cluster of 10 Billion such CPUs performing additions in parallel. That is 10^20 additions per second. In other words, this humongous computing cluster will take ~ 10^120 seconds to process the above candidates. The age of the universe is estimated at 14 billion years. In seconds that amounts to less than 10^18 seconds. So our mythical 10 billion CPU cluster of futuristic CPUs, would have to run more than 10^100 *times* the age of the universe to finish the above computations!

But wait, it gets even better. The estimated number of atoms in the observable universe is 10^80. So even if each atom in the universe were used as a memory device to store one candidate number, we would still fall woefully short (10^60) on memory.

And we haven’t even considered all 299, 298, 297…1 digit number combinations. This analysis was only for numbers that were exactly 300 digits long.

At this point, you should at least be chuckling, if not outright laughing out loud! 🙂 Clearly we need a better method.

Dynamic Programming to the Rescue

The key insight comes when you decompose a single number. Let us represent for example, an 8 digit number, as a string of digits “abcdefgh”. Each alphabet is a placeholder for a digit (either 4 or 5 or 6). So for example, one number corresponding to abcdefgh could be 64545545.

We can write abcdefgh as:

a * 10^7 + b * 10^6 + c * 10^5 + d * 10^4 + e * 10^3 + f * 10^2 + g * 10^1 + h * 10^0

Let us call this the decomposition property.

Now what if you knew the sum of all the 7-digit combinations? How could this help in constructing 8-digit combination sums?

More generally, let us represent S(i, j, k) as the sum of all (i+j+k) digit numbers consisting of EXACTLY i 4s, j 5s, and k 6s. Now consider the most significant digit in this (i+j+k) digit number. It is either a 4 or a 5 or a 6.

More specifically, if it is a 4, we have concatenated to this 4 a (i+j+k – 1) digit number, consisting of exactly (i-1) 4s, j 5s, and k 6s. How many such (i+j+k-1) numbers are there? Exactly ((i+j+k-1) C (i-1) ) * ((j+k) C j). Call this value P.

Or, if this MSB digit is a 5, we have concatenated to this 5 a (i+j+k-1) digit number, consisting of exactly i 4s, (j-1) 5s, and k 6s. How many such (i+j+k-1) numbers are there? Exactly ((i+j+k-1) C (j-1) ) * ((i+k) C i). Call this value Q.

Or if this MSB is a 6, we have concatenated to this 6 a (i+j+k-1) digit number, consisting of exactly i 4s, j 5s and (k-1) 6s. How many such (i+j+k-1) numbers are there? Exactly ((i+j+k-1) C (k-1) ) * ((i+j) C j). Call this value R.

Now for each of these cases we can use the decomposition property. i.e. for the sum of all (i+j+k) digit numbers containing 4 as its most significant digit, the MSB contributes (4 * (10 ^ (i+j+k-1))) * P.  Once we add the values from the lower (i+j+k-1) digits to this (which was computed before), we can get the sum of all (i+j+k) digit numbers starting with 4.

We can do the same process for numbers starting with 5 and 6.

This leads us to a dynamic programming recurrence.

S(i, j, k) = [4 * ((i+j+k-1) C (i-1)) * ((j + k) C j) * (10 ^ (i+j+k -1 )) + S(i-1, j, k)]

+ [5 * ((i+j+k-1) C (j-1)) * ((i + k) C i) * (10 ^ (i+j+k -1 )) + S(i, j-1, k)]

+ [6 * ((i+j+k-1) C (k-1)) * ((i + j) C j) * (10 ^ (i+j+k -1 )) + S(i, j, k-1)]

And that’s it folks! Just program a bottom-up dynamic programming algorithm for this recurrence equation.

In the end, we need to add every S(i, j, k) for I <- [0..x], J <- [0..y], and k <-[0..z] to get our answer. Of course, be sure to perform modulo arithmetic at every step to prevent overflows.

Here is my final solution, with complexity O(XYZ). The best part? It runs in a fraction of a second on my slow laptop, even for the worst case (100 x 100 x 100) input!

Don’t you just love algorithms? 🙂

This blog has already gone on for long enough, and my fingers are tired. I will continue with part 2 later, where I discuss the final and most difficult problem of the contest next time. Thanks for reading!