Competitive Programming
Last week I competed in yet another online programming competition hosted by HackerRank, where I placed 32 out of 2324 competitors. This placed me in the “gold” category for this contest (top 4% gold, next 8% silver, next 13% bronze). Of course, it’s all fun and games, but can get quite addicting. You get to compete against some really talented programmers, many of whom I admire greatly, and hope to someday emulate. I thought it would be nice to blog about the contest, the problems and my solutions, and to talk a bit about why any serious programmer should at least try competitive programming. Finally I will go more into depth on 2 of the 5 problems, because I thought they were interesting and my solution to one was a bit… unconventional!
Why competitive programming?
Quite simply, no one ever improves by doing the same thing day in and day out. In order to improve yourself at *anything*, you really need to step outside your comfort zone. If you do the same routine in the gym every day, you will maintain your level of fitness. But in order to progress, you *must* push yourself to do more reps/weights or the same amount in less time.
Programming is no different. Ideally you want to pick problems that are difficult for you, but not insanely so. Choose something that is just outside your abilities, and try hard to come up with a correct/efficient solution. As you get better and better, keep scaling the difficulty level. Iterating this way will greatly enhance your skills as a developer. And competitions are a great way to develop in such a manner.
Let’s also be honest. Competing is addictive. 🙂 You really have no idea how good you are until you test yourself against your peers. Programming competitions give me the same feeling that motorcycle racing did on the racetrack… *butterflies in the stomach*. Prior to any contest you get the *butterflies in the stomach* feeling. What will be on the contest? Will it be computational geometry? I know I should have practiced more at that, I suck at it right now. Or maybe it will be dynamic programming. I rather enjoy those, I wish it is dynamic programming… Just remember, *butterflies in the stomach* is good, because it pushes you to get outside your comfort zone.
Finally, competing is good for another reason. It teaches you how to fall gracefully. If you don’t place well, it reminds you that you need to pick yourself back up and try better next time. This is another quality really worth cultivating. As Rocky says, “It’s not about how hard you can hit… it is about how hard you can get hit and keep moving forward.” Dealing with failure and learning from it, and going back in the arena to fight another day, is a rare quality worth cultivating!
Contest Format
Ok, enough philosophy for now. Onwards with the contest…
This particular contest was Week of Code – #16. This type of contest contains 5 problems, and its format is intriguing. It starts on a Monday, and every day the organizers release 1 problem for contestants to solve. The problem difficulty scales up as you progress through the week, which makes for an exciting time. Hidden test cases are run against the solutions at the end of the day, so unless you have guaranteed the correctness and algorithmic efficiency of your code, it will most likely fail the full test suite. Also, for any problem, the clock doesn’t start ticking until you actually open the problem. I cannot stress how useful this is for us working professionals. This means that we can work all day, go workout, come home and spend some time with the family, and then do some competitive programming for a couple of hours prior to bed. What a perfect way to end the day! 🙂
I will discuss the specific problems out of order, so that I can save my favorites for last.
The first problem in the contest. This wasn’t hard at all, but then again, it was a warmup. The interesting thing is that if you implemented the naïve solution as indicated by the actual function Find O(N*Q) algorithm, you would have almost certainly run out of time. I implemented a variant on the prefix sum, taking into account the absolute, and my algorithm had a complexity of O(Q) after an O(N) pre-computation, for O(Q+N) overall complexity. This was very efficient and easily passed the test cases (for correctness as well as speed).
This was the third problem in the contest. This problem had 2 variables which contributed to the cost function, and we needed to minimize the cost function by choosing appropriate values for both. Note that if you *just knew* the optimum value of Q, you could code up a greedy strategy to place the N sprinklers properly, to be able to water all roses. My algorithm determined the minimum value of Q that was needed, and ran the sprinkler-placement subroutine for gradually increasing values of Q. At the end, since I had explored all the values of Q, I could determine the minimum cost function.
Of course, this O(N^2) solution isn’t the fastest algorithmically. An obvious question is whether I could speed up the search for Q by using binary search (instead of a linear scan). I ran a small experiment where I printed out the values of (N, Q, Cost) for increasing values of Q. I found that the cost function was *NOT* monotonically increasing or decreasing with Q, which meant that I could not optimize by using binary search. i.e. it had multiple local minimas. This was a bummer, because as it stands, my solution was correct but not the most efficient. And sure enough, it passed all the preliminary test cases, but only cleared 12/18 hidden tests, timing out on 6/18. The time for this problem had been reduced to 1 second, so that didn’t help either. 🙂
I haven’t looked at the editorial yet, so I don’t know what the author’s solution was. I’ll probably try once more for a quicker solution before looking at the editorial.
EDIT: Oh bummer! I looked at the editorial, and it turns out we *could* optimize using binary search. Obviously I was on the right track, but my experiment above was doing something incorrect. I should have checked the experimental setup, before ruling out binary search! Oh well… live and learn.
Another interesting problem! This one just screams NP-complete/NP-hard/NP. I think the low value of M is also a good indicator that it is some form of NP, and the solution is some function of O(2^M). This was just my gut feeling, and I haven’t checked the editorial yet. I may very well be wrong about the NP-complete nature of the problem. 🙂
Anyway, I was able to devise a correct solution pretty quickly. The solution has a complexity of O(N*2^M) though, so it isn’t exactly the fastest kid on the block. Now either the problem is some variant of NP, and this is the best we can do, or it just looks NP, but is actually P. Given that I passed 22/34, but timed out on 12/34 tests, I suspect it is the latter.
EDIT: I peeked at the editorial. Turns out that the problem was a variant of NP after all! But there was a slightly more efficient O(N*M + M*2^M) solution. Oh well…
What a beautiful problem! This was my favorite problem on the contest!!
Here is the problem statement: You are given three integers x, y and z. You have to find the sum of all numbers whose digits are made of only 4, 5 and 6, that have at most x fours in decimal representation, at most y fives in decimal representation and at most z sixes in decimal representation. As summation can be very large, print it modulo 10^9+7.
And here is an example for x=1, y=1, z=1:
4+5+6+45+54+56+65+46+64+456+465+546+564+645+654=3675
Let’s first talk about the intractable naïve solution. Let’s also examine the worst case input of x=100, y=100, and z=100.
We could construct all 1 digit numbers, all 2 digit numbers….. and so on… until we get all (100+100+100) digit numbers.
Let’s see how many 300 digit numbers we can construct. Exactly 100 of these digits are 4s and exactly 100 5s and exactly 100 6s, though these could be in any order within a single number. This gives us, (300 C 100) * (200 C 100) numbers. This turns out to be: 376523493564631064367712071965768747782444205128669798396168767743500485766630075466163294008566118208045715304490994009624725072511252178400
That is a 141 digit number. There are about 10^141 candidates just for all the 300 digit numbers!
Let’s put this into some perspective.
Imagine a future CPU that can run at 10 Ghz, and does 1 addition per *cycle*. So it can perform 10^10 additions per second. Imagine that we have a cluster of 10 Billion such CPUs performing additions in parallel. That is 10^20 additions per second. In other words, this humongous computing cluster will take ~ 10^120 seconds to process the above candidates. The age of the universe is estimated at 14 billion years. In seconds that amounts to less than 10^18 seconds. So our mythical 10 billion CPU cluster of futuristic CPUs, would have to run more than 10^100 *times* the age of the universe to finish the above computations!
But wait, it gets even better. The estimated number of atoms in the observable universe is 10^80. So even if each atom in the universe were used as a memory device to store one candidate number, we would still fall woefully short (10^60) on memory.
And we haven’t even considered all 299, 298, 297…1 digit number combinations. This analysis was only for numbers that were exactly 300 digits long.
At this point, you should at least be chuckling, if not outright laughing out loud! 🙂 Clearly we need a better method.
Dynamic Programming to the Rescue
The key insight comes when you decompose a single number. Let us represent for example, an 8 digit number, as a string of digits “abcdefgh”. Each alphabet is a placeholder for a digit (either 4 or 5 or 6). So for example, one number corresponding to abcdefgh could be 64545545.
We can write abcdefgh as:
a * 10^7 + b * 10^6 + c * 10^5 + d * 10^4 + e * 10^3 + f * 10^2 + g * 10^1 + h * 10^0
Let us call this the decomposition property.
Now what if you knew the sum of all the 7-digit combinations? How could this help in constructing 8-digit combination sums?
More generally, let us represent S(i, j, k) as the sum of all (i+j+k) digit numbers consisting of EXACTLY i 4s, j 5s, and k 6s. Now consider the most significant digit in this (i+j+k) digit number. It is either a 4 or a 5 or a 6.
More specifically, if it is a 4, we have concatenated to this 4 a (i+j+k – 1) digit number, consisting of exactly (i-1) 4s, j 5s, and k 6s. How many such (i+j+k-1) numbers are there? Exactly ((i+j+k-1) C (i-1) ) * ((j+k) C j). Call this value P.
Or, if this MSB digit is a 5, we have concatenated to this 5 a (i+j+k-1) digit number, consisting of exactly i 4s, (j-1) 5s, and k 6s. How many such (i+j+k-1) numbers are there? Exactly ((i+j+k-1) C (j-1) ) * ((i+k) C i). Call this value Q.
Or if this MSB is a 6, we have concatenated to this 6 a (i+j+k-1) digit number, consisting of exactly i 4s, j 5s and (k-1) 6s. How many such (i+j+k-1) numbers are there? Exactly ((i+j+k-1) C (k-1) ) * ((i+j) C j). Call this value R.
Now for each of these cases we can use the decomposition property. i.e. for the sum of all (i+j+k) digit numbers containing 4 as its most significant digit, the MSB contributes (4 * (10 ^ (i+j+k-1))) * P. Once we add the values from the lower (i+j+k-1) digits to this (which was computed before), we can get the sum of all (i+j+k) digit numbers starting with 4.
We can do the same process for numbers starting with 5 and 6.
This leads us to a dynamic programming recurrence.
S(i, j, k) = [4 * ((i+j+k-1) C (i-1)) * ((j + k) C j) * (10 ^ (i+j+k -1 )) + S(i-1, j, k)]
+ [5 * ((i+j+k-1) C (j-1)) * ((i + k) C i) * (10 ^ (i+j+k -1 )) + S(i, j-1, k)]
+ [6 * ((i+j+k-1) C (k-1)) * ((i + j) C j) * (10 ^ (i+j+k -1 )) + S(i, j, k-1)]
And that’s it folks! Just program a bottom-up dynamic programming algorithm for this recurrence equation.
In the end, we need to add every S(i, j, k) for I <- [0..x], J <- [0..y], and k <-[0..z] to get our answer. Of course, be sure to perform modulo arithmetic at every step to prevent overflows.
Here is my final solution, with complexity O(XYZ). The best part? It runs in a fraction of a second on my slow laptop, even for the worst case (100 x 100 x 100) input!
Don’t you just love algorithms? 🙂
This blog has already gone on for long enough, and my fingers are tired. I will continue with part 2 later, where I discuss the final and most difficult problem of the contest next time. Thanks for reading!