On hackerrank and Julia

By joe

June 14, 2017 - 6 minutes read - 1225 words

My new day job has me developing considerably less code than my previous endeavor, so I like to work on problems to keep these particular muscles in steady use. Happily, I get to do more analytics than ever before, so this at least is some compensation for the lower amount of coding. When I work on coding for myself, I’ll play with problems from my research days, or small throw-away ones, like on Hackerrank. The latter group of problems tends to be somewhat poorly specified, with the solutions being preferred as minimum viable, passing all tests in the allotted time. Elegance of solution is not directly scored. Quality of implementation is determined entirely in terms of attaining the same results that the authors did for their test cases. I’ve found the last part of that to be dubious in the past, after finding some errors in a number of their tests. Specifically in one in particular, they made an argument about a simple closed form solution for a particular problem in their discussion section, reducing it to evaluating the closed form solution. While I followed their logic, I did not agree with it, and coded up a simple brute force test (the problem was small enough to admit this mechanism). Sure enough, I found that their answers (to this one particular problem) were in fact, wrong. So, I take their “answers” with some grains of salt. This doesn’t take away from the joy of working through solving problems with code. So I do that, and generally care less about their ranking and scoring. Though I do run my code through their checker to see if it would “pass” their tests. Ok … that background set. I’ve been enjoying watching and working with small bits of Julia language over the years. I include it in my Nlytiq development base as one of the core components. I’ve built automation around its build and module installation to create a useful environment for me, others … scientists, data scientists and engineers, etc. Not just Julia, but many other tools. With this basis, I play with problems. One of the problems on Hackerrank was what they called a “megaprime” number. This is not the more common use of megaprime (million digit prime). Their megaprime number is a prime number whose individual digits are also prime (e.g. no “1” or even numbers in the digits). Their problem was to take a set of two numbers as input, and find all the megaprimes between them. There is a simple algorithm to do this … just find the primes between the lower and upper bound, and then test each prime for the property megaprime-ness. The megaprime checker is also remarkably simple to articulate … and if you have a very powerful and expressive language, fairly simple to code.

function megaprime(x::Int64)
  D = digits(x);
  SD = size(D)[1];
  P = find(isprime.(D));
  if SD != size(P)[1]
    return false
  end
  true
end

Take a 64 bit Int as input, named x. 1st, break the x up into its digits, and store in D. Second, get the leading dimension of that D array. Third, apply the isprime function to each digit in D, and filter out any that come back as false, storing this into an array P. If the array of digits did not all come back as prime (e.g. the array P had some false tests within it) then return a false, otherwise, fall through and return true. Ok, I got lazy with the isprime function there. I could have simply tested the digits to be 2, 3, 5, or 7. If they were none of these, then the megaprime test fails. This one is a bit more readable though, and likely nearly as fast (testing single digits for primality shouldn’t take long … no need to optimize this further). Then the driver code:

# read first/last
x = [parse(Int64,ss) for ss in split(readline())];
first = x[1];
last = x[2];
count = 0::Int64;
if last < first
  first,last = swap(first,last);
end
R = primes(first,last);

Here I read a line from STDIN, and parse it from a string into an array of Int64 types. I store the two in the first and last variables, and then check to make sure they are correctly ordered, swapping if not. The swap function …

function swap(x,y)
  y,x
end

… yes … this is they way it should be written.

julia > swap(1,2)
(2,1)

In the driver section, I generated a list of primes between first and last storing that in R. It is important to note that these are fairly fast functions, even on my 6 year old laptop.

julia> @time primes(1000000000)
  4.099736 seconds (9 allocations: 766.314 MB, 2.72% gc time)
50847534-element Array{Int64,1}:
         2
         3
         5
         7
        11
        13
        17
        19
        23
         ?
 999999739
 999999751
 999999757
 999999761
 999999797
 999999883
 999999893
 999999929
 999999937

Primes up to 1 billion in about 4 seconds on a fairly old machine. The machines they run this on in the cloud are somewhat faster, with more ram. The primes to test are in the R array. This is the meat of it. How to iterate over the R array in a meaningful way. Some folks will want to use list comprehensions and other CS constructs. These sometimes occlude intent and remove clarity. What I want to do is to loop over the elements of R. So why not … I dunno … do that? # now only have R items to inspect for num ∈ R test = megaprime(num); if test count = count + 1; end end Notice the notation. This is exactly how I want code to read. I could have used a list comprehension with count = count + megaprime(t) for t ??? R if I changed megaprime to return 0 or 1. That is possible, but I thought this explicit loop would be easier to comprehend for the programmer (me). Yes, some will scream syntactic sugar. Some will complain bitterly over the removal of the test from the if then construct. That was due to a crash in the language I observed, simple memoization worked around that issue. Ok. So I did this. It passed the run test on my machine.

landman@lightning:~/play/hr/mp$ /usr/bin/time ./mp.jl < inp
8
0.79user 0.70system 0:00.74elapsed 202%CPU (0avgtext+0avgdata 174512maxresident)k
0inputs+0outputs (0major+10105minor)pagefaults 0swaps

Remember, Julia is a compiled language, so this time includes startup/compilation time as well as execution time. So I pressed the test button. And it failed. Well, more precisely, they indicated it failed. “Timed out” was the error. So I tried the “Run” button, so I could see if I could get some of the other tests run, download the inputs and outputs, and do a run to check.

landman@lightning:~/play/hr/mp$ cat inp2 out2
230711883 401853350
7362
landman@lightning:~/play/hr/mp$ /usr/bin/time ./mp.jl < inp2
7362
9.67user 0.66system 0:09.66elapsed 106%CPU (0avgtext+0avgdata 301256maxresident)k
0inputs+0outputs (0major+12135minor)pagefaults 0swaps

Works on a much larger problem, again, correctly. But I get the timeout and … now … “too much output” message. I am guessing that they have an old version of julia installed. 0.5.2 is current stable, with 0.6-RC series up. Hopefully they will update soon. This said, my point about all of this is that Julia is a joy to use. No silly indentation, useful and expressive functions, rich and growing module ecosystem, multiple mechanisms to do things, compiled … did I mention, compiled?