Wednesday, February 17, 2021

HackerRank - Ransom Note

First on our list of challenges is what HackerRank says is easy. I disagree, but here we are. My key interpretation of this challenge is checking if string array B is a subset of string array A. On the surface, pretty easy, but I disagree. As I was running through the code I ran into a snag which I’ll present as we go along.

First things first, the boilerplate that we have to put this into:

static void checkMagazine(string[] magazine, string[] note){

}

And a test case:

magazine give me one grand today night
note give one grand today

So my first attempt:

static void checkMagazine(string[] magazine, string[] note)
{
	var isSubset = !note.Except(magazine).Any();
	Console.WriteLine(isSubset ? "Yes" : "No");
}

Jon Skeet proposed that simplistic answer back in 2009 (and I’m sure others before him). So we plug that in and give it a shot. Of the 3 initial test cases it gives us, 2 of the 3 pass. Diving into the failure we see

magazine two times three is not four
note two times two is four

Ok, so now our issue is not just is B a subset of A, but is ALL of B included in A. I’m sure there is a more eloquent way of stating that, but I’m going to call it a subset with unique values.

So lets try attempt two:

static void checkMagazine(string[] magazine, string[] note)
{
	var success = true;
	foreach (var word in note)
	{
		var indexInMagazine = Array.IndexOf(magazine, word);
		if (indexInMagazine > -1)
		{
			magazine[indexInMagazine] = string.Empty;
		}
		else
		{
			success = false;
			break;
		}
	}
	Console.WriteLine(success ? "Yes" : "No");
}

Plugging that in we get success on the 3 test cases they initially give us. Huzzah! So now we submit. Doing that runs the code against 21 test cases, of which 8 fail. Huh? Why fail? Checking into that we see Time limit exceeded. A little further digging into HackerRank and we find that C# code only has 3 seconds and 512 MB for a test case.

Picking one of the failures, I see that it has 30,000 entries for magazine and 20,000 for note. I guess our process is taking too long, so now we’ve got to optimize this even further. Now I’m officially stumped. Taking their test input I feed it into VisualStudio and am frustrated to see that my code runs in 1.4(ish) seconds. Guess we can’t rely on consistency between my dev environment and theirs.

One of my coworkers, and a few StackOverflow posts, suggested a HashSet, but that eliminates duplicates in either magazine or note, which is not something we want. So back to the drawing board.

Now I feel I’m on to something, it’s similar to my removing entries from the array, but should be much quicker:

static void checkMagazine(string[] magazine, string[] note)
{
	var matchedFromMagazine = note.Intersect(magazine);
	var success = matchedFromMagazine.Count() == note.Length;

	Console.WriteLine(success ? "Yes" : "No");
}

Nice and compact, pulls in some LINQ which is almost always nice. Let’s see how it goes. Running it against the full test suite, I no longer get failures from timeout, but now I get 5 failures from wrong answer. :(

Extracting one of the failure cases, the great news is my test runs in about 0.01 seconds, the bad news is it says “No” instead of “Yes”. The test case is a bit too long to put here (30,000 in magazine and 20,000 in note), but lets see if we can figure out where things are going wrong.

Jon Skeet saves us again (well, not really) with .ToLookup. Something, I’ll be honest, I’d never heard of before. Effectively breaks the array into a Dictionary<string, IEnumerable<string>> where the key is the word and the value is a collection of the word. I don’t really need the collection, just the count, but that’s still pretty neat.

So let’s try that out:

static void checkMagazine(string[] magazine, string[] note)
{
    var lookupOfNote = note.ToLookup(word => word);
    var success = true;
    foreach (var grouping in lookupOfNote)
    {
        if (magazine.Count(word => word.Equals(grouping.Key)) < grouping.Count())
        {
            success = false;
            break;
        }
    }
    Console.WriteLine(success ? "Yes" : "No");
}

Plugging that in, we get better results, but 2 still fail with Time limit exceeded. Gah!

Not that I thought this would work, but just to be thorough I gave this a whirl:

static void checkMagazine(string[] magazine, string[] note)
{
    var noteDictionary = note.GroupBy(word => word)
        .ToDictionary(k => k.Key, v => v.Count());
    var success = true;
    foreach (var kvp in noteDictionary)
    {
        if (magazine.Count(word => word.Equals(kvp.Key)) < kvp.Value)
        {
            success = false;
            break;
        }
    }
    Console.WriteLine(success ? "Yes" : "No");
}

It worked about as you’d expect (assuming you expected it to fail like the ToLookup did).

Reaching out to a coworker, he tried his hand at it and (to my chagrin) appears to have solved it. Let’s take a look at his solution and see how/why it worked compared to some of my attempts:

static void checkMagazine(string[] magazine, string[] note)
{
    var success = false;
    // build dictionary of note's words and how many times we need to find each word
    // This way we go through note only once, storing in a structure
    // for more performant lookup.
    var noteDict = new Dictionary<string, int>();
    foreach (var n in note)
    {
        if (!noteDict.ContainsKey(n))
        {
            noteDict.Add(n, 0);
        }
        else
        {
            noteDict[n]++;
        }
    }
    // Check if note can be built from magazine 
    // This way we only go through magazine at most once
    foreach (var word in magazine)
    {
        if (noteDict.ContainsKey(word))
        {
            if (noteDict[word] > 0)
            {
                noteDict[word]--;
            }
            else // we found the word enough times. Remove it from dict.
            {
                noteDict.Remove(word);
            }
        }
        if (noteDict.Count == 0) // if we've emptied our "words to find" dict, we're done
        {
            success = true;
            break;
        }
    }
    Console.WriteLine(success ? "Yes" : "No");
}

First blush, it seems very similar to my last attempt that builds a dictionary of words and their count from the note. Looks like the big difference (and the apparently reason for the speed improvement) is he iterates over the magazine and compares that to the new note dictionary (updating counts as he finds words).

Because I’m a butt head, I tried to see if I could improve on this. First I played with the creation of the noteDict to see if I could optimize that. Using note.Distinct().ToDictionary(k => k, v => note.Count(w => w.Equals(v))) but that still caused timeout errors.

So I’ve got to admit defeat on this one. I couldn’t solve it and the solution that my coworker found isn’t one I would have come up with, nor can I find any tweaks to it that make it better or more “mine”. So HackerRank, you win this round, on to the next.

You can find this test at HackerRank - Hash Tables: Ransom Note

No comments:

Post a Comment

HackerRank - SherlockAndAnagrams

Sherlock and Anagrams, oh what a title. Let’s see what HackerRank has in store for us next. This one is rated as medium and the blurb for it...