Sunday, February 21, 2021

HackerRank - SherlockAndAnagrams

Sherlock and Anagrams, oh what a title. Let’s see what HackerRank has in store for us next. This one is rated as medium and the blurb for it is “Find the number of unordered anagramic pairs of substrings of a string”. That…is a mouthful.

As usual, our initial boilerplate:

static int sherlockAndAnagrams(string s) {

}

A few examples:

Input ifailuhkqq Output 3
Input kkkk Output 10
Input abba Output 4
Input abcd Output 0

Ok, this is going to be interesting. So it looks like we have to find the total number of anagrams within a string. So for abba we have a,a, ab,ba, b,b, and finally abb,bba. The same holds true for the other examples. Let’s get to work.

Working iteratively, I think the first thing to do is extract all the substrings of a string. My proposed code for that:

var allSubstrings = new List<string>();
for (var i = 0; i < s.Length; i++)
{
    for (var j = i; j < s.Length; j++)
    {
        allSubstrings.Add(s.Substring(i,j-i+1));
    }
}

This gives us (using abba as input):

  • a
  • ab
  • abb
  • abba
  • b
  • bb
  • bba
  • b
  • ba
  • a

Now for each of those, we’ll need to see if it’s an anagram of any others. To do that I’m going to try:

var totalAnagrams = 0;
for (var i = 0; i < allSubstrings.Count - 1; i++)
{
    var initialWord = allSubstrings[i];
    var sortedInitialWord = new string(initialWord.OrderBy(l => l).ToArray());
    for (var j = i+1; j < allSubstrings.Count; j++)
    {
        var testWord = allSubstrings[j];
        var sortedTestWord = new string(testWord.OrderBy(l => l).ToArray());
        if (sortedInitialWord == sortedTestWord)
        {
            totalAnagrams++;
        }
    }
}

Let’s break that down. For each “word” in my list of substrings, sort the word, then for each “word” remaining after that sort the word and compare them. If they match, increment my total. So abb sorted becomes abb and compared to bba sorted into aab match.

Putting that all together we get:

static int sherlockAndAnagrams(string s)
{
    var allSubstrings = new List<string>();
    for (var i = 0; i < s.Length; i++)
    {
        for (var j = i; j < s.Length; j++)
        {
            allSubstrings.Add(s.Substring(i,j-i+1));
        }
    }

    var totalAnagrams = 0;
    for (var i = 0; i < allSubstrings.Count - 1; i++)
    {
        var initialWord = allSubstrings[i];
        var sortedInitialWord = new string(initialWord.OrderBy(l => l).ToArray());
        for (var j = i+1; j < allSubstrings.Count; j++)
        {
            var testWord = allSubstrings[j];
            var sortedTestWord = new string(testWord.OrderBy(l => l).ToArray());
            if (sortedInitialWord == sortedTestWord)
            {
                totalAnagrams++;
            }
        }
    }

    return totalAnagrams;
}

Testing that against the initial tests all pass and expanding it to the full suite gives us 3/7 passing. The good(?) news is that they fail because time limit exceeded, not wrong answers (yet).

So let’s tweak that to be a bit more performant, it definitely didn’t look speedy when I put it together. Since this is in the dictionary and hashset section of HackerRank, lets play with a HashSet:

static int sherlockAndAnagrams(string s)
{
    var totalAnagrams = 0;
    var substrings = new HashSet<string>();
    for (var i = 0; i < s.Length; i++)
    {
        for (var j = i; j < s.Length; j++)
        {
            var substring = s.Substring(i, j - i + 1);
            var sortedSubstring = new string(substring.OrderBy(l => l).ToArray());
            if (!substrings.Add(sortedSubstring))
            {
                totalAnagrams++;
            }
        }
    }

    return totalAnagrams;
}

That seems pretty and effective. Let’s see if they agree.

They don’t. Of the 3 initial test cases, 2/3 fail with wrong answer. It didn’t find the 10 combinations of kkkk, only 6. So back to it to find the right code.

Lots of testing later, I felt confident that I was on to the right start with extracting out the substrings and then doing something with them. I figured also that I didn’t need to actually find the anagrams, just find all possible combinations of my substrings by the count of how many times they showed up. For example using kkkk I know it breaks down into

4 k’s
3 kk’s
2 kkk’s
1 kkkk

Doing some googling, I found the formula for calculating the total combinations of something is
n!r!(nr)!) \frac{n!}{r! * (n - r)!)}

That’s Greek to me, but I was able to figure out that n = number of times a substring appears and r = total length of combination (in this case 2). Using this new found info I was able to construct a process for counting the total combinations given a specific substring and how many times it appeared in the main string.

I did have to make my own method to get the Factorial of a number (note the double, that’s really important otherwise it tends to overflow int and suddenly you’re dealing with negative numbers). Putting it all together I have:

static int sherlockAndAnagrams(string s)
{
    var substrings = new Dictionary<string, int>();
    for (var i = 0; i < s.Length; i++)
    {
        for (var j = i; j < s.Length; j++)
        {
            var substring = s.Substring(i, j - i + 1);
            var orderedSubstring = new string(substring.OrderBy(letter => letter).ToArray());
            if (substrings.ContainsKey(orderedSubstring))
            {
                substrings[orderedSubstring]++;
            }
            else
            {
                substrings.Add(orderedSubstring, 1);
            }
        }
    }

    var totalAnagrams = 0;

    foreach (var substring in substrings)
    {
        var n = substring.Value;
        var r = 2;
        var top = Factorial(n);
        var bottomLeft = Factorial(r);
        var bottomRight = Factorial(n - r);
        var bottom = bottomLeft * bottomRight;
        var combinations = top / bottom;
        totalAnagrams += Convert.ToInt32(combinations);
    }

    return totalAnagrams;
}

public static double Factorial(int number)
{
    double result = 1;
    while (number > 1)
    {
        result *= number;
        number--;
    }

    return result;
}

My foreach loop is a bit verbose on breaking things down, but I found it much easier to wrap my head around than var combinations = Factorial(n)/(Factorial(r) * Factorial(n - r)) (honestly, not even sure I got that one liner version right for this and I’m not gonna go back and check it. Code is for people to read, I find the long version much easier to deduce what is going on).

Testing this against all of their cases passes and has no timeouts. So success! You can find this one here.

Thursday, February 18, 2021

HackerRank - Two Strings

Our next challenge from HackerRank is to see if two strings contain a common substring, it goes on to say that substring could even just be one letter. Again, this one is rated as easy and I hope it lives up to that.

The initial boilerplate to plug things into:

static string twoStrings(string s1, string s2) {

}

And a test case:

hello
world

Result YES

And one more for good measure:

hi
world

Result NO

My first attempt at it involves breaking one of the strings into a char array so that I can use IndexOfAny which only takes in a char array. Converting string to char array is very easy since there is a ToCharArray method on string. Lets see how this looks and run it against their tests:

static string twoStrings(string s1, string s2)
{
    var charsOfFirstWord = s1.ToCharArray();
    var index = s2.IndexOfAny(charsOfFirstWord);
    var success = index > -1;
    return success ? "YES" : "NO";
}

Initial tests pass and then running it against the full suite, they also pass! Huzzah! This one really was easy. Whew, nice change from that last one.

We could tidy this up a bit more to reduce the number of lines. In fact we could one line this (though I question readability) into:

static string twoStrings(string s1, string s2)
{
    return s2.IndexOfAny(s1.ToCharArray()) > -1 ? "YES" : "NO";
}

You can find this test at HackerRank - Two Strings

Wednesday, February 17, 2021

HackerRank - Ransom Note

First on our list of challenges is what HackerRank says is easy. I disagree, but here we are. My key interpretation of this challenge is checking if string array B is a subset of string array A. On the surface, pretty easy, but I disagree. As I was running through the code I ran into a snag which I’ll present as we go along.

First things first, the boilerplate that we have to put this into:

static void checkMagazine(string[] magazine, string[] note){

}

And a test case:

magazine give me one grand today night
note give one grand today

So my first attempt:

static void checkMagazine(string[] magazine, string[] note)
{
	var isSubset = !note.Except(magazine).Any();
	Console.WriteLine(isSubset ? "Yes" : "No");
}

Jon Skeet proposed that simplistic answer back in 2009 (and I’m sure others before him). So we plug that in and give it a shot. Of the 3 initial test cases it gives us, 2 of the 3 pass. Diving into the failure we see

magazine two times three is not four
note two times two is four

Ok, so now our issue is not just is B a subset of A, but is ALL of B included in A. I’m sure there is a more eloquent way of stating that, but I’m going to call it a subset with unique values.

So lets try attempt two:

static void checkMagazine(string[] magazine, string[] note)
{
	var success = true;
	foreach (var word in note)
	{
		var indexInMagazine = Array.IndexOf(magazine, word);
		if (indexInMagazine > -1)
		{
			magazine[indexInMagazine] = string.Empty;
		}
		else
		{
			success = false;
			break;
		}
	}
	Console.WriteLine(success ? "Yes" : "No");
}

Plugging that in we get success on the 3 test cases they initially give us. Huzzah! So now we submit. Doing that runs the code against 21 test cases, of which 8 fail. Huh? Why fail? Checking into that we see Time limit exceeded. A little further digging into HackerRank and we find that C# code only has 3 seconds and 512 MB for a test case.

Picking one of the failures, I see that it has 30,000 entries for magazine and 20,000 for note. I guess our process is taking too long, so now we’ve got to optimize this even further. Now I’m officially stumped. Taking their test input I feed it into VisualStudio and am frustrated to see that my code runs in 1.4(ish) seconds. Guess we can’t rely on consistency between my dev environment and theirs.

One of my coworkers, and a few StackOverflow posts, suggested a HashSet, but that eliminates duplicates in either magazine or note, which is not something we want. So back to the drawing board.

Now I feel I’m on to something, it’s similar to my removing entries from the array, but should be much quicker:

static void checkMagazine(string[] magazine, string[] note)
{
	var matchedFromMagazine = note.Intersect(magazine);
	var success = matchedFromMagazine.Count() == note.Length;

	Console.WriteLine(success ? "Yes" : "No");
}

Nice and compact, pulls in some LINQ which is almost always nice. Let’s see how it goes. Running it against the full test suite, I no longer get failures from timeout, but now I get 5 failures from wrong answer. :(

Extracting one of the failure cases, the great news is my test runs in about 0.01 seconds, the bad news is it says “No” instead of “Yes”. The test case is a bit too long to put here (30,000 in magazine and 20,000 in note), but lets see if we can figure out where things are going wrong.

Jon Skeet saves us again (well, not really) with .ToLookup. Something, I’ll be honest, I’d never heard of before. Effectively breaks the array into a Dictionary<string, IEnumerable<string>> where the key is the word and the value is a collection of the word. I don’t really need the collection, just the count, but that’s still pretty neat.

So let’s try that out:

static void checkMagazine(string[] magazine, string[] note)
{
    var lookupOfNote = note.ToLookup(word => word);
    var success = true;
    foreach (var grouping in lookupOfNote)
    {
        if (magazine.Count(word => word.Equals(grouping.Key)) < grouping.Count())
        {
            success = false;
            break;
        }
    }
    Console.WriteLine(success ? "Yes" : "No");
}

Plugging that in, we get better results, but 2 still fail with Time limit exceeded. Gah!

Not that I thought this would work, but just to be thorough I gave this a whirl:

static void checkMagazine(string[] magazine, string[] note)
{
    var noteDictionary = note.GroupBy(word => word)
        .ToDictionary(k => k.Key, v => v.Count());
    var success = true;
    foreach (var kvp in noteDictionary)
    {
        if (magazine.Count(word => word.Equals(kvp.Key)) < kvp.Value)
        {
            success = false;
            break;
        }
    }
    Console.WriteLine(success ? "Yes" : "No");
}

It worked about as you’d expect (assuming you expected it to fail like the ToLookup did).

Reaching out to a coworker, he tried his hand at it and (to my chagrin) appears to have solved it. Let’s take a look at his solution and see how/why it worked compared to some of my attempts:

static void checkMagazine(string[] magazine, string[] note)
{
    var success = false;
    // build dictionary of note's words and how many times we need to find each word
    // This way we go through note only once, storing in a structure
    // for more performant lookup.
    var noteDict = new Dictionary<string, int>();
    foreach (var n in note)
    {
        if (!noteDict.ContainsKey(n))
        {
            noteDict.Add(n, 0);
        }
        else
        {
            noteDict[n]++;
        }
    }
    // Check if note can be built from magazine 
    // This way we only go through magazine at most once
    foreach (var word in magazine)
    {
        if (noteDict.ContainsKey(word))
        {
            if (noteDict[word] > 0)
            {
                noteDict[word]--;
            }
            else // we found the word enough times. Remove it from dict.
            {
                noteDict.Remove(word);
            }
        }
        if (noteDict.Count == 0) // if we've emptied our "words to find" dict, we're done
        {
            success = true;
            break;
        }
    }
    Console.WriteLine(success ? "Yes" : "No");
}

First blush, it seems very similar to my last attempt that builds a dictionary of words and their count from the note. Looks like the big difference (and the apparently reason for the speed improvement) is he iterates over the magazine and compares that to the new note dictionary (updating counts as he finds words).

Because I’m a butt head, I tried to see if I could improve on this. First I played with the creation of the noteDict to see if I could optimize that. Using note.Distinct().ToDictionary(k => k, v => note.Count(w => w.Equals(v))) but that still caused timeout errors.

So I’ve got to admit defeat on this one. I couldn’t solve it and the solution that my coworker found isn’t one I would have come up with, nor can I find any tweaks to it that make it better or more “mine”. So HackerRank, you win this round, on to the next.

You can find this test at HackerRank - Hash Tables: Ransom Note

HackerRank - SherlockAndAnagrams

Sherlock and Anagrams, oh what a title. Let’s see what HackerRank has in store for us next. This one is rated as medium and the blurb for it...