Sunday, February 21, 2021

HackerRank - SherlockAndAnagrams

Sherlock and Anagrams, oh what a title. Let’s see what HackerRank has in store for us next. This one is rated as medium and the blurb for it is “Find the number of unordered anagramic pairs of substrings of a string”. That…is a mouthful.

As usual, our initial boilerplate:

static int sherlockAndAnagrams(string s) {

}

A few examples:

Input ifailuhkqq Output 3
Input kkkk Output 10
Input abba Output 4
Input abcd Output 0

Ok, this is going to be interesting. So it looks like we have to find the total number of anagrams within a string. So for abba we have a,a, ab,ba, b,b, and finally abb,bba. The same holds true for the other examples. Let’s get to work.

Working iteratively, I think the first thing to do is extract all the substrings of a string. My proposed code for that:

var allSubstrings = new List<string>();
for (var i = 0; i < s.Length; i++)
{
    for (var j = i; j < s.Length; j++)
    {
        allSubstrings.Add(s.Substring(i,j-i+1));
    }
}

This gives us (using abba as input):

  • a
  • ab
  • abb
  • abba
  • b
  • bb
  • bba
  • b
  • ba
  • a

Now for each of those, we’ll need to see if it’s an anagram of any others. To do that I’m going to try:

var totalAnagrams = 0;
for (var i = 0; i < allSubstrings.Count - 1; i++)
{
    var initialWord = allSubstrings[i];
    var sortedInitialWord = new string(initialWord.OrderBy(l => l).ToArray());
    for (var j = i+1; j < allSubstrings.Count; j++)
    {
        var testWord = allSubstrings[j];
        var sortedTestWord = new string(testWord.OrderBy(l => l).ToArray());
        if (sortedInitialWord == sortedTestWord)
        {
            totalAnagrams++;
        }
    }
}

Let’s break that down. For each “word” in my list of substrings, sort the word, then for each “word” remaining after that sort the word and compare them. If they match, increment my total. So abb sorted becomes abb and compared to bba sorted into aab match.

Putting that all together we get:

static int sherlockAndAnagrams(string s)
{
    var allSubstrings = new List<string>();
    for (var i = 0; i < s.Length; i++)
    {
        for (var j = i; j < s.Length; j++)
        {
            allSubstrings.Add(s.Substring(i,j-i+1));
        }
    }

    var totalAnagrams = 0;
    for (var i = 0; i < allSubstrings.Count - 1; i++)
    {
        var initialWord = allSubstrings[i];
        var sortedInitialWord = new string(initialWord.OrderBy(l => l).ToArray());
        for (var j = i+1; j < allSubstrings.Count; j++)
        {
            var testWord = allSubstrings[j];
            var sortedTestWord = new string(testWord.OrderBy(l => l).ToArray());
            if (sortedInitialWord == sortedTestWord)
            {
                totalAnagrams++;
            }
        }
    }

    return totalAnagrams;
}

Testing that against the initial tests all pass and expanding it to the full suite gives us 3/7 passing. The good(?) news is that they fail because time limit exceeded, not wrong answers (yet).

So let’s tweak that to be a bit more performant, it definitely didn’t look speedy when I put it together. Since this is in the dictionary and hashset section of HackerRank, lets play with a HashSet:

static int sherlockAndAnagrams(string s)
{
    var totalAnagrams = 0;
    var substrings = new HashSet<string>();
    for (var i = 0; i < s.Length; i++)
    {
        for (var j = i; j < s.Length; j++)
        {
            var substring = s.Substring(i, j - i + 1);
            var sortedSubstring = new string(substring.OrderBy(l => l).ToArray());
            if (!substrings.Add(sortedSubstring))
            {
                totalAnagrams++;
            }
        }
    }

    return totalAnagrams;
}

That seems pretty and effective. Let’s see if they agree.

They don’t. Of the 3 initial test cases, 2/3 fail with wrong answer. It didn’t find the 10 combinations of kkkk, only 6. So back to it to find the right code.

Lots of testing later, I felt confident that I was on to the right start with extracting out the substrings and then doing something with them. I figured also that I didn’t need to actually find the anagrams, just find all possible combinations of my substrings by the count of how many times they showed up. For example using kkkk I know it breaks down into

4 k’s
3 kk’s
2 kkk’s
1 kkkk

Doing some googling, I found the formula for calculating the total combinations of something is
n!r!(nr)!) \frac{n!}{r! * (n - r)!)}

That’s Greek to me, but I was able to figure out that n = number of times a substring appears and r = total length of combination (in this case 2). Using this new found info I was able to construct a process for counting the total combinations given a specific substring and how many times it appeared in the main string.

I did have to make my own method to get the Factorial of a number (note the double, that’s really important otherwise it tends to overflow int and suddenly you’re dealing with negative numbers). Putting it all together I have:

static int sherlockAndAnagrams(string s)
{
    var substrings = new Dictionary<string, int>();
    for (var i = 0; i < s.Length; i++)
    {
        for (var j = i; j < s.Length; j++)
        {
            var substring = s.Substring(i, j - i + 1);
            var orderedSubstring = new string(substring.OrderBy(letter => letter).ToArray());
            if (substrings.ContainsKey(orderedSubstring))
            {
                substrings[orderedSubstring]++;
            }
            else
            {
                substrings.Add(orderedSubstring, 1);
            }
        }
    }

    var totalAnagrams = 0;

    foreach (var substring in substrings)
    {
        var n = substring.Value;
        var r = 2;
        var top = Factorial(n);
        var bottomLeft = Factorial(r);
        var bottomRight = Factorial(n - r);
        var bottom = bottomLeft * bottomRight;
        var combinations = top / bottom;
        totalAnagrams += Convert.ToInt32(combinations);
    }

    return totalAnagrams;
}

public static double Factorial(int number)
{
    double result = 1;
    while (number > 1)
    {
        result *= number;
        number--;
    }

    return result;
}

My foreach loop is a bit verbose on breaking things down, but I found it much easier to wrap my head around than var combinations = Factorial(n)/(Factorial(r) * Factorial(n - r)) (honestly, not even sure I got that one liner version right for this and I’m not gonna go back and check it. Code is for people to read, I find the long version much easier to deduce what is going on).

Testing this against all of their cases passes and has no timeouts. So success! You can find this one here.

No comments:

Post a Comment

HackerRank - SherlockAndAnagrams

Sherlock and Anagrams, oh what a title. Let’s see what HackerRank has in store for us next. This one is rated as medium and the blurb for it...