Group Anagrams Leetcode

0 min read

The prompt

Given an array of strings strs, group the anagrams together. You can return the answer in any order. An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Examples

Example	Input	Target	Output
1	["eat","tea","tan","ate","nat","bat"]	9	[["bat"],["nat","tan"],["ate","eat","tea"]]
2	[""]	6	[""]
3	["a"]	6	["a"]

Constraints

1 <= strs.length <= 104
0 <= strs[i].length <= 100
strs[i] consists of lowercase English letters. (26 letters)

Solution #1: (Sorted)

1def sorted_group_anagram(array):
2    hashmap = {}
3    for word in array:
4        sorted_word = tuple(sorted(word))
5
6        hashmap[sorted_word] = hashmap.get(sorted_word, []) + [word] 
7    
8    return hashmap.values()

How does it work?

Let's break it down line by line.

We will be skipping line one as that is simply the definition of the function.

In Line 2 we created our empty dict/hashmap.

Next line, Line 3, we simply start our for loop. Now, in Line 4, something interesting is going on. We created a variable to sort the word using the sorted python function. Then we need to convert it to a tuple.

Why did we sort it? Well, because we are looking for anagrams. Sorting two words and comparing them is one, if not the easiest(not most efficient) way to check if two words are anagrams of one another. Alright, I get that, but why the do we convert it to tuple?

Well my friend, excellent question! The reason for this is because we will be using this word as the hashmap key, and in Python dicts can only use inmutable*

Doesn't allow any change in the object once it has been created

objects as the keys. We also need to remember that when we use sorted the string becomes a list, and Python list are mutable*

An object whose internal state can be changed is mutable

. Because a tuple is inmutable it works perfectly for our purpose.

Alright, we are in Line 6. This is where the magic happens. First we use hashmap[sorted_word], to add a new key-value pair to our dict/hashmap if it's not present, or update it if it's present. Now the second part, hashmap.get(sorted_word, []) + [word]. We use the get method to check if the sorted_word is in our dict/hashmap, if it is, the we concatenate the [word]. Using this methods avoids KeyError*

Raised when a mapping (dictionary) key is not found in the set of existing keys.

Let's work on a sample function: Lets say we have an array such as ["eat","ate","bat"]

The first iteration will sort the word "eat". Thus, "eat" will become ("a","e","t"). Now we add the new key-pair to our hashmap like so: hashmap[("a","e","t")] = ["eat"]

This is our hashmap after the first iteration:

1{
2    ("a","e","t"):["eat"]
3}

The second iteration will sort the word "ate". Thus "ate" will become ("a","e","t"). Now we add the new key-pair to our hashmap like so: hashmap[("a","e","t")] = ["ate"]

This is our hashmap after the second iteration:

1{
2    ("a","e","t"):["eat", "ate"]
3}

The third iteration will sort the word "bat". Thus "bat" will become ("a","b","t"). Now we add the new key-pair to our hashmap like so: hashmap[("a","b","t")] = ["bat"]

This is our hashmap after the third iteration:

1{
2    ("a","e","t"):["eat", "ate"],
3    ("a","b","t"):["bat"],
4}

Time Complexity	Space Complexity
O(m * n log n)	O(n)

Space Complexity could probably be linear (O(1)) if we pass tuple(sorted()) directly, instead of creating a variable. But it's up to you if you want to sacrifice readability or not.

Hopefully you get the idea. And with that we got our not so efficient sorted solution.

Solution #2: (Python ord() Solution)

Well!, time for the more efficient solution.

1def hash_group_anagram(array):
2  hashmap = defaultdict(list)
3    for word in array:
4        temp = [0] * 26
5        for letter in word:
6            temp[ord(letter) - ord("a")] = temp[ord(letter) - ord("a")] + 1
7        
8        hashmap[tuple(temp)].append(word)

This solution involves using the ord() function, if you are not familar with the ord() function just hover me*

ord() -> Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().

So how do we leverage the ord() function.

Becuase we know that for this problem we can ony have 26 letters, we can use the ord function to assist in the count of our anagrams.

But first, let's go line by line, as usual we skip the very first line. Now Line 2 is where we assign defaultdict(list) as our hashmap. Now, maybe you have not used defaultdict before. Basically defaultdict provides the same functionality as a dict with the extra benefits of not giving a KeyError if the key is not found, instead it returns None, or any argument you may have passed. In this case the default we are passing is the list object because each of our keys will contain the grouped lists of anagrams.

Importing defaultdict.

To import default dict, you need to use the collections module. from collections import defaultdict

Line 3, a simple start of the for loop. Line 4, this is where we create an array that contains 26 zeros. We will use this array later as the key in our hashmap.

Line 5, we started our nested loop. Line 6, is where ord comes into play. Because we are using ord we can map the count of each letter occurrence to the index in the array we previously created. For example:

Let's say our first word is dog, after we are done looping through the word dog our array should look like this.

Because dog has 3 letter we should see the count at index 3 for d increase by one. The same applies for the letter o and g. Thus we should see 3 ones in the array above. This works because if we assume that a return 90 after ord() if we subtract ord("a") from ord("d") would be the same as ` 93 - 90 which would return 3. So we add one to index 3 of our array.

After the loops complete going over the word we simple append the word to the hashmap by using the array as the key and the word as the value. Do not forget to covert the array to a tuple because the keys must be inmmutable.

Now to verify our complexity. We can see that we use more variables so our space complexity is O(n), however, our time complexity would be O(m * n). Why?

Well because m represents the length of the array passed as an argument, and n is the average lenght of each word.

Time Complexity	Space Complexity
O(m * n)	O(n)

See you on the next problem!

For more problems like this one be sure to checkout NeetCode.

Knowledge is power! And everyone deserves access to it! Share away!