21 December 2007

PowerShell Associative Arrays and Anagrams

Jon Bentley's Programming Pearls describes the following pipeline for finding anagrams from a list of words: generate a signature for the word, then group together all words with the same signature. The signature is just a sorted list of all letters in a word. For instance, dame, edam, made and mead all have the same signature adem.

Associative Arrays

To implement an anagram-finder in PowerShell, let's use an associative array and we store the signatures in the keys and each word that has the same signature is stored in an array related to that key. Below is a concrete example of what we plan to do:

> $a = @{adem:("dame","edam","made","mead")}
> $a
Name                           Value
----                           -----
adem                           {dame, edam, made, mead}

Notes

  • The key for an associative array does not have to be quoted if it is a string without a whitespace.
  • Oddly, arrays in the value column are printed delimited by braces instead of parentheses.

Generating Word Signatures

A word signature is just a string with the letters in the original word sorted. We split a string into a char[] type, sort it, then make it into a string again:

> $sig = [string]("edam".ToCharArray() | sort-object)
> $sig
a d e m
> $sig.length
7

Note that the signature sig is a string with a space between each character. We can prettify the signature but it doesn't hurt because all the signatures will have the same format.

Find Anagrams in a Word List

Now that we can create a signature, we can find all anagrams in a word list by assigning each word's signature as a key in the associative array's value and adding the word to that key's array:

> get-content <test.txt> | foreach-object { $h = @{} } { $t = $_.clone(); $sig = [string]($t.ToCharArray() | sort-object); if (!$h.containsKey($sig)) { $h[$sig] = @() } $h[$sig] += $t } { $h }

Name                           Value
----                           -----
adem                           {dame, edam, made, mead}

Let's decompose this longish statement to understand what it is doing:

get-content <test.txt> |
Send every line in the input file into a pipeline.
foreach-object
Apply some operation on each object in the pipeline.
{ $h = @{} }
Initialize the associative array h in the begin script block.
{ process block }
Here is where words with the same signature are grouped together in the associative array.
$t = $_.clone();
Copy the input word from the current pipeline object before it is overwritten in the next statement.
$sig = [string]($t.ToCharArray() | sort-object);
Get the signature of the input word.
if (!$h.containsKey($sig)) { $h[$sig] = @() }
Create an array of anagrams if the signature does not already exist.
$h[$sig] += $t
Add the current word to the array of anagrams.
{ $h }
Output the associative array h in the end script block.

Conclusion

This article presented an imperative approach to grouping data with the same property using a loop and an associative array. You could apply the same style to any programming language and get the same result. In a future article, I will explore how to use a more streamlined approach to solve similar problems.