17 December 2007

PowerShell String and Char[] Sort and Conversion

Introduction

I wanted to sort the characters in a string to use as a signature for that string. A string is basically an array of characters but System.String class does not have a Sort() method. Hmm, looks like we have to break the process into the following steps:

  1. Convert a string to a character array.
  2. Sort the character array.
  3. Convert the character array back into a string.

First cut

A hitch: System.String can only be converted into a char[], so we have to use an external sorter such as Sort-Object cmdlet. If we could have made an Array of char, then we could have used the Array's Sort() method. The first cut looks like this:

> $s = "mad"
> $sig = sort-object -inputObject $s.ToCharArray()
> $sig
m
a
d

Eh? Why isn't sig sorted? Either Sort-Object or PowerShell doesn't do the expected when presented with an array using the -inputObject parameter. Let's test this:

> "m","a","d" | sort-object
a
d
m
> sort-object -inputObject "m","a","d"
m
a
d

Second Cut

Because of the strange behaviour found in the previous section, we call Sort-Object in a pipeline. In addition, we reconstruct the output into a string:

> $s = "mad"
> $sig = (string)($s.ToCharArray() | sort-object)
> $sig
a d m
> $sig.length
5

What's wrong now? Why does sig have a whitespace between each character? This was getting a bit deep into PowerShell for me at the moment, so let's use an appropriate constructor in System.String such as String(char[]).

21-Dec-2007: Solution is to change OFS (Output Field Separator) to an empty string, like this: $OFS = "".

Third Cut

We try the System.String(char[]) constructor and make the sorted array output into a string:

> $s = "mad"
> $sig = new-object String(($s.ToCharArray() | sort-object))
New-Object : Exception calling ".ctor" with "3" argument(s): "Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: startIndex"
At line:1 char:16
+ $t = new-object  <<<< String($s.ToCharArray())

What gives? The error indicates that a different constructor, String(char[], startIndex, length) should be used. It's not clear why the first constructor is not available.

Fourth Cut

Finally, taking into account all that we learnt above, we end up with the following statement which gives us the result we wanted:

> $s = "mad"
> $sig = new-object String(($s.ToCharArray() | sort-object), 0, $s.length)
> $sig
adm

Conclusion

The resulting statement to sort characters in a string is mostly noise because the purpose of the statement statement is obscured by the need to convert an object from one type to another. In an earlier article about palindromes, another way to make a string from an character array is to use the static method [string]::Join(). That is …

> $sig = [string]::Join("", ($s.ToCharArray() | sort-object))

The second method is shorter but still rather obscure because it relies on the side-effect of the empty string argument when calling the Join() method. It's a rather disappointing end to this exercise because I spent most of the time fighting instead of using PowerShell.

19-Dec-2007. PowerShell 2.0 will have a new Join operator that should make this exercise moot.

21-Dec-2007. Fourth method is to change OFS first, leading to:

> $OFS = ""
> $s = "mad"
$gt; $sig = [string]($s.ToCharArray() | sort-object)
> $sig
adm
> $sig.length
3