09 December 2007

Find Duplicate Objects in PowerShell

You can find unique lines in a file using Get-Unique but what if you want to know which lines are duplicates? Below are two variations of a one-liner that finds duplicate lines. In both cases, an object is added into an array $a if it does not already exist in that array ($a -contains …). If the object exists in the array (i.e. not unique), it is printed out. A side-effect is that the unique lines are available in $a after the statement is finished.

> $a = @(); foreach ($l in Get-Content <file>) { if ($a -contains $l) { $l } else { $a += $l } }
> Get-Content <file> | Foreach-Object { $a = @() } { if ($a -contains $_) { $_ } else { $a += $_ } }

The first version is implemented using the foreach statement. The $a array is initialized in the first statement, then the file is read into $l one line at a time in the second statement. It's actually a short two-liner.

The second version is implemented using the Foreach-Object cmdlet in a command pipeline. The Foreach-Object can have three command blocks: beginning, middle and ending. The $a array is initialized in the beginning command block ($a = @()) and the test and output executed in the middle command block (if () …). The ending command block is not used in this example. This version is more tidy because only one variable, $a has to be created; the other version requires another variable $l.