30 April 2008

Using Clipboard in the Command Line

GnuWin32 / Cygutils package has two tools for interacting with the Windows clipboard: getclip and putclip. The first copies text from the clipboard to standard output and the second copies text from standard input to the clipboard. These tools are useful when you want to process text from one Windows application before pasting the text into another application, in the following recipe: getclip | <filters> | putclip.

For example, I want to paste all DLL file names in a folder into a document:

  1. Navigate to the required folder using 2xExplorer browser.
  2. Type Alt+a to select all files.
  3. Type Alt+c to copy all file names. 2xExplorer copies the absolute path for each file.
  4. Start cmd.exe console.
  5. In cmd.exe console, enter: getclip | cut -d\ -fn | grep dll$ | putclip. cut is GnuWin32 tool which selects a column of data given a column delimiter (-d\ defines backslash) and field number (-fn defines column n). grep filters the output to only list files with "dll" in their name.
  6. Start editor.
  7. Paste the text in the clipboard in destination document.

Of course, you can do the same using Excel:

  1. Navigate to the required folder using 2xExplorer browser.
  2. Type Alt+a to select all files.
  3. Type Alt+c to copy all file names. 2xExplorer copies the absolute path for each file.
  4. Start Excel.
  5. Paste data in a worksheet column.
  6. Select all cells by typing Shift+Space.
  7. Open Convert Text to Columns Wizard by typing Alt+d+e.
  8. Select Delimited data type by typing Alt+d.
  9. Type Alt+n to go to page 2.
  10. Select Other delimiter by typing Alt+o, then enter "\" for paths.
  11. Type Alt+f run the wizard.
  12. Start Auto Filter by typing Alt+d+f+f.
  13. Move to filter column using the mouse (no keyboard shortcuts?) then select from the drop down list (Custom …).
  14. Select ends width criteria, enter .dll, then press Enter.
  15. Move cursor to required column and select it using Control+Space.
  16. Copy column by typing Control+C.
  17. Start editor.
  18. Paste the text in the clipboard in destination document.

The Excel solution has many more steps than the getclip-putclip solution but Excel leads you through to a solution step-by-step. If you're familiar with GNU tools, then getclip-putclip recipe is faster to use and much more extensible.

2008-05-07. I should have remembered that the basename command would output the name of the file without the leading path string. See later article More Uses of Getclip-PutClip about how to use basename in a pipeline.

25 April 2008

Strange GnuWin32 Invalid Argument Error Messages

When chaining GnuWin32 commands in Windows cmd.exe, you may encounter strange error messages like this:

> ls | grep
…
ls: write error: Invalid argument

The first command reports a write error but the error is really in the second command after the pipe symbol.

You may also encounter a similar write error if the wrong command is found in your PATH variable. For instance, Windows and GnuWin32 both have a find and sort command which support different command-line options, so depending on the order of directories listed in your PATH variable, one version or the other is used. If you enter the wrong command-line options for these commands, they won't start and cause the command earlier in the chain to report some sort of I/O error.

24 April 2008

Python Command Line (-c option)

Perl has a -n option which implicitly runs a while-loop over all lines in STDIN (while (<>) { }). This mode is handy in a command shell when Perl is the recipient of the output of another command and you don't want to write a script. Can we do the same for Python?

Python has a -c option which runs a command in the string following it. While it's not entirely clear to me what is a Python command, I found that you can write some useful functions using list functions and statements using this template:

python -c "import <package>; print '\n'.join(<list function>(lambda x: <expression>, (s.strip() for s in sys.stdin)))

To use this template, replace <package> with a package name (e.g. os), <list function> with a list function (e.g. filter()) and <expression> with, well, an expression. The rest of the template just constructs a list of strings (without a trailing "\n") from the input and prints the results.

For simple string processing, the list function and expression are not required, resulting in a simplified version of this template:

python -c "import <package>; print '\n'.join(<fn>(s.strip()) for s in sys.stdin)"

While researching this topic, I found an ASPN Python Recipe called Pyline to help write commands. Here's the examples in that recipe rewritten using my template:

Print the first 20 characters of each line:

tail test.txt | python -c "import sys; print '\n'.join(s.strip()[:20] for s in sys.stdin)"

Print the 7th word in each line, assuming the separator is ' ':

tail test.txt | python -c "import sys; print '\n'.join(s.strip().split(' ')[6:7] for s in sys.stdin)"

Note that you can also get columns of text from a file using the cut command. Also note that the reason for using the array slice is to avoid getting an IndexError exception if the string is not long enough.

List all files that are greater than 1024 bytes in size:

ls | python -c "import os, sys; print '\n'.join(filter(lambda x: os.path.isfile(x) and os.stat(x).st_size > 1024, (s.strip() for s in sys.stdin)))

Generate MD5 digest values for a list of files, like md5sum.

ls *.txt | python -c "import md5, sys; print ''.join('%s %s' % md5.new(file(s.strip()).read()).hexdigest(), s) for s in sys.stdin)"

26-Apr-2008: Replaced list comprehension statement (for-in with square brackets) with generator expression (for-in with parentheses) in the template to avoid very large lists stored in memory.

Added MD5 digest example, and realised that we only need to use list functions (e.g. filter()) if you want to change the members of the resulting list. Otherwise, the simpler template suffices.

11 April 2008

Firefox Greasemonkey Kills Google Groups Spam

If you read Usenet newsgroups, no doubt you'd be familiar with spam messages spruiking credit, fake jewellery, external organ enlargements and free graduate degrees. On a PC, you can use killfiles in newsreading software to ignore spam messages. If you're reading newsgroups using the Google Groups web-based reader with Firefox, you can ignore annoying spam messages using a Greasemonkey script called Google Groups Killfile (GGK).

You can add entries to your killfile list using GGK's context menu but the list becomes hard to view and manage once you have a lot of entries. It is easier to edit GGK's kill list variable:

  • Enter "about:config" in Firefox's location bar.
  • Enter "kill" in the Filter field.
  • Click on greasemonkey.scriptvals.www.penney.org/Google Groups Killfile.GoogleKillFile and edit the configuration string.

2008-04-14: If you use regular expressions (RE), you can reduce the number of entries in the killfile list by using wildcards and the "alternate" operator (vertical bar symbol ("|")). You can further reduce the number of patterns to define by specifying case-insensitive comparison in GGK. Just search for the REs' "compile()" function in the GGK script and add a second "i" argument.

08 April 2008

Functional Python Palindromes

To find all palindromes from a list of words in a file, one word per line, you could write a procedural Python program like this:

for row in file('test.txt'):
  s = row.strip()
  if s == s[::-1]:
    print s

Here's a functional Python version, with notes below:

from itertools import imap
filter(lambda s: s == s[::-1], imap(str.strip, file('test.txt')))
  5       3              4       2               1
  1. Create a file iterator.
  2. When we read a line from a file into a string, the string has a trailing newline character (e.g. 'add\n'). We want to remove that trailing newline character, so we use the itertools.imap() function to create a new iterator that applies the str.strip() to each line read. The result is that we have an iterator that provides strings without the newline character.
  3. Define an anonymous function using lambda keyword that returns true if the input string is a palindrome.
  4. Python idiom for returning a reversed sequence (a string is a sequence of characters).
  5. Use the filter function to return a list of palindromes.

Using this input file …

add
dad
dam
mad
made
madam
set

… the result of running the functional script is:

['dad', 'madam']

07 April 2008

Reading CSV Files in Python

Python has a csv module for reading and writing CSV files (usually exported by Excel or database tables). The basic use of this module is documented in the on-line help. My CSV files usually have a header row, so the idiomatic way to skip this line is to open the CSV file and use the next() function immediately:

from csv import reader
f = open("blah.csv", "rb")
f.next()
for row in reader(f):
  print row

If your CSV files are pretty simple (e.g. only single line data, no quotes, etc.), you can use list comprehension and array slicing:

                   1       2                                      3
for row in [line.strip().split(',') for line in file("blah.csv")][1:]:
  print row

Notes:

  1. You have to remove the trailing "\n" from each line.
  2. Split the input line using the delimiter, typically a comma.
  3. The list comprehension statement returns all lines, so to ignore the first line, you take a slice of the array starting from the second line.