24 April 2008

Python Command Line (-c option)

Perl has a -n option which implicitly runs a while-loop over all lines in STDIN (while (<>) { }). This mode is handy in a command shell when Perl is the recipient of the output of another command and you don't want to write a script. Can we do the same for Python?

Python has a -c option which runs a command in the string following it. While it's not entirely clear to me what is a Python command, I found that you can write some useful functions using list functions and statements using this template:

python -c "import <package>; print '\n'.join(<list function>(lambda x: <expression>, (s.strip() for s in sys.stdin)))

To use this template, replace <package> with a package name (e.g. os), <list function> with a list function (e.g. filter()) and <expression> with, well, an expression. The rest of the template just constructs a list of strings (without a trailing "\n") from the input and prints the results.

For simple string processing, the list function and expression are not required, resulting in a simplified version of this template:

python -c "import <package>; print '\n'.join(<fn>(s.strip()) for s in sys.stdin)"

While researching this topic, I found an ASPN Python Recipe called Pyline to help write commands. Here's the examples in that recipe rewritten using my template:

Print the first 20 characters of each line:

tail test.txt | python -c "import sys; print '\n'.join(s.strip()[:20] for s in sys.stdin)"

Print the 7th word in each line, assuming the separator is ' ':

tail test.txt | python -c "import sys; print '\n'.join(s.strip().split(' ')[6:7] for s in sys.stdin)"

Note that you can also get columns of text from a file using the cut command. Also note that the reason for using the array slice is to avoid getting an IndexError exception if the string is not long enough.

List all files that are greater than 1024 bytes in size:

ls | python -c "import os, sys; print '\n'.join(filter(lambda x: os.path.isfile(x) and os.stat(x).st_size > 1024, (s.strip() for s in sys.stdin)))

Generate MD5 digest values for a list of files, like md5sum.

ls *.txt | python -c "import md5, sys; print ''.join('%s %s' % md5.new(file(s.strip()).read()).hexdigest(), s) for s in sys.stdin)"

26-Apr-2008: Replaced list comprehension statement (for-in with square brackets) with generator expression (for-in with parentheses) in the template to avoid very large lists stored in memory.

Added MD5 digest example, and realised that we only need to use list functions (e.g. filter()) if you want to change the members of the resulting list. Otherwise, the simpler template suffices.