Software Salariman: Python

Showing posts with label Python. Show all posts

2008-08-27

What's the time?

14 quick ways to find the current time on your computer.

Cmd.exe has two built-in commands for the date and time. You have to add the /t option when calling these commands otherwise you are prompted to set the system time:

> date /t
Wed 27/08/2008
> time /t
07:42 PM

GnuWin's date command prints the date, time and time zone:

> date
Wed Aug 27 19:43:23 AUS Eastern Standard Time 2008

You can use the POSIX module in Perl to get the current date and time:

> perl -e "use POSIX; print asctime(localtime());"
Wed Aug 27 19:44:21 2008

Python has a time module similar to Perl's:

> python -c "import time; print time.asctime()"
Wed Aug 27 19:48:07 2008

PHP's time and date functions return an array, which you can dump using the print_r() function:

> php -r "print_r(get_date());"
Array
(
    [seconds] => 49
    [minutes] => 34
    [hours] => 14
    [mday] => 30
    [wday] => 6
    [mon] => 8
    [year] => 2008
    [yday] => 242
    [weekday] => Saturday
    [month] => August
    [0] => 1220070889
)

Ruby has a Time class:

> ruby -e "print Time.now"
Wed Aug 27 19:45:32 +1000 2008

PowerShell has a get-date cmdlet:

> get-date
Wednesday, 27 August 2008 7:50:13 PM

Or use the .Net System.DateTime.Now property in PowerShell:

> [System.DateTime]::Now
Thursday, 28 August 2008 9:53:21 AM

Firefox can tell you the time using the Javascript Date() object. Enter the following statement in your browser's address bar:

javascript:Date()
Wed Aug 27 2008 20:11:27 GMT+1000 (AUS Eastern Standard Time)

MSIE6 has a similar object but the output is different from Firefox's:

javascript:Date()
Thu Aug 28 10:06:59 2008

Groovy (and Java) has a java.util.Date object which defaults to the current time:

new java.util.Date()
Result: Thu Aug 28 09:58:45 EST 2008

2008-08-01

List Empty Access Tables using Python

We wanted to find empty tables in a Microsoft Access database. Below is a Python script that uses the PythonWin odbc module to find empty tables in an MS-Access database. Edit the required path to your database file by modifying the MDB_PATH variable.

Follow the note at the start of the script to configure your MS-Access database security if you get the following message: dbi.program-error: [Microsoft][ODBC Microsoft Access Driver] Record(s) cannot be read; no read permission on 'MSYSOBJECTS'. in EXEC.

# List empty tables in an Access database by Kam-Hung Soh 2008.
# Before using this script, you have to allow User and Group Permissions in Access.
# 1. Open database.
# 2. Select menu item Tools / Security / User and Workgroup Permissions.
# 3. In 'User and Group Permissions' dialog:
# 3.1. Select User/Group Name = Admin.
# 3.2. Select Object Name = MSysObjects.
# 3.3. Check 'Read Data' check box.
# 3.4. Press OK button to close dialog box.

import odbc

MDB_PATH = r'<path>'

conn = odbc.odbc(r"DRIVER={Microsoft Access Driver (*.mdb)}; Dbq=%s;" % MDB_PATH)
cur = conn.cursor()
cur.execute(r"SELECT name from MSYSOBJECTS WHERE name NOT LIKE 'MSYS%' AND type = 1")
for x in cur.fetchall():
    table_name = x[0]
    cur.execute(r'SELECT COUNT(*) FROM [%s]' % table_name)
    row = cur.fetchone()
    if row[0] == 0:
        print table_name + ' is empty'
cur.close()
conn.close()

Script Notes

MS-Access stores object metadata in a system table called MSysObjects. In this table, a user table object has a type value of '1' and its name doesn't start with 'MSys'. This script first gets a list of all user tables from MSysObjects, then counts the number of rows in those tables. If a table has no rows, the script prints the table name and a message.

The fetchall() function always returns a list of tuples even if only one column is selected, so you have to extract the required column data using an array operator (e.g. x[0]).

The table name in the second cur.execute() SQL statement is delimited by square brackets in case the table name has whitespaces. Without these delimiters, you may see the following message: dbi.program-error: [Microsoft][ODBC Microsoft Access Driver] Syntax error in WITH OWNERACCESS OPTION declaration. in EXEC.

2008-07-28

Extract Lines with Line Numbers using Gawk, Groovy, Perl, Python and Ruby

More ways to extract a block of text from a stream and prepend the line number to each line.

Below is the Gawk version. The built-in variables NR is the number of the current line and $0 is the content of the current line.

gawk "(NR >= r1 && NR <= r2) {printf("""%4d %s\n""", NR, $0)}"

The Perl and Ruby scripts are exactly the same. The built-in variable $. holds the number of the current line and $_ holds the text of the current line.

perl|ruby -ne "printf '%4d %s', $., $_ if $. >= r1 && $. <= r2"

The Groovy command line options are similar to the Perl and Ruby version, except that you have to separate -n and -e. The built-in variable count holds the number of the current line and line holds the text of the current line.

groovy -n -e "if (count >= r1 && count <= r2) out.format '%4d %s\n', count, line"

The Python version is verbose due to boilerplate code to iterate through all rows in a file:

python -c "import sys; print ''.join('%4d %s' % (r, l) for r, l in enumerate(sys.stdin) if r >= r1 and r <= r2)"

PS

2008-07-29: Added Groovy version.

2008-07-13

Basic Python Tk HTTP Server Monitor

We had some servers which would occasionally go offline, so I wrote a basic HTTP server monitor using Python and Tkinter (the interface to the Tk GUI library):

# HTTP Server Monitor by Kam-Hung Soh 2008
from csv     import reader
from httplib import HTTPConnection
from logging import basicConfig, error, info, INFO
from os.path import exists
from time    import strftime
from tkFont  import Font
from Tkinter import Button, Frame, Label

CONFIGURATION_PATH = 'HttpServerMonitor.csv'
LOG_PATH           = 'HttpServerMonitor.log'
REFRESH_INTERVAL   = 60000 # Miliseconds
TIME_FORMAT        = '%H:%M:%S %d-%m-%y'
GRID_DEFAULT       = {'padx':2, 'pady':2}

class Application(Frame):
  def __init__(self, master=None):
    Frame.__init__(self, master)
    self.status_label = {}
    self.time_label = {}
    self.grid(**GRID_DEFAULT)
    self.create_widgets()

  def create_widgets(self):
    for i, s in enumerate(['Name', 'Host', 'Port', 'Status', 'Last Check']):
      Label(self, font=Font(size=10, weight='bold'), text=s).grid(column=i, row=0)

    if not exists(CONFIGURATION_PATH):
      error("Cannot open,%s" % CONFIGURATION_PATH)
      exit(1)

    f = open(CONFIGURATION_PATH, "rb")
    f.next() # Skip header row
    for r, p in enumerate(reader(f)):
      row_num = r + 1
      for col_num, s in enumerate(p):
        Label(self, justify='left', text="%s" % s).grid(column=col_num, row=row_num, sticky='w', **GRID_DEFAULT)
      host_name, host, port = p
      key = host + ":" + port
      self.status_label[key] = Label(self, background='yellow', text='unknown')
      self.status_label[key].grid(column=col_num + 1, row=row_num, sticky='w', **GRID_DEFAULT)
      self.time_label[key] = Label(self, text='%s' % strftime(TIME_FORMAT))
      self.time_label[key].grid(column=col_num + 2, row=row_num, sticky='w', **GRID_DEFAULT)

    Button(self, text='Refresh', command=self.refresh).grid(column=4, sticky='e', **GRID_DEFAULT)

  def refresh(self):
    for key in self.status_label.keys():
      self.time_label[key].config(text=strftime(TIME_FORMAT))
      label = self.status_label[key]
      h = HTTPConnection(key)
      try:
        h.connect()
        label.config(background='green', text='up')
      except:
        label.config(background='red', text='down')
      finally:
        h.close()
    self.after(REFRESH_INTERVAL, self.refresh)

if __name__ == "__main__":
  basicConfig(
    datefmt='%Y%m%d_T%H%M%S',
    filemode='a',
    filename=LOG_PATH,
    format='%(asctime)s,%(levelname)s,%(message)s',
    level=INFO
  )
  info('Started')
  app = Application()
  app.master.title('HTTP Server Monitor')
  app.refresh()
  app.mainloop()
  info('Ended')

This program reads a CSV file specified in CONFIGURATION_PATH constant for a list of servers to monitor. The CSV file has three columns: the display name, the server address and the server's port. The first line of the CSV file is for information only; it is not used by the program. Below is a sample CSV file:

Name,Host,Port
My server,myserver.com,80

You can define the time interval between checks by modifying the REFRESH_INTERVAL constant. This constant is in miliseconds, not seconds, so don't set too small a value!

If you using Windows, run it using pythonw HttpServerMonitor.py.

2008-07-12

Extract Columns From Tabular Text - Powershell and Python

Finishing off different ways to extract columns, here's the PowerShell and Python versions:

foreach-object { $_.Split('<delimiter>')[-1] }

$_ is the current object (or record) in the loop. When processing tabular text, $_ is a .Net String class, so we use its Split() method to divide the input on the <delimiter>. Split() returns a String array, and index -1 refers to the last String (or column) in that array.

python -c "import sys; print ''.join(s.split('<delimiter>')[-1] for s in sys.stdin)"

Unlike Perl or Ruby, Python doesn't have any special command-line support to iterate through all lines of input or split the input, so we have to use this generator hack. Like the PowerShell version, each record (s) is a string, so we use a string's split() function to divide the input into an array and use index -1 to refer to the last column in that array.

2008-05-24

Fix Incorrectly Encoded Unicode Files with Python

The Problem

We had a lot of text files committed into our CVS repository as Unicode format. When these files were checked out later, we found that they weren't really text files nor Unicode files because CVS had only prepended two bytes to the start of these files, FF FE, but left only one byte for encoding each character. Some text editors such as Vim could open these files but other applications such as Notepad and Excel showed only gibberish.

Unicode Encoded Text in Files

Unicode is an encoding standard … for processing, storage and interchange of text data in any language. For the purpose of fixing this problem, we just have to know how to identify and write valid Unicode files.

We use two tools to experiment and visualize the effect of different encoding methods:

Microsoft Notepad editor, because it can save text files using different encoding methods.
GnuWin32 od utility to output the data in a file as byte values.

Open Notepad and enter this text: Hello World. Select the File / Save As menu item. In the Save As dialog, there are four encoding methods in the Encoding drop down list: ANSI, Unicode, Unicode big endian and UTF-8. Save the same text using each of the encoding methods into four files, say TestANSI.txt, TestUnicode.txt, TestUnicodeBigEndian.txt and TestUTF8.txt, respectively.

Examine the contents of each file using od:

>od -A x -t x1 HelloANSI.txt
000000 48 65 6c 6c 6f 20 57 6f 72 6c 64
00000b

>od -A x -t x1 HelloUnicode.txt
000000 ff fe 48 00 65 00 6c 00 6c 00 6f 00 20 00 57 00
000010 6f 00 72 00 6c 00 64 00
000018

>od -A x -t x1 HelloUnicodeBigEndian.txt
000000 fe ff 00 48 00 65 00 6c 00 6c 00 6f 00 20 00 57
000010 00 6f 00 72 00 6c 00 64
000018

>od -A x -t x1 HelloUTF8.txt
000000 ef bb bf 48 65 6c 6c 6f 20 57 6f 72 6c 64
00000e

The ANSI encoded file contains 11 bytes representing the characters you typed. The Unicode encoded files contain 24 bytes, starting with a two-byte BOM and using two bytes to represent each character. If the first two bytes are FF FE, then the two bytes are stored in low-byte, high-byte order. Conversely, if the first two bytes are FE FF, then the two bytes are stored in high-byte, low-byte order. Finally, when a file starts with byte EF BB BF, only one byte is used to encode each ANSI character and two or more bytes are used to encode non-ANSI characters (not demonstrated).

Fixing Incorrectly Encoded Files in Python

Now we know the format of a Unicode encoded file: it starts with FF FE and stores each character in low-byte, high-byte order. Our text files in CVS just have ANSI characters, so we just have to insert a 0 byte between each character, starting from the third byte. Julian W. wrote a short Python script that to do this. I don't have his code right now, so here's my version for correcting the Unicode encoding for a file:

import codecs
raw = map(ord, file(r'HelloBadUnicode.txt').read())
if raw[0] == 255 and raw[1] == 254 and raw[3] != 0:
  output = codecs.open(r'HelloFixedUnicode.txt', 'w', 'UTF-16')
  for i in raw[2:]:
    output.write(chr(i))
  output.close()

References

Unicode Consortium's FAQ on UTF-8, UTF-16, UTF-32 & BOM.
Wikipedia's Byte-order mark.

Postscript

I started with a more complicated piece of Python code using lists and generators:

from itertools import repeat
from operator import concat

raw = map(ord, file(r'HelloBadUnicode.txt').read())
if raw[0] == 255 and raw[1] == 254 and raw[3] != 0:
  output = file(r'HelloFixedUnicode.txt','w')
  output.write(chr(255))
  output.write(chr(254))
  for i in reduce(concat, zip(raw[2:], repeat(0, len(raw)-2))):
    output.write(chr(i))
  output.close()

But then I realised I just had to write a 0 byte after each ANSI character, so here's a simpler version:

raw = map(ord, file(r'HelloBadUnicode.txt').read())
if raw[0] == 255 and raw[1] == 254 and raw[3] != 0:
  output = file(r'HelloFixedUnicode.txt','w')
  output.write(chr(255))
  output.write(chr(254))
  for i in raw[2:]:
    output.write(chr(i))
    output.write(chr(0))
  output.close()

2008-05-25. I remembered that Python had no problems with writing Unicode files, resulting in the even simpler code in the body of this article.

2008-05-20

Python Command Line (-c option) Test 2

Julian W. suggested that I write a one line Python loop for my command scripts instead of map(), as in my earlier article. Instead of map(lambda l: expression(l), sys.stdin), I could write for l in sys.stdin: expression(l). An example trivial command to echo all input lines would be:

python -c "import sys; for line in sys.stdin: print line,"

Problem is that the Python interpreter complains:

  File "", line 1
    import sys; for line in sys.stdin: print line,
                  ^
SyntaxError: invalid syntax

2008-04-24

Python Command Line (-c option)

Perl has a -n option which implicitly runs a while-loop over all lines in STDIN (while (<>) { }). This mode is handy in a command shell when Perl is the recipient of the output of another command and you don't want to write a script. Can we do the same for Python?

Python has a -c option which runs a command in the string following it. While it's not entirely clear to me what is a Python command, I found that you can write some useful functions using list functions and statements using this template:

python -c "import <package>; print '\n'.join(<list function>(lambda x: <expression>, (s.strip() for s in sys.stdin)))

To use this template, replace <package> with a package name (e.g. os), <list function> with a list function (e.g. filter()) and <expression> with, well, an expression. The rest of the template just constructs a list of strings (without a trailing "\n") from the input and prints the results.

For simple string processing, the list function and expression are not required, resulting in a simplified version of this template:

python -c "import <package>; print '\n'.join(<fn>(s.strip()) for s in sys.stdin)"

While researching this topic, I found an ASPN Python Recipe called Pyline to help write commands. Here's the examples in that recipe rewritten using my template:

Print the first 20 characters of each line:

tail test.txt | python -c "import sys; print '\n'.join(s.strip()[:20] for s in sys.stdin)"

Print the 7th word in each line, assuming the separator is ' ':

tail test.txt | python -c "import sys; print '\n'.join(s.strip().split(' ')[6:7] for s in sys.stdin)"

Note that you can also get columns of text from a file using the cut command. Also note that the reason for using the array slice is to avoid getting an IndexError exception if the string is not long enough.

List all files that are greater than 1024 bytes in size:

ls | python -c "import os, sys; print '\n'.join(filter(lambda x: os.path.isfile(x) and os.stat(x).st_size > 1024, (s.strip() for s in sys.stdin)))

Generate MD5 digest values for a list of files, like md5sum.

ls *.txt | python -c "import md5, sys; print ''.join('%s %s' % md5.new(file(s.strip()).read()).hexdigest(), s) for s in sys.stdin)"

26-Apr-2008: Replaced list comprehension statement (for-in with square brackets) with generator expression (for-in with parentheses) in the template to avoid very large lists stored in memory.

Added MD5 digest example, and realised that we only need to use list functions (e.g. filter()) if you want to change the members of the resulting list. Otherwise, the simpler template suffices.

2008-04-08

Functional Python Palindromes

To find all palindromes from a list of words in a file, one word per line, you could write a procedural Python program like this:

for row in file('test.txt'):
  s = row.strip()
  if s == s[::-1]:
    print s

Here's a functional Python version, with notes below:

from itertools import imap
filter(lambda s: s == s[::-1], imap(str.strip, file('test.txt')))
  5       3              4       2               1

Create a file iterator.
When we read a line from a file into a string, the string has a trailing newline character (e.g. 'add\n'). We want to remove that trailing newline character, so we use the itertools.imap() function to create a new iterator that applies the str.strip() to each line read. The result is that we have an iterator that provides strings without the newline character.
Define an anonymous function using lambda keyword that returns true if the input string is a palindrome.
Python idiom for returning a reversed sequence (a string is a sequence of characters).
Use the filter function to return a list of palindromes.

Using this input file …

add
dad
dam
mad
made
madam
set

… the result of running the functional script is:

['dad', 'madam']

2008-04-07

Reading CSV Files in Python

Python has a csv module for reading and writing CSV files (usually exported by Excel or database tables). The basic use of this module is documented in the on-line help. My CSV files usually have a header row, so the idiomatic way to skip this line is to open the CSV file and use the next() function immediately:

from csv import reader
f = open("blah.csv", "rb")
f.next()
for row in reader(f):
  print row

If your CSV files are pretty simple (e.g. only single line data, no quotes, etc.), you can use list comprehension and array slicing:

                   1       2                                      3
for row in [line.strip().split(',') for line in file("blah.csv")][1:]:
  print row

Notes:

You have to remove the trailing "\n" from each line.
Split the input line using the delimiter, typically a comma.
The list comprehension statement returns all lines, so to ignore the first line, you take a slice of the array starting from the second line.

2008-02-02

Prune Directories with Python

I converted my earlier PowerShell script to prune directories to Python:

     1  from os import listdir, rmdir
     2  from os.path import isdir, exists, join
     3
     4  def prune_directory(path):
     5    if len(path) < 1:
     6      print "Empty path"
     7      return
     8    if not exists(path):
     9      print "Invalid path:", path
    10      return
    11    if not isdir(path):
    12      return
    13    if len(listdir(path)) <= 0:
    14      rmdir(path)
    15      return
    16    for elem in listdir(path):
    17      prune_directory(join(path, elem))
    18    if len(listdir(path)) <= 0:
    19      rmdir(path)

It's almost a one-to-one translation from the initial PowerShell version. The difference is that in Python, you have to ensure that path is a directory before you call listdir(path).

Python 2.5 has a new generator function os.walk() for traversing a directory tree. Below, the recursive function call in lines 13-19 is replaced by a for loop in lines 13-14.

     1  from os import listdir, rmdir, walk
     2  from os.path import isdir, exists
     3
     4  def prune_directory_walk(path):
     5    if len(path) < 1:
     6      print "Empty path"
     7      return
     8    if not exists(path):
     9      print "Invalid path:", path
    10      return
    11    if not isdir(path):
    12      return
    13    for curr, dirs, files in walk(path, topdown=False):
    14      if len(listdir(curr)) < 1: rmdir(curr)

In this sample, os.walk() returns a tuple (curr, dirs, files) for each directory it visits. curr is the current directory being traversed, and dirs and files are the directories and files in that directory. Using the parameter topdown=False, os.walk() starts producing these tuples from the lowest descendant directory and working up to the start directory, path.

Note that the loop's conditional statement uses len(listdir(curr)) instead of just len(dirs). os.walk() generates the dirs list before it visits each of the directories in dirs; if all child directories in dirs have been deleted, dirs would still contain an unchanged list and the parent directory, curr, would not be deleted. At least, that's what I think happens; the Python help doesn't say so explicitly.

In earlier versions of Python, there is a similar function called os.path.walk() but os.walk() is much easier to use.

2007-12-14

IronPython and Jython Hello Windows

The Python language has been ported to two major virtual machine platforms: IronPython for Microsoft .Net and Jython for Java. To get a flavour of these implementations, here are two Hello World scripts that open a window containing a label and a button.

IronPython Hello World Window

import clr
clr.AddReference("System.Drawing")
from System.Drawing import ContentAlignment, Size
clr.AddReference("System.Windows.Forms")
from System.Windows.Forms import Button, FlowLayoutPanel, Form, Label

p = FlowLayoutPanel()
p.Controls.Add(Label(Text="A label:", Size=Size(50,20), TextAlign=ContentAlignment.BottomLeft))
p.Controls.Add(Button(Text="Press Me"))
f = Form(Text="Hello World", Size=Size(160,70))
f.Controls.Add(p)
f.ShowDialog()

Jython Hello World Window

# Jython Hello Window
from java.awt import FlowLayout
from javax.swing import JButton, JFrame, JLabel

f = JFrame("Hello", defaultCloseOperation=JFrame.DISPOSE_ON_CLOSE, size=(170,70), layout=FlowLayout())
f.add(JLabel("A label:"))
f.add(JButton("Press Me"))
f.show()

Mini Observations

IronPython makes loading the .Net libraries explicit by using the clr module.
Both implementations allow setX functions in object constructor's argument list.

Software Salariman

2008-08-27

What's the time?

2008-08-01

List Empty Access Tables using Python

Script Notes

See Also

2008-07-28

Extract Lines with Line Numbers using Gawk, Groovy, Perl, Python and Ruby

See Also

PS

2008-07-13

Basic Python Tk HTTP Server Monitor

See Also

2008-07-12

Extract Columns From Tabular Text - Powershell and Python

See Also

2008-05-24

Fix Incorrectly Encoded Unicode Files with Python

The Problem

Unicode Encoded Text in Files

Fixing Incorrectly Encoded Files in Python

References

Postscript

2008-05-20

Python Command Line (-c option) Test 2

2008-04-24

Python Command Line (-c option)

2008-04-08

Functional Python Palindromes

2008-04-07

Reading CSV Files in Python

2008-02-02

Prune Directories with Python

2007-12-14

IronPython and Jython Hello Windows

IronPython Hello World Window

Jython Hello World Window

Mini Observations

Google Analytics