02 February 2008

Prune Directories with Python

I converted my earlier PowerShell script to prune directories to Python:

     1  from os import listdir, rmdir
     2  from os.path import isdir, exists, join
     3
     4  def prune_directory(path):
     5    if len(path) < 1:
     6      print "Empty path"
     7      return
     8    if not exists(path):
     9      print "Invalid path:", path
    10      return
    11    if not isdir(path):
    12      return
    13    if len(listdir(path)) <= 0:
    14      rmdir(path)
    15      return
    16    for elem in listdir(path):
    17      prune_directory(join(path, elem))
    18    if len(listdir(path)) <= 0:
    19      rmdir(path)

It's almost a one-to-one translation from the initial PowerShell version. The difference is that in Python, you have to ensure that path is a directory before you call listdir(path).

Python 2.5 has a new generator function os.walk() for traversing a directory tree. Below, the recursive function call in lines 13-19 is replaced by a for loop in lines 13-14.

     1  from os import listdir, rmdir, walk
     2  from os.path import isdir, exists
     3
     4  def prune_directory_walk(path):
     5    if len(path) < 1:
     6      print "Empty path"
     7      return
     8    if not exists(path):
     9      print "Invalid path:", path
    10      return
    11    if not isdir(path):
    12      return
    13    for curr, dirs, files in walk(path, topdown=False):
    14      if len(listdir(curr)) < 1: rmdir(curr)

In this sample, os.walk() returns a tuple (curr, dirs, files) for each directory it visits. curr is the current directory being traversed, and dirs and files are the directories and files in that directory. Using the parameter topdown=False, os.walk() starts producing these tuples from the lowest descendant directory and working up to the start directory, path.

Note that the loop's conditional statement uses len(listdir(curr)) instead of just len(dirs). os.walk() generates the dirs list before it visits each of the directories in dirs; if all child directories in dirs have been deleted, dirs would still contain an unchanged list and the parent directory, curr, would not be deleted. At least, that's what I think happens; the Python help doesn't say so explicitly.

In earlier versions of Python, there is a similar function called os.path.walk() but os.walk() is much easier to use.