30 January 2007

SpamBayes Thresholds

My e-mail spam detector, SpamBayes, seems to have problems classifying some of this year's spam, placing two or three in the Suspect folder each day when there used to be none. The main culprits are stock market e-mail, probably because their content includes a lot of business-oriented words (hard to avoid at work) that outnumber the relatively small proportion of definite spam words. Lowering the spam threshold works well enough; what really matters is that ham e-mail still gets ranked at 0%.