Howto to spam-protect your python-based blog with bayesian filter.

As severall people, I run into issue with some spammer using my comment system to spam, and post backlinks. (Even using some funny stuffs)

I ‘m already using a good email spam filter: SpamBayes, so I decided to test bayesian filtering for the spam on this blog too.

I decided to give Reverend a try:

from reverend.thomas import Bayes

SPAM_DB='spam.bayes'
guesser = Bayes()

# load the spam DB
try:
guesser.load(SPAM_DB)
except IOError:
print "Creating a new spam filter database"
guesser.save(SPAM_DB)

def train_spam(text):
guesser.train('spam',text)
guesser.save(SPAM_DB)

def train_ham(text):
guesser.train('ham',text)
guesser.save(SPAM_DB)

# try to guess the spam / ham ratio of a text
def guess(text):
spam = 0
ham = 0
value = guesser.guess(text)
for o in value:
if o[0] == 'ham': ham = o[1]
if o[0] == 'spam': spam = o[1]
return (ham,spam)

Small, and really simple module no ? The next step, simply add a ’spam’ and ‘ham’ attributes on your comment post. And add two methods to train the comment as a spam or a ham.. And of course, only display comments which have a good ratio ( >1) ham/spam. This took me about 1 hour to implement…

After a week, of train, this is working very fine, not a single false positive, and it filter every spam since the first trains. As I get around 20 spams post per day, this is quite a good news ;)

Enjoy Bayesian ?

Related posts :

admin November 17th, 2006


7 Responses to “Howto to spam-protect your python-based blog with bayesian filter.”

  1. philon 17 Nov 2006 at 10:56 pm

    Thanks, Reverend Jkx :)

  2. philon 19 Nov 2006 at 7:31 pm

    So, now your comments RSS feed is usable, right? Because it still seem to contain some spam in there (article 239)..

  3. Jkxon 20 Nov 2006 at 4:08 pm

    Yes, my comment RSS is still full of SPAM. I need to apply the filter here to. Right now, I’m using this to check everything is Ok. I will switch soon.

    Bye ..

  4. Jon 09 Jun 2007 at 6:49 am

    What happens to the spam database file when two people submit a comment at the same time?  Is there a way to prevent it from getting corrupted?

  5. Jkxon 09 Jun 2007 at 1:13 pm

    This depends on the way you plug this in your webapp, but you can easily protect the write with a lock.

  6. Lilianeon 03 Oct 2007 at 8:55 am

    Spam Filtering may reduce the number of spam for a short while but you cant say that it is an ultimate solution to Spamming. The reason is that the Spammers are aware of these filtering techniques whether it is Filtering with BogoFire or some other. There are many websites available that are providing the information on Anti-Spamming Solutions but most of this information is either irrelevant or not useful. I have recently visited a website that I would like to suggest 

    Anti-Spam Solutions Website

  7. Mikeon 12 Oct 2007 at 7:42 pm

    How is the filter working out these days? Is this approach still worth implementing? Thanks, Mike

Comments RSS

Leave a Reply