Spam dictionary (work in progress)

spam

In 2003, a dictionary of Spam was created by Guy Di Mattina whilst studying at the School of Information Technology and Electrical Engineering, University of Queensland. “The first step” explains the author, “was to create a list of features that appeared in Spam or normal mail but not in both.” The results were published in a thesis entitled : ‘Spam and Open Relay Blocking System’.

Here are some examples from the dictionary :

#18. accomplishments
#496. decreased
#1023. ifdlawdodcbpbiazifdlzwtzisehisancjxwigfsawdupsjjz
#1397. museum
#3049. grandpa
#4462. wont

An article reproduced on the Radio Australia website explains more :

“[…] we all got together and we all discussed what was going on and it came out that we were using the Support Vector Machine in an unthought of way, mainly because Guy was not trained in Support Vector Machines so we didn’t know how everybody is trained to use them. We came up with something completely different just purely and simply because he didn’t know what he was doing when he started out and that’s what’s made it so effective.”