Bayesian Spam Filters For Outlook Express
Bayesian spam filters for Outlook Express and anti-spam software for Outlook use the process of using Bayesian statistical methods to classify documents into categories.
Bayesian filtering was proposed by Sahami et al. (1998) and gained attention in 2002 when it was described in the paper A Plan for Spam by Paul Graham. Since then it has become a popular mechanism to distinguish illegitimate spam email from legitimate email. Many modern mail programs such as Outlook Express put into effect Bayesian spam filters as well as Outlook spam filters. Server-side email filters, such as SpamAssassin and ASSP, make use of Bayesian spam filtering techniques, and the functionality is sometimes embedded within mail server software itself.
Mathematical foundation
Bayesian spam filters for Outlook Express and many Microsoft Outlook spam filters take advantage of Bayes' theorem. Bayes' theorem, in the context of spam, says that the probability that an email is spam, given that it has certain words in it, is equal to the probability of finding those certain words in spam email, times the probability that any email is spam, divided by the probability of finding those words in any email.
Process
Particular words have particular probabilities of occurring in spam email and in legitimate email. For instance, most email users will often meet the word Viagra in spam email, but will seldom see it in other email. Free Outlook Express spam blockers do not know these probabilities in advance, and must first be trained so it can build them up. To train the filter, the user must manually suggest whether a new email is spam or not. For all words in each training email, the filter will adjust the probabilities that each word will appear in spam or legitimate email in its database. For instance, Bayesian spam filters will typically have learned a high spam probability for the words "Viagra" and "refinance", but a low spam probability for words seen only in legitimate email, such as the names of friends and family members.
After training, the word probabilities (also known as likelihood functions) are used to compute the probability that an email with a particular set of words in it belongs to either group. Each word in the email contributes to the email's spam probability. This contribution is called the posterior probability and is computed using Bayes' theorem. Then, the email's spam probability is computed over all words in the email, and if the total exceeds a certain threshold (say 95%), the filter will mark the email as spam. Email marked as spam can then be automatically moved to a "Junk" email folder, or even deleted outright.
Advantages
The advantage of bayesian spam filters for Outlook Express is that it can be trained on a per-user basis, however, Markovian Discrimination that is found in CRM114 is far superior.
The spam and email fraud that a user gets is often related to the online user's activities. For example, a user may have been subscribed to an online newsletter that the user thinks about to be spam. This online newsletter is likely to contain words that are common to all newsletters, such as the name of the newsletter and its starting email address. A Bayesian spam filter will eventually assign a higher probability based on the user's specific patterns.
The legitimate e-mails a user gets will be tend to be different. For example, in a corporate environment, the company name and the names of clients or customers will be mentioned often. The filter will assign a lower spam probability to emails containing those names.
The word probabilities are unique to each user and can develop over time with corrective training whenever the filter incorrectly classifies an email. So, Naive Bayesian filtering accuracy after training is often superior to pre-defined rules.
It can perform particular well in avoiding false negatives, where legitimate email is incorrectly classified as spam. For example, if the phishing email has the word "Nigeria", which often appeared in a long spam campaign, a pre-defined rules filter might reject it outright. A Bayesian filter would mark the word "Nigeria" as a probable spam word, but would take account of other important words that usually show legitimate e-mail. For example, the name of a spouse may strongly tell the e-mail is not spam, which could overcome the use of the "Nigeria."
Some bayesian spam filters for Outlook Express combine the results of both Bayesian spam filtering and pre-defined rules leading to even higher filtering accuracy. Recent spammer tactics include insertion of random innocuous words that are not normally associated with spam, by that decreasing the email's spam score, making it more likely to slip past a Bayesian spam filter.
DISCLAIMER
Although we do our best to provide our users with useful and accurate information on our web site, we do not update this information which is derived from sources believed to be accurate. Users must understand that information presented does not serve as an endorsement of any particular company or individual and that this information changes frequently and is subject to differing interpretations. Users are hereby advised that they are responsible for ensuring that the facts and general advice obtained from our site are applicable to their specific situations and should discuss their specific tax, business, financial, and legal matters with pertinent professionals.
|