Abstract:
Commercials and virus infected files transferred by electronic mails (e-mail) have been a main source of time and resource loss during the past decade and email filtering as a binary text classification problem has attracted the attention of many researchers recently. In this study, we have tried to find a filtering solution which is able to automatically classify emails into spam and legitimate categories. In order to automatically and efficiently classify emails as spam or legitimate we took advantage of some Bayesian stochastic algorithms and machine learning based heuristic methods. Our approach includes some novel ideas from Information Retrieval.