View Full Version : Auto Classify Emails


shafaki
Emails can easily be classified based on simple rules such as words in the subject, or according to sender and so on.

How about classifying emails or messages (in forums perhaps) based also on their content, not simply by checking the presence of certain words, but by analyzing them and then sorting them in one of a set of distinct categories.

I would like to know of available programs that do such sort of automatic classification of message/emails as they come in. Please give me links, and perhaps your own comment if you have tried any of them.

angoranimi
procmail! formail! and your favourite scripting language.

There will first be a pattern matching phase, done by procmail, then a classification phase, in which case you might optionally decide to edit/add an email header tag, and then finally the sorting by piping output to your script, which then does whatever it is... input them into a database, whatever...

A vague answer for a relatively vague question. What kind of sorting are you talking about?

shafaki
PROBLEM 1
Admins of online forums have a hard time moderating their forums. To start with, they try to exclude the following kind of messages:

1- chain marketing
2- political
3- religious

It might look like spam stopping, yet it's not exactly the same.

If a sorting mechanism was developed in order to sort messages into the above mentioned and other categories, it'll make the life of the forum admin way easier. He can just have a quick glance over the already categorised messages and easily approve the right ones.

PROBLEM 2
The second problem is less easy to solve. In practice, a single forum usually recieves several types of messages (all acceptable and legal). Yet members have different interests. If there was a way to automatically sort out messages comming to the forum it'll make it much easier for members to find the messages that interest them quickly and let alone the messages that are of little interest to them. This is especially of benefit in forums that recieve a large number of messages per day. Without this solution one reverts to either waisting a lot of time doing the tedius task of checking each message trying to figure out if fits in his frame of interest or not; or leaving the whole forum and not checking it at all.

PROBLEM 3
My dream is to automate or semiautomate the process of creatng FAQs from forum messages. Millions of questions are asked again and again over the net. Why not create some sort of AnswerOnce (my dream) software that would help in reducing this redundancy?

I'll be happy to hear down-to-earth comments on either current software or prospects of creating such a piece of software.

alaa
you need to check some of these bayesian algorithm based mail filters, you can train them by sorting your mail manualy, after a few hundred emails the program should have a fair idea about the classification you want.

I doubt however that there is a solution that will completley fill your need, you might need to do lots of hacking (this is pure AI so don't expect it to be easy).

cheers,
Alaa

shafaki
Specifically, it is NLP in AI.

As for training, one can use Neural Networks for training, but the trick will always be picking and getting the chritaria upon which training is to be based on.

alaa
I'm not sure neural networks would be the right choice here, since we are talking about natural language processing and complex text pattern recognition.
you need to be able to use higher level tools like regular expressions, the way I understand it nueral networks work best when the task could be divided into very very small goal oriented modules (but I'm probably wrong about that).

anyway a bayesian algorithm is what is used and AFAIK it is no nueral network (or am I wrong about that too??)

cheers,
Alaa

shafaki
I've only implemented the backpropagation neural net algorithm, yet failed to implement some other neural net ones. Perhaps the backpropagation would be fine here. I've seen a reaserch before in NLP using it.

Cheers,