Progress so far

April 15, 2010
by Kwaku Yeboah-Antwi (ky10)

Kwaku unfortunately lost some work due to his HD dying so we lost some work.

But so far, this is what we’ve done so far.

  • Grab list of emails
  • Read emails unto stack
    • Each email is concatenated into one (super) long string
    • Each email will be a spam email
    • (define read
      (lambda (type)
      (lambda (state)
      (let (temp "")
      (unless (eof-object? (peek-char input))
      (write ((read-char in) output))
      (string-append temp output));;
      (push temp type state)))))
  • GP will try to evolve regex’s for the emails — In Progress
    • One problem we have to figure out is how to make our system doesn’t try to evolve one catch-all gigantic regex(which just horribly fails)
    • Have some ideas (actually not really) but have to talk to Lee
  • Fitness function will apply regex to another sample of emails. — In Progress

New Timeline

April 7, 2010
by Edward Alexander-Gill (eja08)

This Week:
– Finish implementing RegEx as a stack in Schush
– Acquire list of emails that are classified as spam (either from database or from actual email collection as marked)

Next Week:
– Figure out a way of extracting relevant lexical information from email bodies, and how to divide it into strings
– Decide on a set of operations and code modification types that would be useful for creating a filter

Week after that:
– Try to get something running in some way
– Pray that it works

Week After after that
– Find out a way to realistically implement it
– Analyze methods that were found using Genetic Methods

Last Week?
– Compile findings
– Package everything somehow (finding a way to implement it, put it online for other people to use, etc.)


Tentative Schedule

March 30, 2010
by Kwaku Yeboah-Antwi (ky10)

We are both learning scheme
– We need to budget some time to get conversant with scheme

(WEEK 1)
– We need to figure out where to implement it
(thunderbird/evolution/outlook/spam assassin/spamd ?)

– Need to figure out how email spam flagging is done
—-Check out Spam Assassin documentation
— x-spam-flag, x-spam-score, x-spam-status
– Keep learning scheme

(WEEK 2)
We need to research and find prior work on this.
Should be conversant with scheme enough to start serious by now.
Come up with a flowchart of how exactly this should be implemented.

(WEEK 3)
modify schush to support regexp and strings
Find a way to generate a population of words
fake it

(WEEK 4)

More programming

(WEEK 5)

More programming

(WEEK 6)
Even more programming
Present solution(hopefully)

\\\\- We need to get a list of emails(preferably from personal inbox, spam and
all. )

– Woohoo, actually write it