Progress so far

April 15, 2010
by Kwaku Yeboah Antwi (ky10)

Kwaku unfortunately lost some work due to his HD dying so we lost some work.

But so far, this is what we’ve done so far.

  • Grab list of emails
  • Read emails unto stack
    • Each email is concatenated into one (super) long string
    • Each email will be a spam email
    • (define read
      (lambda (type)
      (lambda (state)
      (let (temp "")
      (unless (eof-object? (peek-char input))
      (write ((read-char in) output))
      (string-append temp output));;
      (push temp type state)))))
  • GP will try to evolve regex’s for the emails — In Progress
    • One problem we have to figure out is how to make our system doesn’t try to evolve one catch-all gigantic regex(which just horribly fails)
    • Have some ideas (actually not really) but have to talk to Lee
  • Fitness function will apply regex to another sample of emails. — In Progress

New Timeline

April 7, 2010
by Edward Alexander-Gill (eja08)

This Week:
– Finish implementing RegEx as a stack in Schush
– Acquire list of emails that are classified as spam (either from database or from actual email collection as marked)

Next Week:
– Figure out a way of extracting relevant lexical information from email bodies, and how to divide it into strings
– Decide on a set of operations and code modification types that would be useful for creating a filter

Week after that:
– Try to get something running in some way
– Pray that it works

Week After after that
– Find out a way to realistically implement it
– Analyze methods that were found using Genetic Methods

Last Week?
– Compile findings
– Package everything somehow (finding a way to implement it, put it online for other people to use, etc.)

-Ted


Tentative Schedule

March 30, 2010
by Kwaku Yeboah Antwi (ky10)

WEEK 0 – SPRING BREAK
We are both learning scheme
– We need to budget some time to get conversant with scheme

(WEEK 1)
– We need to figure out where to implement it
(thunderbird/evolution/outlook/spam assassin/spamd ?)

– Need to figure out how email spam flagging is done
—-Check out Spam Assassin documentation
— x-spam-flag, x-spam-score, x-spam-status
– Keep learning scheme

(WEEK 2)
We need to research and find prior work on this.
Should be conversant with scheme enough to start serious by now.
Come up with a flowchart of how exactly this should be implemented.

(WEEK 3)
modify schush to support regexp and strings
Find a way to generate a population of words
fake it

(WEEK 4)

More programming

(WEEK 5)

More programming

(WEEK 6)
Even more programming
Present solution(hopefully)

\\\\- We need to get a list of emails(preferably from personal inbox, spam and
all. )

– Woohoo, actually write it