Thursday, August 23, 2007

E-Discovery Searches are Inadequate

Many e-discovery efforts focus on two things: date range searches and searches for email addresses. I'd like to suggest that these are inadequate, and what you can do to really find the messages you're looking for.

The main problem with searching addresses is that they are not normalized. They come in myriad formats:




These are just examples-- there are others. The point is that these all refer to the same user. On the other hand, you might end up with different users sharing the same address (which Joe Smith were we referring to during the three year period covered by the e-discovery?).

Dates in email are really completely random, unless you are referring to dates in the received lines. Alternately, you could keep metainformation about the email, i.e. the date that it was delivered to the journal, etc.

Emails need to have the current contextual information applied at the time of archive insertion. At a minimum, I would suggest looking at inserting unique identifier for the user (something like an employee id), what department the user is in, whether the user is an executive, whether the email contains potentially proprietary information, and whether the email is potentially privileged.

It would also be a good time to set retention policies and flag non-business mail, but that's a discussion for another day.

