Yesterday, I received a press release from Nuix (and a similar release was sent out by EDRM) saying that Nuix and EDRM had republished the EDRM Enron PST Data Set after cleansing it of private, health and personal financial information.
A portion of the Nuix release said:
"The EDRM Enron data set is an industry-standard collection of email data that the legal profession has used for many years for eDiscovery training and testing. However, it was well known to contain large amounts of personal information about the company’s former employees."
The only part of that paragraph I quibble with is "it was well known." It was certainly well known to those who used the data and to certain others in the EDD sector. But as this blog has indicated in previous posts, the extent of personal information in the data set was unknown to many.
Nonetheless, I applaud the Nuix folks and EDRM for cleaning the data set of more than 10,000 e-mails and attachments of such things as credit card numbers, social security numbers, dates of birth and other personal information.
To download the cleansed data set and the case study that explains the methodology used, visit here.
Nuix will host a Twitter chat to discuss the release of the cleansed EDRM Enron PST Data Set on Thursday, May 23rd 2pm–3pm ET. Its experts will describe the process of identifying unsecured financial, health and personally identifiable information in corporate data. You can follow the hashtag #NuixChat and send in your questions beforehand to @nuix.
E-mail: snelson@senseient.com Phone: 703-359-0700

