Proportionality in eDiscovery supported by metadata collection and analysis

Forexus Data theftSince the famous statement by Michael Hayden, former NSA and CIA director, saying: 'We kill people based on metadata' (article) the importance of metadata has been raised to another level. In eDiscovery metadata can help to eliminate large amounts of data, apply proportionality and reduce the total costs in an investigation.

Proportionality as in US Federal Rules of Civil Procedure 26(b)(2)(C) is a principle, which ensures that the eDiscovery costs are proportional/balanced to the size of a legal case. In other words, it might be of a great value for a case to apply hundred of keywords, incorporate lots of data sources, review all data, etc. however the burden it would constitute for a defendant would be out of proportion.

The need for proportionality is increasing because of the exploding capacity of storage systems, as well as the amount of the data itself. The cost of one gigabyte of data in 1985 was around $100.000. Ten years later, in 1995, the cost was houndred-times smaller - around $1.000. This year, the average cost per gigabyte is around $0.02. This decrease in storage costs means that more data is being produced and stored each day, which causes an increase in costs for eDiscovery. And proportionality can help to keep the costs under control.

During the pre-processing phase of each eDiscovery case the collected ESI (electronically stored information) can be culled very efficiently based on the metadata and other data properties (like MD5 hashes). This requires that the right set of metadata was preserved before and during the collection. I have often seen that not enough attention was paid to the metadata during the collection. Sometimes the important information available in the metadata was not collected at all. The more information we have about the data, the easier it is to classify it and ultimately come to a conclusion if the data can be of interest or not. Based on this experience, I would like to share some best practices on how to limit the total size of all the group share collections and the amount of the data that is involved in the eDiscovery process:

Before starting the collection:

  • If there is a possibility (during an interview or via a questionnaire), ask the custodians where they save the data on the (corporate) network. This can be difficult in cases when the custodian is not cooperative or left the organization.
  • Ask the IT administrator to produce reports for the individual custodians, listing all the access permissions to the network shares. The reports typically contain only the current permissions set as historical snapshots are often not available. Such reports provide great value for eDiscovery practitioners.

During the collection:

  • Make sure to collect the files and folders with the original time stamps. These are often referred to as MAC times (modified, accessed and created timestamps).
  • Most of the live acquisition tools for collection of data fail to preserve or don’t preserve at all ownership and permissions metadata of the file systems on network shares. For that reason it is necessary to request a list of permissions and ownership information for all data collected over the network (one can use icacls / cacls tool in MS Windows environments or ls -l in Unix / Linux environments).
  • Make sure to always collect the whole path starting from the root of the share as this can also be valuable information for the targeted review of data.
  • Use a forensic tool which for the data collection which supports long paths (i.e. paths exceeding the 260 character length limit). Most of the standard tools available for MS Windows, as well as the MS Windows API itself, usually skip the files or directories with long names without even raising an error.

How can the (properly) collected metadata help with defining proportionality? How can it reduce the cost and time required for processing and review?

  1. It is a standard practice to reduce the data set based on the investigation time frame. This can be performed based on the MAC times of the files.
  2. Process data that:
    • the custodians indicated during the interview, or
    • the custodians had access to, based on the access report obtained from the IT administrator.
  3. It should be considered whether to process and review only files owned by a particular custodian or group of users (based on the file system information) or created by a custodian. The latter is usually based on the metadata from MS Office files and some other file formats which provide such metadata.
  4. Exclude files based on the path names as they can indicate a high probability of irrelevancy for the case.

Defining proportionality for a particular case can be very challenging process. It involves both technical and legal aspects. The application of any of the steps mentioned above needs to be carefully considered and discussed between experienced legal and eDiscovery practitioners.

Should you be interested in more technical details please contact us.