Use Predictive Coding to Find Responsive Information and Lower eDiscovery Costs

Use Predictive Coding to Find Responsive Information and Lower eDiscovery Costs

Phil Yaccino

Senior Strategist, Information Governance

 The need to perform eDiscovery on electronically stored information (ESI) is becoming as regular as going to Starbucks® for coffee. Legal teams need to find ways to perform this function quicker and easier than they do today. The process needs to be accurate, repeatable and fully audited. More and more legal teams are turning to Predictive Coding as an alternative to manual review. This article will discuss what is predictive coding and how it can help turn a mountain of data into a mole hill of information.

On a daily basis, legal teams need to review electronically stored information (“ESI”) for productions.  This should be a streamlined process to identify responsive information.  Participants in an eDiscovery often find new and exciting ways to derail document review.  Consider a typical employment discrimination case where the Defense took the following positions:

  • ·        It would take 6-8 weeks for us to collect the requested ESI, which does not include reviewing the data for privilege, privacy, or confidentiality
  • ·        Sanctions will be sought if the Plaintiff does not provide search terms by a date set by the Defense
  • ·        The Defense is not under any obligation to provide electronic discovery unless and until there is full agreement on search terms
  • ·        The Defense reserves the right to “shift all fees and costs incurred in the collection, review, and production of ESI to Plaintiff and Plaintiff’s counsel.”1

There were many problems with the Defense view of their eDiscovery obligations, especially on cost-shifting.  In this case most judges would take the Defense to task over the legally unsupportable position that “it is not under any obligation to provide electronic discovery unless and until there is full agreement on search terms.” There would be more unsupportable arguments by the Defense, such as they would seek sanctions if the Plaintiffs did not provide search terms by a date set by the Defendants. Most courts would refer to this conduct as being neither professional nor a good faith meet and confer.

Courts will usually issue the following order for producing discovery:

  • ·        The Defense will “immediately design and implement a discovery plan to diligently and in good faith produce documents consistent with this order and schedule its production on a rolling basis
  • ·        Produce responsive documents consistent with the discovery rules
  • ·        Disclose the scope of their search, including “search terms, custodians or other limitations”
  • ·        Electronically stored information shall be produced in a format that extracts the text and preserves the metadata from native files.

This is an extreme case, but instead of turning discovery into trench warfare, it is usually cheaper to focus on setting up document review to produce responsive data.

Document review should further the goal of Federal Rule of Civil Procedure Rule 1 to “secure the just, speedy, and inexpensive determination of every action and proceeding.” One way to meet that goal is to leverage predictive coding with first and second pass review in responding to discovery requests. It is up to the requestor to create a clear, specific and executable Request For Production (RFP).

Preparing for Predictive Coding Document Review

The goal is to make identifying responsive ESI easier and for objecting with specificity if necessary.  In order to have the Transparent Predictive Coding system learn review criteria for classifying items as responsive, privileged, or any other issue code through input from expert reviews.  Essentially, you are training and teaching the system through review control and training sets.  Over time, the Transparent Predictive Coding system becomes:

  • ·        statistically stable
  • o  learning enough to correctly predict and rank training sets of documents
  • ·        Progressing to a point where it can be leveraged against your entire item population
  • ·        Enabling you to achieve high levels of review accuracy at significantly reduced time and cost.  

There are multiple steps in order to prepare for document review to be as efficient as possible.

A system’s flexibility allows you to follow different workflows depending on your review and case requirements.  For example, you may perform only some of the Transparent Predictive Coding steps, or perform the steps by varying the sequence, depending on the case review requirements and workflow.  As an iterative process, some steps or stages in the system may be repeated a number of times to refine the approach after a better understanding of the data emerges.  The following is a typical Predictive Coding workflow:

 

 

First Pass Document Review

Attorneys conducting “first pass document review” are doing the hard work of identifying ESI that could be responsive from data that is irrelevant or non-responsive.  The goal is for the to not have to wade through extensive documents to find what is relevant to the claims or defense in the case. 

One strategy for responding to discovery requests is for a case administrator or knowledgeable attorney to create saved searches for each discovery request.  These searches should be given names that are easily identifiable, such as “RFP001 – Contract Claims”.

The search terms for each request are taken directly from the request for production.  If the requesting party has done their job right, they have crafted a narrowly tailored request for production that asks for ESI between specific individuals, time periods, with concepts that can be searched for in the database.

 

 

We don’t live in a perfect world, and sometimes a request for production (RFP) is a strange house of cards where a requesting party wants all emails containing highly common phrases that are horrible search terms.  In such situations, letting the other side know their requests will likely contain a substantial number of false positive hits is a wise course of action. If the other side is not cooperative, that is not a free pass to skip responding to a discovery request.  The producing party still has an obligation to object and make a good faith effort to produce responsive documents.

After a search has been created for each request for production (RFP), these searches can be used to create assignments for the review team.  The producing party can address each specific request for production on the size of the dataset, or other relevant information to the motion.

The reviewing attorneys should rate highly relevant documents as hot, responsive documents as warm, and irrelevant documents as cold. There are other ways to code, but whatever model a team follows, be consistent. If a document is rated as hot or warm, it should also be coded as responsive to specific requests for production. Anything privileged should be coded properly.

There is a high probability that ESI responsive to one request for production, could be responsive to another. Reviewing attorneys should keep the big picture in mind and issue code whether the ESI is responsive to any other requests. In order to avoid reviewers being assigned documents that have already been reviewed by another attorney, the review criteria for subsequent assignments can contain an exclusion for ESI that has already been coded responsive to any discovery request.


 

The reviewing attorneys should provide feedback on the search terms used to find responsive ESI.  Additional inclusion or exclusion criteria can be used to help focus on the relevant data. Moreover, completely new searches could be warranted, if terms are discovered in review that could impact the scope of the search.  Ideally, if reviewing attorneys report they have found newsletters, marketing emails, eBay alerts, or similar irrelevant information, those communications can be searched for and bulk tagged as both cold and irrelevant.

 

Applying Predictive Coding

One of the first steps to successfully using Predictive Coding for review is to create a control set. The purpose of this control set is to provide data that can be leveraged against system training sets for system learning. Once your system understands what information is responsive for the matter at hand, you will allow it to evaluate the quality of the control set.  Once you are satisfied with the results, it is time to move forward.

Reviewing attorneys have multiple options for leveraging predictive coding to focus on relevant or responsive information.

If search terms yield a large number of hits, applying a predictive coding model to the hits can help narrow the dataset in an assignment. This first requires training the predictive coding model and testing it against the data. This can be done by focusing on data that is not well reviewed within the database or data that is well reviewed. If the predictive coding has been well trained, this can help narrow the searches to data with a high precision of being responsive.


Another option is as attorneys near finishing their first pass review, is to crosscheck the database with the prediction model for ESI rated hot or warm and coded as responsive to the Requests for Production. The predictive coding can identify responsive data that was not identified by the search terms, but is nevertheless responsive.

 

Second Pass Document Review

Quality assurance is the name of the game with second pass review.  There are multiple approaches for second pass review.  One tactic is to make sure what has been coded as responsive actually is responsive.  Another method is to focus on ESI that has been rated hot, warm, and responsive to specific discovery requests before being produced to the other side.  More experienced attorneys could also apply confidentiality or attorney’s eyes designations to responsive ESI.  

Another option is to focus on identifying any privileged communications that could be within the responsive dataset.  This requires conducting searches over the dataset for attorney email addresses or other key phrases that would contain privileged information. 

It is a good quality assurance practice to sample the ESI from the first pass review that was rated cold and irrelevant.  While the reviewing attorneys most likely made the correct judgment call, sampling the data, or running a prediction model against the cold ESI, is a good way to “trust, but verify” the first pass review.

Final Review Before Production

Responsive ESI identified from the first and second pass reviews can have one validation before production. This can be done with searches within the assignment for specific issue coding, such as proper designations, or known privileged information.  After the ESI has been checked, it can be prepared for final production and presentation to opposing council.

The Bottom Line

EDiscovery can be complicated, but it is more complicated without a plan. Legal teams face this challenge daily. Leveraging search terms, assignments, and predictive coding can help attorneys have a dynamic workflow to find responsive ESI to respond to eDiscovery requests.  The easier we can make this process for them, the more we add value for our customers.

The purpose of the Veritas eDiscovery Platform is to help legal teams find the needle in the haystack. The simplest way to find that needle is to set the haystack on fire. However, this is not possible in the eDiscovery world. Instead the eDiscovery Platform takes a two pronged approach. First shrink the haystack, then provide workflows, intelligent analytics and predictive coding that help our customer find the needle with the least amount of time and effort.

The eDiscovery Platform shifts the focus of the project from a static, mindless cull down of the data, to an interactive and intelligent workflow. Our interactive and intelligent process helps our customers cull down their data sets significantly. This reduces the amount of time spent on eDiscovery and ultimately drives down costs. These reduced costs can be significant and can save our customers 60% or more compared to their more traditional eDiscovery processes.

 

To view or add a comment, sign in

More articles by Phil Yaccino

Others also viewed

Explore content categories