Please ensure Javascript is enabled for purposes of website accessibility

W. Lawrence Wescott II: Predictive coding in electronic discovery

To many lawyers, the entire electronic discovery process is inimical to Federal Rule 1, that the federal rules “should be construed and administered to secure the just, speedy, and inexpensive determination of every action and proceeding.” When confronted with thousands of emails and electronic documents, the idea of securing a “speedy and inexpensive determination” appears futile.

Many e-discovery vendors have advertised technological solutions to the problem of large scale review of documents. One such tool is predictive coding.

Predictive coding involves the selection of a core set of documents by counsel intimately familiar with the case. Those lawyers then code the documents according to the relevant issues of the case, and a computer develops an algorithm based upon the coding. This process is repeated iteratively until counsel are reasonably satisfied that the computer can determine which documents are relevant.

Predictive coding is a component of a strategy developed by noted e-discovery expert and blogger Ralph Losey. In a recent post on his e-Discovery Team® blog, Losey introduced the concept of Bottom Line Driven Proportional Review (click here). The bottom line, the ultimate cost of production, is determined through a cost-estimating process, based upon “projected review costs, defensible culling, and best practices of review.” The number of documents the producing party will review is based upon that estimate of what they are “willing, or required, to pay for the production.” The estimate also takes into account the merits of the case and the production costs. The resulting production budget “should be proportional to the monies and issues in the case.” Thus, the budget is governed by the proportionality principle embodied in Rule 26(b)(2)(C).

Key to the success of Losey’s strategy is the use of “smart” search techniques (along with appropriate quality control techniques) to find the documents permitted by the budget. He suggests that predictive coding is well suited for bottom-line driven review. Losey states that “[p]redictive coding is inherently rank based and so it makes bottom line driven review especially easy to do.”

U.S. Magistrate Judge Andrew J. Peck recently pointed out a key stumbling block to the adoption of computer-assisted technology such as predictive coding. Writing in Law Technology Review in October (registration required),  Judge Peck observed that as of that writing, no reported case had ruled on the use of computer-assisted review. He surmised that many attorneys were waiting for an opinion approving its use. Somewhat tongue-in-cheek, Judge Peck stated:

Perhaps they are looking for an opinion concluding that: “It is the opinion of this court that the use of predictive coding is a proper and acceptable means of conducting searches under the Federal Rules of Civil Procedure, and furthermore that the software provided for this purpose by [insert name of your favorite vendor] is the software of choice in this court.” If so, it will be a long wait.

Last month, though, Peck decided to take matters into his own hands. In Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC)(AJP), 2012 U.S. Dist LEXIS 23350 (S.D.N.Y. Feb. 24, 2012), the magistrate approved the use of predictive coding.

The parties had actually agreed that defendants could utilize the process, but disputed its scope and implementation. Peck resolved those disputes in favor of the ultimate use of the technology.

In Moore, five named plaintiffs are suing Publicis Groupe, “one of the world’s ‘big four’ advertising conglomerates,” and its United States public relations subsidiary, MSL Group, alleging “systemic, company-wide gender discrimination against female PR employees,” according to the opinion.

Both parties realized that the best approach to dealing with more than 3 million electronic documents was to take a phased approach. One of the issues addressed by the court was deciding which custodians would be included in the initial phase, as well as the sources of ESI to include.

The parties agreed to take a random sample of the entire email collection at the 95 percent confidence level to provide a “seed set” of 2,399 documents to train the software. The review would be conducted by senior attorneys. MSL Group agreed to provide the documents to plaintiffs, who could add two more sets of issue tags, which would be incorporated into the system coding. Four thousand additional documents would be generated through keyword searches from both defendants and plaintiffs, and all non-privileged documents comprising the seed set would be provided to plaintiffs, whether relevant or not.

The court agreed to seven rounds of iteration; i.e. in each round, 500 documents would be reviewed to determine whether the computer was returning relevant documents. After the seventh round, a random sample (2,399) of document discards would be reviewed to insure that the discards were, in fact, not relevant. The court reserved the right to order additional iterative rounds if plaintiffs objected to the results.

In approving the use of predictive coding, the court cited the agreement of the parties to use it (although they differed in questions of implementation), the large number of documents to be reviewed, the superiority of computer-assisted review to the alternatives (manual review or keyword searches), the principles of proportionality found in Rule 26(b), and the transparency of the defendants in this case — for example, defendants proposed to allow plaintiffs to see all of their coding of the initial non-privileged seed set of documents.

Peck concluded:

What the Bar should take away from this Opinion is that computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review.

Given the fact that the parties had basically agreed upon its use, and that the defendants were extraordinarily transparent about the process (the court noted that “not all experienced ESI counsel believe it necessary to be as transparent as [defendants]”), the case was probably close to the ideal case for the use of predictive coding. A closer question is the use of the technology when one party objects to its use.

As both the judge and Losey have recognized, mere use of predictive coding is not sufficient; an appropriate process must be designed with appropriate quality control procedures, which also take into account the principles of Rule 1 and Rule 26(b) proportionality. Nevertheless, the approval of predictive coding in Moore will undoubtedly lead more attorneys to consider its use in large cases.

W. Lawrence Wescott II, Esq., a former IT manager and database development manager, is an e-discovery consultant. He chairs the Technology Committee of the Maryland State Bar Association’s Litigation Section. He can be reached at