Predictive coding as an appropriate method for reducing the number of electronically stored documents that attorneys must review in eDiscovery has cleared another judicial hurdle. In the closely watched Global Aerospace case in Virginia, the defendants recently produced their discovery, which they had culled using OrcaTec’s predictive coding system. With no objections from the plaintiffs, the case for using predicting coding has become stronger.
The interest in this state-court case arose from the fact that it was the first in the U.S. in which predictive coding was ordered over the opposing party’s objection. In April, Judge James Chamblin, in Loudoun Circuit Court, overruled the plaintiffs’ objections that predictive coding is not as effective as purely human review, and signed a protective order approving the use of predictive coding technology to process electronically stored information for purposes of discovery.
The defense, under the leadership of Thomas C. Gricks III, chair of the e-Discovery Practice Group and a partner in Philadelphia-based Schnader Harrison Segal and Lewis LLP, was attempting to reduce the 8 terrabytes of collected electronic information into a sufficiently small number of documents. He asked the court to allow predictive coding to get the document set down to a size that he and his legal team could actually review in a reasonable time frame in order for them to understand the merits of their own case.
“We needed to find a way to cut down the cost of eDiscovery and use the money on litigation instead of useless document review,” Gricks said, in a conversation on Jan. 18. “We needed to get rid of the information we didn’t need as effectively as possible.”
Recruiting the assistance of predictive-coding advocate Karl Schieneman of Review Less, Gricks sent the document set to Cleveland eDiscovery company JurInnov, which used the generally accepted techniques of de-duplication and de-NISTing (removal of system files), and otherwise reduced the 8TB of information into 270GB that could be loaded into the OrcaTec system for indexing and predictive coding.
Gricks said he chose the OrcaTec system because it was unique in using random sampling for its predictive coding instead of key words or seed sets. The system requires only a single subject-matter expert, whom it provides with sets of completely random sample documents.
The expert codes each document as responsive or non-responsive. As the expert codes, the system learns which documents the expert is going to approve or disapprove.
“I prefer random sampling to seeding,” Gricks said. “It’s going to allow me to use one person to do the coding, which will give me consistency in knowing whether a document trail is relevant or not.”
Gricks had promised the court the defense would show a “recall” rate of 75 percent or better — that is, that 75 percent of the responsive documents in the set had been found by predictive coding. Human review has repeatedly been shown to be 50 percent or lower.
After the reviewer spent just a few days coding documents, Gricks said, he, the coder and OrcaTec determined the set to be sufficient to reach the 75 percent cutoff. Gricks then met with two lawyers on the other side to show them the documents that were now coded as a training set.
He said those attorneys had comments, so he agreed to re-run the predictive coding to add some additional documents they requested to the training set.
Ultimately, 5,000 documents were coded and used as the training set for the predictive coding. When OrcaTec ran the predictive coding process, the 1.3 million document set was reduced by 83 percent.
The team then needed to determine statistically that it had reached the 75 percent recall threshold for the court. They pulled 385 documents from the putatively responsive and 385 from the putatively non-responsive set for quality control.
“That’s the number of documents to test to get 95 percent confidence level on the recall number,” Gricks said. “On the basis of what we agreed to do, we’re 95 percent confident that it was 81 percent recall.”
Upon linear review of the remaining documents, Gricks said, he additionally found that 80 percent of the documents the OrcaTec system had predicted would be responsive were, in fact, responsive. In statistics, this number is called precision.
The final production set was 173,000 documents, accepted without objection by the plaintiffs. Had the documents been reviewed manually, Gricks estimated the culling process would have taken about 20,000 man-hours at a cost of $1.5 million. The use of predictive coding not only saved the client a huge sum of money, but resulted in more relevant production within a short timeframe.
The benefits of using predictive coding are apparent: reducing data sets to only relevant documents for attorneys to review saves large amounts of time and money. Now, with the acceptance of Global Aerospace, perhaps more litigants can take advantage of that savings with less fear.
W. Lawrence Wescott II, Esq., a former IT manager and database development manager, is an e-discovery consultant. He is currently chair of the Technology Committee of the Maryland State Bar Association’s Litigation Section. He can be reached at [email protected]