Peer to Peer: ILTA's Quarterly Magazine
Issue link: https://epubs.iltanet.org/i/1508143
37 I L T A N E T . O R G C oncepts like predictive coding, machine learning and AI are leveraged in technology-assisted review models in the eDiscovery world today to allow for more efficient review of documents for the purposes of production. TAR can also be used to exclude documents entirely from review for production altogether – if such a workflow is defensible and agreed to by opposing counsel and/or the relevant governing body. But beyond how these utilities are used today, there are other ways TAR can be leveraged to provide quicker access to important information, shorter lead times and more direct routes to identify such information and to reduce costs for several nonstandard workflows. Before diving into potential nonstandard utilities of TAR, it's important to have consensus on how TAR buzzwords such as predictive coding, supervised machine learning and AI are used, at least for the purposes of this article. In eDiscovery, predictive coding is a way to automate the review process by leveraging supervised machine learning algorithms. While there are different types of predictive algorithms, most predictive coding tools bubble up documents based on previous review decisions by humans. This typically works by taking information gained from manual coding and automating that logic to apply to a larger group of documents. Predictive coding is just one form of TAR, which is itself a broader category that encompasses many uses of technology in the document review process. Predictive coding does not, for example, replace the important tasks of culling or early case assessment in the review process. The technology most commonly behind current TAR models is supervised machine learning, a subset of AI. Supervised machine learning employs labeled data sets to train algorithms to classify data or predict outcomes accurately. There are currently two main approaches to TAR using supervised machine learning: • TAR 1.0 (Predictive coding): The OG of TAR, predictive coding involves loading a seed set of documents into the system to train the predictive algorithm. The quality of the results will depend on the quality of this original seed set, which can only be altered by adding subsequent sample sets with updated coding to retrain the machine. • TAR 2.0 (Continuous active learning or CAL): Developed more recently, CAL does not require a seed set. The algorithm automatically learns from the reviewers' decisions and starts feeding the review team what it determines are the most relevant documents based on the purpose of the model. The system gains intelligence as it receives further inputs (decisions) from human reviewers. Because of its simplicity and increasingly accurate results, the CAL model is usually the choice for innovative legal teams as it has several advantages. In particular, CAL has been shown to reach higher levels of recall and identifies a greater number of relevant documents more quickly and with less effort. CAL also has more flexibility and can adapt to changes in the scope of discovery and the receipt of additional data because it continues to train throughout the life of the review process. For nearly two decades, TAR has been used in eDiscovery matters to reduce costs, increase efficiencies and speed up manual review of documents for the purposes of production. To date, TAR models have been implemented to classify documents based on binary decisions – responsive/not responsive or relevant/ not relevant. But the innovation and improvements in