Digital White Papers

KMMT24

publication of the International Legal Technology Association

Issue link: https://epubs.iltanet.org/i/1529627

Contents of this Issue

Navigation

Page 18 of 21

I L T A W H I T E P A P E R | K N O W L E D G E M A N A G E M E N T & M A R K E T I N G T E C H N O L O G I E S 19 R O B U S T R A G - B A S E D L E G A L Q U E S T I O N A N S W E R I N G S Y S T E M S F O R K N O W L E D G E M A N A G E M E N T within the RAG architecture can be adapted to build a more accurate legal question-answering system. The four challenges we discuss are Exhaustive Information, Aggregate Questions, Multi- hop Questions, and Overlapping Content. In each subsection, we delve into the challenge in greater depth and propose a solution with a more robust approach. Exhaustive Information Standard RAG systems are not designed for exhaustive retrieval, often returning a limited number of passages, which can lead to critical omissions in legal contexts where thoroughness is essential. To address this, we propose a clarification mechanism that can independently refine broad queries, narrow the search space, and retrieve information from each document. A clarification mechanism ensures exhaustive coverage and improves precision and recall when handling complex legal queries. A substantial challenge in applying Retrieval Augmented Generation (RAG) to legal question answering is the need for exhaustive information retrieval across large document sets. Legal queries often require thorough searches to identify and retain all critical details. RAG systems, optimized for efficiency, return a limited number of passages (e.g., the top 5 or 10), potentially omitting essential information. Increasing the number of retrieved documents may improve recall but can add computational overhead and latency, especially when generating answers from multiple passages. For instance, consider the query: "What are all the indemnification clauses in contracts between Company A and its vendors?" The system must retrieve every relevant contract and identify each indemnification clause. This process requires potentially hundreds of contracts to be retrieved. Standard RAG systems, constrained by passage limits, may miss critical documents if they rank lower in retrieval. A more effective approach includes a clarification mechanism that prompts users for specific details when the initial query is too broad. Legal questions often require narrowing to avoid overwhelming or irrelevant document retrieval. By asking for clarifications—such as specifying the contract, party, or time frame—the system can reduce unnecessary retrieval and focus on the most relevant documents, improving precision and recall. Additionally, retrieving information from each document independently, rather than generating an answer based on a combined set of top-ranked passages, ensures exhaustive coverage. In standard RAG systems, answers are often generated from the highest-ranked passages across multiple documents, risking the omission of essential details from lower-ranked but relevant documents. By extracting information from each document individually, the system ensures no relevant content is missed, enabling more accurate responses to complex legal queries.

Articles in this issue

Archives of this issue

view archives of Digital White Papers - KMMT24