publication of the International Legal Technology Association
Issue link: https://epubs.iltanet.org/i/1529627
I L T A W H I T E P A P E R | K N O W L E D G E M A N A G E M E N T & M A R K E T I N G T E C H N O L O G I E S 21 R O B U S T R A G - B A S E D L E G A L Q U E S T I O N A N S W E R I N G S Y S T E M S F O R K N O W L E D G E M A N A G E M E N T whereas answering one part of a question depends on retrieving specific information that informs the next part. Consider the example: "What is the limitation of liability in the MSAs with ABC Corporation governed by the laws of Delaware?". This multi-hop question requires several steps: First, the system locates the MSAs between the company and ABC Corporation. Next, it identifies which MSAs Delaware law provides governance for. Finally, it extracts the limitation of liability clauses and synthesizes them into a complete answer. The above is a classic case of multi-hop reasoning, where each step builds on the previous one. The system must retrieve information and logically combine and process multiple layers to reach the final answer. Standard RAG systems struggle because they focus on retrieving relevant context, not performing complex reasoning. Multi-hop questions require the system to follow a logical chain—identifying, cross-referencing, and synthesizing information to conclude. Multi-hop questions differ from simple queries that collect information from different contexts. In multi-hop queries, the answer to one part informs and narrows down to the next. One approach to solving multi-hop legal QA involves breaking the query into smaller components. These atomic questions tackle individual parts of the query in sequence. The key steps are: • Step 1: Break Down the Question – The system decomposes the complex question into atomic parts. For example, one atomic question might be: "Which MSAs with ABC Corporation are governed by Delaware law?" Another might be: "What is the limitation of liability in those MSAs?" • Step 2: Recognize the Order of Resolution – The system must follow the logical sequence after identifying the atomic parts. First, determine the MSAs governed by Delaware law, then extract the limitation of liability. • Step 3: Solve Atomic Questions Serially – The system solves the atomic questions individually. Solving the first part provides context for the next, ultimately leading to the final answer. The challenge lies in identifying atomic questions, which requires understanding the query structure and where relevant information resides, often scattered across documents. This structured reasoning approach overcomes the limitations of standard RAG systems, which struggle with multi-step inference. Overlapping Content Legal documents typically contain a lot of overlapping text, making conventional retrieval difficult in large repositories, as standard RAG systems struggle to find relevant context by retrieving most similar documents effectively. To improve accuracy, implementing document subsetting narrows the search space while ensuring relevant context from similar documents is not excluded. Legal documents, such as contracts, frequently contain substantial portions of identical text, with only a few key variables, such as entity names or dates, differing between documents. This results in a higher probability of retrieving similar yet incorrect documents, lowering retrieval accuracy. Moreover, legal repositories often contain tens of thousands of documents that differ in type, involve various counterparties, and span multiple jurisdictions. Standard RAG systems typically fail to account for this complexity, as they do not leverage metadata to filter source documents, leading to an unnecessarily broad search space and further reducing accuracy. Document subsetting can be applied before the RAG stage to address considerable text overlap. Each document is tagged with standardized metadata, such as agreement type (e.g., MSA, NDA), party names, and governing jurisdiction. Upon receiving a query, the system uses this metadata to narrow the document set, ensuring only the most relevant documents are searched. For instance, in a query like "What is the limitation of liability in the MSAs with ABC Corporation governed by Delaware law?", the system identifies vital entities —"ABC Corporation," "MSA," and "Delaware"— and filters the repository to include only relevant documents. This step reduces overlap and improves retrieval accuracy.