Digital White Papers

KMMT24

publication of the International Legal Technology Association

Issue link: https://epubs.iltanet.org/i/1529627

Contents of this Issue

Navigation

Page 19 of 21

I L T A W H I T E P A P E R | K N O W L E D G E M A N A G E M E N T & M A R K E T I N G T E C H N O L O G I E S 20 R O B U S T R A G - B A S E D L E G A L Q U E S T I O N A N S W E R I N G S Y S T E M S F O R K N O W L E D G E M A N A G E M E N T While multi-hop, or multi-part, questions are a general challenge in information retrieval systems; they become particularly pronounced in the legal domain due to the complexity and structure of legal documents. Aggregate Questions Legal queries often require exhaustive retrieval and tasks like counting and listing across documents, where LLMs struggle. Our proposed hybrid approach combines document retrieval with SQL- like structured data processing, transforming queries and results into structured formats for precise counting, listing, and filtering tailored to legal analysis. In the legal domain, many queries are not limited to retrieving facts or passages from a single or small set of documents but instead require aggregation of information across many documents. Questions such as "Which of the contracts contain an indemnity clause?" or "How many contracts were amended in the last year?" introduce an additional layer of complexity for Retrieval Augmented Generation (RAG) systems. These types of aggregate questions demand exhaustive coverage of relevant documents and involve performing aggregation tasks like counting, which LLMs are not known to be great at. To address aggregate questions in the legal domain, we propose a two-step solution that combines exhaustive retrieval with structured data processing through Structured Query Language (SQL)-like queries. Structured Query Language is a domain-specific language used to manage data, especially in a database management system. This approach leverages RAG systems and LLMs, addressing their limitations in tasks like counting or listing. First, the system exhaustively retrieves relevant information from each document, ensuring no data is missed. For example, in response to the query "Which contracts contain an arbitration clause?", the system retrieves relevant passages from all applicable contracts, ensuring comprehensive coverage. Next, the retrieved data is transformed into a structured, SQL- compatible format. The LLM translates the original query into an SQL-like command. Finally, the SQL query is executed on the structured data, allowing for precise aggregation, such as counting or filtering. This approach delivers accurate and reliable answers to complex legal queries by combining exhaustive retrieval and structured querying. Multi-Hop Questions Multi-hop legal questions require sequential reasoning across multiple documents, which standard RAG systems handle poorly. We synthesize accurate responses by breaking complex queries into atomic steps and solving them sequentially. This structured process ensures logical progression, overcoming the limitations of standard retrieval methods. While multi-hop, or multi-part, questions are a general challenge in information retrieval systems; they become particularly pronounced in the legal domain due to the complexity and structure of legal documents. Legal queries often require sequential reasoning across multiple pieces of information within or across documents,

Articles in this issue

Archives of this issue

view archives of Digital White Papers - KMMT24