publication of the International Legal Technology Association
Issue link: https://epubs.iltanet.org/i/1545606
© ILTA and Thomson Reuters 2026. Appendix A — Definitions & Core Technical Concepts This is not a definitive glossary but provides key terms that are essential for understanding legal use of AI and functioning effectively in AI-enabled legal environments. Understanding them at a working level — not a technical level — can help you ask the right questions, understand how AI models produce their outputs, and evaluate those outputs more critically. Terminolo is evolving rapidly, and different vendors, platforms, and professional bodies use the same terms in different ways. Bias: Systematic skewing of AI outputs based on patterns in training data. In legal contexts, examples of biased outputs include: skewed litigation outcome predictions, biased document review prioritization, or disparate output quality across jurisdictions, practice areas, or demographic groups. Black Box: An AI system whose internal decision-mak- ing process is not transparent or explainable to users. Many predictive AI models operate as black boxes, which can complicate validation and challenge discoverability in litigation. Context Window: The maximum amount of text an AI system can "see" and process in a single session, including prompts, uploaded documents, and chat history. A short context window means the system may not be able to process a lengthy contract, deposition transcript, or case file in a single pass, potentially caus- ing it to miss information or lose coherence across a long document. Data Sovereignty: The legal and contractual principle governing where data is stored, processed, and controlled. Lawyers must understand data sover- eignty implications before inputting client information into any AI platform operating across jurisdictions. Hallucination: The tendency of Generative AI to produce confident but factually incorrect output — including fabricated case citations, invented statutes, and inaccurate legal holdings. This is among the most significant accuracy risks in GenAI for legal workflows. Human-in-the-Loop (HITL): A workflow design where a human reviews and approves AI-generated outputs be- fore they are acted upon. The ABA and most state bars effectively require HITL for consequential legal tasks. Large Language Model (LLM): The underlying technol- ogy in most Generative AI tools, including legal draft- ing and research. LLMs are trained on enormous text datasets. When given a prompt, the LLMs use learned patterns to generate responses. They are predictive lan- guage tools; they are not thinking systems. Unlike search engines, LLMs do not retrieve verified information, so their output must always be independently checked. Machine Learning (ML): The branch of AI in which systems learn f rom data rather than f rom explicit programming. Predictive coding and technology- assisted review are examples of machine learning applications. Natural Language Processing (NLP): The capability that allows AI systems to understand, interpret, and interact with human language. For example, it is what allows a user to query a research database in plain English rather than Boolean search strings. Predictive Coding: Use of machine learning models trained on human-reviewed examples to classify, rank, or prioritize large volumes of documents based on relevance, privilege, or responsiveness, including ediscovery, contract review, and legal research. Prompt: The instruction or query input a user provides to a Generative AI system. Prompt construction is crit- ical — the quality and specificity of the prompt signifi- cantly affects the quality of the output — a discipline sometimes called prompt engineering. Reasoning Model: Advanced models that work through a problem step-by-step — considering multiple angles, statutory elements, and testing possibilities — before producing its output, similar in structure to how a lawyer might think through an issue before writing a conclusion. Also called Extended Thinking. Retrieval-Augmented Generation (RAG): A technique where an AI system searches a specific set of docu- ments (e.g., a client's contract library, a firm's prior work product, or a defined body of case law) before gener- ating its response, grounding the output in authentic source materials rather than relying solely on what the model learned during training. This allows you to "ask questions" of your own documents. May reduce risk of hallucinations. Temperature: A parameter that controls how creative (or variable) a model's outputs are. Higher temperature = more varied responses; Lower temperature = more consistent, predictable outputs. Token: The unit of text (roughly a word or word f rag- ment) that LLMs process. Token limits constrain how much text an LLM can consider at once, which affects its performance on long documents. Training Data: The dataset on which an AI model was trained (e.g., case law, contracts, legal briefs). The data quality and recency directly affect the model's capabil- ities and biases. LLMs trained on data through a given date may lack knowledge of subsequent legal develop- ments. Data use policies and default settings determine whether information that the user inputs, including privileged or confidential information, may be retained by a third-party vendor or used to train future model versions, potentially exposing it to other users. A I G U I D E L I N E S E R I E S | A I G U I D E F O R L E G A L P R O F E S S I O N A L S : A F O U N D A T I O N A L O V E R V I E W 7
