Peer to Peer: ILTA's Quarterly Magazine
Issue link: https://epubs.iltanet.org/i/1530716
89 I L T A N E T . O R G In its core definition, digital forensics deals with recovering and investigating data residing in digital devices and cloud-native storage, generally in the context of cybercrime. Similarly, in the context of the EDRM model, ediscovery manages data as evidence from initial collection to presentation for use in both civil and criminal legal cases. Yet, for both disciplines, the point of origination is through data collection, which is this crucial process of gathering digital evidence or information that is accurate and legally allowable. Like digital forensics and e-discovery, AI's effectiveness hinges on forensically sound data collection. Why? Forensically sound data is information collected, preserved, and handled in a way that maintains its integrity and authenticity so it can be reliably used as evidence or for analysis. For AI, gathering and validating data isn't just a 'best practice,' but it's essential for building trustworthy AI systems that can identify patterns, form patterns, and produce insights. Illumination of the need for forensically sound data as the foundation of AI occurs when considering the fact that when LLMs memorize errors and biases and create incomplete analyses, there is an audit trail to see where such misgivings originated. The critical role of forensically sound data collection and its parallels with established practices in digital forensics and ediscovery must be examined to seize the opportunity of AI technologies. THE DATA COLLECTION CHALLENGE According to research by AI Multiple Research, training data collection has been identified as one of the main barriers to AI adoption. Their analysis highlights six significant data collection challenges: availability issues, bias problems, quality concerns, protection and legal requirements, cost constraints, and data drift prevention. Three challenges—quality concerns, protection and legal requirements, and bias problems—can be effectively addressed through forensic data collection methods because forensically sound data is collected to ensure integrity using legally prescribed standards. When talking about clean data in the context of AI, we generally mean valid, consistent, and uncorrupted data. The large volume, complexity, and rapid data evolution of data within an organization make the task difficult. However, these challenges present an opportunity to leverage established forensic methodologies to ensure data quality. THE ROLE OF FORENSICALLY SOUND DATA COLLECTION Artificial intelligence begins with data collection. Every technology starts with data collection. Data collection is not just the first step in the decision-making process; it is the driver of machine learning. The integrity and reliability of AI systems hinge on acquiring meaningful information to build a consistent and complete FEATURES