publication of the International Legal Technology Association
Issue link: https://epubs.iltanet.org/i/1310179
I L T A W H I T E P A P E R | L I T I G A T I O N A N D P R A C T I C E S U P P O R T 39 differently. Issues with the level of detail in the data, ambiguous field names, or just plain wrong information can severely impact your analysis. Starting with validity, you are assessing how well the data accurately represents what it should. Validity is a combination of ensuring that the data is in the right format, such as the right data types and descriptive values, and represents its true value. For example, if you expect a date field to contain a date and instead identify values like "43982," clearly, the data is not valid but you may be able to cleanse and standardize the data later. Similarly, the date field should reflect the appropriate date value. If there are records where the "end date" precede the "start date," the meaning and validity of those fields are clearly flawed. Next, you can measure the reliability of the data. Reliability in statistics refers to the consistency of a measure, but with data, this is evaluated in a slightly different manner. Evaluating how much you can rely on your data requires the following considerations: • Veracity: Is the data credible with high accuracy and from a trustworthy source? • Consistency: Do the data contents and their meanings remain consistent over time? Measuring Relationships The third step is measuring the relationships in the data. Data relationships are typically measured to show correlation. In statistics, correlation is measured to determine how variability in one variable affects one or more other variables. This concept can be applied to litigation support. Analyzing multiple variables together is useful for data exploration and identifying both trends and outliers within data. For example, does the day of the week have any relationship to how often and to whom a mobile device custodian sends SMS messages? Designing a measurement of the day of the week, custodian, and SMS activity will undoubtedly help you better understand the relationship between these variables. In addition, it can provide insights into the data that may not even involve this set of variables. Minimizing Risk: Would you bet the company on your analysis? All analysis carries risk. Whenever you perform an analysis, you are likely to rely on the findings, whether the impact is small or large. Your analysis may be used to determine whether a data set is complete or for building a case strategy. Regardless of whether the analysis is based on low- quality data or faulty assumptions, your application of that analysis can be flawed. In other words, you will unknowingly be working with bad information if you don't understand your data. EDA helps minimize that risk. First, you will be profiling the data to understand what is in the data: "You will unknowingly be working with bad information if you don't understand your data."