Digital White Papers

LPS20

publication of the International Legal Technology Association

Issue link: https://epubs.iltanet.org/i/1310179

Contents of this Issue

Navigation

Page 39 of 51

I L T A W H I T E P A P E R | L I T I G A T I O N A N D P R A C T I C E S U P P O R T 40 what it represents, how it is structured, and whether it is complete. This crucial step allows you to assess what you have in the data and identify any gaps or other issues you must consider. As an analogy, data profiling is similar to a construction survey. Like a construction survey's physical mapping out the topography, structural weaknesses, and other measurements, data profiling establishes the baseline data and informs the subsequent analysis design and execution steps. Risk can be minimized by taking one or more the following steps that are based on completeness, validity, reliability, and relationship measurements: • Describe: Fully document any known data issues as part of your analysis findings • Supplement: Incorporate supplemental or replacement data to account for data deficiencies • Exclude: Do not include or rely upon the deficient parts of the data Generating Ideas and Strategies While EDA is best known for minimizing risk, it is also a powerful approach for developing ideas and strategies for analysis. Answering open-ended data questions via analysis is partly an art, which introduces a seemingly less rigorous element to analysis. Applying EDA to the early stages of analysis compensates for this by allowing you to address the anomalous and interesting parts of your data. If you are stuck in your analysis, these are great places to start. At a minimum, EDA will help you better understand your data, the limitations of your data, and what qualifications you need to place on your analysis findings. Suppose that you have performed a profiling of email data as a time series by month as part of exploring your data. Your analysis shows an unexpected lower volume of records for the same three months every year. You may have assumed that the volume should be consistent over time, so you need to validate that assumption to understand the discrepancy. From here, you can further profile the data (e.g., show the time series by day to identify potential data gaps) or you can start looking for patterns that help determine whether this lowered volume is explainable, such as by seasonality or other factors. Better yet, you can explore both. The key to exploring data is to continuously ask critical questions about the data and methodically pursue answers through analysis. This hypothesis-based approach increases the rigor of your analysis. However, be careful not to get too fixated on specific questions at the expense of answering more fundamental questions about your data. The following chart illustrates a general approach for starting analysis based on EDA. "The key to exploring data is to continuously ask critical questions about the data and methodically pursue answers through analysis."

Articles in this issue

Archives of this issue

view archives of Digital White Papers - LPS20