Peer to Peer: ILTA's Quarterly Magazine
Issue link: https://epubs.iltanet.org/i/1530716
70 P E E R T O P E E R : I L T A ' S Q U A R T E R L Y M A G A Z I N E | W I N T E R 2 0 2 4 However, data stewards should always have the final say regarding data governance rules. • Data Privacy, Security, and Compliance: Internal datasets used to train Gen AI models often contain sensitive personal information (PII) or intellectual property (IP), raising significant privacy concerns around data collection, storage, and usage. Concerns around Gen AI-powered technology 's impact on PII and IP are present within industries ranging from finance and healthcare to specific functions like marketing and legal – all of which benefit from AI's ability to process and interpret data at scale. Additional concerns include legal exposure from using copyrighted material for model training or grounding or if potentially sensitive organizational data escapes the walled garden and is used to train foundational AI models. Data governance helps by establishing clear rules and enforcing security measures related to who (or what) can access specific data and when. It also identifies and tags potentially sensitive data and mandates that organizations have the appropriate policies for encryption, storage, retention, and deletion, data access and handling procedures, and other controls (e.g., data anonymization, pseudonymization) in place to mitigate risk and meet legal and global regulatory requirements (e.g., GDPR, HIPAA, and CCPA). For example, in healthcare, the same data governance policies and procedures that protect patient privacy would also prevent Gen AI models from accessing or compromising their data. In the case of copyrighted material, data governance would ensure they have explicit permission to use the content for Gen AI model training, grounding, or other outputs before the system ingests the data. These data access and handling policies and other legal and IT security practices will help organizations remain nimble in a rapidly changing regulatory environment and prevent unintentional data leakage. They can also shield organizations against AI copyright infringement issues by reviewing all external data – whether licensed or publicly available – for any conditions, restrictions, or possible violations that could stem from its use.