AI Security for Executives Part 2 - Sensitive Data Disclosure

Or: Jeff loves classifying data!

Jul 19, 2025

Executive Summary

This is part 2 of my series on AI Security for Executives. This week's topic: Sensitive Data Disclosure.

Sensitive data disclosure occurs when AI systems reveal confidential information to unauthorized users. This can result from mistakes, malicious attacks, or poor design. Like all cybersecurity topics, this requires instilling due care in your teams. Teams that behave professionally get better results. Teams that chase unrealistic deadlines face problems when it's too late to fix them. The challenge: root causes typically emerge during model design and construction, making fixes difficult once systems are running.

Executive actions needed:

Data Classification Project: Classify, prune, and actively manage your data as a specific project prior to an AI initiative.
Training Data Audit: Review all data accessible to the model development phase and restrict to the minimum required.
Vendor Due Diligence: Include contractual requirements as to how, where, and by whom your data can be processed and retained.
User Training Program: Train your users to avoid entering sensitive data, especially into shadow IT AI systems.

The risk here is disclosure of confidential data to hackers, competitors, or even the public at large.

A Scenario

This scenario illustrates sensitive data disclosure, though the specific incident is fictional.

Sarah Chen, the head of operations at Meridian Financial Services, was reviewing quarterly performance metrics when her phone rang. The caller was Jenny Martinez from their business development team.

"Sarah, I was testing our customer service AI to see how it handles competitive questions, and something weird happened. I asked it about other financial services firms in our market, and it gave me a detailed breakdown of our clients, including notes about which ones are considering switching to competitors. It even mentioned specific conversations from our CRM system."

Three months earlier, Meridian had deployed their new AI-powered customer service system. The development team, eager to make the system as helpful as possible, had fed it every piece of customer data they could find. CRM exports, support tickets, account notes, relationship histories. Everything that would help the AI understand customer context and provide personalized service.

The system worked beautifully. Customers loved getting responses that referenced their account history and preferences. Internal teams praised the AI's ability to surface relevant information quickly. Sarah had even presented the project as a success story at the quarterly board meeting.

But no one had considered what would happen when someone asked the AI general questions about the market. Jenny had simply been curious about how the AI would respond to competitive inquiries, thinking it might help with prospect conversations. Instead, the system helpfully provided a detailed analysis of Meridian's entire client base, complete with relationship notes and competitive vulnerabilities.

This brings us to what every executive needs to understand about sensitive data disclosure. The problem isn't always malicious hackers breaking into your systems. The problem is your own AI systems performing as you trained them, because you fed them too much data and didn't hide or de-identify confidential data.

About This Series

This series addresses C-suite executives making critical AI investment decisions without a clear understanding of the security implications.

I structured this series based on recommendations from the Open Web Application Security Project because their AI Security Top 10 represents the consensus view of leading security researchers on the most critical AI risks.

This series provides educational overview rather than specific security advice, since AI security is a rapidly evolving field requiring expert consultation. The goal is to give you the knowledge to ask the right questions of your teams and vendors.

Executive Action Plan

The good news is that sensitive data disclosure is entirely preventable. The challenge is that prevention requires changing how your organization thinks about data before AI training begins, not after deployment.

1. Data Classification and Cleanup Initiative

Candidly, most organizations maintain terabytes of legacy data with unclear provenance and unknown contents. Can you definitively identify and tag every piece of confidential information within your organization?

Start by assigning a dedicated IT or operations professional to conduct systematic review. Classify data as Public, Internal, Confidential, or Restricted. Delete data where feasible.

Your old data isn't some valuable investment. It's a lump of plutonium sitting in the closet.

Leverage existing infrastructure such as Microsoft Purview's eDiscovery capabilities, Google Vault, or AWS bucket APIs. One assessment I conducted identified 1.9 million files eligible for safe removal.

Success indicator: When you can answer "where is our sensitive data?" without twelve people scrambling to check different systems, you can consider yourself successful.

Data classification and hygiene is foundational. Consider the alternative scenario: explaining to your board why your AI began disseminating due diligence target lists to competitors because someone inadvertently included that directory in training datasets.

You likely have employees who would find this investigative process intellectually stimulating. Identify team members with analytical curiosity, establish clear parameters, and empower them to start.

2. Training Data Scrutiny Process

A critical insight regarding sensitive information disclosure: root causes frequently originate during LLM training phases or system architecture decisions. For instance, when implementing Model Context Protocol (MCP) or similar integration technologies for data access, limit access to only necessary information.

Key components:

Establish the minimal dataset required for project objectives.
De-identify data on intake where necessary.
The US National Archives (NARA) maintains a useful list at https://www.archives.gov/cui/registry/category-list. This covers data that, while not classified, the federal government believes should be protected. The first item on NARA's list includes locations of ammonium nitrate producers. You can see why the government wouldn't want that disclosed.

The data scrutiny process forces uncomfortable conversations. Your sales team will argue that customer relationship data makes the AI more effective. Your legal team will worry about liability. Your compliance team will cite regulations you've never heard of. These conversations will be valuable and will force meaningful design.

3. Vendor Diligence and Contracting

I assume I'll bring up vendor diligence on all 10 articles of this series. Regarding data processing specifically, you must comprehensively understand storage and processing mechanisms used by your vendors.

Focus areas:

Data processing geography and personnel clearance requirements
Subcontractor disclosure (because vendors often use other vendors)
Acceptable usage
Right to audit and data deletion when relationships end
Liability allocation for when things go wrong

The vendor conversation reveals how many AI companies treat data security as an afterthought. You'll encounter vendors who can't explain where your data is processed, who their subcontractors are, or what happens to your information when the contract ends. These vendors are telling you everything you need to know about their security posture.

4. User Training and Awareness

Training should encompass: appropriate usage scope for sanctioned AI systems and risks associated with Shadow IT. Shadow IT is use of unauthorized systems by your staff. Employees may inadvertently input confidential information into consumer AI platforms such as ChatGPT, Perplexity, or Claude. Educate users regarding your data classification schema and appropriate usage boundaries for each sensitivity tier.

Expected benefits: Personnel develop disciplined data handling practices and cultivate organizational awareness regarding information governance protocols.

Program elements:

Executive briefings on AI risks (so leadership understands why this matters)
Hands-on workshops for frequent users
Clear guidelines that people actually read and follow
Incident reporting procedures that encourage honesty

Developer Guidelines

Your development teams need specific technical controls that complement the executive actions above. These guidelines come directly from the OWASP AI Security Project and represent industry consensus on best practices. I won't detail these since this is an executive presentation, but you should know they exist.

Priority 1 - Data Sanitization:

Implement automated PII detection before training
Use tokenization for sensitive fields
Apply differential privacy techniques
Document data lineage and retention policies

Priority 2 - Access Controls:

Principle of least privilege for training data access
Role-based permissions for model outputs
Secure API design with input validation
Runtime monitoring for unusual queries

Priority 3 - Advanced Protections:

Federated learning for distributed sensitive data (unlikely for smaller organizations)
Homomorphic encryption for privacy-preserving analysis (unlikely for smaller organizations)
Output filtering and redaction capabilities
Regular security testing and red team exercises

Some methods above are standard cybersecurity practices. Others, like homomorphic encryption, are advanced techniques unlikely to be used until built into major AI vendor platforms.

Competitive Advantage

The market is bifurcating into two categories of AI companies: those that can handle sensitive data safely and those that cannot. Companies in the second category will find their market opportunities increasingly constrained as security requirements become fundamental prerequisites for enterprise sales.

Enterprise customers now routinely conduct extensive security due diligence before contract execution. Organizations capable of providing comprehensive responses to AI security assessments will capture opportunities that competitors cannot pursue.

Executive Summary

Classify and establish control over your corporate data assets
Implement rigorous oversight during development and model training phases
Conduct vendor due diligence
Provide thorough training for employees and contractors

Sarah's story at Meridian Financial Services ended better than it could have. The inadvertent disclosure was discovered by an internal employee during routine testing, not by a customer or competitor. As I mentioned, I've dealt with information disclosure incidents before. Trust me, it's much easier to address these issues at design time.

Glossary

Sensitive Information Disclosure: Unauthorized revelation of confidential data through AI system outputs, potentially exposing customer information, proprietary algorithms, or competitive intelligence

HIPAA: Healthcare data protection regulation requiring specific safeguards for patient information

PII (Personal Identifiable Information): Customer data that can identify individuals, subject to privacy regulations and breach notification requirements

Differential Privacy: Mathematical technique that adds controlled noise to data, enabling AI training while protecting individual privacy

Federated Learning: Training approach that keeps sensitive data distributed across multiple locations rather than centralizing it

OWASP: Open source security project providing industry-standard guidelines for AI and application security

Least Privilege: Security principle limiting system access to minimum requirements, reducing potential exposure surface

Homomorphic Encryption: Cryptographic technique that allows computations on encrypted data without decrypting it first, enabling AI training while keeping underlying data completely protected

Shadow IT: Unauthorized technology systems used by employees without official approval, creating security risks when sensitive data is processed outside controlled environments

NARA: National Archives and Records Administration, the US agency that maintains guidelines for protecting sensitive but unclassified government information

CUI (Controlled Unclassified Information): Government designation for sensitive information that requires protection but doesn't meet classification standards, providing a useful example for corporate data sensitivity

eDiscovery: Electronic discovery tools that automatically scan and categorize digital information for legal proceedings, regulatory compliance, and data classification projects

MCP (Model Context Protocol): Integration technology that enables AI systems to access external data sources and tools, requiring careful configuration to prevent unauthorized data exposure

Jeff Allegrezza's IT Ops

Discussion about this post