AI Security for Executives Part 10: Unbounded Consumption
Don't let the cost sneak up on you.
Executive Summary
Unbounded Consumption is where users have too much ability to consume LLM resources. The big risk: cost. Users could be malicious, but it's just as likely that your real users can accidentally trigger this consumption. It's a magnification of existing problems. If you've ever wondered why your AWS bill is $100k per month when it was supposed to be so cost effective, this risk should resonate with you.
Executive actions needed:
Model and Control Cost for AI tools: All enterprise vendors allow you to report on this and set budgets.
Monitor your LLM input and output for anomaly detection: Consider platforms like LangSmith or Datadog.
Layer in preventative controls: Rate limits, strict input validation, nonsense input detection.
A Scenario
This is a fictional, but realistic scenario to illustrate the point
Michelle reviewed the monthly cost report.
$5,300. The AI chatbot they'd built was supposed to cost $4,000 per month. It had been working beautifully. Customers were happier, support tickets were down, everyone went home earlier. But costs didn't appear to be slowing down.
The AI was writing verbose statements in response to simple questions. "What are your business hours?" received answers with all kinds of extra information. Someone had asked 847 complex questions over a single weekend, each one more elaborate than the last. The machine had dutifully answered them all.
There was no malice here. Just a computer that had learned to be helpful in the most expensive way possible.
By Thursday, they'd found the problem in the logs and fixed it. The bills returned to normal. But for those few days, Michelle had wondered how to explain this to the executive team as costs continued to spiral. Feeling more lucky than smart, Michelle's problem was solved...for now.
About This Series
This series addresses C-suite executives making critical AI investment decisions given the emerging security implications.
I structured this series based on recommendations from the Open Web Application Security Project because their AI Security Top 10 represents the consensus view of leading security researchers on the most critical AI risks.
This series provides educational overview rather than specific security advice, since AI security is a rapidly evolving field requiring expert consultation. The goal is to give you the knowledge to ask the right questions of your teams and vendors.
Executive Action Plan
The name of the game here: Keep control of your costs.
1. Model and Control Cost for AI tools.
Establish monthly budget caps with automated alerting at 75% and 90% thresholds across all AI platforms. The big vendors like OpenAI, Google, and Anthropic allow you to set limits and report on usage.
I assume your organization already has tiered access controls for other cloud services. Apply the same principle to AI: different user groups should have distinct spending limits based on business criticality. Your customer service team needs different consumption limits than your R&D group, and your executive assistants need different access than your data science team.
2. Monitor your LLM input and output for anomaly detection.
Log user input and LLM output, processing time, and user patterns for all AI interactions.
Configure anomaly detection algorithms identifying unusual consumption patterns: sudden volume spikes, abnormally long queries, repetitive requests from single sources.
A suggestion: LangSmith, Datadog, or other platforms can speed this up for you. It probably isn't necessary to build from scratch yourself.
3. Layer in preventative controls
Set an overall rate limit that makes sense for your business. If you're running a customer service chatbot, maybe 10 queries per minute per user is reasonable. If it's an internal research tool, the limits can be higher.
Configure automated IP bans for excessive requests - hundreds of queries from a single source in a short period is usually either a bot or someone testing your system's limits. Neither should consume your resources unchecked.
Implement strict input validation that rejects queries over reasonable length limits. A 10,000-character customer service question is either an accident or an attack. Filter out obviously malicious prompts…easier said than done but on the other hand, an interesting task for your engineers.
Executive Summary
Unbounded consumption occurs when AI systems lack proper resource controls, allowing users to generate excessive computational costs through legitimate use, poor system design, or malicious exploitation.
Executive priorities:
Implement spending caps, monitoring, and automated alerting across all AI platforms
Use comprehensive monitoring systems to identify unusual consumption patterns and potential abuse
Establish rate limiting, input validation, and other preventative controls
Appendix 1: Developer Guidelines: Technical Implementation
This section doesn't have enough detail to be actionable, but I include it so that executives can be familiar with the standard guidance for these issues.
Input Validation: Implement strict input validation ensuring inputs don't exceed reasonable size limits and filter malicious content patterns
Rate Limiting: Apply rate limiting and user quotas restricting requests per time period from single sources with tiered access controls
Resource Allocation Management: Monitor and manage resource allocation dynamically preventing single users from consuming excessive computational resources
Timeouts and Throttling: Set processing timeouts and throttle resource-intensive operations to prevent prolonged resource consumption
Comprehensive Logging and Monitoring: Continuously monitor resource usage implementing anomaly detection for unusual consumption patterns
Graceful Degradation: Design systems to degrade gracefully under heavy load maintaining partial functionality rather than complete failure
Access Controls: Implement role-based access control (RBAC) and principle of least privilege limiting unauthorized access to AI systems
Sandbox Techniques: Restrict AI application access to network resources, internal services, and APIs to prevent side-channel attacks
Automated Scaling: Implement dynamic scaling and load balancing with queue limits to handle varying demands while maintaining performance
Watermarking: Deploy watermarking frameworks to detect unauthorized use of AI outputs and potential model extraction attempts
Appendix 2: Glossary
Unbounded Consumption: Uncontrolled use of AI computational resources leading to excessive costs, service degradation, or security vulnerabilities through legitimate misuse or malicious exploitation
Rate Limiting: Technical controls restricting the number of API requests or resource consumption allowed per user or time period to prevent system overload
Anomaly Detection: Automated monitoring systems that identify unusual patterns in AI usage, costs, or performance indicating potential abuse or system issues
Token Consumption: Measurement unit for AI processing costs based on input and output text length, directly correlating to computational expense and resource usage
Model Extraction: Attack method using excessive API queries to reverse-engineer proprietary AI models, potentially stealing intellectual property and competitive advantages
Denial of Wallet (DoW): Attack strategy deliberately generating high-cost AI operations to inflict financial damage on target organizations through resource exhaustion
Circuit Breaker: Automated safety mechanism that temporarily disables AI services when consumption exceeds safe operational parameters, preventing cascading failures
Graceful Degradation: System design approach maintaining partial functionality during resource constraints rather than complete service failure, preserving business continuity
Input Validation: Security controls filtering and restricting user inputs to AI systems, preventing malicious content and resource-intensive queries from consuming excessive resources
Resource Allocation Management: Dynamic monitoring and control systems ensuring fair distribution of computational resources across users and applications while preventing individual abuse
