AI Cost Anomaly Detection: Catching Runaway Inference Before Your Bill Does

In September 2023, a startup received a $72,000 AWS bill after a bug caused their AI pipeline to run in an infinite loop for 18 hours. Their billing alert was set at $1,000 per day — it triggered once, at the end of day one, by which point the loop had been running for 24 hours. By the time a human saw the alert, the damage was done.

This is not an edge case. AI API costs spike fast. A prompt injection attack that succeeds in triggering a model to generate maximum-length responses can exhaust a $500/month budget in under an hour. A runaway agent that retries on every error can generate thousands of API calls before anyone notices. The problem is that traditional cost monitoring operates on billing cycles, not inference cycles.

Real-Time Cost Tracking

Track cumulative cost at inference time, not at billing time. Every API call has a known token count and a known price per token. Sum these at call time, per session, per API key, per organization. When the running sum exceeds a threshold, take action immediately.

python

COST_PER_1K_TOKENS = {
    "claude-3-5-sonnet-20241022": {"input": 0.003, "output": 0.015},
}

async def track_and_gate(usage, model: str, key: str, limits: dict):
    cost = (
        usage.input_tokens / 1000 * COST_PER_1K_TOKENS[model]["input"]
        + usage.output_tokens / 1000 * COST_PER_1K_TOKENS[model]["output"]
    )

    running_total = await redis.incrbyfloat(f"cost:{key}:hour", cost)

    if running_total > limits["hourly_usd"]:
        await redis.set(f"circuit:{key}", "open", ex=3600)
        raise CostLimitExceeded(f"Hourly limit ${limits['hourly_usd']} reached")

Tiered Alerting

▸80% of hourly budget: alert to Slack — "Heads up, cost rate is elevated"
▸100% of hourly budget: trip circuit breaker, alert to PagerDuty
▸200% of daily budget: escalate to on-call, require manual acknowledgement to resume

G8KEPR's AI Gateway tracks token costs in real time and supports configurable budget limits at the API key, organization, and platform level. Circuit breakers can be configured to trip automatically at any threshold with customisable fallback behaviour.

ShareX / Twitter LinkedIn

AI Cost Anomaly Detection: Catching Runaway Inference Before Your Bill Does

Real-Time Cost Tracking

Tiered Alerting

Related Articles

Row-Level Security in PostgreSQL: The Last Line of Defense for Multi-Tenant SaaS

Audit Log Integrity: Why Hash-Chaining Beats Encryption

API Security vs AI Gateway: Why You Need Both

Ready to secure your AI stack?