In September 2023, a startup received a $72,000 AWS bill after a bug caused their AI pipeline to run in an infinite loop for 18 hours. Their billing alert was set at $1,000 per day — it triggered once, at the end of day one, by which point the loop had been running for 24 hours. By the time a human saw the alert, the damage was done.
This is not an edge case. AI API costs spike fast. A prompt injection attack that succeeds in triggering a model to generate maximum-length responses can exhaust a $500/month budget in under an hour. A runaway agent that retries on every error can generate thousands of API calls before anyone notices. The problem is that traditional cost monitoring operates on billing cycles, not inference cycles.
Real-Time Cost Tracking
Track cumulative cost at inference time, not at billing time. Every API call has a known token count and a known price per token. Sum these at call time, per session, per API key, per organization. When the running sum exceeds a threshold, take action immediately.
COST_PER_1K_TOKENS = {
"claude-3-5-sonnet-20241022": {"input": 0.003, "output": 0.015},
}
async def track_and_gate(usage, model: str, key: str, limits: dict):
cost = (
usage.input_tokens / 1000 * COST_PER_1K_TOKENS[model]["input"]
+ usage.output_tokens / 1000 * COST_PER_1K_TOKENS[model]["output"]
)
running_total = await redis.incrbyfloat(f"cost:{key}:hour", cost)
if running_total > limits["hourly_usd"]:
await redis.set(f"circuit:{key}", "open", ex=3600)
raise CostLimitExceeded(f"Hourly limit ${limits['hourly_usd']} reached")Tiered Alerting
- ▸80% of hourly budget: alert to Slack — "Heads up, cost rate is elevated"
- ▸100% of hourly budget: trip circuit breaker, alert to PagerDuty
- ▸200% of daily budget: escalate to on-call, require manual acknowledgement to resume
G8KEPR's AI Gateway tracks token costs in real time and supports configurable budget limits at the API key, organization, and platform level. Circuit breakers can be configured to trip automatically at any threshold with customisable fallback behaviour.
