AI Incident Response: What to Do When Your LLM Gets Exploited

A successful prompt injection against your production AI system is a different kind of incident than a SQL injection or an authentication bypass. The attacker may not have accessed a database — they may have exfiltrated data through the model's output stream. The forensic evidence is in your prompt logs, not your database query logs. The remediation may require prompt changes as much as code changes.

The First 30 Minutes

Determine blast radius

The immediate question: what did the attacker do with access to the model? Pull the session logs for the affected time window. Look for outputs that contain unusual content: base64 strings, JSON payloads not in the expected schema, role acknowledgements, or responses that reference system prompt contents.

Contain via circuit breaker

If the incident is active and the attack vector is still open, trip the circuit breaker to halt AI API processing. This stops the bleeding immediately without requiring a code deploy. The circuit breaker should be accessible to your on-call engineer in under 2 minutes.

The First 2 Hours

Reconstruct the injection

Identify the exact input that triggered the incident. This requires your prompt logs — which is why full prompt logging is non-negotiable. Reproduce the attack in a sandbox environment (not production) to understand exactly what was possible.

Assess data exposure

What was in the model's context window during the affected session? If your RAG system retrieved customer records into context, those records were potentially accessible to the attacker via prompt injection. Cross-reference the retrieval logs with the attack timeline.

Remediation

▸Patch the injection vector: add detection patterns for the specific attack technique used
▸Harden the system prompt: add explicit constraints against the attack pattern
▸Scope reduction: remove retrieval capabilities or tool access not required for the use case
▸Add output validation: detect the category of output the attack produced and block it automatically

If the attack resulted in PII exposure, you have breach notification obligations. Start the clock from the moment you confirmed the breach — GDPR's 72-hour window does not pause while you investigate.

ShareX / Twitter LinkedIn

AI Incident Response: What to Do When Your LLM Gets Exploited

The First 30 Minutes

Determine blast radius

Contain via circuit breaker

The First 2 Hours

Reconstruct the injection

Assess data exposure

Remediation

Related Articles

G8KEPR Red Team Run 4: What We Found and What We Fixed

MCP Security in 2026: How to Sandbox AI Tool Calls

What Is Model Context Protocol (MCP) and Why Does It Need Security?

Ready to secure your AI stack?