Skip to main content
AI Incident Response: What to Do When Your LLM Gets Exploited — G8KEPR Blog
Back to Blog
Security8 min readJanuary 5, 2026

AI Incident Response: What to Do When Your LLM Gets Exploited

AI incidents are different from traditional security incidents. The blast radius is semantic, the forensics require prompt logs, and the remediation involves prompt engineering as much as code fixes. Here is a runbook for the first 4 hours of an AI security incident.

A successful prompt injection against your production AI system is a different kind of incident than a SQL injection or an authentication bypass. The attacker may not have accessed a database — they may have exfiltrated data through the model's output stream. The forensic evidence is in your prompt logs, not your database query logs. The remediation may require prompt changes as much as code changes.

The First 30 Minutes

Determine blast radius

The immediate question: what did the attacker do with access to the model? Pull the session logs for the affected time window. Look for outputs that contain unusual content: base64 strings, JSON payloads not in the expected schema, role acknowledgements, or responses that reference system prompt contents.

Contain via circuit breaker

If the incident is active and the attack vector is still open, trip the circuit breaker to halt AI API processing. This stops the bleeding immediately without requiring a code deploy. The circuit breaker should be accessible to your on-call engineer in under 2 minutes.

The First 2 Hours

Reconstruct the injection

Identify the exact input that triggered the incident. This requires your prompt logs — which is why full prompt logging is non-negotiable. Reproduce the attack in a sandbox environment (not production) to understand exactly what was possible.

Assess data exposure

What was in the model's context window during the affected session? If your RAG system retrieved customer records into context, those records were potentially accessible to the attacker via prompt injection. Cross-reference the retrieval logs with the attack timeline.

Remediation

  • Patch the injection vector: add detection patterns for the specific attack technique used
  • Harden the system prompt: add explicit constraints against the attack pattern
  • Scope reduction: remove retrieval capabilities or tool access not required for the use case
  • Add output validation: detect the category of output the attack produced and block it automatically

If the attack resulted in PII exposure, you have breach notification obligations. Start the clock from the moment you confirmed the breach — GDPR's 72-hour window does not pause while you investigate.

ShareX / TwitterLinkedIn

Ready to secure your AI stack?

14-day free trial — full platform access, no credit card required. Early access members get pricing locked in forever.