A successful prompt injection against your production AI system is a different kind of incident than a SQL injection or an authentication bypass. The attacker may not have accessed a database — they may have exfiltrated data through the model's output stream. The forensic evidence is in your prompt logs, not your database query logs. The remediation may require prompt changes as much as code changes.
The First 30 Minutes
Determine blast radius
The immediate question: what did the attacker do with access to the model? Pull the session logs for the affected time window. Look for outputs that contain unusual content: base64 strings, JSON payloads not in the expected schema, role acknowledgements, or responses that reference system prompt contents.
Contain via circuit breaker
If the incident is active and the attack vector is still open, trip the circuit breaker to halt AI API processing. This stops the bleeding immediately without requiring a code deploy. The circuit breaker should be accessible to your on-call engineer in under 2 minutes.
The First 2 Hours
Reconstruct the injection
Identify the exact input that triggered the incident. This requires your prompt logs — which is why full prompt logging is non-negotiable. Reproduce the attack in a sandbox environment (not production) to understand exactly what was possible.
Assess data exposure
What was in the model's context window during the affected session? If your RAG system retrieved customer records into context, those records were potentially accessible to the attacker via prompt injection. Cross-reference the retrieval logs with the attack timeline.
Remediation
- ▸Patch the injection vector: add detection patterns for the specific attack technique used
- ▸Harden the system prompt: add explicit constraints against the attack pattern
- ▸Scope reduction: remove retrieval capabilities or tool access not required for the use case
- ▸Add output validation: detect the category of output the attack produced and block it automatically
If the attack resulted in PII exposure, you have breach notification obligations. Start the clock from the moment you confirmed the breach — GDPR's 72-hour window does not pause while you investigate.
