Skip to main content
PII Redaction in AI Applications: Field-Level vs Request-Level Approaches — G8KEPR Blog
Back to Blog
Security8 min readFebruary 5, 2026

PII Redaction in AI Applications: Field-Level vs Request-Level Approaches

Most teams think about PII redaction as "strip names and emails before sending to the LLM." The real problem is that PII travels in context — in conversation history, in retrieved documents, in tool call responses. Here is how to do it right.

PII redaction in AI applications is harder than it looks. The naive approach — run a regex over the request and replace known PII patterns before forwarding to the LLM — catches obvious cases (emails, phone numbers, SSNs) and misses the important ones (context-dependent PII, indirect identifiers, PII that arrives via tool call responses).

The Problem With Request-Level Redaction

If you redact PII from the user's initial message but your RAG pipeline retrieves documents containing PII and injects them into the context, the redaction achieved nothing. If your MCP tools return customer records containing PII and you add them to the conversation history, the LLM now has access to PII regardless of what you did to the initial prompt.

Field-Level Redaction

The more robust approach is field-level redaction applied at the data boundary — wherever data enters the AI pipeline, regardless of where it came from. This means: redacting PII from RAG retrieval results before injecting them into context, redacting PII from tool call responses before returning them to the model, and redacting PII from user messages.

Tokenisation vs Masking

Masking replaces PII with a placeholder (e.g., [EMAIL_REDACTED]). Tokenisation replaces PII with a consistent token (e.g., [EMAIL_7f3a]) that maps back to the original value in a lookup table. Tokenisation lets the model reason about the entity ("email addresses were provided by 47 different users") without exposing the actual values. It also allows re-identification after the fact for legitimate use cases.

What the Model Needs to Know

Counter-intuitively, the model usually does not need the actual PII to do its job. A customer service model handling a complaint about an order does not need to know the customer's actual name — it needs to know "the customer" and have a reference it can use. A medical triage model does not need the patient's SSN — it needs the clinical data.

Audit your AI application's prompt templates against a data minimisation lens: for each piece of data you are injecting into context, ask whether the model needs this to complete the task. Most applications include significantly more PII in context than is necessary.

ShareX / TwitterLinkedIn

Ready to secure your AI stack?

14-day free trial — full platform access, no credit card required. Early access members get pricing locked in forever.