PII Redaction in AI Applications: Field-Level vs Request-Level Approaches

PII redaction in AI applications is harder than it looks. The naive approach — run a regex over the request and replace known PII patterns before forwarding to the LLM — catches obvious cases (emails, phone numbers, SSNs) and misses the important ones (context-dependent PII, indirect identifiers, PII that arrives via tool call responses).

The Problem With Request-Level Redaction

If you redact PII from the user's initial message but your RAG pipeline retrieves documents containing PII and injects them into the context, the redaction achieved nothing. If your MCP tools return customer records containing PII and you add them to the conversation history, the LLM now has access to PII regardless of what you did to the initial prompt.

Field-Level Redaction

The more robust approach is field-level redaction applied at the data boundary — wherever data enters the AI pipeline, regardless of where it came from. This means: redacting PII from RAG retrieval results before injecting them into context, redacting PII from tool call responses before returning them to the model, and redacting PII from user messages.

Tokenisation vs Masking

Masking replaces PII with a placeholder (e.g., [EMAIL_REDACTED]). Tokenisation replaces PII with a consistent token (e.g., [EMAIL_7f3a]) that maps back to the original value in a lookup table. Tokenisation lets the model reason about the entity ("email addresses were provided by 47 different users") without exposing the actual values. It also allows re-identification after the fact for legitimate use cases.

What the Model Needs to Know

Counter-intuitively, the model usually does not need the actual PII to do its job. A customer service model handling a complaint about an order does not need to know the customer's actual name — it needs to know "the customer" and have a reference it can use. A medical triage model does not need the patient's SSN — it needs the clinical data.

Audit your AI application's prompt templates against a data minimisation lens: for each piece of data you are injecting into context, ask whether the model needs this to complete the task. Most applications include significantly more PII in context than is necessary.

ShareX / Twitter LinkedIn

PII Redaction in AI Applications: Field-Level vs Request-Level Approaches

The Problem With Request-Level Redaction

Field-Level Redaction

Tokenisation vs Masking

What the Model Needs to Know

Related Articles

G8KEPR Red Team Run 4: What We Found and What We Fixed

MCP Security in 2026: How to Sandbox AI Tool Calls

What Is Model Context Protocol (MCP) and Why Does It Need Security?

Ready to secure your AI stack?