Zero-width characters are Unicode characters that have no visible glyph but occupy a position in a string. The zero-width space (U+200B), zero-width non-joiner (U+200C), zero-width joiner (U+200D), and similar characters are rendered as nothing in most interfaces but are present in the byte stream — and in the token stream that a language model processes.
An attacker can embed zero-width characters between the letters of a known injection phrase. "Ignore" becomes "Ignore" — invisible to the human reviewer, invisible in the UI, but the LLM reads the characters and processes the full string. Pattern-matching detection that looks for "Ignore" as a contiguous string misses the attack entirely.
Attack Variants
- ▸Character interleaving — zero-width characters inserted between every character of the injection phrase
- ▸Bulk prefix injection — sequences of 50+ zero-width characters prepended to a string, disrupting tokenisation in some models
- ▸Right-to-left override (U+202E) — changes the visual rendering direction, hiding injection text in plain sight in some UI contexts
- ▸HTML comment embedding — instructions hidden in <!-- --> comments in content the model processes
Detection
The most robust detection approach is pre-processing: strip or flag all zero-width characters from input before pattern matching. This normalises the input for detection and also for the model — many zero-width characters have no legitimate use in API request payloads.
For content where zero-width characters might be legitimate (rich text, multilingual content), maintain a separate detection pass that reconstructs the string without zero-width characters and applies pattern matching to the reconstructed version.
G8KEPR's anti-spotlighting detection module strips zero-width characters from all text inputs and applies pattern matching to both the original and stripped versions. Inputs containing 2+ consecutive zero-width characters trigger an automatic high-severity alert.
