Observability for AI APIs: Why Standard Tracing Is Not Enough

OpenTelemetry is the standard for distributed tracing in modern API architectures. It captures spans across service boundaries, measures latency, and allows you to reconstruct the call chain for any request. But standard OpenTelemetry instrumentation treats an LLM call as an opaque HTTP request — it captures the latency and status code, and nothing else that matters.

What Standard Traces Miss for AI Workloads

▸Token counts (input and output) — essential for cost attribution and budget monitoring
▸Model version — the same endpoint may route to different model versions; which version handled this request?
▸System prompt hash — did the system prompt change between this call and yesterday's call?
▸Prompt classification — was this request flagged for any security patterns?
▸Confidence scores and output schema validation results
▸Cache hit/miss for semantic caching

Adding AI-Specific Attributes

python

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

async def call_llm(prompt: str, system: str) -> dict:
    with tracer.start_as_current_span("llm.inference") as span:
        span.set_attribute("llm.model", "claude-3-5-sonnet-20241022")
        span.set_attribute("llm.system_prompt_hash", hashlib.sha256(system.encode()).hexdigest()[:16])
        span.set_attribute("llm.input_tokens_estimate", len(prompt.split()) * 1.3)

        response = await anthropic_client.messages.create(...)

        span.set_attribute("llm.input_tokens", response.usage.input_tokens)
        span.set_attribute("llm.output_tokens", response.usage.output_tokens)
        span.set_attribute("llm.cost_usd", calculate_cost(response.usage))
        span.set_attribute("llm.stop_reason", response.stop_reason)

        return response

The Security Observability Layer

Security events need their own trace attributes: whether the request was scanned for injection patterns, which patterns matched (if any), whether the output passed schema validation, and whether any circuit breakers or rate limits were triggered. Standard application traces do not include this information — it must be added at the gateway layer.

G8KEPR attaches AI-specific span attributes to every proxied LLM call: model version, token counts, prompt hash, security scan results, and cost. These are exported via OTLP to your existing observability stack — Grafana, Datadog, Honeycomb, or any OTLP-compatible backend.

ShareX / Twitter LinkedIn

Observability for AI APIs: Why Standard Tracing Is Not Enough

What Standard Traces Miss for AI Workloads

Adding AI-Specific Attributes

The Security Observability Layer

Related Articles

Row-Level Security in PostgreSQL: The Last Line of Defense for Multi-Tenant SaaS

Audit Log Integrity: Why Hash-Chaining Beats Encryption

API Security vs AI Gateway: Why You Need Both

Ready to secure your AI stack?