Skip to main content
Observability for AI APIs: Why Standard Tracing Is Not Enough — G8KEPR Blog
Back to Blog
Architecture7 min readFebruary 15, 2026

Observability for AI APIs: Why Standard Tracing Is Not Enough

OpenTelemetry traces your API calls but not your model calls. Standard span attributes do not capture token counts, model versions, prompt hashes, or inference latency. Here is how to extend distributed tracing for AI workloads so you can debug what actually happened.

OpenTelemetry is the standard for distributed tracing in modern API architectures. It captures spans across service boundaries, measures latency, and allows you to reconstruct the call chain for any request. But standard OpenTelemetry instrumentation treats an LLM call as an opaque HTTP request — it captures the latency and status code, and nothing else that matters.

What Standard Traces Miss for AI Workloads

  • Token counts (input and output) — essential for cost attribution and budget monitoring
  • Model version — the same endpoint may route to different model versions; which version handled this request?
  • System prompt hash — did the system prompt change between this call and yesterday's call?
  • Prompt classification — was this request flagged for any security patterns?
  • Confidence scores and output schema validation results
  • Cache hit/miss for semantic caching

Adding AI-Specific Attributes

python
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

async def call_llm(prompt: str, system: str) -> dict:
    with tracer.start_as_current_span("llm.inference") as span:
        span.set_attribute("llm.model", "claude-3-5-sonnet-20241022")
        span.set_attribute("llm.system_prompt_hash", hashlib.sha256(system.encode()).hexdigest()[:16])
        span.set_attribute("llm.input_tokens_estimate", len(prompt.split()) * 1.3)

        response = await anthropic_client.messages.create(...)

        span.set_attribute("llm.input_tokens", response.usage.input_tokens)
        span.set_attribute("llm.output_tokens", response.usage.output_tokens)
        span.set_attribute("llm.cost_usd", calculate_cost(response.usage))
        span.set_attribute("llm.stop_reason", response.stop_reason)

        return response

The Security Observability Layer

Security events need their own trace attributes: whether the request was scanned for injection patterns, which patterns matched (if any), whether the output passed schema validation, and whether any circuit breakers or rate limits were triggered. Standard application traces do not include this information — it must be added at the gateway layer.

G8KEPR attaches AI-specific span attributes to every proxied LLM call: model version, token counts, prompt hash, security scan results, and cost. These are exported via OTLP to your existing observability stack — Grafana, Datadog, Honeycomb, or any OTLP-compatible backend.

ShareX / TwitterLinkedIn

Ready to secure your AI stack?

14-day free trial — full platform access, no credit card required. Early access members get pricing locked in forever.