Skip to main content
CI Gate: p95 < 200ms @ 50 VU

Performance
Benchmarks

We publish our performance targets, test methodology, and CI gate thresholds. Every pull request must pass p95 < 200ms at 50 VU before it can merge. Here is exactly what G8KEPR adds to your request path — and why.

<200ms
p95 CI gate
100+
req/s @ 50 VU
<1%
Error rate target
<10ms
Cache hit overhead

Context matters. The numbers on this page are targets enforced by the CI gate at 50 VU on the demo stack (1GB DigitalOcean droplet). Self-hosted deployments on production-grade hardware will perform better. Stress tests at 200–500 VU show higher latency — the demo server is not sized for that load. We link to all test scripts so you can run them yourself.

Enforced on Every Pull Request

CI Performance Gate

These thresholds are defined in tests/load/k6-ci.js and block merges if any threshold fails.

Smoke (1 VU · 30s)

Sanity check — catches regressions before load testing

p95
< 200ms
p99
< 500ms
Throughput
> 100 req/s
Error rate
< 1%
Load (50 VU · 2 min)

Normal traffic simulation — must pass to merge to main

p95
< 200ms
p99
< 500ms
Throughput
> 100 req/s
Error rate
< 1%
Stress (200 VU · 1 min)

Burst capacity — validates the system under spike load

p95
< 200ms
p99
< 500ms
Throughput
> 100 req/s
Error rate
< 1%

Per-Endpoint Performance Targets

Published targets from tests/load/benchmark_results.json

Endpoint
Category
p50 target
p95 target
GET /health

Readiness probe

Infrastructure
50ms
100ms
POST /api/keys/validate

API key authentication

Auth
50ms
200ms
GET /api/gateway/proxy

Gateway passthrough

Gateway
80ms
200ms
GET /api/compliance/frameworks

Compliance data fetch

Compliance
150ms
500ms
GET /api/audit-logs

Audit log retrieval

Audit
200ms
500ms
POST /api/threat-intelligence/analyze

Full AI threat scan

AI Analysis
300ms
1000ms

Threat detection p95 is higher because it runs optional ML analysis. It is async and does not block the gateway response.

What G8KEPR Does With Those Milliseconds

The Request Pipeline

Every request goes through only the steps it needs. Pass-through validation adds ~15–30ms. Active threat scanning adds 50–200ms but runs async.

1
JWT + API key validation1–3ms

In-memory JWT signature check + Redis GET for key scope

2
Rate limit check2–5ms

Redis INCR with sliding window — single round-trip

3
Circuit breaker evaluation< 1ms

In-memory state check — no I/O

4
Threat detection (cached)5–10ms

Redis semantic cache hit — no model call needed

4b
Threat detection (uncached)50–200ms

Pattern matching + optional ML scoring. Async — does not block response.

5
Audit log write2–8ms

Async PostgreSQL insert — does not add to request latency

Pass-through total: ~15–30ms added overhead for JWT validation + rate limit + circuit breaker. No more.

Threat analysis: Runs async after the response is sent. Your users do not wait for the ML model.

Built for Throughput

Architecture Choices That Keep Overhead Low

Performance is a first-class design constraint, not an afterthought. These are the specific choices that give G8KEPR its overhead profile.

Async FastAPI

All endpoints are async/await — no thread blocking on I/O. Single worker can handle hundreds of concurrent connections without spawning threads.

Python asynciouvicornNo GIL blocking

Redis Singleflight Cache

Identical concurrent requests to Redis are coalesced — only one cache miss fires even under burst. Semantic cache prevents re-running ML models for repeated prompt patterns.

SingleflightSemantic dedup<10ms cache hit

Connection Pooling

PostgreSQL connections are pooled via asyncpg — no per-request connect overhead. Pool sizing is tuned per deployment based on DB capacity.

asyncpg poolNo reconnect overheadConfigurable size

Brotli + HTTP/3

API responses are Brotli-compressed at the edge, reducing wire bytes by ~30% vs gzip. HTTP/3 eliminates head-of-line blocking for clients that support it.

Brotli compressionHTTP/3 QUIC~30% smaller payloads

Why G8KEPR Adds Less Overhead Than You Expect

API gateway overhead is mostly network RTT, not processing. Self-hosted eliminates that entirely.

Factor
Cloud-Only Gateway
G8KEPR Self-Hosted
Network latency to gateway
Add 20–80ms RTT — gateway is in another datacenter
Near-zero — deploy alongside your services
Authentication overhead
Remote auth call on every request — adds 50–200ms
In-process JWT check + Redis local lookup — 1–10ms
Rate limit check
Centralized rate-limit service — extra network hop
Local Redis — 2–5ms single round-trip
Threat analysis blocking
Synchronous scan blocks response until complete
Async — response sent before analysis finishes
Cold start / scale-out
Vendor cold starts add unpredictable spikes
Persistent uvicorn workers — no cold starts
Vendor lock-in on hardware
Cannot scale beyond vendor-provided instance types
Scale to your hardware — upgrade without migration
Open Methodology

Run It Yourself

All test scripts are in the repository. No black-box benchmarks — every number on this page is reproducible. Run against the demo API or your own self-hosted deployment.

tests/load/k6-ci.js

CI gate — smoke + 50 VU load + 200 VU stress

tests/load/k6-baseline.js

Authoritative baseline — 5 key endpoints at 50 VU

tests/load/k6-full.js

Full stress test — 0→500 VU ramp (staging only)

backend/tests/performance/benchmark_guard_performance.py

SemanticGuard cache performance (pytest)

Test Environment

Demo server1GB RAM · 1 vCPU · DigitalOcean NYC
Test toolk6 (Grafana) + pytest-benchmark
BackendFastAPI + asyncpg + Redis 7 + PostgreSQL 15
Load scenariosSmoke (1 VU) · Load (50 VU) · Stress (200 VU)
Baseline VU count50 VU sustained for 2 minutes
FrequencyBlocking CI gate on every pull request

Production note: Self-hosted on production hardware (4+ vCPU, 8GB RAM) will significantly outperform these demo numbers. Contact us for production benchmark guidance.

Open Benchmarks

Performance questions? Run the tests yourself.

All k6 scripts are in the repo. Run them against the demo API or your own deployment. If you want a production sizing conversation, we are happy to help.