We publish our performance targets, test methodology, and CI gate thresholds. Every pull request must pass p95 < 200ms at 50 VU before it can merge. Here is exactly what G8KEPR adds to your request path — and why.
Context matters. The numbers on this page are targets enforced by the CI gate at 50 VU on the demo stack (1GB DigitalOcean droplet). Self-hosted deployments on production-grade hardware will perform better. Stress tests at 200–500 VU show higher latency — the demo server is not sized for that load. We link to all test scripts so you can run them yourself.
These thresholds are defined in tests/load/k6-ci.js and block merges if any threshold fails.
Sanity check — catches regressions before load testing
Normal traffic simulation — must pass to merge to main
Burst capacity — validates the system under spike load
Published targets from tests/load/benchmark_results.json
GET /healthReadiness probe
POST /api/keys/validateAPI key authentication
GET /api/gateway/proxyGateway passthrough
GET /api/compliance/frameworksCompliance data fetch
GET /api/audit-logsAudit log retrieval
POST /api/threat-intelligence/analyzeFull AI threat scan
Threat detection p95 is higher because it runs optional ML analysis. It is async and does not block the gateway response.
Every request goes through only the steps it needs. Pass-through validation adds ~15–30ms. Active threat scanning adds 50–200ms but runs async.
In-memory JWT signature check + Redis GET for key scope
Redis INCR with sliding window — single round-trip
In-memory state check — no I/O
Redis semantic cache hit — no model call needed
Pattern matching + optional ML scoring. Async — does not block response.
Async PostgreSQL insert — does not add to request latency
Pass-through total: ~15–30ms added overhead for JWT validation + rate limit + circuit breaker. No more.
Threat analysis: Runs async after the response is sent. Your users do not wait for the ML model.
Performance is a first-class design constraint, not an afterthought. These are the specific choices that give G8KEPR its overhead profile.
All endpoints are async/await — no thread blocking on I/O. Single worker can handle hundreds of concurrent connections without spawning threads.
Identical concurrent requests to Redis are coalesced — only one cache miss fires even under burst. Semantic cache prevents re-running ML models for repeated prompt patterns.
PostgreSQL connections are pooled via asyncpg — no per-request connect overhead. Pool sizing is tuned per deployment based on DB capacity.
API responses are Brotli-compressed at the edge, reducing wire bytes by ~30% vs gzip. HTTP/3 eliminates head-of-line blocking for clients that support it.
API gateway overhead is mostly network RTT, not processing. Self-hosted eliminates that entirely.
All test scripts are in the repository. No black-box benchmarks — every number on this page is reproducible. Run against the demo API or your own self-hosted deployment.
tests/load/k6-ci.jsCI gate — smoke + 50 VU load + 200 VU stress
tests/load/k6-baseline.jsAuthoritative baseline — 5 key endpoints at 50 VU
tests/load/k6-full.jsFull stress test — 0→500 VU ramp (staging only)
backend/tests/performance/benchmark_guard_performance.pySemanticGuard cache performance (pytest)
Production note: Self-hosted on production hardware (4+ vCPU, 8GB RAM) will significantly outperform these demo numbers. Contact us for production benchmark guidance.
All k6 scripts are in the repo. Run them against the demo API or your own deployment. If you want a production sizing conversation, we are happy to help.