CI Gate: p95 < 200ms @ 50 VU

Performance
Benchmarks

We publish our performance targets, test methodology, and CI gate thresholds. Every pull request must pass p95 < 200ms at 50 VU before it can merge. Here is exactly what G8KEPR adds to your request path — and why.

<200ms

p95 CI gate

100+

req/s @ 50 VU

<1%

Error rate target

<10ms

Cache hit overhead

Request Benchmark Report Security Posture

Context matters. The numbers on this page are targets enforced by the CI gate at 50 VU on the demo stack (1GB DigitalOcean droplet). Self-hosted deployments on production-grade hardware will perform better. Stress tests at 200–500 VU show higher latency — the demo server is not sized for that load. We link to all test scripts so you can run them yourself.

Enforced on Every Pull Request

CI Performance Gate

These thresholds are defined in tests/load/k6-ci.js and block merges if any threshold fails.

Smoke (1 VU · 30s)

Sanity check — catches regressions before load testing

p95

< 200ms

p99

< 500ms

Throughput

> 100 req/s

Error rate

< 1%

Load (50 VU · 2 min)

Normal traffic simulation — must pass to merge to main

p95

< 200ms

p99

< 500ms

Throughput

> 100 req/s

Error rate

< 1%

Stress (200 VU · 1 min)

Burst capacity — validates the system under spike load

p95

< 200ms

p99

< 500ms

Throughput

> 100 req/s

Error rate

< 1%

Per-Endpoint Performance Targets

Published targets from tests/load/benchmark_results.json

Endpoint

The Request Pipeline

Every request goes through only the steps it needs. Pass-through validation adds ~15–30ms. Active threat scanning adds 50–200ms but runs async.

JWT + API key validation1–3ms

In-memory JWT signature check + Redis GET for key scope

Rate limit check2–5ms

Redis INCR with sliding window — single round-trip

Circuit breaker evaluation< 1ms

In-memory state check — no I/O

Threat detection (cached)5–10ms

Redis semantic cache hit — no model call needed

Threat detection (uncached)50–200ms

Pattern matching + optional ML scoring. Async — does not block response.

Audit log write2–8ms

Async PostgreSQL insert — does not add to request latency

Pass-through total: ~15–30ms added overhead for JWT validation + rate limit + circuit breaker. No more.

Threat analysis: Runs async after the response is sent. Your users do not wait for the ML model.

Built for Throughput

Architecture Choices That Keep Overhead Low

Performance is a first-class design constraint, not an afterthought. These are the specific choices that give G8KEPR its overhead profile.

Async FastAPI

All endpoints are async/await — no thread blocking on I/O. Single worker can handle hundreds of concurrent connections without spawning threads.

Python asynciouvicornNo GIL blocking

Redis Singleflight Cache

Identical concurrent requests to Redis are coalesced — only one cache miss fires even under burst. Semantic cache prevents re-running ML models for repeated prompt patterns.

SingleflightSemantic dedup<10ms cache hit

Connection Pooling

PostgreSQL connections are pooled via asyncpg — no per-request connect overhead. Pool sizing is tuned per deployment based on DB capacity.

asyncpg poolNo reconnect overheadConfigurable size

Brotli + HTTP/3

API responses are Brotli-compressed at the edge, reducing wire bytes by ~30% vs gzip. HTTP/3 eliminates head-of-line blocking for clients that support it.

Brotli compressionHTTP/3 QUIC~30% smaller payloads

Why G8KEPR Adds Less Overhead Than You Expect

API gateway overhead is mostly network RTT, not processing. Self-hosted eliminates that entirely.

Factor

Cloud-Only Gateway

G8KEPR Self-Hosted

Network latency to gateway

Add 20–80ms RTT — gateway is in another datacenter

Near-zero — deploy alongside your services

Authentication overhead

Remote auth call on every request — adds 50–200ms

In-process JWT check + Redis local lookup — 1–10ms

Rate limit check

Centralized rate-limit service — extra network hop

Local Redis — 2–5ms single round-trip

Threat analysis blocking

Synchronous scan blocks response until complete

Async — response sent before analysis finishes

Cold start / scale-out

Vendor cold starts add unpredictable spikes

Persistent uvicorn workers — no cold starts

Vendor lock-in on hardware

Cannot scale beyond vendor-provided instance types

Scale to your hardware — upgrade without migration

Open Methodology

Run It Yourself

All test scripts are in the repository. No black-box benchmarks — every number on this page is reproducible. Run against the demo API or your own self-hosted deployment.

tests/load/k6-ci.js

CI gate — smoke + 50 VU load + 200 VU stress

tests/load/k6-baseline.js

Authoritative baseline — 5 key endpoints at 50 VU

tests/load/k6-full.js

Full stress test — 0→500 VU ramp (staging only)

backend/tests/performance/benchmark_guard_performance.py

SemanticGuard cache performance (pytest)

Test Environment

Demo server1GB RAM · 1 vCPU · DigitalOcean NYC

Test toolk6 (Grafana) + pytest-benchmark

BackendFastAPI + asyncpg + Redis 7 + PostgreSQL 15

Load scenariosSmoke (1 VU) · Load (50 VU) · Stress (200 VU)

Baseline VU count50 VU sustained for 2 minutes

FrequencyBlocking CI gate on every pull request

Production note: Self-hosted on production hardware (4+ vCPU, 8GB RAM) will significantly outperform these demo numbers. Contact us for production benchmark guidance.

Open Benchmarks

Performance questions? Run the tests yourself.

All k6 scripts are in the repo. Run them against the demo API or your own deployment. If you want a production sizing conversation, we are happy to help.

Production Sizing Help Security Posture

PerformanceBenchmarks

CI Performance Gate

Per-Endpoint Performance Targets

The Request Pipeline

Architecture Choices That Keep Overhead Low

Async FastAPI

Redis Singleflight Cache

Connection Pooling

Brotli + HTTP/3

Why G8KEPR Adds Less Overhead Than You Expect

Run It Yourself

Test Environment

Performance questions? Run the tests yourself.

Performance
Benchmarks