Why APISIX For AI Gateway Workloads

Teams building LLM platforms inevitably face a critical infrastructure question: should AI traffic run behind a “regular API gateway,” should you adopt a specialized AI gateway product, or can you bridge both worlds with an extensible open-source gateway?

For many organizations — especially those that need transparency, body-level security inspection, and freedom from vendor lock-in — Apache APISIX is a compelling answer. It combines mature API gateway fundamentals with the deep extensibility required for AI-specific workloads, and it is the gateway SafeLLM chose as its primary reference integration.

This article is not marketing. It is a technical analysis of why APISIX fits AI gateway workloads, where its limitations are, how the SafeLLM integration works at the protocol level, and what you should consider before choosing it for your stack.

The AI Gateway Problem: Why Traditional Gateways Fall Short

Traditional API gateways (Kong, Envoy, AWS API Gateway, Azure APIM) were designed for REST/GraphQL traffic patterns. They excel at:

Header-based routing and authentication
Rate limiting by IP, API key, or JWT claims
TLS termination and load balancing
Request/response transformation at the header level

But LLM traffic has fundamentally different characteristics:

1. The Payload Is the Security Surface

In REST APIs, security decisions are typically made on headers, URLs, and query parameters. The request body is passed through to the backend with minimal inspection.

In LLM traffic, the request body IS the security surface. A prompt injection attack lives in the body. PII leakage happens through body content. Jailbreak attempts are embedded in the text payload. A gateway that cannot deeply inspect request bodies is blind to the most important security signals in AI traffic.

Most traditional gateways support forward-auth plugins that delegate authentication to external services — but forward-auth only forwards headers and URLs. It does not send the request body. This is a fundamental limitation for AI security.

2. Token Economics Replace Byte Economics

REST APIs are priced by request count or data transfer. LLM APIs are priced by token count — and costs scale dramatically with model capability and prompt length. A gateway for AI traffic needs to understand token-level economics:

Token counting for cost attribution per tenant
Token-based rate limiting (not just request-based)
Cache-aware routing to avoid paying for repeated prompts
Cost-aware model selection and failover

3. Streaming Is the Default, Not the Exception

LLM responses are typically streamed via Server-Sent Events (SSE). Traditional gateways either buffer the entire response (breaking streaming UX) or pass it through without inspection (creating a security blind spot).

An AI gateway must support streaming-aware security: inspecting response chunks as they flow without breaking the streaming experience. This requires a fundamentally different approach to response processing.

4. Bursty, Long-Running Requests

LLM inference calls are orders of magnitude slower than typical API calls — 500ms to 30 seconds vs 10-100ms. Request volumes are bursty (a team enabling a new AI feature can 10x traffic overnight). A gateway optimized for high-volume, low-latency REST traffic may need different tuning for lower-volume, higher-latency AI workloads.

Why Apache APISIX Addresses These Challenges

APISIX is not an “AI gateway” by origin — it is a high-performance, extensible API gateway that happens to have the right architectural primitives for AI workloads. Here is why that matters.

Apache Governance and Open Source Transparency

APISIX is an Apache Software Foundation top-level project. For security-sensitive and regulated organizations, this has concrete implications:

Vendor-neutral governance — no single company controls the roadmap, licensing, or access.
Transparent development — all code, issues, and design discussions are public.
Long-term maintainability — ASF projects have structured incubation, graduation, and sustainability processes.
License clarity — Apache 2.0 license with well-understood patent provisions.

In regulated environments (EU financial services, healthcare, government), procurement teams often require open-source governance guarantees that go beyond “the code is on GitHub.” ASF governance provides that assurance.

APISIX Is a Real Data Plane, Not a Config Wrapper

Some gateways are primarily management UIs that wrap a reverse proxy (Nginx, Envoy) with limited ability to customize the data plane. APISIX is different — it is built directly on OpenResty (Nginx + LuaJIT), giving you:

Full Lua scripting on the hot path — you can write custom logic that executes inside the request processing pipeline with near-native performance.
Dynamic routing — routes, upstreams, and plugins can be modified at runtime via the Admin API without restarts or reloads.
Plugin chaining — multiple plugins execute in a defined order on each request, enabling composition of concerns (auth → body inspection → rate limit → proxy).

For AI traffic, this matters because:

Policy checks happen on every request, on the hot path. You need a data plane that can carry custom logic without collapsing under load.
AI traffic patterns change rapidly (new models, new tools, new tenants). Dynamic routing means you can adapt without downtime.
The ability to script custom behavior in Lua — specifically, reading and forwarding request bodies — is what makes the SafeLLM integration possible.

First-Class AI Gateway Direction

Starting in 2025, APISIX has explicitly invested in AI gateway capabilities. The project now includes plugin-level support for LLM traffic patterns:

Model/provider routing — route requests to different LLM providers based on model name, cost tier, or availability.
Token-aware controls — rate limiting and cost attribution based on token counts rather than just request counts.
Prompt-level policy hooks — the ability to inspect and act on prompt content at the gateway level.
Upstream failover — automatic failover between LLM providers when one is slow or unavailable.
AI plugin ecosystem — growing set of community and official plugins for AI-specific use cases.

This means teams do not have to force AI traffic into generic API assumptions from 2018. They can compose AI-specific controls at the gateway level, using purpose-built primitives.

How SafeLLM Integrates With APISIX: The Technical Deep Dive

The SafeLLM + APISIX integration is not a loose coupling or a marketing partnership. It is a specific technical architecture that leverages APISIX’s Lua scripting capability to perform deep content inspection on every request.

Why `serverless-pre-function`, Not `forward-auth`

This is the most important technical detail of the integration, and it is frequently misunderstood.

APISIX (like most gateways) offers a forward-auth plugin that delegates authentication to an external service by forwarding the request headers and URL. The external service returns 200 (allow) or 403 (deny).

The problem: forward-auth does not send the request body. It only forwards headers and the URL path. For traditional API authentication (checking JWT tokens, API keys, IP allowlists), this is sufficient. For AI security, it is useless — the prompt content that needs to be inspected is in the body.

SafeLLM uses APISIX’s serverless-pre-function plugin instead. This plugin allows arbitrary Lua code to execute inside the Nginx request processing pipeline:

-- APISIX serverless-pre-function (simplified)
local core = require("apisix.core")
local http = require("resty.http")

-- Step 1: Read the full request body from Nginx memory
ngx.req.read_body()
local body = ngx.req.get_body_data()

if body and #body > 0 then
    -- Step 2: POST the body to SafeLLM /auth endpoint
    local httpc = http.new()
    local res, err = httpc:request_uri("http://127.0.0.1:8000/auth", {
        method = "POST",
        headers = {
            ["Content-Type"] = "application/json",
        },
        body = body,
    })

    -- Step 3: Enforce SafeLLM's decision
    if not res or res.status ~= 200 then
        -- Block the request - SafeLLM denied it
        return ngx.exit(403)
    end
    -- If 200, request continues to upstream
end

This flow has specific performance characteristics:

Body reading — ngx.req.read_body() reads the request body into Nginx’s shared memory. For typical prompt sizes (1-10KB), this is sub-millisecond.
Local HTTP call — the call to SafeLLM at 127.0.0.1:8000 is a loopback call with ~0.1ms network overhead. In Kubernetes, when SafeLLM runs as a sidecar pod, this becomes a pod-internal call with similar latency.
Decision enforcement — the 200/403 decision is binary and immediate. No complex response parsing required.

The Sidecar Architecture Pattern

SafeLLM is designed to run as a sidecar alongside APISIX, not as a separate centralized service. This architectural choice has specific advantages:

┌──────────────────────────────────────────────┐
│  Pod / Container Group                        │
│                                               │
│  ┌─────────────┐     ┌────────────────────┐  │
│  │   APISIX    │────▶│    SafeLLM         │  │
│  │  Gateway    │◀────│    Sidecar         │  │
│  │  :9080      │     │    :8000           │  │
│  └─────────────┘     └────────────────────┘  │
│        │                      │               │
│        │                      ▼               │
│        │              ┌──────────────┐        │
│        │              │    Redis     │        │
│        │              │   (Cache)    │        │
│        │              └──────────────┘        │
│        ▼                                      │
│  ┌─────────────┐                              │
│  │  Upstream   │                              │
│  │  (LLM API)  │                              │
│  └─────────────┘                              │
└──────────────────────────────────────────────┘

Why sidecar, not centralized service:

Latency — localhost/loopback communication is ~0.1ms. A centralized service adds network hops, DNS resolution, and load balancer overhead.
Scaling — SafeLLM scales automatically with APISIX. If you add more gateway replicas, you get more security processing capacity.
Isolation — each APISIX instance has its own SafeLLM sidecar. A failure in one sidecar does not affect other instances.
Simplicity — no need for service discovery, load balancing, or health checking between gateway and security service. They co-locate and share fate.

SafeLLM’s Security Pipeline in the APISIX Context

When a request arrives at APISIX and is forwarded to SafeLLM’s /auth endpoint, the full waterfall pipeline executes:

Request Body (prompt text)
         │
         ▼
┌─────────────────────────────────────────────┐
│ L0: Smart Cache                              │
│ • SHA-256 hash of normalized prompt          │
│ • Redis lookup (standalone or Sentinel HA)   │
│ • HIT → return cached verdict in <0.1ms      │
│ • MISS → continue to L1                      │
└─────────────────────┬───────────────────────┘
                      │ MISS
                      ▼
┌─────────────────────────────────────────────┐
│ L1: Keyword Guard                            │
│ • FlashText (Aho-Corasick) scan             │
│ • 80+ patterns: jailbreak, role-play,       │
│   system overrides, multilingual            │
│ • Hardening: NFKC, homoglyph, leetspeak,   │
│   skeleton generation                        │
│ • MATCH → block in <0.01ms                   │
│ • NO MATCH → continue to L1.5               │
└─────────────────────┬───────────────────────┘
                      │ NO MATCH
                      ▼
┌─────────────────────────────────────────────┐
│ L1.5: PII Shield                             │
│ • OSS: Regex + Luhn validation (~1-2ms)     │
│   - Email, phone, credit card, IBAN, SSN,   │
│     PESEL, NIP, IP, crypto addresses        │
│   - Obfuscation-resistant (spaced/dotted)   │
│ • Enterprise: GLiNER AI (25+ types, ~25ms)  │
│ • PII FOUND → block or redact               │
│ • CLEAN → continue to L2                     │
└─────────────────────┬───────────────────────┘
                      │ CLEAN
                      ▼
┌─────────────────────────────────────────────┐
│ L2: AI Guard (Enterprise Only)               │
│ • ONNX-compiled neural network               │
│ • CPU-only, no GPU required                  │
│ • Detects: jailbreak, indirect injection,   │
│   system prompt leakage                      │
│ • Threshold: 0.85-0.9 (configurable)        │
│ • Decision in 30-70ms                        │
│ • THREAT → block                             │
│ • SAFE → allow                               │
└─────────────────────┬───────────────────────┘
                      │ SAFE
                      ▼
              200 OK → APISIX forwards
              to upstream LLM

The waterfall design is critical for performance. The cheapest, fastest checks run first. If L0 cache has seen this prompt before, the entire pipeline is skipped. If L1 keywords match a known attack pattern, the request is blocked in microseconds without invoking regex scanning or neural inference. Only novel, non-obvious prompts reach the expensive AI layer.

Real-World Performance Numbers

On a standard CPU (AMD Ryzen 5 PRO 3600, no GPU):

Metric	Result	Target Baseline
Throughput	1,206 RPS	100 RPS
Average latency	10ms	25ms
P95 latency	13.5ms	—
Total requests (60s)	72,380	—
Accuracy (with AI layers)	>95%	—
False positive rate	<0.3%	—

For context: a typical LLM inference call to GPT-4 or Claude takes 500ms-5000ms. SafeLLM’s full pipeline overhead of 10-75ms adds 1-2% to total request time. The APISIX gateway itself adds <1ms for routing and plugin execution. The combined gateway + security overhead is effectively invisible to end users.

Separation of Concerns: Why Architecture Matters

A common architecture mistake in AI platforms is placing all security logic in application code. The application team writes prompt injection checks, PII scanning, rate limiting, and audit logging inside the AI application. This leads to:

Duplicated controls — every AI application reimplements the same security logic.
Inconsistent policy — different applications apply different rules, or the same rules with different configurations.
Painful governance — security teams cannot audit or enforce policy without reading every application’s source code.
Slow iteration — updating a security rule means redeploying every application.

With APISIX + SafeLLM, responsibilities are cleanly separated:

Platform Team Owns the Gateway

Ingress routing and TLS termination
Authentication and authorization (JWT validation, API key management)
Rate limiting and traffic shaping
Load balancing and upstream health checks
Observability (access logs, metrics, tracing)

Security / AI Team Owns SafeLLM

Prompt content security (injection detection, jailbreak blocking)
PII detection and redaction
DLP output scanning
Cache policy and cache invalidation
Audit logging and compliance evidence
Security rule updates (keyword lists, PII entity configuration, AI model thresholds)

Application Team Focuses on Business Logic

Model selection and prompt engineering
Business-specific tool implementations
User experience and response formatting
Application-level error handling

This separation means:

Security rules can be updated without redeploying applications.
Platform reliability (rate limiting, failover) is managed independently from security policy.
Audit trails cover all AI traffic uniformly, regardless of which application generated it.
New AI applications automatically inherit security controls when they route through APISIX.

Why Not “Only Sidecar, No Gateway”?

SafeLLM can run in standalone mode — directly as a reverse proxy without APISIX. This is useful for local development and testing. But at production scale, the gateway-first architecture provides critical capabilities that a standalone sidecar cannot:

Standardized Ingress Controls

Without a gateway, each service manages its own ingress: TLS, authentication, rate limiting, IP allowlisting. This creates operational overhead and inconsistency. APISIX provides a single ingress point with uniform controls.

Centralized Policy Attachment

Gateway-level policy applies to all traffic that flows through it. If you add a new AI application behind APISIX, it immediately inherits all security controls. With sidecar-only deployment, each new service needs its own security configuration.

Consistent Observability

A gateway produces unified access logs, metrics, and traces for all AI traffic. Without a gateway, observability is fragmented across services, making it difficult to get a holistic view of AI platform health and security posture.

Traffic Governance

Canary deployments, traffic splitting, blue-green routing, A/B testing between models — these are gateway-level concerns. Implementing them at the sidecar level means reinventing gateway functionality in each service.

The APISIX Adoption Question

A valid concern, especially in European markets: APISIX has lower brand recognition than legacy gateway products (Kong, AWS API Gateway, Azure APIM, Nginx Plus). Some enterprise procurement teams default to “the gateway we already have.”

SafeLLM addresses this pragmatically with dual messaging:

Product message: SafeLLM is gateway-agnostic AI security. It works with any HTTP-capable gateway or reverse proxy. The /auth endpoint accepts standard HTTP POST requests — any system that can make an HTTP call can integrate with SafeLLM.

Reference deployment message: APISIX is the fastest fully working open reference path. The safellm-oss/examples/apisix-reference/ directory contains a complete Docker Compose stack that boots in under a minute and demonstrates the full security pipeline.

This keeps go-to-market broad while providing an opinionated, working default for teams that want to move quickly.

Making the APISIX Reference Operationally Real

A reference deployment is only valuable if it is maintained as executable infrastructure, not static documentation. SafeLLM’s APISIX reference stack is:

A Docker Compose stack that boots all components (APISIX, SafeLLM, Redis, upstream) with one command.
A smoke-test baseline — you can run predefined curl commands to validate health, routing, and security decisions.
An integration template — the route configuration, Lua scripts, and environment variables serve as a starting point for production configuration.
A regression baseline — CI/CD can boot the stack, run smoke tests, and verify that new SafeLLM releases do not break APISIX integration.

If the reference were only static docs, it would degrade into marketing material within months as the code evolves. Operational discipline — automated testing, version-pinned dependencies, documented configuration — is what keeps it valuable.

Configuration Deep Dive: APISIX + SafeLLM

For teams evaluating the stack, here is a practical configuration walkthrough.

APISIX Route Configuration

The core APISIX configuration defines routes that invoke SafeLLM via the serverless-pre-function plugin:

# config/apisix.yaml
routes:
  - uri: /api/*
    upstream:
      type: roundrobin
      nodes:
        "upstream-service:8080": 1
    plugins:
      serverless-pre-function:
        phase: rewrite
        functions:
          - |
            return function(conf, ctx)
              ngx.req.read_body()
              local body = ngx.req.get_body_data()
              if body and #body > 0 then
                local http = require("resty.http")
                local httpc = http.new()
                local res, err = httpc:request_uri(
                  "http://127.0.0.1:8000/auth",
                  {
                    method = "POST",
                    headers = { ["Content-Type"] = "application/json" },
                    body = body,
                  }
                )
                if not res or res.status ~= 200 then
                  ngx.status = 403
                  ngx.say('{"error":"blocked by SafeLLM"}')
                  return ngx.exit(403)
                end
              end
            end

Key points:

phase: rewrite ensures SafeLLM is called before the request is proxied to the upstream.
The Lua function reads the body, calls SafeLLM, and enforces the decision inline.
If SafeLLM returns anything other than 200, the request is blocked with a 403.
The serverless-pre-function has near-zero overhead beyond the SafeLLM call itself.

SafeLLM Environment Configuration

# .env for SafeLLM sidecar
# Core pipeline
ENABLE_CACHE=true
ENABLE_L1_KEYWORDS=true
ENABLE_L3_PII=true
ENABLE_L2_AI=false           # true for Enterprise

# Security posture
SHADOW_MODE=true             # Start with logging, switch to false for enforcement
FAIL_OPEN=false              # Deny if SafeLLM is unavailable
MAX_BODY_SIZE=1000000        # 1MB max request body
REQUEST_TIMEOUT=30           # 30s per request

# Redis (L0 Cache)
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=0
REDIS_TTL=3600               # 1 hour cache TTL
REDIS_TIMEOUT=0.5            # 500ms Redis timeout

# Logging
LOG_LEVEL=INFO
LOG_FORMAT=json

# Metrics
ENABLE_METRICS=true          # Prometheus endpoint on /metrics

# Auth result header
ALLOW_HEADER=X-Auth-Result

Docker Compose Stack

# docker-compose.yml (reference)
services:
  apisix:
    image: apache/apisix:3.11-debian
    ports:
      - "9080:9080"
    volumes:
      - ./config/apisix.yaml:/usr/local/apisix/conf/apisix.yaml
    depends_on:
      sidecar:
        condition: service_healthy

  sidecar:
    image: ghcr.io/safellmio/safellm-apisix-gateway-sidecar:2.0.0
    env_file: .env
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s

Kubernetes Helm Deployment

For production Kubernetes deployments:

# Add SafeLLM Helm repository
helm repo add safellm https://safellm.github.io/charts

# Install with custom values
helm install safellm safellm/safellm-oss \
  -n safellm \
  --create-namespace \
  --set redis.enabled=true \
  --set replicaCount=3 \
  --set resources.requests.cpu=100m \
  --set resources.requests.memory=256Mi \
  --set resources.limits.cpu=500m \
  --set resources.limits.memory=512Mi

Resource requirements are intentionally modest — SafeLLM is CPU-optimized and does not require GPU. A production deployment with 3 replicas, anti-affinity rules, and Redis typically runs on 1.5 CPU cores and 1.5GB RAM total.

Monitoring: What to Watch

Prometheus Metrics (OSS)

SafeLLM exposes Prometheus metrics at /metrics when ENABLE_METRICS=true:

safellm_blocked_requests_total — counter of blocked requests, labeled by layer (L0/L1/L1.5/L2). Rising L1 counts may indicate an automated attack. Rising L1.5 counts may indicate PII exposure in application workflows.
safellm_scan_duration_seconds — histogram of scan latency per layer. Watch for p99 spikes that could indicate resource contention.
safellm_cache_hits_total — cache effectiveness. A hit rate below 10% in a customer support scenario may indicate the cache is misconfigured.
safellm_dlp_pii_detected_total — output-side PII detections. Any non-zero count warrants investigation.

APISIX Metrics

APISIX provides its own Prometheus metrics for gateway-level observability:

Request rate, latency histograms, and error rates per route
Upstream health and response times
Plugin execution latency (including serverless-pre-function overhead)

Together, APISIX + SafeLLM metrics give a complete picture of both traffic health and security posture.

Practical Decision Framework

Choose APISIX + SafeLLM if:

You need an open-source stack with transparent governance and no vendor lock-in.
You need gateway-level body inspection for AI traffic security — not just header-based auth.
You want a working reference deployment you can boot in minutes for pilots, presales demos, and integration testing.
You need clean separation between routing/reliability concerns (APISIX) and content security concerns (SafeLLM).
You are deploying in regulated environments (EU, healthcare, financial services) where audit trails and policy enforcement are mandatory.
You prefer CPU-only infrastructure — no GPU requirements for security processing.

Consider alternatives if:

Your team cannot operate gateway infrastructure yet and needs a fully managed ingress from day one with zero ops capacity. (Consider a managed API gateway + SafeLLM standalone mode as a stepping stone.)
Your current platform mandates a specific gateway as a hard standard. (SafeLLM’s /auth endpoint works with any HTTP-capable gateway — you can integrate without APISIX.)
You need specialized AI gateway features that APISIX does not yet support (e.g., provider-specific billing integrations). (Evaluate whether SafeLLM’s gateway-agnostic approach can fill the gap.)

The Honest Assessment

APISIX is not the most well-known gateway in every market. In European enterprise segments, Kong, AWS API Gateway, and Azure APIM have stronger brand recognition. Some procurement processes default to “the gateway we already use.”

SafeLLM acknowledges this reality. The product is gateway-agnostic by design. APISIX is the reference deployment because it is open-source, extensible enough for body-level inspection, and provides the fastest path from zero to working demo. But SafeLLM works with any gateway that can make an HTTP POST to /auth.

The long-term win is not claiming that APISIX is universally the best gateway. The win is demonstrating that APISIX + SafeLLM is a low-friction, high-control starting point for serious AI gateway security — and that SafeLLM’s value persists regardless of which gateway you ultimately choose.

For teams ready to evaluate, the fastest path is the reference deployment: clone the repository, run docker compose up -d, and have a working AI security gateway in under 60 seconds. From there, you can explore shadow mode, test with real prompts, and decide whether the architecture fits your production requirements.