Security Layers (Waterfall)

SafeLLM implements a “Waterfall” security model where queries pass through multiple specialized layers. This design ensures both maximum security and optimal performance by using fast, deterministic filters before applying more complex AI models.

The Waterfall Concept

Each layer in the pipeline has a specific responsibility. If a layer determines that a query is unsafe, it immediately blocks it (short-circuit), and the subsequent, more resource-intensive layers are not executed.

L0: Smart Cache (Performance Layer)

Goal: Near-zero latency for repeated queries.
Mechanism: Hashes the prompt (SHA256) and checks Redis for a previously stored safety decision.
Performance: < 0.1ms.

L1: Keyword Guard (Static Layer)

Goal: Block known malicious phrases and forbidden content.
Mechanism: Uses the FlashText algorithm for O(n) scanning. Includes “Skeleton Matching” to prevent bypasses using spaces or special characters.
Performance: < 0.01ms.

L1.5: PII Shield (Privacy Layer)

Goal: Prevent sensitive information from being sent to models or leaked in responses.
Mechanism:
- OSS: Regex-based patterns for common entities (Email, Cards, etc.).
- Enterprise (Paid): AI-powered GLiNER model for contextual detection and country-specific identifiers.
Performance: 1-25ms.

L2: AI Guard (Neural Layer)

Goal: Detect advanced prompt injection and jailbreak attempts.
Mechanism: Uses a specialized neural network (Prompt Guard) via ONNX Runtime to analyze the intent behind the text.
Performance: 30-70ms on CPU.

Layer Summary

Layer	Type	Efficiency	Precision	Best For
L0	Cache	Ultra Fast	High	Repeated queries
L1	Static	Ultra Fast	High	Known attacks, forbidden words
L1.5	Regex/AI	Fast/Med	Very High	PII, GDPR compliance
L2	Neural	Medium	Very High	Zero-day attacks, jailbreaks