Security Layers (Waterfall)
Security Layers (Waterfall)
Section titled “Security Layers (Waterfall)”SafeLLM implements a “Waterfall” security model where queries pass through multiple specialized layers. This design ensures both maximum security and optimal performance by using fast, deterministic filters before applying more complex AI models.
The Waterfall Concept
Section titled “The Waterfall Concept”Each layer in the pipeline has a specific responsibility. If a layer determines that a query is unsafe, it immediately blocks it (short-circuit), and the subsequent, more resource-intensive layers are not executed.
L0: Smart Cache (Performance Layer)
Section titled “L0: Smart Cache (Performance Layer)”- Goal: Near-zero latency for repeated queries.
- Mechanism: Hashes the prompt (SHA256) and checks Redis for a previously stored safety decision.
- Performance: < 0.1ms.
L1: Keyword Guard (Static Layer)
Section titled “L1: Keyword Guard (Static Layer)”- Goal: Block known malicious phrases and forbidden content.
- Mechanism: Uses the FlashText algorithm for O(n) scanning. Includes “Skeleton Matching” to prevent bypasses using spaces or special characters.
- Performance: < 0.01ms.
L1.5: PII Shield (Privacy Layer)
Section titled “L1.5: PII Shield (Privacy Layer)”- Goal: Prevent sensitive information from being sent to models or leaked in responses.
- Mechanism:
- OSS: Regex-based patterns for common entities (Email, Cards, etc.).
- Enterprise (Paid): AI-powered GLiNER model for contextual detection and country-specific identifiers.
- Performance: 1-25ms.
L2: AI Guard (Neural Layer)
Section titled “L2: AI Guard (Neural Layer)”- Goal: Detect advanced prompt injection and jailbreak attempts.
- Mechanism: Uses a specialized neural network (Prompt Guard) via ONNX Runtime to analyze the intent behind the text.
- Performance: 30-70ms on CPU.
Layer Summary
Section titled “Layer Summary”| Layer | Type | Efficiency | Precision | Best For |
|---|---|---|---|---|
| L0 | Cache | Ultra Fast | High | Repeated queries |
| L1 | Static | Ultra Fast | High | Known attacks, forbidden words |
| L1.5 | Regex/AI | Fast/Med | Very High | PII, GDPR compliance |
| L2 | Neural | Medium | Very High | Zero-day attacks, jailbreaks |