Skip to content

Security Layers (Waterfall)

SafeLLM implements a “Waterfall” security model where queries pass through multiple specialized layers. This design ensures both maximum security and optimal performance by using fast, deterministic filters before applying more complex AI models.

Each layer in the pipeline has a specific responsibility. If a layer determines that a query is unsafe, it immediately blocks it (short-circuit), and the subsequent, more resource-intensive layers are not executed.

  • Goal: Near-zero latency for repeated queries.
  • Mechanism: Hashes the prompt (SHA256) and checks Redis for a previously stored safety decision.
  • Performance: < 0.1ms.
  • Goal: Block known malicious phrases and forbidden content.
  • Mechanism: Uses the FlashText algorithm for O(n) scanning. Includes “Skeleton Matching” to prevent bypasses using spaces or special characters.
  • Performance: < 0.01ms.
  • Goal: Prevent sensitive information from being sent to models or leaked in responses.
  • Mechanism:
    • OSS: Regex-based patterns for common entities (Email, Cards, etc.).
    • Enterprise (Paid): AI-powered GLiNER model for contextual detection and country-specific identifiers.
  • Performance: 1-25ms.
  • Goal: Detect advanced prompt injection and jailbreak attempts.
  • Mechanism: Uses a specialized neural network (Prompt Guard) via ONNX Runtime to analyze the intent behind the text.
  • Performance: 30-70ms on CPU.
LayerTypeEfficiencyPrecisionBest For
L0CacheUltra FastHighRepeated queries
L1StaticUltra FastHighKnown attacks, forbidden words
L1.5Regex/AIFast/MedVery HighPII, GDPR compliance
L2NeuralMediumVery HighZero-day attacks, jailbreaks