Skip to content

L2: AI Guard

Enterprise (Paid) only: L2 AI Guard is available in the Enterprise edition. Contact sales@safellm.io for access.

The L2 layer is the most advanced level of protection in SafeLLM, dedicated to combating attacks that static rules cannot detect.

Traditional keyword filters can be fooled by appropriately constructing sentences (social engineering). The L2 layer uses a neural network model (Prompt Guard) compiled to the ONNX format, which analyzes the semantics and intent of the query.

  • Jailbreak: Attempts to force the model to break its system instructions (e.g., “Imagine you are a hacker…”).
  • Indirect Injection: Hidden instructions within data that may be processed by the LLM.
  • System Prompt Leakage: Attempts to extract secret instructions that define the model’s behavior.
  • Use L2 when you need protection against semantic attacks, social engineering, and advanced jailbreaks that don’t use specific “bad words”.
  • Best for public-facing LLM applications where users might actively try to bypass security filters.
  • Essential for high-risk environments where system prompt leakage must be prevented at all costs.
  • Over-blocking: Setting L2_THRESHOLD too low can lead to high false-positive rates, blocking legitimate user queries.
  • Latency sensitive apps: L2 adds 30-70ms per request. If your application requires sub-10ms response times, consider relying more on L1 or using a more powerful CPU.
  • Semantic Truncation: The model has a L2_MAX_LENGTH limit (default 512 tokens). Prompts longer than this are truncated before reaching the AI model. An attacker could potentially bypass the layer by sending a very long “filler” text followed by the actual jailbreak. You can increase this limit via L2_MAX_LENGTH if your infrastructure permits, but ensure your application also limits maximum prompt length.
ENABLE_L2_AI=true
L2_THRESHOLD=0.85
L2_MODEL_PATH="models/prompt_guard.onnx"
L2_MAX_LENGTH=512
SHADOW_MODE=true

Despite being an AI model, it has been optimized for CPU operation:

  • Latency: ~30-70ms (depending on text length).
  • Preloading: With PRELOAD_MODELS=true, the model is loaded once at startup and shared between workers, drastically reducing RAM consumption.
Terminal window
ENABLE_L2_AI=true
L2_THRESHOLD=0.9 # Blocking threshold (0.0 - 1.0)
L2_MODEL_PATH=models/prompt_guard.onnx

We recommend starting with L2_THRESHOLD=0.9 and monitoring logs in SHADOW_MODE to avoid False Positives. When you are confident in the rules, set SHADOW_MODE=false to enforce blocking.