Skip to content

L1: Keyword Guard

The L1 layer is an ultra-fast, deterministic keyword filter based on the FlashText algorithm (Aho-Corasick).

AI-based layers (L1.5, L2) are precise but computationally expensive (~20-70ms). L1 allows for the rejection of approximately 40% of known attacks and forbidden content in under 0.01ms.

  • O(n) Performance: Scanning time depends on the text length, not the number of forbidden words.
  • Multilingual: The default list contains phrases in Polish, English, and German.
  • Case-Insensitive: Automatic matching regardless of case.
  • Basic Security: To block known attack strings, structural headers (### instruction), or company-specific forbidden topics.
  • Cost Efficiency: L1 should always be enabled to catch simple attacks before they reach more expensive AI layers.
  • Regulatory Compliance: To enforce strict blocking of specific terms or restricted content.
  • Rigid Lists: Static keywords cannot catch semantic variations (e.g., it blocks “jailbreak” but might miss “break out of jail”). Use L2 for semantic protection.
  • Large Lists: While FlashText is fast, extremely large lists (millions of words) will increase the startup time and memory usage of the sidecar.
  • Inadvertent Blocking: Common words added to the blocklist might accidentally block legitimate queries. Always use SHADOW_MODE to test new keywords.
ENABLE_L1_KEYWORDS=true
L1_BLOCKED_PHRASES='["rm -rf", "drop table", "ignore instructions", "forget everything"]'
# Alternative CSV format:
# L1_BLOCKED_PHRASES="jailbreak,pwned,system prompt"

Hardening: Skeleton Matching & Unicode Resilience

Section titled “Hardening: Skeleton Matching & Unicode Resilience”

Standard keyword filters are easy to bypass by adding spaces, special characters, or using homoglyphs. SafeLLM features a multi-stage hardening mechanism:

  1. Unicode Normalization (NFKC): Prevents bypasses using different representations of the same character (e.g., combining accents or full-width characters).
  2. Homoglyph Resistance: Prevents attacks using visually similar characters from different alphabets (e.g., Cyrillic “а” vs Latin “a”).
  3. Leetspeak Mapping: Automatically maps common obfuscations like 4 -> a, @ -> a, 1 -> i, etc.
  4. Skeleton Generation: Creates a “skeleton” of the text by keeping only alphanumeric characters, effectively ignoring spaces, dots, and other separators.

Detection Example:

  • j a i l b r e a k -> BLOCKED
  • j.a.i.l.b.r.e.a.k -> BLOCKED
  • j @ 1 l b r 3 @ k -> BLOCKED
  • ### instruction -> BLOCKED (Structural bypass)

Based on our benchmarks (dataset: prompt_injections.csv with 546 samples):

  • Jailbreak Recall: ~38% (blocks nearly 40% of attacks in <0.01ms before any AI model is invoked).
  • Latency Overhead: < 0.01ms (Deterministic CPU processing).
  • Structural Awareness: Detects 80+ enterprise-grade patterns including role-play headers and system-level overrides.
Terminal window
ENABLE_L1_KEYWORDS=true
L1_BLOCKED_PHRASES=["hack", "ignore instructions", "forget everything"]

You can provide the list as a JSON string or a comma-separated list.