L1: Keyword Guard
The L1 layer is an ultra-fast, deterministic keyword filter based on the FlashText algorithm (Aho-Corasick).
Why L1?
Section titled “Why L1?”AI-based layers (L1.5, L2) are precise but computationally expensive (~20-70ms). L1 allows for the rejection of approximately 40% of known attacks and forbidden content in under 0.01ms.
Main Features
Section titled “Main Features”- O(n) Performance: Scanning time depends on the text length, not the number of forbidden words.
- Multilingual: The default list contains phrases in Polish, English, and German.
- Case-Insensitive: Automatic matching regardless of case.
When to use
Section titled “When to use”- Basic Security: To block known attack strings, structural headers (
### instruction), or company-specific forbidden topics. - Cost Efficiency: L1 should always be enabled to catch simple attacks before they reach more expensive AI layers.
- Regulatory Compliance: To enforce strict blocking of specific terms or restricted content.
Common pitfalls
Section titled “Common pitfalls”- Rigid Lists: Static keywords cannot catch semantic variations (e.g., it blocks “jailbreak” but might miss “break out of jail”). Use L2 for semantic protection.
- Large Lists: While FlashText is fast, extremely large lists (millions of words) will increase the startup time and memory usage of the sidecar.
- Inadvertent Blocking: Common words added to the blocklist might accidentally block legitimate queries. Always use
SHADOW_MODEto test new keywords.
Configuration Example
Section titled “Configuration Example”ENABLE_L1_KEYWORDS=trueL1_BLOCKED_PHRASES='["rm -rf", "drop table", "ignore instructions", "forget everything"]'# Alternative CSV format:# L1_BLOCKED_PHRASES="jailbreak,pwned,system prompt"Hardening: Skeleton Matching & Unicode Resilience
Section titled “Hardening: Skeleton Matching & Unicode Resilience”Standard keyword filters are easy to bypass by adding spaces, special characters, or using homoglyphs. SafeLLM features a multi-stage hardening mechanism:
- Unicode Normalization (NFKC): Prevents bypasses using different representations of the same character (e.g., combining accents or full-width characters).
- Homoglyph Resistance: Prevents attacks using visually similar characters from different alphabets (e.g., Cyrillic “а” vs Latin “a”).
- Leetspeak Mapping: Automatically maps common obfuscations like
4->a,@->a,1->i, etc. - Skeleton Generation: Creates a “skeleton” of the text by keeping only alphanumeric characters, effectively ignoring spaces, dots, and other separators.
Detection Example:
j a i l b r e a k-> BLOCKEDj.a.i.l.b.r.e.a.k-> BLOCKEDj @ 1 l b r 3 @ k-> BLOCKED### instruction-> BLOCKED (Structural bypass)
Efficiency & Performance
Section titled “Efficiency & Performance”Based on our benchmarks (dataset: prompt_injections.csv with 546 samples):
- Jailbreak Recall: ~38% (blocks nearly 40% of attacks in <0.01ms before any AI model is invoked).
- Latency Overhead: < 0.01ms (Deterministic CPU processing).
- Structural Awareness: Detects 80+ enterprise-grade patterns including role-play headers and system-level overrides.
Configuration
Section titled “Configuration”ENABLE_L1_KEYWORDS=trueL1_BLOCKED_PHRASES=["hack", "ignore instructions", "forget everything"]You can provide the list as a JSON string or a comma-separated list.