L1: Keyword Guard

The L1 layer is an ultra-fast, deterministic keyword filter based on the FlashText algorithm (Aho-Corasick).

Why L1?

AI-based layers (L1.5, L2) are precise but computationally expensive (~20-70ms). L1 allows for the rejection of approximately 40% of known attacks and forbidden content in under 0.01ms.

Main Features

O(n) Performance: Scanning time depends on the text length, not the number of forbidden words.
Multilingual: The default list contains phrases in Polish, English, and German.
Case-Insensitive: Automatic matching regardless of case.

When to use

Basic Security: To block known attack strings, structural headers (### instruction), or company-specific forbidden topics.
Cost Efficiency: L1 should always be enabled to catch simple attacks before they reach more expensive AI layers.
Regulatory Compliance: To enforce strict blocking of specific terms or restricted content.

Common pitfalls

Rigid Lists: Static keywords cannot catch semantic variations (e.g., it blocks “jailbreak” but might miss “break out of jail”). Use L2 for semantic protection.
Large Lists: While FlashText is fast, extremely large lists (millions of words) will increase the startup time and memory usage of the sidecar.
Inadvertent Blocking: Common words added to the blocklist might accidentally block legitimate queries. Always use SHADOW_MODE to test new keywords.

Configuration Example

ENABLE_L1_KEYWORDS=true
L1_BLOCKED_PHRASES='["rm -rf", "drop table", "ignore instructions", "forget everything"]'
# Alternative CSV format:
# L1_BLOCKED_PHRASES="jailbreak,pwned,system prompt"

Hardening: Skeleton Matching & Unicode Resilience

Standard keyword filters are easy to bypass by adding spaces, special characters, or using homoglyphs. SafeLLM features a multi-stage hardening mechanism:

Unicode Normalization (NFKC): Prevents bypasses using different representations of the same character (e.g., combining accents or full-width characters).
Homoglyph Resistance: Prevents attacks using visually similar characters from different alphabets (e.g., Cyrillic “а” vs Latin “a”).
Leetspeak Mapping: Automatically maps common obfuscations like 4 -> a, @ -> a, 1 -> i, etc.
Skeleton Generation: Creates a “skeleton” of the text by keeping only alphanumeric characters, effectively ignoring spaces, dots, and other separators.

Detection Example:

j a i l b r e a k -> BLOCKED
j.a.i.l.b.r.e.a.k -> BLOCKED
j @ 1 l b r 3 @ k -> BLOCKED
### instruction -> BLOCKED (Structural bypass)

Efficiency & Performance

Based on our benchmarks (dataset: prompt_injections.csv with 546 samples):

Jailbreak Recall: ~38% (blocks nearly 40% of attacks in <0.01ms before any AI model is invoked).
Latency Overhead: < 0.01ms (Deterministic CPU processing).
Structural Awareness: Detects 80+ enterprise-grade patterns including role-play headers and system-level overrides.

Configuration

ENABLE_L1_KEYWORDS=true
L1_BLOCKED_PHRASES=["hack", "ignore instructions", "forget everything"]

You can provide the list as a JSON string or a comma-separated list.