L1.5: PII Shield
The L1.5 layer protects against sending personal data (PII) to LLM models and against its leakage in responses.
Detection Modes
Section titled “Detection Modes”Depending on the version, SafeLLM offers two detection engines:
1. [OSS] Fast Regex Detector
Section titled “1. [OSS] Fast Regex Detector”- Technology: Optimized regular expressions with Luhn validation.
- Purpose: Basic protection, ultra-high performance (~1-2ms).
- Detected Entities:
| Entity Type | Pattern Description | Examples |
|---|---|---|
EMAIL_ADDRESS | Standard email format | user@domain.com |
PHONE_NUMBER | International/domestic formats | +48 500 600 700, (555) 123-4567 |
CREDIT_CARD | Visa, MasterCard, Amex, Discover (with Luhn validation) | 4111-1111-1111-1111 |
IP_ADDRESS | IPv4 addresses | 192.168.1.100 |
IBAN_CODE | International Bank Account Numbers | DE89370400440532013000 |
CRYPTO | Bitcoin, Ethereum addresses | 0x742d35Cc6634... |
US_SSN | US Social Security Numbers | 123-45-6789 |
POLISH_PESEL | Polish national ID (11 digits) | 90010112345 |
POLISH_NIP | Polish tax ID (10 digits) | 123-456-78-90 |
Obfuscation Detection
Section titled “Obfuscation Detection”The regex detector includes aggressive patterns to catch obfuscation attempts:
- Credit cards with spaces between digits:
4 5 3 2 0 1 5 1... - SSNs with unusual separators:
1.2.3-4.5-6.7.8.9
Obfuscated patterns are validated with Luhn checksum for credit cards and SSA rules for SSNs to minimize false positives.
2. [Enterprise (Paid)] AI GLiNER Detector
Section titled “2. [Enterprise (Paid)] AI GLiNER Detector”- Technology: GLiNER (Generalist Model for Named Entity Recognition) language model.
- Purpose: Precise detection in context, support for country-specific formats.
- Advantages: Detects over 25 types of entities, including Polish ones: PESEL, NIP, REGON, Identity Card.
- Performance: ~20-25ms on CPU.
Configuration
Section titled “Configuration”| Variable | Description |
|---|---|
ENABLE_L3_PII | Enables the PII layer. |
USE_FAST_PII | true = Regex (OSS, default), false = GLiNER [Enterprise (Paid)]. |
L3_PII_ENTITIES | List of entities to detect (e.g., ["EMAIL_ADDRESS", "POLISH_PESEL"]). |
L3_PII_THRESHOLD | Confidence threshold for the AI model (default 0.7). |
L3_PII_LANGUAGE | Language code for GLiNER analysis (default en). |
When to use
Section titled “When to use”- Input Filtering: To prevent users from sending sensitive data (like their own SSN or emails) to external LLM providers.
- Privacy by Design: To ensure that PII is caught early in the pipeline, right after L1.
- Hybrid Security: Use Regex (Fast) for common patterns and AI (GLiNER) for context-aware detection in regulated industries.
Common pitfalls
Section titled “Common pitfalls”- Regex Limitations: Regex can be bypassed by creative formatting (e.g., “e-mail at domain dot com”). Use Enterprise GLiNER for better recall.
- Resource Consumption: GLiNER requires a CPU-intensive scan. In high-traffic environments, ensure sufficient CPU cores are allocated to the sidecar pods.
- Custom PII Length Limit: Custom regex patterns are skipped for texts longer than
CUSTOM_FAST_PII_MAX_TEXT_LENGTH(default 20,000 chars) to prevent ReDoS attacks. This limit is configurable to balance security and performance. Standard PII patterns are always scanned regardless of text length. - False Positives: Random strings that look like IDs (e.g.,
ACME-1234-5678) might be flagged as Credit Cards or SSNs. UseCUSTOM_FAST_PII_PATTERNSto define your own rules and reduce noise.
Custom Regex Patterns [Enterprise Only]
Section titled “Custom Regex Patterns [Enterprise Only]”You can extend the PII detector by providing your own regular expressions for company-specific identifiers (e.g., Internal IDs, project codes).
Configuration
Section titled “Configuration”To add custom patterns, use the CUSTOM_FAST_PII_PATTERNS environment variable. It accepts a JSON dictionary where the key is the entity name and the value is the regex pattern.
# Example: Adding internal ACME ID and Project CodeCUSTOM_FAST_PII_PATTERNS='{"ACME_ID": "ACME-[0-9]{4}", "PROJ_CODE": "PRJ-[A-Z]{3}"}'Safety & Performance
Section titled “Safety & Performance”To prevent ReDoS (Regular Expression Denial of Service) attacks, SafeLLM enforces several limits on custom regexes:
- Text Length Limit: Custom patterns are skipped for texts longer than
CUSTOM_FAST_PII_MAX_TEXT_LENGTH(default: 20,000 chars). - Pattern Count: Maximum of 50 custom patterns can be registered.
- Pattern Complexity: Maximum pattern length is 256 characters.
Configuration Example [OSS & Enterprise]
Section titled “Configuration Example [OSS & Enterprise]”ENABLE_L3_PII=trueUSE_FAST_PII=trueL3_PII_ENTITIES=["EMAIL_ADDRESS", "PHONE_NUMBER", "ACME_ID"]CUSTOM_FAST_PII_PATTERNS='{"ACME_ID": "ACME-[0-9]{4}"}'CUSTOM_FAST_PII_MAX_TEXT_LENGTH=20000Error Handling (Circuit Breaker)
Section titled “Error Handling (Circuit Breaker)”PII detection (especially in AI mode) has a built-in Circuit Breaker. If the detection engine starts reporting errors (e.g., out of RAM), the layer can switch to fail-open mode (letting traffic through) or fail-closed (blocking), depending on the FAIL_OPEN setting.
OSS note: In the OSS build, USE_FAST_PII=false is ignored and the regex detector is always used.