L1.5: PII Shield

The L1.5 layer protects against sending personal data (PII) to LLM models and against its leakage in responses.

Detection Modes

Depending on the version, SafeLLM offers two detection engines:

1. [OSS] Fast Regex Detector

Technology: Optimized regular expressions with Luhn validation.
Purpose: Basic protection, ultra-high performance (~1-2ms).
Detected Entities:

Entity Type	Pattern Description	Examples
`EMAIL_ADDRESS`	Standard email format	`user@domain.com`
`PHONE_NUMBER`	International/domestic formats	`+48 500 600 700`, `(555) 123-4567`
`CREDIT_CARD`	Visa, MasterCard, Amex, Discover (with Luhn validation)	`4111-1111-1111-1111`
`IP_ADDRESS`	IPv4 addresses	`192.168.1.100`
`IBAN_CODE`	International Bank Account Numbers	`DE89370400440532013000`
`CRYPTO`	Bitcoin, Ethereum addresses	`0x742d35Cc6634...`
`US_SSN`	US Social Security Numbers	`123-45-6789`
`POLISH_PESEL`	Polish national ID (11 digits)	`90010112345`
`POLISH_NIP`	Polish tax ID (10 digits)	`123-456-78-90`

Obfuscation Detection

The regex detector includes aggressive patterns to catch obfuscation attempts:

Credit cards with spaces between digits: 4 5 3 2 0 1 5 1...
SSNs with unusual separators: 1.2.3-4.5-6.7.8.9

Obfuscated patterns are validated with Luhn checksum for credit cards and SSA rules for SSNs to minimize false positives.

2. [Enterprise (Paid)] AI GLiNER Detector

Technology: GLiNER (Generalist Model for Named Entity Recognition) language model.
Purpose: Precise detection in context, support for country-specific formats.
Advantages: Detects over 25 types of entities, including Polish ones: PESEL, NIP, REGON, Identity Card.
Performance: ~20-25ms on CPU.

Configuration

Variable	Description
`ENABLE_L3_PII`	Enables the PII layer.
`USE_FAST_PII`	`true` = Regex (OSS, default), `false` = GLiNER [Enterprise (Paid)].
`L3_PII_ENTITIES`	List of entities to detect (e.g., `["EMAIL_ADDRESS", "POLISH_PESEL"]`).
`L3_PII_THRESHOLD`	Confidence threshold for the AI model (default 0.7).
`L3_PII_LANGUAGE`	Language code for GLiNER analysis (default `en`).

When to use

Input Filtering: To prevent users from sending sensitive data (like their own SSN or emails) to external LLM providers.
Privacy by Design: To ensure that PII is caught early in the pipeline, right after L1.
Hybrid Security: Use Regex (Fast) for common patterns and AI (GLiNER) for context-aware detection in regulated industries.

Common pitfalls

Regex Limitations: Regex can be bypassed by creative formatting (e.g., “e-mail at domain dot com”). Use Enterprise GLiNER for better recall.
Resource Consumption: GLiNER requires a CPU-intensive scan. In high-traffic environments, ensure sufficient CPU cores are allocated to the sidecar pods.
Custom PII Length Limit: Custom regex patterns are skipped for texts longer than CUSTOM_FAST_PII_MAX_TEXT_LENGTH (default 20,000 chars) to prevent ReDoS attacks. This limit is configurable to balance security and performance. Standard PII patterns are always scanned regardless of text length.
False Positives: Random strings that look like IDs (e.g., ACME-1234-5678) might be flagged as Credit Cards or SSNs. Use CUSTOM_FAST_PII_PATTERNS to define your own rules and reduce noise.

Custom Regex Patterns [Enterprise Only]

You can extend the PII detector by providing your own regular expressions for company-specific identifiers (e.g., Internal IDs, project codes).

Configuration

To add custom patterns, use the CUSTOM_FAST_PII_PATTERNS environment variable. It accepts a JSON dictionary where the key is the entity name and the value is the regex pattern.

# Example: Adding internal ACME ID and Project Code
CUSTOM_FAST_PII_PATTERNS='{"ACME_ID": "ACME-[0-9]{4}", "PROJ_CODE": "PRJ-[A-Z]{3}"}'

Safety & Performance

To prevent ReDoS (Regular Expression Denial of Service) attacks, SafeLLM enforces several limits on custom regexes:

Text Length Limit: Custom patterns are skipped for texts longer than CUSTOM_FAST_PII_MAX_TEXT_LENGTH (default: 20,000 chars).
Pattern Count: Maximum of 50 custom patterns can be registered.
Pattern Complexity: Maximum pattern length is 256 characters.

Configuration Example [OSS & Enterprise]

ENABLE_L3_PII=true
USE_FAST_PII=true
L3_PII_ENTITIES=["EMAIL_ADDRESS", "PHONE_NUMBER", "ACME_ID"]
CUSTOM_FAST_PII_PATTERNS='{"ACME_ID": "ACME-[0-9]{4}"}'
CUSTOM_FAST_PII_MAX_TEXT_LENGTH=20000

Error Handling (Circuit Breaker)

PII detection (especially in AI mode) has a built-in Circuit Breaker. If the detection engine starts reporting errors (e.g., out of RAM), the layer can switch to fail-open mode (letting traffic through) or fail-closed (blocking), depending on the FAIL_OPEN setting.

OSS note: In the OSS build, USE_FAST_PII=false is ignored and the regex detector is always used.