Skip to content

Testing Overview

SafeLLM follows a rigorous testing process to ensure performance, security, and reliability.

The project includes several levels of tests:

  1. Unit Tests: Testing individual components and layers (e.g., Cache, Keywords, PII) in isolation.
  2. Integration Tests: Testing the interaction between components, such as the full Waterfall Pipeline.
  3. End-to-End (E2E) Tests: Testing the entire stack, including APISIX, the Sidecar, and a mock upstream model.
  4. Benchmark Tests: Measuring latency, RPS (Requests Per Second), and memory usage.

You can run the entire test suite using the provided script:

Terminal window
cd safellm-oss
./run_tests.sh

This script will:

  • Activate the virtual environment.
  • Run all tests using pytest.
  • Generate a coverage report.

We maintain a set of “red team” prompts to verify the effectiveness of our security layers against:

  • Prompt Injection
  • Jailbreak attempts
  • PII leaks (both input and output)

These prompts are used in our integration tests to ensure that no security regression occurs during development.

For a step-by-step manual validation of the full APISIX -> Sidecar -> Upstream flow, see:

  • testing/manual-e2e