Executive Summary

llmverify is a heuristic-based monitoring and classification tool. It provides risk indicators and signals, not ground truth. It cannot prove an LLM is hallucinating, detect all prompt injection attacks, guarantee content safety, verify factual accuracy, or replace human review.

What llmverify IS

A runtime monitoring layer that tracks LLM behavior over time
A pattern-based classifier that identifies output characteristics
A risk indicator system that flags suspicious patterns
A developer tool for building more observable AI systems
A local-first library with zero data collection

What llmverify IS NOT

Not a fact-checking service
Not a content moderation AI
Not a security guarantee
Not a replacement for human review
Not an official audit tool for any provider

Specific Limitations by Feature

Hallucination Detection

What it does: Scores risk based on patterns like overconfident language, new entities, and fabricated JSON keys.

What it CANNOT do:

Verify if statements are factually true
Detect plausible-sounding lies
Identify subtle inaccuracies
Check against external knowledge bases

Accuracy: ~60-70% correlation with human-identified hallucinations in controlled tests. Real-world accuracy varies significantly by domain.

Prompt Injection Detection

What it does: Pattern-matches against known injection techniques.

What it CANNOT do:

Detect novel/zero-day attacks
Stop sophisticated adversarial prompts
Guarantee input safety
Replace proper input validation

Coverage: Detects ~80% of common injection patterns from public datasets. Novel attacks will bypass detection.

Recommendation: Use as one layer in defense-in-depth, not as sole protection.

PII Detection

What it does: Regex-based pattern matching for common PII formats.

What it CANNOT do:

Detect all PII (especially names without context)
Understand semantic PII (e.g., "my address is...")
Handle international formats comprehensively
Guarantee GDPR/HIPAA compliance

Accuracy: ~90% for structured PII (emails, phones, SSNs). ~60% for unstructured PII (names, addresses).

Runtime Health Monitoring

What it does: Tracks latency, token rate, and response structure over time.

What it CANNOT do:

Detect provider-side model changes with certainty
Distinguish network issues from model issues
Account for prompt complexity differences
Guarantee stable baselines

Best use case: Detecting anomalies in repetitive, stable workloads (CI pipelines, batch processing).

Worst use case: Highly variable prompts with diverse expected outputs.

General Limitations

English-optimized: Prompt injection and harmful content detection are English-optimized. PII detection works for most languages (universal formats). Multilingual support is on the roadmap.
Pattern-based only: Cannot detect anything not in its pattern library. Novel attacks, new PII formats, and emerging threats require package updates.
Not a guarantee: All results are risk indicators. Use as one layer in a defense-in-depth strategy, combined with human review for high-stakes decisions.

← Algorithms All Docs →