Executive Summary
llmverify is a heuristic-based monitoring and classification tool. It provides risk indicators and signals, not ground truth. It cannot prove an LLM is hallucinating, detect all prompt injection attacks, guarantee content safety, verify factual accuracy, or replace human review.
What llmverify IS
- A runtime monitoring layer that tracks LLM behavior over time
- A pattern-based classifier that identifies output characteristics
- A risk indicator system that flags suspicious patterns
- A developer tool for building more observable AI systems
- A local-first library with zero data collection
What llmverify IS NOT
- Not a fact-checking service
- Not a content moderation AI
- Not a security guarantee
- Not a replacement for human review
- Not an official audit tool for any provider
Specific Limitations by Feature
Hallucination Detection
What it does: Scores risk based on patterns like overconfident language, new entities, and fabricated JSON keys.
What it CANNOT do:
- Verify if statements are factually true
- Detect plausible-sounding lies
- Identify subtle inaccuracies
- Check against external knowledge bases
Accuracy: ~60-70% correlation with human-identified hallucinations in controlled tests. Real-world accuracy varies significantly by domain.
Prompt Injection Detection
What it does: Pattern-matches against known injection techniques.
What it CANNOT do:
- Detect novel/zero-day attacks
- Stop sophisticated adversarial prompts
- Guarantee input safety
- Replace proper input validation
Coverage: Detects ~80% of common injection patterns from public datasets. Novel attacks will bypass detection.
Recommendation: Use as one layer in defense-in-depth, not as sole protection.
PII Detection
What it does: Regex-based pattern matching for common PII formats.
What it CANNOT do:
- Detect all PII (especially names without context)
- Understand semantic PII (e.g., "my address is...")
- Handle international formats comprehensively
- Guarantee GDPR/HIPAA compliance
Accuracy: ~90% for structured PII (emails, phones, SSNs). ~60% for unstructured PII (names, addresses).
Runtime Health Monitoring
What it does: Tracks latency, token rate, and response structure over time.
What it CANNOT do:
- Detect provider-side model changes with certainty
- Distinguish network issues from model issues
- Account for prompt complexity differences
- Guarantee stable baselines
Best use case: Detecting anomalies in repetitive, stable workloads (CI pipelines, batch processing).
Worst use case: Highly variable prompts with diverse expected outputs.
General Limitations
- English-optimized: Prompt injection and harmful content detection are English-optimized. PII detection works for most languages (universal formats). Multilingual support is on the roadmap.
- Pattern-based only: Cannot detect anything not in its pattern library. Novel attacks, new PII formats, and emerging threats require package updates.
- Not a guarantee: All results are risk indicators. Use as one layer in a defense-in-depth strategy, combined with human review for high-stakes decisions.