Algorithms & Detection
Transparency document: exactly how each engine computes its scores. No magic, no black boxes.
Philosophy
llmverify uses heuristic-based scoring. These are pattern-matching algorithms, not AI models. They detect signals that correlate with issues — they do not prove issues exist.
All scores are indicators, not facts. A high hallucination risk score means "this output has patterns associated with hallucination" — not "this output is definitely hallucinated."
Runtime Health Engines
LatencyEngine
Purpose: Detect when LLM response time deviates from baseline.
ratio = call.latencyMs / baseline.avgLatencyMs deviation = max(0, ratio - 1) if deviation < warnRatio (default 1.5): status = 'ok', value = 0 elif deviation < errorRatio (default 3.0): status = 'warn', value = deviation / errorRatio else: status = 'error', value = min(1, deviation / errorRatio)
Output Range: 0–1 (0 = normal, 1 = severe deviation)
Limitations: Network variability can cause false positives. Does not account for prompt complexity differences.
TokenRateEngine
Purpose: Detect when tokens-per-second throughput drops.
currentTPS = (call.responseTokens / call.latencyMs) * 1000 ratio = currentTPS / baseline.avgTokensPerSecond deficit = max(0, 1 - ratio) if deficit < warnThreshold (default 0.3): status = 'ok' elif deficit < errorThreshold (default 0.6): status = 'warn' else: status = 'error'
Output Range: 0–1 (0 = normal throughput, 1 = severe slowdown)
Limitations: Short responses have high variance. Token counting may differ from provider's count.
FingerprintEngine
Purpose: Detect behavioral drift by comparing response structure patterns.
Computes a structural fingerprint of each response (sentence count, average word length, vocabulary diversity, punctuation patterns) and compares against baseline fingerprints using cosine similarity.
Limitations: Sensitive to prompt changes. Best for stable, repetitive workloads.
Security Engines
Prompt Injection Detection
Pattern matching against known attack signatures from OWASP LLM Top 10 categories. 9 attack categories: system override, context exfiltration, tool/function abuse, instruction manipulation, delimiter attacks, encoding bypasses, multi-turn attacks, indirect injection, payload injection.
Method: Regex patterns + keyword scoring + structural analysis
Accuracy: 70-85% on known attack patterns
Limitations: Cannot detect novel/zero-day attacks. Sophisticated adversarial prompts may bypass detection.
PII Detection
Regex-based pattern matching for 25+ PII types. Includes Luhn algorithm validation for credit cards, checksum validation for SSN, format validation for API keys.
Accuracy: ~90% for structured PII (emails, phones, SSNs). ~60% for unstructured PII (names, addresses).
False positive rate: ~5% (e.g., phone numbers in non-standard formats).
Hallucination Risk Scoring
Heuristic scoring based on: overconfident language ("definitely", "guaranteed", "100%"), new entities not in the prompt, internal contradictions, fabricated JSON keys.
Accuracy: ~60-70% correlation with human-identified hallucinations in controlled tests.
Critical limitation: This is a risk indicator, not a fact-checker. It cannot verify factual accuracy.