Philosophy

llmverify uses heuristic-based scoring. These are pattern-matching algorithms, not AI models. They detect signals that correlate with issues — they do not prove issues exist.

All scores are indicators, not facts. A high hallucination risk score means "this output has patterns associated with hallucination" — not "this output is definitely hallucinated."

Runtime Health Engines

LatencyEngine

Purpose: Detect when LLM response time deviates from baseline.

ratio = call.latencyMs / baseline.avgLatencyMs
deviation = max(0, ratio - 1)

if deviation < warnRatio (default 1.5):
  status = 'ok', value = 0
elif deviation < errorRatio (default 3.0):
  status = 'warn', value = deviation / errorRatio
else:
  status = 'error', value = min(1, deviation / errorRatio)

Output Range: 0–1 (0 = normal, 1 = severe deviation)

Limitations: Network variability can cause false positives. Does not account for prompt complexity differences.

TokenRateEngine

Purpose: Detect when tokens-per-second throughput drops.

currentTPS = (call.responseTokens / call.latencyMs) * 1000
ratio = currentTPS / baseline.avgTokensPerSecond
deficit = max(0, 1 - ratio)

if deficit < warnThreshold (default 0.3):
  status = 'ok'
elif deficit < errorThreshold (default 0.6):
  status = 'warn'
else:
  status = 'error'

Output Range: 0–1 (0 = normal throughput, 1 = severe slowdown)

Limitations: Short responses have high variance. Token counting may differ from provider's count.

FingerprintEngine

Purpose: Detect behavioral drift by comparing response structure patterns.

Computes a structural fingerprint of each response (sentence count, average word length, vocabulary diversity, punctuation patterns) and compares against baseline fingerprints using cosine similarity.

Limitations: Sensitive to prompt changes. Best for stable, repetitive workloads.

Security Engines

Prompt Injection Detection

Pattern matching against known attack signatures from OWASP LLM Top 10 categories. 9 attack categories: system override, context exfiltration, tool/function abuse, instruction manipulation, delimiter attacks, encoding bypasses, multi-turn attacks, indirect injection, payload injection.

Method: Regex patterns + keyword scoring + structural analysis

Accuracy: 70-85% on known attack patterns

Limitations: Cannot detect novel/zero-day attacks. Sophisticated adversarial prompts may bypass detection.

PII Detection

Regex-based pattern matching for 25+ PII types. Includes Luhn algorithm validation for credit cards, checksum validation for SSN, format validation for API keys.

Accuracy: ~90% for structured PII (emails, phones, SSNs). ~60% for unstructured PII (names, addresses).

False positive rate: ~5% (e.g., phone numbers in non-standard formats).

Hallucination Risk Scoring

Heuristic scoring based on: overconfident language ("definitely", "guaranteed", "100%"), new entities not in the prompt, internal contradictions, fabricated JSON keys.

Accuracy: ~60-70% correlation with human-identified hallucinations in controlled tests.

Critical limitation: This is a risk indicator, not a fact-checker. It cannot verify factual accuracy.

← Server Mode Limitations →

Algorithms & Detection