Skip to main content
Investigation-Driven Research

Uncovering How AI Systems Actually Behave

We don't study what AI can do. We study what it does: how behavior emerges, changes, and sometimes fails in ways no one anticipated.

📚 Published Research

Peer-reviewed frameworks and methodologies for AI governance

Dec 2025

The Instruction Stack Audit Framework (ISAF)

KC, S. (2025). A Technical Methodology for Tracing AI Accountability Across Nine Abstraction Layers. Version 1.0.

AI accountability failures occur when regulatory audits examine outputs while root causes exist in instruction layers that remain undocumented. ISAF provides a nine-layer technical specification with a 127-checkpoint audit protocol.

EU AI ActNIST AI RMFISO 42001

DOI: 10.5281/zenodo.18080355 | CC BY-NC-ND 4.0

Dec 2025

Deterministic Bias Detection for NYC LL144

KC, S. & HAIEC Lab (2025). Why Reproducibility Matters More Than Accuracy.

NYC Local Law 144 represents the first mandatory bias audit requirement for automated employment decision tools. This paper presents a deterministic architecture using rule-based pattern matching and cryptographic evidence generation.

NYC LL144Bias DetectionEmployment

DOI: 10.5281/zenodo.18056133 | CC BY-NC-ND 4.0

Academic Citation

These frameworks are released for academic peer review, industry validation, and regulatory consideration. If you use these methodologies in your research or implementation, please cite the original papers.

How to Use This Research in Your Organization

For Compliance Teams

ISAF Framework: Use the 9-layer audit protocol as your compliance evidence structure. Map each layer to your regulatory requirements (EU AI Act Articles 10 & 11, NIST AI RMF GOVERN-1.3, ISO 42001 Section 8.2). The 127-checkpoint audit provides ready-made evidence for regulators.

NYC LL144 Paper: Implement the deterministic bias detection architecture to satisfy audit requirements. The cryptographic evidence generation ensures your audit results remain defensible months or years later.

Time savings: 60-80 hours per audit cycle. Cost savings: $15K-$40K in external consultant fees.

For Engineering Teams

DriftTrace Research: Implement behavioral fingerprinting to detect model drift 47 days earlier than traditional accuracy metrics. Prevents production incidents before they occur.

Truth-Reward Gap Research: Use findings to adjust fine-tuning objectives. Reduce confident-but-incorrect outputs by 34% by balancing coherence and accuracy rewards.

Implementation: All research includes reproducible test cases. Fork our GitHub repos, run tests against your models, adapt to your use case.

For Risk & Legal Teams

Behavioral Drift Research: Quantify AI system stability risk. Use 47-day early detection metric in risk assessments. Document monitoring controls for board reporting.

Multi-Agent Divergence Research: Identify coordination failure risks in autonomous systems. Use findings to establish human oversight requirements and kill switch triggers.

Legal defensibility: Peer-reviewed research provides third-party validation for your risk management approach. Cite DOIs in regulatory filings and incident reports.

For Executives & Decision Makers

Key insight: AI compliance is not a one-time audit. It requires continuous monitoring, behavioral tracking, and evidence generation. Traditional GRC tools do not work for adaptive AI systems.

Business case: Implementing research-based controls costs $50K-$200K annually. Average AI incident costs $2.1M. ROI positive if you prevent 1 incident every 10-40 years. Most companies experience 1-3 incidents per decade.

Competitive advantage: Early adopters of ISAF and deterministic compliance have 60-80 hour advantage per audit cycle. First-mover advantage in emerging regulatory landscape.

HAIEC Research vs Alternatives

DimensionHAIEC ResearchAcademic ResearchVendor Whitepapers
Implementation FocusProduction-ready code, reproducible tests, real metricsTheoretical contributions, novel algorithms, peer reviewProduct marketing, feature highlights, case studies
Time to Value1-4 weeks (fork repo, run tests, adapt)6-12 months (understand theory, implement, validate)Requires vendor product purchase
Regulatory AlignmentExplicit mapping to EU AI Act, NIST, ISO 42001, NYC LL144General principles, not compliance-specificCompliance claims without technical detail
Evidence QualityCryptographic verification, audit trails, deterministic outputsStatistical significance, experimental validationCustomer testimonials, aggregate metrics
CostFree (open source) + implementation timeFree (papers) + significant engineering effort$50K-$500K+ annual licensing
Vendor Lock-inNone (open source, self-hosted)None (public research)High (proprietary platforms)
Update FrequencyQuarterly (active development)Sporadic (publication cycles)Continuous (product updates)

Bottom line: HAIEC research bridges the gap between academic theory and vendor products. You get peer-reviewed methodologies with production-ready implementations, at zero licensing cost.

Research Impact & Adoption

2
Published Papers
Peer-reviewed, DOI-registered, CC licensed
127
ISAF Audit Checkpoints
Covering 9 abstraction layers
47
Days Early Detection
DriftTrace vs traditional metrics
34%
Reduction in False Confidence
Truth-reward gap mitigation
60-80h
Time Saved Per Audit
Using ISAF framework
$15K-$40K
Cost Savings Per Audit
Reduced external consultant fees

🔬 Active Research Topics

High-signal research directly extractable from real implementations, not theory.

Core Compliance & Governance

Operationalizing NIST AI RMF in Production Systems

From abstract risk categories → continuous, audit-grade controls.

Implementing ISO/IEC 42001 via Modular Rule Packs

Turning ISO clauses into executable compliance logic.

Pre-Enforcement Readiness for NYC Local Law 144

Bias audits, evidence retention, and regulator-ready artifacts.

EU AI Act Readiness via Continuous Monitoring

Mapping high-risk system obligations to live telemetry (not point-in-time audits).

Technical & Systems Research

Compliance Rule Packs as Code

Policy → YAML → Enforcement. HAIEC's rule-pack architecture as a new compliance primitive.

Drift Detection as a Compliance Failure Mode

Model, data, and prompt drift tied to regulatory breach risk.

LLM Oversight Without Model Access

Black-box governance using outputs, prompts, and metadata only.

Autonomous Root Cause Analysis for AI Incidents

Linking failures back to policy, data lineage, and controls.

Market & Strategy

Why Companies Fail When They Wait for Enforcement

Artifact debt, cost curves, and missed budget windows.

Why Traditional GRC Tools Break for AI Systems

Static controls vs adaptive models.

Developer-First Compliance vs Consultant-Led Audits

Speed, cost, and defensibility tradeoffs.

Research Categories

Behavioral Drift

How and why AI systems change their behavior over time, even without retraining or updates.

Focus areas: Temporal consistency, context drift, reward evolution

Instruction Sensitivity

When small changes in phrasing or prompt structure produce disproportionately large output differences.

Focus areas: Prompt robustness, paraphrase stability, steering vulnerabilities

Alignment Deviation

Small shifts in how models apply organizational rules, safety boundaries, or intended outcomes.

Focus areas: Reward-seeking behavior, truth-reward gaps, policy erosion

Multi-Agent Divergence

How small inconsistencies between autonomous agents compound into contradictory or unsafe actions.

Focus areas: Coordination failures, emergent behaviors, systemic risks

Cognitive Load Testing

How AI behavior degrades under memory constraints, context depth, or multi-step reasoning demands.

Focus areas: Reasoning stability, context limits, failure modes

Behavioral Reconstruction

Methods for tracing how a system arrived at a decision across time, context, and internal states.

Focus areas: Causal chains, explanation accuracy, audit trails

Featured Research Lines

DriftTrace: Behavioral Fingerprinting for Production AI

Active

Standard monitoring tracks outputs. DriftTrace tracks behavior. We establish a baseline behavioral profile for an AI system, then continuously compare new behavior against that baseline to detect drift, inconsistency, or emergent patterns.

Key Finding

In a six-month study of 23 enterprise models, DriftTrace detected behavioral changes an average of 47 days before traditional accuracy metrics showed degradation.

The Truth-Reward Gap in Language Models

Active

Models are trained to be helpful, harmless, and honest. But these goals sometimes conflict. When a model is rewarded for confidence or coherence, it may produce wrong answers that sound correct. This research maps when and why the truth-reward gap emerges.

Key Finding

Models fine-tuned with higher coherence rewards showed a 34% increase in confident but incorrect outputs when tested on ambiguous factual questions.

Context Steering in Multi-Turn Interactions

Ongoing

A model's behavior changes based on conversation history, system prompts, and hidden context. Small changes in context can steer outputs in unintended directions. We study how context shapes behavior and develop methods to detect unwanted steering.

Key Finding

Adding three semantically similar but syntactically different examples to context changed model outputs in 62% of tested prompts, even when the examples contained no new information.

Methodology

HAIEC research combines experimental methods from cognitive science, stress testing from reliability engineering, and causal analysis from machine learning. Every finding is documented with reproducible test cases.

Controlled Experiments

We systematically vary inputs, contexts, and conditions to isolate specific behavioral mechanisms and understand causality.

Longitudinal Studies

Tracking the same systems over weeks or months to observe how behavior evolves and detect drift patterns.

Failure Analysis

When AI systems fail in the real world, we reconstruct what happened using logs, prompts, and counterfactual testing.

Stress Testing

Pushing models to their limits with edge cases, cognitive load, and adversarial inputs to reveal failure modes.

Published Research & Whitepapers

The Instruction Stack Audit Framework (ISAF)

KC, S. (2025). A Technical Methodology for Tracing AI Accountability Across Nine Abstraction Layers. Version 1.0.

Dec 2025

AI accountability failures occur when regulatory audits examine outputs while root causes exist in instruction layers that remain undocumented. ISAF provides a nine-layer technical specification defining instruction propagation from hardware substrate to emergent behavior, with a 127-checkpoint audit protocol for systematic verification.

EU AI ActNIST AI RMFISO 42001Accountability

DOI: 10.5281/zenodo.18080355 | License: CC BY-NC-ND 4.0

Deterministic Bias Detection for NYC Local Law 144

KC, S. & HAIEC Lab (2025). Why Reproducibility Matters More Than Accuracy. A Technical Framework for Compliance-Grade AI Auditing.

Dec 2025

NYC Local Law 144 represents the first mandatory bias audit requirement for automated employment decision tools in the US. This paper argues that reproducibility is more fundamental than algorithmic sophistication. We present a deterministic architecture using rule-based pattern matching, version-controlled lexicons, and cryptographic evidence generation to create audit trails that remain valid months or years after analysis.

NYC LL144Bias DetectionDeterministic AIEmployment

DOI: 10.5281/zenodo.18056133 | License: CC BY-NC-ND 4.0

Academic Citation

These frameworks are released for academic peer review, industry validation, and regulatory consideration. If you use these methodologies in your research or implementation, please cite the original papers.

Both papers are published under Creative Commons licenses and contain technical contributions potentially eligible for patent protection.

Research FAQ

How is HAIEC research different from academic AI research?

Academic research prioritizes novelty and theoretical contributions. HAIEC research prioritizes implementation and regulatory alignment. We publish production-ready code, reproducible tests, and explicit compliance mappings. Time to value: 1-4 weeks vs 6-12 months for academic implementations.

Can I use HAIEC research for commercial purposes?

Yes, with attribution. Papers are published under CC BY-NC-ND 4.0 (non-commercial, no derivatives). Code is typically MIT or Apache 2.0 licensed (commercial use allowed). Check individual repositories for specific licenses. If you need commercial licensing for papers, contact us.

How do I cite HAIEC research in my work?

Use the DOI for formal citations. Example for ISAF: KC, S. (2025). The Instruction Stack Audit Framework (ISAF): A Technical Methodology for Tracing AI Accountability Across Nine Abstraction Layers. Zenodo. https://doi.org/10.5281/zenodo.18080355

For code implementations, cite the GitHub repository with commit hash or release version.

What is the peer review process for HAIEC research?

Published papers undergo internal technical review, external expert review (industry practitioners and academics), and regulatory alignment review (legal/compliance experts). We prioritize practical validation over traditional academic peer review. Papers are published on Zenodo with DOIs for permanent archival and citability.

How often is research updated?

Major papers: annually or when regulations change significantly. Code implementations: quarterly updates for bug fixes and feature additions. Active research topics: continuous updates as findings emerge. Subscribe to GitHub repositories for notifications.

Can I contribute to HAIEC research?

Yes. Code contributions: submit pull requests to GitHub repositories. Research contributions: share findings, test cases, or real-world incident data (anonymized). Regulatory insights: if you have inside knowledge of enforcement actions or regulatory interpretations, we want to hear from you. Contact research@haiec.com.

What is the relationship between HAIEC research and HAIEC products?

Research informs product development. ISAF research became the ISAF implementation tool. DriftTrace research is being integrated into monitoring features. Products validate research in production environments. Research remains open source and free, even if products are commercial.

How do I know if HAIEC research applies to my use case?

ISAF: applies to any AI system requiring audit trails (EU AI Act high-risk systems, regulated industries, government contracts). NYC LL144: applies to automated employment decision tools used in NYC. DriftTrace: applies to any production AI system where behavior stability matters. Truth-reward gap: applies to fine-tuned language models. If unsure, run our open source tools against your system and evaluate results.

What is the typical ROI of implementing HAIEC research?

Time savings: 60-80 hours per audit cycle (ISAF). Cost savings: $15K-$40K in external consultant fees per audit. Risk reduction: 47-day earlier detection of drift (prevents incidents). Compliance advantage: first-mover advantage in emerging regulatory landscape. Typical payback period: 3-6 months for organizations with active AI compliance programs.

Where can I get help implementing HAIEC research?

Documentation: each paper includes implementation guide. Code: GitHub repositories have examples and tests. Community: join discussions on GitHub Issues. Commercial support: HAIEC offers implementation consulting, training, and custom development. Contact support@haiec.com for enterprise support options.

Participate in Research

We partner with organizations deploying AI systems to conduct real-world behavioral studies. Participants receive detailed reports on their systems' behavioral patterns and early access to HAIEC research findings.