The Compliance Officer's Guide to AI Claims Investigation Deployment

$308.6B

Annual US insurance fraud loss

Coalition Against Insurance Fraud

States that adopted the NAIC AI bulletin

as of March 2025

NIST AI RMF functions to map a tool to

Govern, Map, Measure, Manage

2-4 hrs

Hesper time to a defensible finding

vs 14+ days manual, Hesper internal benchmark

The manual fraud investigation a carrier already signs off on often cannot reproduce its own evidence chain six months later, when a state examiner pulls the file. A well-built AI investigation can. That single fact inverts the question most compliance officers start with. The worry is usually "is the AI defensible?" - but the honest comparison is against a manual process whose decision trail lives in an investigator's memory and a few notes. Done right, autonomous investigation is the more auditable of the two.

This guide is written for the person who signs off before an AI claims-investigation system goes live: the compliance officer who owns the antifraud-plan filing with the state Department of Insurance and sits in the buying committee as the regulatory gate. If that role is nervous, the deployment stalls - and it should, until the regulatory questions are answered. The aim here is to make those questions concrete: which regulations actually apply, what each one requires of the system, and the exact checklist to put in front of a vendor.

It is a companion to the technical-gatekeeper view in the CIO checklist for an AI claims investigation rollout. The CIO owns security and integrations; the compliance officer owns regulatory sign-off. Both have to clear before production.

The compliance officer is the gatekeeper, not the rubber stamp

The compliance officer's job in an AI deployment is to confirm that the system's output can be defended to a regulator - in a market-conduct examination, an antifraud-plan filing, or bad-faith litigation. That is a higher bar than "does it work." It is "can we show our work." Insurance fraud is a $308.6 billion annual problem in the US and shows up in roughly 10% of property-casualty losses, per the Coalition Against Insurance Fraud; we set out the full picture in the State of Insurance Fraud Detection 2026 report. The pressure to investigate more claims is real, which is exactly why the governance around how you investigate them matters.

What changes with autonomous investigation is throughput and documentation at once. A manual SIU case takes 14+ days, an investigator carries 200+ cases, and the practical result is that only about 25% of flagged claims get a real investigation. Running the investigation autonomously in 2-4 hours lifts that flagged-claim coverage toward 100%. For a compliance officer the relevant part is not the speed - it is that every one of those investigations arrives with a complete, timestamped record instead of a backlog of flags nobody worked.

The reframe that matters

The compliance burden lands on the decision, not the flag. A detection score from FRISS, Shift, or Verisk raises a question; it is not a documented investigation and was never meant to be. Detection is upstream; investigation is downstream. The defensibility bar applies to the finding that gets acted on - and that is the artifact an AI investigation either produces well or does not.

The regulatory map: what actually applies to AI fraud investigation

Six regimes bear on an AI claims-investigation deployment in the US, plus the EU AI Act if the carrier operates in Europe. Most were not written for AI specifically - they govern how claims are investigated and how decisions are documented, and an AI system inherits those duties. The table below maps each one to what it requires of the system. Read it as the agenda for the vendor conversation.

Regime	Applies to	What it requires of the AI system
NAIC AI Model Bulletin (Dec 2023)	Insurer use of AI/ML, including third-party systems	Written governance program, oversight and approval, validation/testing/retesting, vendor diligence with audit rights, and records a department can request in an exam
California 10 CCR 2698.36	SIU investigation documentation (P&C)	A written investigation summary stating the basis for the findings, and documentation when a referred red flag is not investigated
New York 11 NYCRR 86	Fraud Prevention Plans / SIU (3,000+ policies)	A filed plan and full-time SIU; the insurer remains primarily responsible for contracted SIU work; annual SIU report due March 15
NAIC Model Act 680 (48 states)	Insurance Fraud Prevention / antifraud plans	The investigation output has to fit the antifraud plan the carrier files and stand up in a state SIU exam
NAIC Model Act 900 (UCSPA)	Unfair claims settlement practices	Reasonable standards for prompt investigation; no denying or refusing a claim without a reasonable investigation
EU AI Act (Aug 2, 2026)	Life/health risk-assessment and pricing in the EU	High-risk obligations (logging, human oversight, documentation) - note this targets pricing, not P&C fraud investigation

Two caveats keep this honest. The NAIC AI Model Bulletin was adopted in December 2023 and, per Quarles & Brady, 24 states had adopted it as of March 2025, generally without material change. And Colorado's SB21-169 and Regulation 10-1-1 - often cited as the template for insurer AI governance - actually govern unfair discrimination in underwriting and pricing, not fraud investigation. They are worth reading as the direction regulators are heading, not as a rule that classifies a fraud-investigation tool.

NIST AI RMF as the review structure compliance officers already trust

The cleanest framework to organize a review is one regulators recognize: the NIST AI Risk Management Framework (AI RMF 1.0), released January 26, 2023. It has four functions - Govern, Map, Measure, and Manage - and they map neatly onto what a compliance officer has to verify before approving a deployment. Using NIST as the spine turns a vague "is this AI safe" review into four answerable questions.

NIST AI RMF function	What it asks	What the compliance officer verifies
Govern	Is there a documented program, ownership, and oversight?	A written AI governance policy, named accountable owners, and human sign-off built into the workflow
Map	Is the context and impact understood?	Where the tool sits (downstream of detection), what decisions it touches, and which regulations apply to those decisions
Measure	Is performance tested and documented?	Validation and retesting evidence, error-rate documentation, and an evidence trail for each finding
Manage	Are risks prioritized and responded to?	Escalation paths, the human review step before any acted-on decision, and vendor audit rights

The Govern function is the cross-cutting one - NIST describes it as the function that enables the other three. For a fraud-investigation tool that translates into a simple test: can you show a written program, an accountable owner, and a human in the decision loop. If a vendor cannot help you populate those four rows, that is the finding.

The documented-decision requirement: where most AI fails the audit

This is the section that decides most deployments. California 10 CCR 2698.36 requires a written investigation summary that states the basis for the findings, per the California SIU regulations. NAIC Model Act 900 prohibits refusing a claim without a reasonable investigation. The NAIC AI bulletin tells insurers a department may request records about an AI system during an examination. Put together, the standard is not "the AI was confident." It is "show the documented basis, on demand."

That is exactly where a confidence score or a black-box recommendation fails. A number with no reconstructable reasoning behind it cannot satisfy a written-basis requirement, and it cannot survive an examiner asking why a claim was denied. The audit trail is the decision itself, written in a form a regulator can read.

Two case files, six months later: only one can still show its work. The audit trail is the compliance deliverable.

This is why Hesper is built audit-trail-native: every one of the 15+ investigation phases run on a flagged claim is logged with its sources, its reasoning, and a timestamp, so the finding can be reproduced and defended later. That design is the regulatory expression of the defensibility standard for fraud investigation AI, and the artifact it produces is the kind a DOI examiner can pull - covered in the audit-ready fraud report. The compliance test is whether the documented basis exists for every finding, not whether the model scored high.

Human-in-the-loop is a compliance feature

Every regime in the map converges on one rule: AI must not be the sole decision-maker. The NAIC bulletin expects human oversight and approval. The EU AI Act requires human oversight for high-risk uses. Model Act 900's reasonable-investigation standard assumes a person stands behind the determination. A vendor that pitches fully autonomous adjudication is selling you a compliance problem.

The honest framing is that the AI runs the investigation and a human SIU lead makes the call. The investigator's role shifts from execution to decision-making: instead of spending 14+ days assembling evidence, they review a complete, documented investigation and decide. That shift is what makes the model both faster and more defensible - the throughput comes from automating the assembly, and the defensibility comes from keeping judgment with a named person.

Why the human step also protects against false positives

Rules-based detection runs a false-positive rate in the 60-85% range, so most raw flags are not fraud. Acting on a score without an investigation is the bad-faith risk. A human reviewing a full investigation before any adverse action is both the regulatory requirement and the practical control that keeps a carrier from denying legitimate claims on a noisy signal.

Third-party and vendor governance: you stay responsible

When the AI is a vendor system, the compliance duty does not transfer to the vendor. The NAIC AI bulletin asks insurers to perform due diligence on third-party AI and to secure audit rights. New York 11 NYCRR 86 is explicit that an insurer remains primarily responsible for SIU work it contracts out. Colorado's Regulation 10-1-1 likewise puts vendor oversight on the insurer. So the contract has to carry the governance: documented diligence, audit rights, and clear data-handling terms.

That makes data governance and security part of the compliance review, not a separate IT concern - the overlap with SOC 2 controls is covered in SOC 2 compliance for AI fraud investigation. The practical artifacts to ask for: the vendor's data-handling and retention terms, audit rights written into the agreement, and evidence of the system's testing and validation. None of this is exotic; it is the same diligence file the NAIC bulletin describes, applied to one specific deployment.

The 7-question vendor verification checklist

This is the checklist to put in front of any AI investigation vendor before sign-off. Each question ties to a regulation in the map, names what good looks like, and shows how Hesper answers it. It is written to be used verbatim in an RFP or a vendor review.

#	Verify before approval	Regulatory basis	How Hesper maps
1	Can you produce the documented basis for every finding on demand?	CA 10 CCR 2698.36; NAIC exam records	Audit-trail-native: every phase logged with sources, reasoning, timestamps
2	Is there a human review and sign-off before any acted-on decision?	NAIC bulletin; EU AI Act; Model 900	The SIU lead makes the call; the AI is not the adjudicator
3	Is there a written AI governance program with testing and validation docs?	NAIC bulletin; CO Reg 10-1-1	Model documentation and evaluation evidence provided on request
4	Are audit rights and vendor diligence written into the contract?	NAIC bulletin; NY 86.6; CO Reg 10-1-1	Audit rights, data-handling terms, and security posture in the agreement
5	How does the output appear in our antifraud plan and a state SIU exam?	Model 680 (48 states); CA 2698.30-.43; NY 86.6	Audit-ready report attaches to the claim file in the claims system
6	Does anyone act on a raw score without an investigation?	Model 900 (reasonable investigation)	Hesper investigates the flag end-to-end; coverage moves from ~25% to 100%
7	If we operate in the EU, what is the AI Act scope and conformity path?	EU AI Act Annex III / Art 6 (Aug 2, 2026)	Annex III targets life/health pricing, not P&C fraud investigation; logging and human-oversight maintained regardless

A useful tell when running this checklist: a detection vendor answers questions 1 through 5 awkwardly, because a score was never meant to carry a documented investigation, while an investigation system answers them directly. That is not a knock on detection - it is the point. Detection raises the flag; investigation resolves it. Hesper is complementary to FRISS, Shift Technology, and Verisk, not a replacement, and it sits on the downstream side where the documented-decision duty lives.

Key takeaways

The compliance officer is the gatekeeper for AI claims investigation: the test is whether the output can be defended to a regulator, which is a higher bar than whether it works.
Done right, an AI investigation produces a stronger, more reproducible audit trail than the manual process it replaces - that is the reframe that should drive the review.
Six US regimes apply (NAIC AI bulletin, CA 10 CCR 2698.36, NY 11 NYCRR 86, NAIC Model Acts 680 and 900, plus the EU AI Act in Europe); most govern how decisions are documented, and the AI inherits those duties.
Every regime converges on one rule - AI must not be the sole decision-maker - so human-in-the-loop sign-off is a compliance feature, and the 60-85% false-positive rate of raw detection is why acting on a score alone is the real risk.
Use the 7-question checklist: documented basis on demand, human sign-off, written governance, audit rights, antifraud-plan fit, no action on a raw score, and EU AI Act scope if relevant.

Frequently asked questions

In the US, the main regimes are the NAIC AI Model Bulletin (adopted December 2023, in 24 states as of March 2025), state SIU rules such as California 10 CCR 2698.36 and New York 11 NYCRR 86, and NAIC Model Acts 680 (insurance fraud prevention, 48 states) and 900 (unfair claims settlement practices). Most of these were written to govern how claims are investigated and how decisions are documented, and an AI system inherits those duties. Carriers operating in the EU also face the EU AI Act, whose high-risk obligations take effect August 2, 2026, though that classification targets life and health pricing rather than P&C fraud investigation. The common thread is a documented, defensible basis for every decision that gets acted on.

Yes. The NAIC AI Model Bulletin covers insurer use of AI and machine learning across the insurance lifecycle, including third-party systems, and that includes fraud investigation. It sets the expectation of a written governance program, oversight and approval, validation and retesting, and vendor due diligence with audit rights. It also advises insurers that a state department may request information about an AI system during an examination. As of March 2025, 24 states had adopted the bulletin, generally without material change. For a compliance officer, the bulletin is the backbone of the review: if you can show a documented program, human oversight, testing evidence, and vendor audit rights, you have satisfied its core expectations.

Seven things. First, that the vendor can produce the documented basis for every finding on demand (California 10 CCR 2698.36). Second, that a human reviews and signs off before any acted-on decision (NAIC bulletin, EU AI Act, Model 900). Third, that there is a written AI governance program with testing and validation documentation. Fourth, that audit rights and vendor diligence are in the contract (NAIC bulletin, NY 86.6). Fifth, how the output appears in your antifraud plan and a state SIU exam (Model 680). Sixth, that no one acts on a raw detection score without an investigation (Model 900). Seventh, the EU AI Act scope if you operate in Europe. Each maps to a specific regulation, so the checklist doubles as your documentation.

No, and it should not. Every relevant regime converges on the same rule: AI must not be the sole decision-maker. The NAIC bulletin expects human oversight and approval, the EU AI Act requires human oversight for high-risk uses, and the unfair-claims-practices standard in NAIC Model Act 900 assumes a person stands behind the determination. The defensible model is that the AI runs the investigation and a human SIU lead makes the call. The investigator's role shifts from assembling evidence over 14+ days to reviewing a complete, documented investigation and deciding. Keeping judgment with a named person is what makes the deployment both faster and defensible, and it is a compliance feature to highlight, not a limitation to hide.

The exposure comes from acting on a signal without a documented investigation. NAIC Model Act 900 prohibits refusing or denying a claim without a reasonable investigation and requires reasonable standards for prompt investigation. Because rules-based detection runs a false-positive rate in the 60-85% range, most raw flags are not fraud - so denying or delaying a claim on a score alone is precisely the bad-faith risk. The control is a real investigation plus a human review before any adverse action. An audit-trail-native system that logs every step and produces a written basis for the finding is what lets a carrier show it conducted a reasonable investigation if the decision is later challenged in litigation or an examination.

As of March 2025, 24 states had adopted the NAIC AI Model Bulletin, according to a tracking analysis by Quarles & Brady, with Wisconsin reported as the 24th adopter that month. Most states adopted it with little or no material change from the NAIC model text. The bulletin was first adopted by the NAIC in December 2023. Because adoption is ongoing, a compliance officer should confirm the current status in the specific states where the carrier writes business rather than relying on a national count. The practical point is that the bulletin's expectations - written governance, oversight, testing, and vendor diligence - are now the mainstream regulatory baseline, not an outlier, so building to them is the safe default even in states that have not formally adopted it.

The NIST AI Risk Management Framework (AI RMF 1.0, released January 26, 2023) gives a compliance officer a recognized structure to organize the review around four functions: Govern, Map, Measure, and Manage. Govern is the cross-cutting function that asks whether there is a documented program, named ownership, and oversight. Map asks whether the system's context and impact are understood, including where it sits in the workflow and which regulations apply. Measure asks whether performance is tested and documented. Manage asks whether risks are prioritized and responded to, including the human review step and vendor audit rights. Using NIST as the spine turns a vague "is this AI safe" question into four answerable ones, and it aligns the internal review with a framework regulators already recognize.

Not directly. Colorado SB21-169 (signed July 6, 2021) and its implementing Regulation 10-1-1 govern the use of external consumer data and algorithms to prevent unfair discrimination in insurance underwriting and pricing - the life-insurance regulation took effect in November 2023. They do not regulate fraud-investigation AI specifically. The reason they come up is that they are widely treated as the template for how regulators expect insurers to govern AI generally: a board-overseen governance framework, documented testing, model-drift monitoring, vendor oversight, and an annual compliance report. A compliance officer reviewing a fraud-investigation deployment should read Colorado as the direction of travel for insurer AI governance, and build the same kind of documented program, without claiming the rule classifies the fraud tool itself.