The defensibility standard for fraud investigation AI: what "AI you can defend" actually requires

$308.6B

Annual US insurance fraud loss

Coalition Against Insurance Fraud via III, 2022

$200M

Bad-faith verdict, automated denial

Sierra Health & Life, 2022, $160M punitive

78% vs 4%

P&C insurers using vs scaling gen AI

Bain 2025 Claims Maturity Assessment, 81 insurers

14+ days → 2-4 hrs

Investigation time per case

Manual SIU vs Hesper, internal benchmark

"Defensible AI" is becoming a contested phrase in insurance, and a document-review vendor is currently the loudest voice defining it - around source-linked medical summaries that hold up in litigation. That bar is correct for document review and too low for fraud investigation. A defensible document summary needs observations linked to the underlying record. A defensible fraud finding needs that plus a complete evidence chain, a reconstructable decision trail, and explicit support for the denial or referral the carrier acts on.

The reason the gap matters is the downstream. A document summary is an internal artifact; a fraud finding becomes a denial, an SIU or SAR referral, or a paid claim - each of which is litigable and regulator-reviewable in a way a chronology is not. This post defines a three-tier defensibility standard, separating document review, flag-only detection, and full investigation, and argues that only an audit-trail-native, end-to-end investigation clears the highest bar.

It is written for Marcus, the SIU Director whose top fear is an AI conclusion he cannot defend in a deposition, an examination under oath, or a SAR filing, and for Lin, the Compliance Officer who owns the antifraud-plan filing and asks whether the output satisfies California 10 CCR 2698.36 and NAIC Model Act #680. For where this sits in the broader category, start with the guide to autonomous AI claims investigation.

"Defensible AI" is being defined by document vendors - and the bar is too low for fraud

The phrase doing the most work in insurance AI marketing right now is "defensible." Document-review vendors have built the cleanest version of it: AI whose output withstands legal scrutiny because every observation connects directly to the underlying record, the methodology is consistent across files, and the report is audit-ready. Wisedocs, a medical-record summarization vendor, has put this framing in front of the market through a sponsored piece in Claims Journal and a talk at the CLM Annual Conference. Their claimed gains are real for what they do: up to an 80% reduction in document-review time, roughly a 150% capacity increase, and chronology turnaround compressed from 14 days to 2.

Those are document-review numbers, and the bar behind them is the right bar for document review. The question a medical chronology has to answer is narrow: does this observation trace to a page in the record. Source-linked, consistent, audit-ready - that clears it. The problem is that "defensible AI in claims" is being treated as if it ends there, when document review is one phase of a fraud investigation, not the whole of it. From fraud detection to fraud resolution is the distinction the market is collapsing.

Look at where AI in claims actually sits today. Bain's 2025 Claims Maturity Assessment of 81 P&C insurers worldwide found that 78% use generative AI in some capacity, but only 4% have scaled it, and the common live use cases are summarizing lengthy documents, supporting customer communications, and detecting potential fraud, per Bain & Company. Summarization and detection. The layer that turns a summary and a flag into a resolved, defensible finding is the one almost no carrier has scaled - and it is the layer where the defensibility question is hardest, because that is where the carrier takes an action a court can review.

The distinction the phrase hides

A source-linked summary makes the document defensible. It does not make the fraud finding defensible. The summary organizes what the documents say; it does not investigate the claim, weigh the non-document signals, or support the decision the carrier acts on. Document review is one of the 15+ phases a full investigation runs. The defensible artifact at the investigation layer is the finding plus the chain behind it, not the chronology.

Why fraud investigation carries categorically higher stakes than document review

The reason the investigation bar is higher is the consequence of being wrong. A flawed document summary produces a re-read. A flawed fraud finding produces a denial that gets litigated, an SIU or SAR referral that names a person, or a paid claim that should not have been paid. Each of those is an action the carrier takes against a policyholder or a regulator looks at, and each is exactly the surface on which bad-faith and unfair-claims-practices exposure lives.

Start with the size of the pool the action sits on top of. Insurance fraud costs the US an estimated $308.6 billion a year, with $45 billion in property and casualty and $34 billion in workers compensation, per the Insurance Information Institute citing the Coalition Against Insurance Fraud. Roughly 10% of P&C claims involve some element of fraud. That is the input volume that has to clear the defensibility bar at the point of decision, not merely at the point of detection. The flag tells you to look; the finding is what you act on, and the finding is what gets tested.

The clearest illustration of the downside is an automated decision made without a defensible, individualized investigation trail. In one widely reported case, a jury returned a $200 million verdict against Sierra Health & Life - $40,000 in compensatory damages and $160 million in punitive - over an automated process that denied a treatment claim without considering the insurer's duty of good faith, per Matt Sharp Law. The case was a health insurer, but the principle is line-agnostic: an action taken without a reconstructable, individualized basis is what converts a routine denial into catastrophic exposure.

The systematic version of that principle is the Unfair Claims Settlement Practices Act, NAIC Model #900, adopted in some form across the states. It makes it an unfair practice to refuse to pay a claim without conducting a reasonable investigation, and to fail to adopt and implement reasonable standards for the prompt investigation of claims. A denial is not defensible because a document was summarized accurately. It is defensible because a reasonable investigation supported it and the carrier can reconstruct what that investigation found. That is the bar a fraud finding has to clear, and it is a different bar from the one a chronology clears.

The three-tier defensibility standard

Defensibility is not one bar; it is three. The amount of evidence, traceability, and decision support required rises as you move from organizing documents, to flagging suspicion, to resolving a claim. Treating all three as the same standard is how a tool that clears the document bar gets sold as if it clears the investigation bar.

Tier 1: Document review

The job is to organize a record and link each observation back to its source page. Document-review vendors do this well, and a source-linked, consistently formatted chronology is the right defensible artifact for the task. The same source-linking logic shows up in adjacent fintech document tools - Ocrolus and similar parsers in lending KYC - but the bar is identical: trace the observation to the record. What this tier does not produce is an investigation. It does not reach the signals that live outside the documents, and it does not support the action the carrier takes.

Tier 2: Flag-only detection

Detection vendors - FRISS, Shift Technology, Verisk - score a claim and surface it for review. That is genuinely useful and Hesper sits downstream of all three; detection is upstream, investigation is downstream. But a score is not a defensible finding. Rules-based detection runs a 60-85% false-positive rate, so the flag is noisy by construction: it tells an investigator to look, not what is true. The model features behind the score are rarely reconstructable as a per-decision trail, and the vendor hands the flag to a human rather than supporting the action taken. Detection defensibility means "we surfaced this for review." It is a real claim, and a lower one than the finding requires.

Tier 3: Full investigation

The job is to take the flag and produce the finding the carrier acts on. That requires evidence across every signal type, a reconstructable decision trail, and explicit support for the denial or referral. Hesper runs 15+ investigation phases in parallel on every flagged claim - document forensics, OSINT, statement cross-reference, timeline reconstruction, financial-pattern analysis - and logs each decision with its sources, reasoning, and timestamps, lifting coverage from the roughly 25% a manual team reaches to 100% of flagged claims. This is the only tier whose output is the artifact a carrier defends in a deposition or an examination. The table below maps the three tiers against the requirements that separate them.

Defensibility requirement	Document review	Flag-only detection	Full investigation (Hesper)
Output type	Source-linked summary / chronology	Risk score or flag	Audit-ready finding + recommendation
Observations linked to source record	Yes	Partial (model features)	Yes, across all signal types
Evidence chain beyond documents (OSINT, statements, timeline, financial pattern)	No	No	Yes (15+ phases)
Reconstructable decision trail (sources + reasoning + timestamps)	For the summary	No (score is opaque; 60-85% false positives)	Yes, logged per decision
Supports the action taken (denial / SIU / SAR referral)	No	No - hands off to human	Yes
Coverage of flagged claims	n/a (per-doc)	Flags all; investigates none	100% investigated
Meets NAIC AI Bulletin traceability / auditability for the decision	For document outputs	Gap (opaque scoring)	Yes

Source-linked summaries make the document defensible. They do not make the fraud finding defensible. The defensible artifact at the investigation layer is the finding plus the chain behind it - the evidence across every signal, the decision trail, and the support for the action taken.
Hesper AI product research

What an audit-ready finding actually requires: evidence chain, decision trail, action support

Three components separate a defensible finding from a defensible document. Miss any one and the finding is contestable, regardless of how well-sourced the underlying summary is.

The first is a complete evidence chain across every signal the case touched. A claim is not only its documents. It is the recorded statements that may contradict the documents, the OSINT that places a claimant somewhere the claim says they were not, the timeline that does or does not hold together, and the financial patterns that connect this claim to others. A document summary covers one of those. A defensible finding has to assemble all of them into a single chain, because the question on examination is not "what did the medical records say" but "what did the investigation find, across everything available."

The second is a reconstructable decision trail. For each conclusion the investigation reaches, there has to be a record of the sources it drew on, the reasoning that connected them, and the timestamp at which it happened. This is what lets an SIU Director sit in a deposition and reconstruct, step by step, how the finding was reached - and what lets a compliance officer hand a state DOI examiner a chain they can read. Hesper logs every decision the agent makes with sources, reasoning, and timestamps for exactly this reason. The how-to companion to this is our walkthrough on how to generate an audit-ready fraud investigation report in under an hour.

The third is explicit support for the action the carrier takes. A defensible finding does not stop at organizing information; it supports the denial, the payment, or the SIU/SAR referral with the reasoning behind it. This is the component detection and document tools both leave to the human, and it is the one the Unfair Claims Settlement Practices Act puts the most weight on. The action is what gets litigated, so the action is what has to be supported.

Those three components are also what makes the output map cleanly to the carrier's compliance obligations. An evidence chain plus a decision trail plus action support is, in regulatory terms, a documented decision - which is what California 10 CCR 2698.36 requires of an SIU investigation and what an antifraud plan filed under NAIC Model Act #680 has to describe. The defensibility standard and the compliance standard are the same standard viewed from two angles.

What the regulator already expects

The defensibility bar is not a vendor invention. It is already written into the regulatory frame, and AI does not lower it - AI has to meet it. The NAIC adopted the Model Bulletin on the Use of Artificial Intelligence Systems by Insurers on December 4, 2023, and roughly two dozen states have adopted it since. It explicitly contemplates AI deployed across "claim management" and "fraud detection," not only underwriting, which puts AI fraud decisions squarely inside the regulator's field of view.

The bulletin names the risks directly. AI "can present unique risks to consumers, including the potential for inaccuracy, unfair discrimination, data vulnerability, and lack of transparency and explainability." It asks insurers to assess models for "interpretability, repeatability, robustness, regular tuning, reproducibility, traceability, model drift, and the auditability of these measurements where appropriate." Traceability and auditability are the regulator's own words. A black-box score that cannot reconstruct its own reasoning is in tension with that expectation; an evidence chain logged per decision is built for it.

The bulletin also ties AI-driven actions back to the older model laws: an insurer's use of AI "must not violate" the Unfair Trade Practices Act (Model #880) or the Unfair Claims Settlement Practices Act (Model #900). And it lists the documentation a regulator may request during an investigation or examination of a specific AI system. The practical reading for a compliance officer is simple: when a state DOI pulls an AI-assisted case on audit, it will expect a documented, traceable, auditable basis for the decision. That is the same artifact the SIU Director needs for a deposition.

Defensibility therefore has a data-handling dimension as well as an evidentiary one. The audit trail has to be secure, the data residency has to be defensible, and the vendor has to sit cleanly inside the carrier's third-party-oversight obligations - which is why the security review and the defensibility review are the same conversation for Lin and Priya. We cover that side in detail in SOC 2 and data handling for AI fraud investigation.

Where each AI tier clears the decision-level defensibility bar (illustrative)

Observations traced to sourceAll tiers

Evidence chain beyond documentsInvestigation only

Reconstructable decision trailInvestigation only

Supports the action takenInvestigation only

Meets NAIC traceability for decisionInvestigation only

100% flagged-claim coverageInvestigation only

How to evaluate a vendor's defensibility claim

When a vendor says its AI is defensible, the right response is to ask which tier they mean. "Source-linked" is necessary but not sufficient at the investigation layer, and the difference is buyable in five procurement questions Marcus and Lin can put to any vendor in writing.

Does the output include a reconstructable decision trail - sources, reasoning, and timestamps for each conclusion - or only a final score or summary.
Does the evidence chain cover non-document signals (OSINT, recorded statements, timeline reconstruction, financial patterns), or only the documents fed in.
Can an investigator see what the agent did, override it, and produce a documented trail for a state DOI examination.
Does the output map to the carrier's antifraud-plan filing under NAIC Model Act #680 and to documented-decision rules such as California 10 CCR 2698.36.
Does the vendor support the action the carrier takes - the denial or the SIU/SAR referral - or does it stop at organizing information and hand off to a human.

A document-review vendor will answer the first question for the summary and the others with a no, and that is the correct answer for their layer. A detection vendor will answer most of them with a no and a hand-off, which is correct for theirs. Those are not failures; they are the boundary of what each tier does. The point of the questions is to surface the boundary, so a carrier does not buy a document tool or a detection score expecting an investigation artifact.

This also clarifies how the tools fit together. A carrier runs detection and investigation at the same time - Hesper is complementary to FRISS, Shift Technology, and Verisk, not a replacement. Detection produces the flag; investigation produces the defensible finding from it. The investigator's role shifts from executing the 14+-day manual workflow to reviewing and deciding on findings produced in 2-4 hours, which is where human judgment actually belongs. Defensibility should be a line item in any procurement, and for the full rubric it sits inside, see our 12-point checklist for evaluating AI fraud investigation vendors.

Key takeaways

A source-linked document summary clears the bar for document review, but a defensible fraud finding additionally requires an evidence chain across every signal, a reconstructable decision trail, and explicit support for the action the carrier takes.
Defensibility is three tiers, not one - document review, flag-only detection, and full investigation each require progressively more, and only full investigation produces the artifact a carrier defends in a deposition or examination.
The downstream stakes are categorically higher for investigation than for document review: a $200 million bad-faith verdict against Sierra Health & Life shows what an automated denial without an individualized investigation trail can cost.
The NAIC AI Model Bulletin, adopted December 4, 2023 and now in roughly two dozen states, already expects AI used in claim management and fraud detection to be traceable and auditable, and ties it to the Unfair Claims Settlement Practices Act.
Buyers can separate the tiers with five questions - on the decision trail, the non-document evidence chain, investigator override, the antifraud-plan mapping, and support for the action taken - and treat "source-linked" as necessary but not sufficient.

Frequently asked questions

Defensible AI fraud investigation produces a finding a carrier can stand behind in a deposition, an examination under oath, a state DOI examination, or a SAR/SIU referral. Three things are required beyond a source-linked document summary: a complete evidence chain across every signal type the case touched (documents, OSINT, recorded statements, timeline, financial patterns), a reconstructable decision trail that logs the sources, reasoning, and timestamps behind each conclusion, and explicit support for the action the carrier takes - the denial, the payment, or the referral. A document summary that links observations to records is necessary but not sufficient; it makes the document defensible, not the decision. The NAIC Model Bulletin on AI, adopted December 4, 2023, frames the same idea in regulatory terms, asking insurers for traceability and auditability of AI-driven measurements.

No. A source-linked summary clears the bar for document review, where the question is whether an observation traces to the underlying record. Fraud investigation asks a harder question: was the action - denial, referral, or payment - supported by a reasonable, documented investigation. Under the NAIC Unfair Claims Settlement Practices Act (Model #900), refusing to pay a claim without conducting a reasonable investigation is itself an unfair practice, and failing to adopt reasonable standards for prompt investigation is too. A summary does not investigate; it organizes documents. A defensible fraud finding needs the summary plus the evidence chain across non-document signals and the decision trail behind the recommendation. Document review is one of the 15+ phases a full investigation runs, not the whole of it.

The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, adopted December 4, 2023 and since adopted by roughly two dozen states, explicitly covers AI used in claim management and fraud detection, not just underwriting. It tells insurers that AI-driven decisions must comply with the Unfair Trade Practices Act and the Unfair Claims Settlement Practices Act, and it expects models to be assessed for interpretability, repeatability, robustness, traceability, and the auditability of these measurements. It also lists the documentation a regulator may request during an examination of a specific AI system. The practical takeaway: regulators already expect AI fraud decisions to be documented, traceable, and auditable - the defensibility bar is not optional, and AI does not lower it.

Detection answers which claims look suspicious; investigation answers what is actually true about this claim, and what should we do. A detection score or flag is not a defensible finding - rules-based detection systems run a 60-85% false-positive rate, so a flag tells an investigator to look, not what happened. Detection vendors such as FRISS, Shift Technology, and Verisk hand a flag to a human, who then does the 14+-day manual investigation. Defensible AI investigation takes the flag and produces the finding: the evidence chain, the decision trail, and the supported recommendation, in 2-4 hours rather than weeks. Detection is upstream; investigation is downstream. The two are complementary - a carrier runs detection and investigation together - but only the investigation layer produces the artifact a carrier defends.

The risk is bad-faith litigation and unfair-claims-practices exposure when a denial is not backed by a reasonable, documented investigation. In one widely reported case, a jury returned a $200 million verdict (including $160 million in punitive damages) against Sierra Health & Life over an automated process that denied a treatment claim without considering the duty of good faith. State law under the NAIC Unfair Claims Settlement Practices Act (Model #900) makes denial without reasonable investigation an unfair practice, and the NAIC AI Model Bulletin extends that duty to AI-driven decisions. The mitigation is not avoiding AI; it is using AI that produces a documented, auditable investigation trail for every decision. An AI that denies without a reconstructable evidence chain increases exposure; an audit-trail-native investigation reduces it.

Start with five questions. First, does the output include a reconstructable decision trail - sources, reasoning, and timestamps for each conclusion - or only a final score or summary. Second, does the evidence chain cover non-document signals (OSINT, recorded statements, timeline reconstruction, financial patterns), or only the documents fed in. Third, can an investigator see what the agent did, override it, and produce a trail for the state DOI. Fourth, does the output map to the carrier's antifraud-plan filing under NAIC Model Act #680 and to documented-decision rules such as California 10 CCR 2698.36. Fifth, does the vendor support the action taken - the denial or SIU/SAR referral - or stop at organizing information. Source-linked is necessary but not sufficient; defensibility lives in the chain behind the finding.