Hesper AI
2026 Industry Report

The State of Insurance Fraud Detection in 2026

Forty pages of benchmarks, architecture analysis, and ROI math on how US P&C carriers are closing the fraud investigation gap with autonomous AI agents - and what the new operating standards look like.

$308B
Annual US insurance fraud losses
Coalition Against Insurance Fraud
10%
Of P&C claims involve fraud
CAIF, industry estimates
25%
Of flagged claims fully investigated
Operational benchmarks
14+
Days per manual SIU investigation
Operational benchmarks
200+
Cases per investigator caseload
P&C carrier SIU norms
60-85%
False positive rate in rules-based detection
SIU confirmation rates
Read online →
Published April 2026 · Hesper AI Threat Researchgethesperai.com/reports

Contents

  1. Executive summary01
  2. Part 1: The scale of the problem02
  3. Part 2: Why detection alone isn't working03
  4. Part 3: The SIU capacity crisis04
  5. Part 4: The autonomous AI shift05
  6. Part 5: New benchmarks for 202606
  7. Part 6: The economics and ROI math07
  8. Part 7: The buyer's evaluation framework08
  9. Methodology and sources09
  10. About Hesper AI10
00 /

Executive summary

The US insurance industry loses an estimated $308 billion per year to fraud - a figure that has grown alongside claim volumes, AI-generated documents, and rising organized fraud ring activity. The industry's response has been substantial: every major carrier runs a detection platform (FRISS, Shift Technology, Verisk, or internal rules engines) and maintains a Special Investigations Unit. Yet approximately 75% of claims those systems flag as suspicious are never fully investigated.

The shortfall is not about intent or resourcing in the conventional sense. Carriers are not underfunding SIU. The problem is architectural: rules-based detection generates alerts at a rate that exceeds the capacity of manual investigation by an order of magnitude. The result is a structural backlog that grows every quarter.

This report documents what autonomous AI investigation agents are changing about that equation - how they shift throughput from ~10 investigations per investigator per month to 800+, how they close the coverage gap from 25% to 100% of flagged claims, and what the new buyer evaluation framework looks like for enterprise carriers selecting technology in this category.

Key findings are summarized below. The full analysis, with citations and operating benchmarks by carrier size and line of business, follows in Parts 1 through 7.

Report at a glance

  • Detection precision is capped. Rules-based fraud platforms run at 60-85% false positive rates by design - they are optimised for recall, not precision.
  • Capacity, not detection, is the binding constraint. Manual SIU investigations take 14+ days per case. At typical investigator caseloads of 200+, this limits throughput to ~10 investigations per investigator per month.
  • Only 25% of flagged claims receive full investigation. The remaining 75% close with abbreviated review or no investigation.
  • Autonomous AI investigation agents close the gap. Investigation time compresses from 14+ days to 2-4 hours; throughput rises to 800+ cases per investigator per month; coverage approaches 100% of flagged claims.
  • Cost per investigation drops ~94%. From ~$2,500 per manual case to ~$150 per AI-augmented case on a fully loaded basis.
  • Evaluation requires a new framework. Signal density, evidence citations, deployment timeline, regulatory posture, and outcome-based metrics matter more than alert-quality metrics in the new category.
01 /

The scale of the problem

Insurance fraud is a large, stable category that is becoming less stable. The headline number - approximately $308 billion in annual US losses per the Coalition Against Insurance Fraud - captures only the detected portion. The uncaptured portion is growing as AI-generated fraud tools commoditize at scale.

Annual losses by line of business

Fraud distribution across lines is not uniform. Insurance is the single largest category of document fraud globally, with property, auto, and workers' compensation accounting for the majority of P&C fraud losses. Medical / health insurance fraud is larger still when measured separately, at an estimated $300 billion annually.

Estimated US insurance fraud losses by sector ($B/year)

Health / medical~$300B
P&C (all lines)~$308B
Life insurance~$75B
Property claims~$45B
Auto claims~$29B
Workers' comp~$7.2B

Source: Coalition Against Insurance Fraud; NICB; industry estimates aggregated by Hesper AI Threat Research, April 2026.

The AI-generated fraud surge

The fastest-growing category of claim fraud in 2026 is AI-generated: synthetic documents, deepfake images, fabricated medical records produced with commodity generative AI tools. Deepfake-involved insurance claims are up approximately 2,137% over the last three years per Hesper internal threat research. Generative tools that previously required a graphic designer to produce a convincing fake document now require a smartphone and a reference template.

This shift has two consequences for carriers. First, the detection tools built around the prior threat model - OCR validation, rule-based signatures, manual visual inspection - miss the new failure mode. Second, the volume of fake-document production has outpaced the capacity of traditional forensic review, making document-level pixel forensics a floor requirement rather than a specialty capability.

The detected vs. uncaptured gap

Detected fraud is a small fraction of total fraud. Industry estimates put the detection rate at 22-41% depending on line of business. In practice, most carriers know their detection rate is lower than they would like but are constrained by investigation capacity - detection without follow-through is an accounting entry, not a recovery.

For the purposes of this report, the uncaptured portion is not the interesting question. The interesting question is: of the claims that are flagged as suspicious, how many actually receive investigation? That is the coverage gap that Part 3 quantifies.

02 /

Why detection alone isn't working

Rules-based fraud detection platforms are effective at what they were designed to do - surface suspicious claims at scale. They are not effective at what the industry implicitly expects them to do - resolve cases. Understanding the architectural difference between detection and investigation is the prerequisite for every technology decision in this category.

How rules-based detection works

A fraud scoring platform ingests a claim and evaluates it against a library of red-flag rules - late reporting, high loss amount, recent policy inception, prior claim history, network overlap with known fraudsters, provider pattern matches. Each rule contributes a weighted signal. Signals are combined into a risk score, typically on a 0-100 scale. Claims above a configured threshold generate alerts that route to the SIU queue.

This pipeline is fast, interpretable, auditable, and regulator-friendly. It is also the only architecture most carriers have for fraud detection at scale. FRISS, Shift Technology, Verisk, and ISO ClaimSearch all implement variations of this model, with varying levels of sophistication in the network-analysis layer and the rule-tuning interface.

The structural false positive rate

The consequence of rules-based design is a high false positive rate. A detection platform evaluating ~30 signals per claim cannot reliably distinguish "unusual" from "fraudulent." Many legitimate claims share characteristics with fraudulent ones - late reporting, high loss amount, recent policy inception, prior claims history. The signals are correlated with fraud but not diagnostic of it.

Most SIU teams we have spoken to report confirmation rates of 15-40% on referred cases, meaning 60-85% of alerts do not result in confirmed fraud when fully investigated. This is not a defect in any specific vendor's product. It is a ceiling imposed by the category: signal density is limited, the scoring model optimizes for recall, and the distinguishing evidence lives in documents, statements, and public records that the detection platform does not examine.

The detection system is designed to find every suspicious claim. The SIU is designed to investigate some of them. The gap between those two designs is the capacity crisis.

Signal density comparison

CategorySignals per claimOutputIntent
Detection (rules-based)~30Risk score + alertTriage
Detection + analytics (FRISS, Shift)30-50Risk score + explainabilityPrioritization
Autonomous AI investigation200+Investigation report + evidenceResolution

The signal-density delta (6-7x) is the mechanical difference between "finding suspicious claims" and "investigating them." Autonomous investigation adds document forensics, medical record analysis, OSINT, statement cross-referencing, and timeline reconstruction to the baseline detection signal set.

03 /

The SIU capacity crisis

Every US state requires insurance carriers to maintain a Special Investigations Unit. Every carrier does. Yet the aggregate capacity of these units is structurally unable to keep pace with the detection systems that feed them. The numbers behind this imbalance are stable across carrier tiers and lines of business.

The per-investigator math

MetricIndustry benchmarkSource
Investigator caseload (active)200+ casesP&C carrier SIU norms
Investigations closed / month~10CAIF surveys, carrier data
Avg time per investigation14+ days (60+ for complex)Operational benchmarks
Cost per investigation~$2,500 (fully loaded)Investigator time + vendors
Time on analysis / decision~12% of investigator workloadInternal time studies
Time on evidence gathering~35%Internal time studies
Time on report writing~25%Internal time studies
Time on admin / coordination~28%Internal time studies

The coverage gap

Apply these numbers to a mid-size regional carrier: 10 SIU investigators, 2,000 flagged claims per month. Manual capacity is 10 investigators × ~10 investigations/month = 100 investigations. The remaining 1,900 flagged claims per month (~95% of detection volume) either receive abbreviated review or no review at all. Annualized: ~22,800 flagged claims per year that the detection system correctly surfaced but the investigation workflow could not absorb.

Monthly investigation capacity vs. flagged volume (mid-size regional carrier, illustrative)

Flagged claims / month2,000
Manual investigation capacity~100
Closed without full investigation~1,900

Top-20 carriers face the same ratio at larger absolute scale. A carrier processing 1 million claims per year at a 10% flag rate generates 100,000 referrals. With 50 investigators at ~10 investigations per month per investigator, annual capacity is ~6,000 investigations - 6% of flagged volume. The structural gap is the same; the absolute numbers are higher.

Why more investigators doesn't scale

The obvious response to a capacity gap is more investigators. Three factors make this difficult. First, qualified SIU investigators are a constrained labor market with meaningful training and certification barriers. Second, the fully loaded cost of an SIU hire ($100-200K annually) scales poorly against the value per investigation. Third, investigator productivity is rate-limited by the workflow, not by hours-in-the-day. An investigator working 200 active cases is constrained by context-switching and evidence-gathering latency, not calendar hours.

This is why the capacity constraint has been stable for a decade despite persistent carrier efforts to expand SIU. The workflow imposes a per-investigator throughput ceiling that adding headcount cannot meaningfully raise.

04 /

The autonomous AI shift

Autonomous AI investigation agents address the workflow ceiling directly. Rather than adding another layer of scoring or an additional alert source to the SIU queue, they automate the evidence-gathering, analysis, and report-generation stages of the investigation itself - the work that consumes approximately 88% of an SIU investigator's time under manual workflows.

Architecture: 15+ investigation phases in parallel

A properly designed autonomous investigation agent decomposes a claim into 15 or more investigation phases and runs them concurrently against the relevant data sources. For a flagged auto injury claim, this includes:

  • Document forensics - pixel-level analysis of claim documents, medical records, repair estimates, and photographs
  • Medical record analysis - reviewing treatment against injury mechanism, flagging inconsistencies, identifying upcoding patterns
  • Database cross-referencing - NICB, ISO ClaimSearch, state DMV, NMVTIS, prior claims history
  • Public records and OSINT - court records, social media, business filings, property records
  • Statement analysis - recorded statements and EUOs cross-referenced against submitted documentation
  • Financial analysis - loss calculation, billing pattern analysis, motive indicators
  • Timeline reconstruction - chronological narrative from all sources
  • Network analysis - relationships between claimants, providers, attorneys, and known fraud rings
  • Report generation - structured investigation report with citations and confidence scores

The output is not a risk score or an alert. The output is an investigation-ready report that an SIU investigator can review, adjust, and sign off on. The investigator's role shifts from performing the investigation to reviewing the findings - from execution to decision-making.

What stays human

Three parts of the workflow remain with human investigators: the fraud determination itself (required under state DOI rules and the NAIC model SIU regulation), the SAR filing and regulatory reporting (a human-signed filing), and testimony in any contested claim denial or criminal prosecution. AI investigation agents produce the evidence package and the recommendation; the investigator decides and represents.

This is not a technical limit but a regulatory and operational feature. Autonomous claim denial is non-compliant under existing state DOI rules, and vendors who describe fully autonomous fraud decisioning are either marketing poorly or operating outside compliance.

Before and after, stage by stage

Investigation stageManual durationAutonomous AI durationCompression
Referral & triage2-4 hoursMinutes~95%
Case planning2-4 hoursMinutes (auto-generated plan)~95%
Evidence gathering5-15 days2-4 hours (parallel)~95%
Analysis & findings1-3 daysIncluded in evidence stage~98%
Report generation4-8 hoursAuto-generated + 30-60 min review~90%
Resolution & SAR4-8 hoursUnchanged (human decision)0%
05 /

New benchmarks for 2026

The 2020-2025 SIU benchmarks were stable because the underlying workflow was stable. Autonomous AI investigation fundamentally changes the workflow, which means the benchmarks change. SIU leaders rebuilding capacity plans for 2026 and beyond need new numbers.

Metric2020-2025 benchmark2026 with AI investigationChange
Investigator caseload200 active200+ active (review-oriented)Unchanged
Cases closed / month~10800+~80x
Investigation coverage25% of flagged100% of flagged4x
Time per investigation14+ days2-4 hours~95% faster
Confirmed fraud / 100 flagged~14~725x (via coverage)
Cost per investigation~$2,500~$150~94% lower
Recovery rate3-8x cost5-12x costModest lift
Investigator time on analysis~12% of workload~80% of workload~7x

The throughput and cost changes are the headline. The confirmed-fraud-rate change is more subtle: AI investigation does not make detection more precise, but it removes the capacity ceiling that previously limited investigation coverage. Cases that were closed without investigation under manual workflows now receive investigation, and a meaningful minority of those turn out to involve actual fraud. The base of confirmed fraud widens as a direct consequence of closing the coverage gap.

Benchmarks by carrier size

Carrier sizeAnnual flagged volumeManual capacityAI-augmented capacity
Top-20 P&C50K-200K+5K-15K investigations50K-200K+ (full coverage)
Mid-size regional5K-20K500-2K5K-20K (full coverage)
Small / specialty500-5K50-300500-5K (full coverage)

Absolute numbers scale with carrier size; ratios are stable. Smaller carriers face a disproportionately steep economic hurdle to manual SIU expansion, which is why the AI-augmented benchmark has outsized impact at the mid-market and specialty lines.

06 /

The economics and ROI math

The economic case for autonomous AI investigation is driven by three compounding effects: direct cost reduction per investigation, claim leakage recovery from closing the coverage gap, and the downstream effect of rising detection precision as completed investigations feed back into rule tuning.

Direct cost savings

Per-investigation cost drops from approximately $2,500 (manual, fully loaded) to approximately $150 (AI-augmented). For a mid-size carrier processing 2,000 flagged claims per month:

  • Manual investigation cost (at 25% coverage): 500 investigations × $2,500 = $1.25M / month = $15M / year
  • AI investigation cost (at 100% coverage): 2,000 investigations × $150 = $300K / month = $3.6M / year
  • Direct cost differential: ~$11.4M / year in favor of AI investigation, at 4x investigation volume

Claim leakage recovery

The larger economic effect is claim leakage recovery from closing the coverage gap. Flagged claims that close without investigation under manual workflows include a meaningful fraction of actual fraud. A conservative industry estimate: 15-20% of uninvestigated flagged claims involve actual fraud, with average claim inflation of 20-40% of claim value.

Apply to the same mid-size carrier: 1,500 uninvestigated flagged claims per month × $8,000 average claim value × 17.5% fraud rate × 30% average inflation = ~$630K per month in leakage = ~$7.6M annual. Carriers recovering this leakage through expanded investigation coverage typically see payback on AI investigation deployment within the first year, often within the first quarter for mid-size and large carriers.

Leakage recovery formula

Annual leakage recovered =
(flagged claims/year × coverage gap %)
    × (fraud rate among uninvestigated)
    × (avg claim value) × (avg inflation %)

For most mid-size carriers this resolves to $5-25M per year. For top-20 carriers, $50-200M+.

07 /

The buyer's evaluation framework

Evaluation frameworks from the rules-based detection era do not transfer cleanly to autonomous investigation. Procurement teams need a new set of criteria. The framework below is organized around four categories: technical, operational, compliance, and economic.

Technical criteria

  1. Scope and output type. Ask what the system produces: a risk score or an investigation-ready report. Risk score = detection. Cited report with findings and recommendation = investigation. These are different categories.
  2. Signal density per case. Rules-based detection evaluates ~30 signals. Autonomous investigation should evaluate 200+. Ask for the full list of data sources and signals per case type.
  3. Evidence and citations. Every finding in the report should trace to a specific document, statement, database query, or public record. Black-box scoring without citations is a red flag - not because the underlying model is wrong, but because carriers cannot defend claim denials on unverifiable output.
  4. Integration path. The system should sit downstream of the existing detection stack (FRISS, Shift, Verisk, ISO ClaimSearch) via standard APIs, not require replacement. Integration timelines of 30-90 days are achievable with modern agents; 6-18 months suggests legacy-style deployments.

Operational criteria

  1. Investigation coverage commitment. The headline metric. Target: 95-100% of flagged claims receive full investigation within 24 hours of flagging.
  2. Throughput per investigator. Target: 800+ cases per investigator per month in a review-oriented role.
  3. Deployment timeline. 30-90 days is the modern standard. Ask for reference deployments with specific milestone schedules.
  4. Report quality and format. Audit-ready output means structured sections, full citations, timeline reconstruction, and a denial/payment recommendation - not a data dump for the investigator to rewrite.

Compliance criteria

  1. Data retention and privacy. Zero-retention architectures are the enterprise default. PHI and PII handling must be documented and auditable. SOC 2 Type II certification (or documented pursuit) is a minimum.
  2. Regulatory posture. State DOI rules and the NAIC model SIU regulation require human decision-making on fraud determinations. Any vendor with a workflow that includes autonomous claim denial is non-compliant.
  3. Audit trail. If a regulator or litigant requests the investigation record, the carrier must be able to produce the full chain - signals evaluated, sources queried, findings surfaced, decisions made.

Economic criterion

  1. Measurement. Detection metrics (precision, recall, alert volume) do not measure investigation value. Investigation metrics: confirmed fraud per flagged case, investigation coverage rate, average time-to-close, claim leakage reduction, recovery rate, SAR filing accuracy.

Red flags in vendor claims

  • "AI-powered" without a clear answer to "what does it do?" → probably detection in a new wrapper
  • Unwillingness to show a sample investigation report on a similar claim
  • "Replaces your SIU team" claims → not regulator-compliant
  • Deployment timelines of 6+ months for what should be data-in / report-out
  • Data retention beyond the active investigation without documented reason
  • Inability to provide reference deployments with measurable coverage and throughput outcomes
08 /

Methodology and sources

This report is based on a combination of public industry data and Hesper AI's internal threat research. Operational benchmarks are drawn from conversations with SIU directors, claims executives, and procurement leaders at US P&C carriers across the top-20, mid-size regional, and specialty segments. External statistics are attributed to their primary sources throughout.

Primary sources

  • Coalition Against Insurance Fraud - annual fraud loss estimates (insurancefraud.org)
  • National Insurance Crime Bureau (NICB) - claim pattern data and fraud ring statistics
  • National Association of Insurance Commissioners (NAIC) - regulatory framework and SIU model regulation
  • Association of Certified Fraud Examiners (ACFE) - Report to the Nations and fraud taxonomy
  • Insurance Information Institute (III) - industry-level statistics and trend analysis
  • State Departments of Insurance - SIU filing data where publicly available
  • Hesper AI Threat Research - internal operational benchmarks from deployed AI investigation agents, Q1 2026

Terms and definitions

Throughout this report, "fraud detection" refers to automated systems that flag suspicious claims for SIU review. "Fraud investigation" refers to the downstream process of confirming whether fraud occurred, gathering evidence, and producing a report that supports a claim decision. "Autonomous AI investigation" refers to systems that automate the evidence-gathering, analysis, and report-generation stages of investigation while preserving human decision authority on the final fraud determination.

Limitations

Operational benchmarks vary by carrier size, line of business, and claim mix. The numbers in this report are central tendencies and should be adjusted for specific carrier context during planning. ROI calculations in Part 6 use illustrative carrier profiles; actual carrier economics depend on flagged volume, average claim value, fraud rate, and current SIU cost structure.

09 /

About Hesper AI

Hesper AI builds autonomous claims investigation agents for insurance companies. Our platform sits downstream of detection systems (FRISS, Shift Technology, Verisk, or internal rules engines) and runs 15+ investigation phases in parallel on every flagged claim - document forensics, medical record analysis, OSINT, database cross-referencing, statement analysis, timeline reconstruction, network analysis, and report generation - producing investigation-ready output in 2-4 hours per case.

We are based in the US, founded in 2024, and support all major P&C lines of business plus specialty, life-and-health, and consumer lines. Our deployment model is API-integrated with existing claims management systems and does not require replacement of current detection or claims stacks.

Ready to see Hesper on your claims?

Bring a sample of your flagged claims from any line of business. We'll run them through the full pipeline and show you the investigation reports.

Request a demo →Explore use cases

© 2026 Hesper AI. Published April 19, 2026. All data sources cited inline. Redistribution permitted with attribution.