The CFO ROI memo: making the case for AI claims investigation

~$2,500 → ~$150

Cost per investigated case

Manual SIU vs AI, Hesper internal benchmark

~25% → 100%

Flagged-claim coverage

The loss-cost lever, manual vs AI

7-14%

Leakage as % of claims spend

EY claims quality assessments

$308B

Annual US insurance fraud

Coalition Against Insurance Fraud

A finance reviewer can sign off on AI claims investigation using three numbers and nothing about how the agent works: the cost per investigated case, the size of the coverage gap it closes, and the payback period. The mechanics of document forensics and OSINT are an engineering detail. The dollars are not. This is the memo a Claims VP or SIU Director hands up the chain for financial sign-off, and it is written so the finance reviewer can defend each line in a board setting.

The argument runs against a common instinct, which is to build the case on headcount savings. That is the weakest version, and it invites organizational resistance. The durable return sits somewhere else: most carriers fully investigate only about 25% of the claims they flag, and the other roughly 75% is where leakage concentrates. Closing that gap toward 100% is the financial event. Speed matters because it is what makes full coverage affordable, but coverage is the lever, not speed alone.

This piece covers the one-page summary, where the money leaks, the cost of inaction, the unit economics of an investigated case, how to model the business case, payback under three scenarios, build-versus-buy, and a pilot structure that funds proof rather than a leap. For the worked carrier scenarios behind the payback math, pair this with the ROI case studies for AI claims investigation, and for the loss-cost backdrop see the claims fraud and leakage pillar.

The one-page version

Hesper AI takes a flagged claim and runs the full SIU playbook end-to-end. The category is fraud resolution, not fraud detection - from fraud detection to fraud resolution is the move the memo is funding. The carrier has almost certainly already paid for detection. A finance reviewer is being asked to fund the layer that turns those flags into resolved cases.

The three numbers that carry the memo are these. First, an AI-investigated case runs about $150 versus roughly $2,500 manually, a Hesper internal benchmark. Second, coverage of flagged claims moves from about 25% to 100%, which is the line item that converts the spend into recovered and denied dollars. Third, payback follows from the gap between small incremental opex and the recoverable leakage that EY puts at 7% to 14% of total claims spend. Everything below is the supporting detail behind those three lines.

The distinction the memo turns on

The dollar leak is not slow investigations - it is the claims a carrier flags and then never fully works. Detection is upstream; investigation is downstream. A faster process on the roughly 25% of flags a team already reaches is a marginal gain. Investigating the roughly 75% it does not reach is the structural one. Frame the recovery story against the coverage gap, never against per-case speed in isolation.

Where the money actually leaks

Start with the size of the pool. Insurance fraud costs the US an estimated $308 billion a year, with $45 billion in property and casualty and $34 billion in workers compensation, per the Coalition Against Insurance Fraud. The Insurance Information Institute corroborates the same figures. Roughly 10% of property-casualty claims may involve fraud, per the National Insurance Crime Bureau. That is the industry pool. The carrier-specific leak is a subset of it, and it has a precise location.

EY's claims quality assessments put leakage at approximately 7% to 14% of a carrier's total claims spend, and one of EY's four named root causes is inadequate investigation of injury causation and liability, per EY. That root cause is the one a finance reviewer can act on directly. It maps to a capacity constraint, not a detection failure: the flags exist, but most of them are never fully investigated. For the full breakdown of where leakage hides on a claim file, see how carriers reduce claims leakage.

Here is the structural mechanism. Detection vendors flag suspicious claims, and rules-based systems do so with false-positive rates of 60-85%, so the flag pile is large and noisy. A manual SIU then investigates roughly 25% of those flags, because each case takes 14+ days and an investigator carries 200+ cases. The remaining roughly 75% are paid, denied without full work, or queued indefinitely. The leakage concentrates in that uninvestigated tail. The memo is not asking finance to fund faster detection. It is asking finance to fund the investigation capacity that converts the existing flag pile into resolved cases.

The cost of inaction

Doing nothing is not a zero-dollar line. It is the leakage that compounds on every uninvestigated flag, period after period, plus the stranded value of detection spend that produces flags no one can work. The carrier is already paying a detection vendor to surface suspicious claims. If only about 25% of those flags are ever fully investigated, the carrier is buying flags it cannot act on. That is the financial waste hiding in plain sight, and it does not appear as a line item anywhere on the income statement.

Quantify it as a formula the finance reviewer fills in, not as a Hesper-published savings number. Take the carrier's annual flagged-claim volume. Multiply by the share currently uninvestigated, which is roughly 75% on a 25% coverage baseline. Apply a recovery-and-denial rate anchored to EY's 7% to 14% leakage band. The result is the illustrative annual leak sitting in the uninvestigated tail. This is illustrative arithmetic on the carrier's own inputs, not a guaranteed outcome. Each policyholder already absorbs about $900 a year in added premium from fraud, per the Coalition Against Insurance Fraud, which is the downstream cost of the same untreated leak.

The unit economics of an investigated case

The reason full coverage was never affordable is unit cost. A manual SIU investigation runs about $2,500 per case at 14+ days of investigator attention, and one investigator completes around 10 investigations per month. At that cost and throughput, investigating 100% of flags would require either a headcount bill the loss-recovery story cannot justify or a backlog that grows faster than the team clears it. So carriers ration investigation down to the roughly 25% a human team can reach. The rationing is an economic decision forced by the per-case cost, not a quality choice.

AI investigation changes the inputs. Hesper runs 15+ investigation phases in parallel - document forensics, OSINT, statement cross-referencing, timeline reconstruction, financial-pattern analysis - on every flagged claim, so per-case attention stops being the bottleneck. That compresses a case to 2-4 hours at about $150, and lifts throughput to 800+ cases per investigator per month. The order-of-magnitude drop in per-case cost is precisely what makes investigating 100% of flags affordable instead of rationing down to 25%. Coverage and cost are the same lever viewed from two sides.

Metric	Manual SIU	AI investigation (Hesper)
Cost per investigated case	~$2,500	~$150
Cycle time per case	14+ days	2-4 hours
Throughput per investigator / month	~10 cases	800+ cases
Flagged-claim coverage	~25%	100%
Investigation phases	Sequential, one analyst	15+ in parallel

Coverage and per-case cost are not two arguments. They are one. The roughly 25% coverage ceiling exists because a $2,500, 14-day case cannot be run on every flag. Drop the case to $150 and hours, and 100% coverage becomes a budget line instead of a fantasy.
Hesper AI product research

How to model the business case

A defensible memo models incremental investigated cases times the recovery-and-denial rate against incremental opex. It never leads with FTE displacement. Three inputs carry the model. First, the incremental flagged claims you can now investigate - the gap between current coverage and 100%. Second, the recovery-and-denial rate on those incremental investigations, anchored to a documented leakage range rather than an optimistic guess. Third, the incremental cost to run them at about $150 per case.

On the return side, anchor the recovery rate to EY's 7% to 14% of claims spend rather than inventing one, and apply it only to the incremental cases, not the whole book. Add the timing benefit: shorter cycle time tightens reserving and accelerates settlement and recovery, which improves when loss costs are recognized. On the cost side, include platform fees, integration, and change management, not just per-case cost. The hidden cost lines are real, and the build-versus-buy section covers them - the deeper treatment is in the hidden integration costs of legacy claims AI. For three fully worked carrier scenarios with the math laid out, the ROI case studies are the companion to this memo.

Express the loss-ratio impact in basis points on the specific lines where fraud concentrates - workers comp and auto bodily injury - rather than at the book level. A book-level number dilutes the effect and reads as noise to a board. A line-level basis-point figure on the lines a carrier already worries about reads as a targeted intervention, which is what it is.

Payback period and sensitivity

No single payback number survives board scrutiny, so present a range. Model payback under conservative, base, and aggressive assumptions on three variables: flagged-claim volume, the recovery-and-denial rate within EY's 7% to 14% band, and how fast the pilot ramps to steady-state coverage. The mechanics are favorable in every scenario because the per-case cost drops by roughly an order of magnitude while coverage rises from about 25% to 100%, so the incremental opex stays small relative to the recoverable leakage.

Illustrative net year-1 return by recovery scenario (model on your own inputs, not a guaranteed outcome)

Conservative recovery (low end of EY band)Net positive, longer payback

Base recovery (mid EY band)Net positive within year 1

Aggressive recovery (high end of EY band)Strong net year-1, fast payback

The chart above is illustrative and meant to show shape, not a promised figure. The bars represent relative net year-1 return across the EY leakage band applied to a carrier's own flagged volume; fill in the absolute dollars with your inputs. The per-case delta of roughly $2,350 saved on every investigated case - about $2,500 manual less about $150 with AI - is a Hesper internal benchmark and the engine behind why even the conservative bar stays net positive. Present payback as a sensitivity table tied to your flagged volume, and treat any single figure as illustrative. The realization ramp matters too, which is why the SIU director's first 90 days with AI is worth reading alongside the finance model.

Build vs. buy and hidden costs

A finance reviewer will ask whether to build the investigation layer internally. The honest answer is that building it is a multi-year R&D bet, not a finance-friendly capex line. An internal team would have to build an autonomous agent that runs 15+ investigation phases in parallel, produces an audit-ready trail satisfying California 10 CCR 2698.36 and the antifraud-plan filings under NAIC Model Act 680, and integrates with Guidewire or Duck Creek. The sticker price hides services markup, customization, training, integration latency, opportunity cost, and lock-in, all detailed in the hidden integration costs of legacy claims AI.

Buying converts that uncertain capital bet into a per-case opex line that scales with flagged volume. It also keeps the spend complementary to detection tools the carrier already owns - Hesper is complementary to FRISS, Shift Technology, and Verisk, not a replacement. FRISS scores and hands off to a human investigator; Shift Technology offers handler-assist that speeds the adjuster; Verisk flags through cross-carrier data. None of them closes the downstream investigation gap. The investigation layer is the layer no other vendor occupies, which is exactly why the carrier's unrealized return sits there. For the diligence questions a finance reviewer should run before trusting any per-case figure, use the AI fraud investigation vendor checklist.

How to de-risk the spend

Structure the first dollar as a scoped pilot with a measured baseline, so finance funds proof rather than a leap. Capture the baseline before the pilot: current flagged-claim volume, current coverage of those flags, current cycle time, and current cost per investigated case. Run the pilot on a defined slice of flagged claims, and measure the same four metrics. Coverage and per-case cost are the primary pilot KPIs because they are the two sides of the loss-cost lever the whole memo turns on.

Frame the headcount story carefully throughout. The model does not depend on cutting investigators - it depends on coverage. With AI handling the mechanical 15+ phases, the investigator's role shifts from execution to decision-making: reviewing audit-ready output, handling exceptions, and making the determination. Headcount gets re-aimed at higher-judgment work, not eliminated, which is what keeps the case from triggering organizational resistance. The operational side of that transition belongs to the Claims VP, and the Claims VP deployment playbook is the artifact that pairs with this memo on the operations side.

The next decision after the pilot is not whether the technology works but how fast to scale coverage toward 100%. That is a finance question disguised as an operations one: each increment of coverage applies the same favorable unit economics to more of the flag pile, so the question becomes how quickly the carrier wants to convert its existing detection spend into resolved cases. A reviewer who funds a clean baseline and a measured pilot will have the data to answer that question with numbers rather than conviction, which is the position a finance function wants to be in before scaling any opex line.

Key takeaways

A finance reviewer can approve AI claims investigation on three numbers - cost per investigated case, the coverage gap it closes, and payback period - without needing to understand the agent's mechanics.
The dollar leak is the roughly 75% of flagged claims that go uninvestigated on a 25% coverage baseline, not the speed of the cases already worked, so coverage is the lever the memo must quantify.
AI investigation drops a case from about $2,500 and 14+ days to about $150 and 2-4 hours, which is what makes investigating 100% of flagged claims affordable instead of rationing to about 25%.
Model the business case as incremental investigated cases times a recovery rate anchored to EY's 7% to 14% leakage band against incremental opex, and never lead with FTE displacement.
Buying converts a multi-year R&D build into a per-case opex line that scales with flagged volume and stays complementary to FRISS, Shift, and Verisk rather than replacing them.

Frequently asked questions

Model three inputs: the incremental number of flagged claims you can now investigate, the recovery-and-denial rate on those incremental investigations, and the incremental cost to run them. Coverage is the lever - most carriers investigate roughly 25% of flagged claims manually, so the return comes from closing the gap toward 100%, not from speeding cases you already work. On the cost side, an AI-investigated case runs about $150 versus roughly $2,500 manually, a Hesper internal benchmark. Anchor your recovery rate to a documented leakage range - EY's claims quality assessments put leakage at 7% to 14% of carrier total claims spend. Build the case as incremental investigated cases times recovery rate, less incremental opex. Avoid leading with FTE reduction; it is politically fragile and not where the durable return sits.

Payback depends on three variables: your flagged-claim volume, the recovery-and-denial rate on newly investigated claims, and how fast the pilot ramps to steady-state coverage. Rather than quoting a single number, model it under conservative, base, and aggressive assumptions. The mechanics are favorable because the per-case cost drops by roughly an order of magnitude - about $2,500 manual to about $150 with AI, a Hesper internal benchmark - while coverage rises from about 25% to 100% of flagged claims. That combination means the incremental opex is small relative to the incremental recoverable leakage, which EY pegs at 7% to 14% of claims spend. Present payback as a sensitivity table tied to your own flagged volume, and treat any single figure as illustrative, not guaranteed.

Leakage is the money a carrier pays on claims it should have caught, reduced, or denied. EY's claims quality assessments put it at 7% to 14% of total claims spend, and one of EY's four named root causes is inadequate investigation of injury causation and liability. That is the structural source: detection vendors flag suspicious claims, but most carriers can only fully investigate about 25% of those flags. The other roughly 75% are paid, denied without full work, or queued indefinitely - and that uninvestigated tail is where leakage concentrates. Insurance fraud costs the US an estimated $308 billion annually, with $45 billion in property and casualty alone, per the Coalition Against Insurance Fraud. The leak is not slow investigation; it is uninvestigated flags.

Building it internally is a multi-year R&D commitment: an autonomous agent that runs 15+ investigation phases in parallel, produces an audit-ready trail satisfying California 10 CCR 2698.36 and NAIC Model Act 680, and integrates with Guidewire or Duck Creek is not a finance-friendly capex line. The hidden costs - services markup, customization, training, integration latency, opportunity cost, and lock-in - rarely show in the sticker price. Buying converts that into a per-case opex line that scales with flagged volume. It also keeps the spend complementary to detection tools you already own such as FRISS, Shift, and Verisk rather than duplicating them. For most finance reviewers, the buy decision turns a large uncertain capital bet into a measurable unit-economics line you can pilot and scale.

The model does not depend on cutting headcount, and a memo that leads with FTE reduction is usually the weakest version of the case. The durable return comes from coverage - investigating 100% of flagged claims instead of about 25% - which means the same team resolves far more cases. With AI handling the mechanical 15+ investigation phases, one investigator can carry 200+ cases and the investigator's role shifts from execution to decision-making: reviewing audit-ready output, handling exceptions, and making the call. Headcount gets re-aimed at higher-judgment work, not eliminated. For a finance reviewer, this matters because the business case rests on incremental recovered and denied dollars from closing the coverage gap, not on payroll savings that invite organizational resistance.

It works on the numerator - incurred losses and loss adjustment expense - by reducing leakage on flagged claims. Industry estimates put fraud at about 10% of property-casualty incurred losses and loss adjustment expenses, per the NICB, and EY measures total leakage at 7% to 14% of claims spend. Closing the coverage gap from about 25% to 100% of flagged claims attacks both: more fraudulent and inflated claims get caught, reduced, or denied before payment. Faster investigation - 2-4 hours versus 14+ days - also tightens reserving and accelerates settlement and recovery, which improves the timing of loss-cost recognition. Model the impact in basis points on the specific lines where fraud concentrates, such as workers comp and auto bodily injury, rather than at the book level.

On the cost side: per-case investigation cost, about $150 with AI versus about $2,500 manual as a Hesper internal benchmark, plus platform or subscription fees, integration and onboarding, and any change-management cost. On the return side: incremental investigated cases from closing the coverage gap, the recovery-and-denial rate applied to them anchored to EY's 7% to 14% leakage range, faster reserve release from shorter cycle time, and the stranded value you recover from detection spend you already make. Include a sensitivity table across conservative, base, and aggressive recovery assumptions and a payback period. Exclude FTE displacement as a headline driver. Note that the spend is complementary to existing detection vendors, so it is incremental coverage, not a tool swap.