Hesper AI
BlogSIU Operations
SIU OperationsMay 13, 2026·11 min read·Nitish Badu, COO

SIU KPIs in 2026: the 12 metrics every director should track

Legacy SIU dashboards reward activity over impact. Here are the 12 metrics that actually predict loss avoidance, audit defensibility, and capacity in 2026.

NB
Nitish Badu · COO and Co-founder
May 13, 2026·11 min read
$308B
Annual US insurance fraud loss
Coalition Against Insurance Fraud
1.4%
SIU staffing growth 2021-2022
Down from 2.5% the prior period
25%
Coverage rate of flagged claims
Manual SIU baseline
60-85%
Rules-based false-positive rate
Why closure counts are noise

An SIU dashboard that counts referrals received, cases opened, and cases closed measures activity, not impact. None of those three numbers tells the Claims VP whether the unit is bending the loss ratio, holding up under audit, or keeping pace with flagged-claim volume. The carriers that get this wrong end up with a 12-tile dashboard that turns green every quarter while coverage rate stays at 25% and backlog age keeps climbing.

The fix is structural. A 2026 SIU scorecard balances four quadrants - speed, quality, financial, capacity - and adds four governance metrics once an AI investigation agent enters the workflow. That is 12 core metrics plus 4 AI metrics, sized to fit a one-page monthly report for the Claims VP and a weekly operational view for the SIU lead.

This post defines all 12 metrics with formulas and benchmarks, then layers the four AI-augmented metrics on top. It is the operational counterpart to the SIU operations playbook and builds on the throughput benchmarks established in our 2026 SIU performance benchmarking piece.

Why legacy SIU dashboards reward activity over impact

The default SIU dashboard inherited from claims case management software counts events: referrals in, cases opened, cases closed, days open. Each of those is an activity, not an outcome. A unit can hit its referral target every month while investigating 25% of flagged claims and writing case summaries that fail audit sampling. The Insurance Information Institute summary of the Coalition Against Insurance Fraud 2022 benchmarking study reported SIU staffing growth of 1.4% from 2021 to 2022, down from 2.5% the prior period. Flagged-claim volume did not slow at the same rate, which is why activity counts look healthy while coverage falls.

Two numbers explain why closure counts are noise-dominated. First, roughly $308 billion in fraud loss runs through the US insurance system each year, with about 10% of P&C claims involving some form of fraud. Second, rules-based detection produces 60-85% false-positive rates at the alert layer. Most cases closed by an SIU under manual operations are closed as not-fraud, which means a quarter-over-quarter rise in "cases closed" can reflect more noise being processed, not more fraud being resolved.

The Claims VP and CFO do not need an activity report. They need to know how fast the unit converts a flag into a defensible decision, how much loss the SIU is actually avoiding per case, and whether the team is scaling with the book. Those three questions map cleanly to the four-quadrant scorecard below. The fourth quadrant - capacity - is usually the binding constraint, which is the through-line in our claims investigator backlog guide.

The four-quadrant SIU scorecard for 2026

Speed, quality, financial, capacity. Each quadrant gets three metrics, balanced so an improvement in one cannot mask a regression in another. Speed without quality produces fast wrong answers. Quality without speed inflates reserves. Financial without capacity overstates ROI per FTE while leaving 75% of flagged claims uninvestigated. Capacity without speed and quality means the unit is processing more cases but not better ones.

The 12-metric scorecard below is the full view. Each formula is implementable from a standard claims case management system event log plus a small QA sampling process. None of these metrics requires a new data warehouse.

MetricQuadrantFormulaTarget benchmark
Avg time-to-first-touchSpeedmean(hours from flag created to first investigator action)< 4 hours (AI-augmented)
Avg cycle time per caseSpeedmean(hours from open to decision)2-4 hours (AI); 14+ days (manual)
Median time-to-decisionSpeedmedian(open to closure timestamp)Same-day (AI); multi-day (manual)
False-positive rateQualitycases closed not-fraud / cases investigatedTrend down; alerts run 60-85% FP
Case-quality scoreQuality% of cases meeting all six 10 CCR 2698.36 summary elements95%+
Audit-defensibility rateQuality% of sampled closed cases passing QA vs antifraud plan90%+
Loss-avoidance $ per caseFinancialsum(reserve cuts + denials + settlement reductions) / cases investigatedTrack trend by line of business
ROI per investigator FTEFinancial(annual loss avoidance + recoveries) / fully-loaded FTE cost> 5x with AI; 1-2x manual
Recovery rate on adverse decisionsFinancial$ recovered or denied / $ identified as fraudulent60%+
Caseload per investigatorCapacitytotal open cases / FTE investigators200+ is current reality, not target
Backlog ageCapacitymedian(today - flag-created date) for uninvestigated flags< 30 days
Coverage rateCapacitycases investigated / flags raised100% (AI); ~25% (manual baseline)

Detection KPIs are not investigation KPIs

Alert volume, hit rate, and model precision sit upstream of the SIU in the detection layer (FRISS, Shift, Verisk). They tell you how the model is performing. They do not tell you whether the investigation unit is converting flags into defensible decisions. The 12-metric scorecard is the downstream view the SIU director owns.

Quadrant 1 - Speed metrics

Speed is not throughput. It is the elapsed time between a flag being raised and a case being closed with a decision the carrier can defend. Slow decisions inflate reserves, weaken settlement leverage, and push cases past regulatory review windows. Three metrics cover the speed quadrant.

Metric 1 - Average time-to-first-touch (hours)

Formula: mean(hours from flag created to first investigator action). This is the queue-entry metric. Under manual operations, time-to-first-touch is measured in days, which is the single biggest source of preventable delay in the cycle. AI-augmented target: under 4 hours, because the agent picks up the flag autonomously and begins the 15+ parallel investigation phases without waiting for human triage.

Metric 2 - Average cycle time per case (days or hours)

Formula: mean(hours from case opened to decision rendered). Manual baseline runs 14+ days per case across the industry. AI-augmented target is 2-4 hours per case. Track in hours, not days, once the agent is in production - day-level granularity hides the variance that matters.

Metric 3 - Median time-to-decision (hours)

Formula: median(case-open to closure timestamp). Use the median, not the mean, because long-tail outliers distort the average for any SIU running litigated or complex cases in parallel with high-volume claims. AI target is same-day; manual median is typically multi-day.

Quadrant 2 - Quality metrics

A fast wrong answer is worse than a slow right one. Quality metrics measure whether each case file would hold up under audit and adverse-action review. They map directly to the regulatory documentation requirements that already govern SIU work.

Metric 4 - False-positive rate (%)

Formula: cases closed as not-fraud / total cases investigated. Rules-based detection produces 60-85% false positives at the alert layer. SIU-side investigated false-positive rate is a separate number and should trend down as detection tuning improves and as investigators get better at early-stage triage. A rising SIU-investigated FPR usually signals the detection layer needs retraining, not that the SIU is performing worse.

Metric 5 - Case-quality score

Formula: percentage of cases where the investigation file addresses all six elements required by California 10 CCR 2698.36 - facts suggesting fraud, alleged misrepresentations, materiality, witnesses, supporting documentation, and a completeness statement. Score 6/6 = 100, 5/6 = 83, and so on. Target: 95% or above. The California rubric is the most prescriptive in the country, which makes it a useful default even for carriers domiciled elsewhere.

Metric 6 - Audit-defensibility rate (%)

Formula: percentage of sampled closed cases that pass QA review against the antifraud plan filed with the state DOI. Sample 5-10% of closed cases monthly. Target: 90%+. This is the metric that protects the carrier when the DOI requests a market-conduct exam.

Quadrant 3 - Financial metrics

This is the quadrant the CFO cares about. If the SIU cannot articulate dollar impact per investigated case, the budget conversation is one-sided and the unit gets framed as a cost center. Three metrics make the financial view defensible.

Metric 7 - Loss-avoidance dollars per investigated case

Formula: sum(reserve reductions + denied indemnity + settlement reductions attributable to SIU work) / cases investigated. Track by line of business - the benchmark for auto bodily injury looks nothing like the benchmark for workers comp. The trend matters more than the absolute number.

Metric 8 - ROI per investigator FTE

Formula: (annual loss avoidance + recoveries attributable to investigator) / fully-loaded investigator cost. Under manual operations at ~$2,500 per case and ~10 investigations per investigator per month, ROI per FTE sits in the 1-2x range. AI-augmented operations at ~$150 per case and 800+ cases per investigator per month push this above 5x. A low ROI under manual workflows is usually a coverage problem, not a productivity problem - the unit is doing good work on the 25% it touches, but the other 75% is the source of the leakage.

Metric 9 - Recovery rate on adverse decisions

Formula: dollars actually recovered or denied / dollars identified as fraudulent. This tracks the gap between 'we caught it' and 'we kept the money.' A high identification number with a low recovery rate signals weakness in the post-SIU handoff to subrogation or legal, not a weakness in the investigation itself.

Quadrant 4 - Capacity metrics

Capacity is the constraint the rest of the scorecard runs into. Three numbers tell the Claims VP whether the SIU is scaling with the book or being outrun by it. This quadrant is where the 1.4% staffing growth statistic does most of its damage: the work keeps arriving, the FTE count does not. That dynamic is the reason 75% of flagged claims never get investigated, which we cover at length in why flagged insurance claims never get investigated.

Metric 10 - Caseload per investigator (queued)

Formula: total open cases / FTE investigators. Industry reality under manual operations: 200+ cases per investigator in the active queue. This is the current state, not a target. A unit that reports 50 cases per investigator is either over-staffed for its book or, more often, has a coverage problem that hides queued work in an upstream 'unassigned' bucket.

Metric 11 - Backlog age (median days)

Formula: median(today - flag-created date) across flags that have not been investigated. Operational tripwire: when median backlog age exceeds the policy review window the carrier filed in its antifraud plan, regulators notice. Target: under 30 days.

Metric 12 - Coverage rate (%)

Formula: cases investigated / flags raised. Manual baseline runs near 25%. AI-augmented target is 100%, because the cost-per-case math at ~$150 makes it economic to investigate every flag rather than triaging which ones merit human time. Coverage rate is the leading indicator that predicts most of the lagging financial metrics - a unit cannot avoid loss on a claim it never touched.

Coverage rate by SIU operating model

Manual SIU (industry baseline)25%
Hybrid (rules + human triage)45%
AI-assisted (agent pre-investigates)80%
AI-augmented (full agent + human review)100%

Four new metrics for AI-augmented SIU teams

Once an autonomous investigation agent is in the workflow, the 12-metric scorecard needs four additions. These are not optional - they are how the SIU lead retains accountability for AI-rendered conclusions and how the antifraud plan filed under NAIC Insurance Fraud Prevention Model Act 680 stays compliant when reviewed by the state DOI. The deployment side of this is covered in our claims VP AI investigation playbook.

  • Autonomous-resolution rate - percentage of cases closed without human override. Frames how much load the agent is actually carrying.
  • Human-override rate - percentage of AI conclusions overridden by the SIU lead on review. Healthy band: 5-15%. Below 5% suggests rubber-stamping; above 15% suggests the agent is not calibrated for the book.
  • Evidence-completeness score - percentage of the 15+ defined investigation phases that produced valid evidence, per case. Maps directly to the California completeness requirement.
  • Audit-trail integrity - binary or percentage. Every AI-rendered decision must be reconstructible from logs back to source inputs. Target: 100%. Ties directly to the 10 CCR 2698.36 documentation requirement.

These four governance metrics also satisfy regulatory pressure. State DOIs are starting to ask about AI governance during antifraud plan reviews, and a carrier that can produce the autonomous-resolution rate, override rate, evidence-completeness, and audit-trail integrity numbers on demand is in a far stronger position than one running an AI pilot without instrumentation.

QuadrantManual baselineAI-augmented target
Speed14+ days cycle time; first-touch in days2-4 hours cycle time; first-touch in minutes
QualityFPR varies; quality scoring inconsistent; audit defensibility depends on individual investigatorStandardized 15+ phase coverage; case-quality 95%+; audit-trail integrity 100%
Financial~$2,500 per case; ~10 investigations per FTE per month; ROI 1-2x~$150 per case; 800+ cases per FTE per month; ROI 5x+
Capacity200+ cases queued; 25% coverage; 1.4% staffing growthThroughput-based capacity; 100% coverage of flagged claims

The shift from a 12-tile activity dashboard to a four-quadrant scorecard is the single highest-leverage move an SIU director can make in 2026. The metrics themselves are not new; what is new is forcing each one to map to a regulator-defined or finance-defined outcome.

Hesper AI product research

Key takeaways

  • The legacy SIU dashboard of referrals received, cases opened, and cases closed measures activity, not impact, and is not the report the Claims VP or CFO actually needs.
  • A balanced 2026 scorecard covers four quadrants - speed, quality, financial, capacity - so that gains in one cannot mask regressions in another.
  • Capacity is the binding constraint: SIU staffing grew only 1.4% from 2021 to 2022 while flagged-claim volume kept rising, which is why coverage rate sits near 25% under manual operations.
  • Quality metrics should map directly to regulatory requirements, with California 10 CCR 2698.36's six summary elements and the antifraud plan filed under NAIC Model 680 providing a defensible rubric instead of a subjective grade.
  • Once an autonomous investigation agent enters the workflow, four metrics become non-optional: autonomous-resolution rate, human-override rate, evidence-completeness score, and audit-trail integrity.

Frequently asked questions

Leading KPIs measure inputs that predict outcomes; lagging KPIs measure outcomes after the fact. Time-to-first-touch, backlog age, evidence-completeness score, and coverage rate are leading. Cases closed, dollars recovered, and audit-defensibility rate are lagging. A scorecard built only on lagging metrics tells the SIU director what happened last quarter; one with leading metrics tells them what next quarter will look like. The most useful 2026 scorecards put leading metrics in the operational view reviewed weekly with the team, and lagging metrics in the executive view reported monthly to the Claims VP. Coverage rate, backlog age, and time-to-first-touch are the three highest-leverage leading indicators for almost every P&C SIU.

Build a one-page report keyed to the four quadrants, with each quadrant showing one headline metric and one trend line. Speed: median time-to-decision. Quality: audit-defensibility rate. Financial: loss-avoidance dollars per investigated case. Capacity: coverage rate. Each metric should have a 12-month trend and a benchmark line. Do not put 12 metrics on the first slide - those go on page two as supporting detail. The Claims VP cares about loss-ratio impact, regulatory exposure, and FTE productivity, and the four-headline view maps cleanly to those three concerns. Carriers running an AI investigation pilot should add the autonomous-resolution rate and evidence-completeness score as governance metrics on the same page.

NAIC Model Act 680, adopted in most states, requires carriers to describe how SIU effectiveness is measured. The metrics that consistently appear on filings are case-quality score, audit-defensibility rate, coverage rate, and recovery rate on adverse decisions. California 10 CCR 2698.36 goes further and prescribes the six elements every SIU investigation summary must address: facts suggesting fraud, alleged misrepresentations, materiality, witnesses, supporting documentation, and a completeness statement. Many other state DOIs reference the California framework informally. Carriers using AI investigation agents should also document audit-trail integrity and evidence-completeness methodology in the plan, because reviewers are starting to ask about AI governance during antifraud plan reviews.

The four-quadrant scorecard - speed, quality, financial, capacity - is the right baseline whether or not AI is in the workflow. The four AI-augmented metrics only become necessary once an autonomous agent is making or recommending case-level decisions. That said, even pre-AI SIUs benefit from instrumenting evidence-completeness against the 15+ investigation phases, because it surfaces which cases are being closed on partial evidence. When the carrier later evaluates AI vendors, having clean historical data on the 12-metric scorecard is what makes the ROI case credible. Starting with the four quadrants now also makes the eventual AI rollout an extension of an existing measurement system rather than a brand-new reporting layer.

Under manual operations, 200+ open cases per investigator is the current industry reality, not a target. Industry benchmarks suggest investigators complete around 10 investigations per month under manual workflows, which is why backlogs accumulate and coverage rates sit near 25% of flagged claims. The Coalition Against Insurance Fraud 2022 benchmarking study, summarized by the Insurance Information Institute, showed SIU staffing grew only 1.4% from 2021 to 2022 - far below claim-volume growth - so the caseload number keeps drifting up. With AI investigation in the workflow, the relevant capacity metric shifts from cases queued to cases per investigator per month, where 800+ becomes achievable because the investigator's role shifts from execution to decision-making.

Three shifts. First, throughput replaces queue size as the meaningful capacity metric: 800+ cases per investigator per month, not 200+ cases queued. Second, evidence-completeness against the 15+ defined investigation phases replaces investigator-by-investigator quality variance. Third, audit-trail integrity becomes a binary gate: every AI-rendered decision must be reconstructible from logs back to source inputs, or the case cannot ship. Autonomous-resolution rate and human-override rate are the governance metrics on top: the SIU lead is accountable for the agent's outputs, and these two numbers show whether the human-in-the-loop check is calibrated. A healthy override rate sits in the 5-15% band, low enough to mean the agent is useful and high enough to mean review is real.

Operational metrics - time-to-first-touch, backlog age, coverage rate, human-override rate - should be reviewed weekly with the SIU team. Quality and governance metrics - case-quality score, audit-defensibility rate, evidence-completeness, audit-trail integrity - belong in a monthly review. Financial metrics - loss-avoidance dollars per case, ROI per FTE, recovery rate on adverse decisions - sit in the monthly executive report and the quarterly board view. Cadence matters as much as the metric itself: a quarterly review of backlog age is too slow to act on; a weekly review of ROI per FTE is too noisy to be useful. Match the review frequency to the metric's volatility and the team's ability to influence it.

← More articles on the Hesper AI blog

See Hesper AI on your documents

Request a demo and we'll run an analysis on your real document samples.