An SIU dashboard that counts referrals received, cases opened, and cases closed measures activity, not impact. None of those three numbers tells the Claims VP whether the unit is bending the loss ratio, holding up under audit, or keeping pace with flagged-claim volume. The carriers that get this wrong end up with a 12-tile dashboard that turns green every quarter while coverage rate stays at 25% and backlog age keeps climbing.
The fix is structural. A 2026 SIU scorecard balances four quadrants - speed, quality, financial, capacity - and adds four governance metrics once an AI investigation agent enters the workflow. That is 12 core metrics plus 4 AI metrics, sized to fit a one-page monthly report for the Claims VP and a weekly operational view for the SIU lead.
This post defines all 12 metrics with formulas and benchmarks, then layers the four AI-augmented metrics on top. It is the operational counterpart to the SIU operations playbook and builds on the throughput benchmarks established in our 2026 SIU performance benchmarking piece.
Why legacy SIU dashboards reward activity over impact
The default SIU dashboard inherited from claims case management software counts events: referrals in, cases opened, cases closed, days open. Each of those is an activity, not an outcome. A unit can hit its referral target every month while investigating 25% of flagged claims and writing case summaries that fail audit sampling. The Insurance Information Institute summary of the Coalition Against Insurance Fraud 2022 benchmarking study reported SIU staffing growth of 1.4% from 2021 to 2022, down from 2.5% the prior period. Flagged-claim volume did not slow at the same rate, which is why activity counts look healthy while coverage falls.
Two numbers explain why closure counts are noise-dominated. First, roughly $308 billion in fraud loss runs through the US insurance system each year, with about 10% of P&C claims involving some form of fraud. Second, rules-based detection produces 60-85% false-positive rates at the alert layer. Most cases closed by an SIU under manual operations are closed as not-fraud, which means a quarter-over-quarter rise in "cases closed" can reflect more noise being processed, not more fraud being resolved.
The Claims VP and CFO do not need an activity report. They need to know how fast the unit converts a flag into a defensible decision, how much loss the SIU is actually avoiding per case, and whether the team is scaling with the book. Those three questions map cleanly to the four-quadrant scorecard below. The fourth quadrant - capacity - is usually the binding constraint, which is the through-line in our claims investigator backlog guide.
The four-quadrant SIU scorecard for 2026
Speed, quality, financial, capacity. Each quadrant gets three metrics, balanced so an improvement in one cannot mask a regression in another. Speed without quality produces fast wrong answers. Quality without speed inflates reserves. Financial without capacity overstates ROI per FTE while leaving 75% of flagged claims uninvestigated. Capacity without speed and quality means the unit is processing more cases but not better ones.
The 12-metric scorecard below is the full view. Each formula is implementable from a standard claims case management system event log plus a small QA sampling process. None of these metrics requires a new data warehouse.
Detection KPIs are not investigation KPIs
Alert volume, hit rate, and model precision sit upstream of the SIU in the detection layer (FRISS, Shift, Verisk). They tell you how the model is performing. They do not tell you whether the investigation unit is converting flags into defensible decisions. The 12-metric scorecard is the downstream view the SIU director owns.
Quadrant 1 - Speed metrics
Speed is not throughput. It is the elapsed time between a flag being raised and a case being closed with a decision the carrier can defend. Slow decisions inflate reserves, weaken settlement leverage, and push cases past regulatory review windows. Three metrics cover the speed quadrant.
Metric 1 - Average time-to-first-touch (hours)
Formula: mean(hours from flag created to first investigator action). This is the queue-entry metric. Under manual operations, time-to-first-touch is measured in days, which is the single biggest source of preventable delay in the cycle. AI-augmented target: under 4 hours, because the agent picks up the flag autonomously and begins the 15+ parallel investigation phases without waiting for human triage.
Metric 2 - Average cycle time per case (days or hours)
Formula: mean(hours from case opened to decision rendered). Manual baseline runs 14+ days per case across the industry. AI-augmented target is 2-4 hours per case. Track in hours, not days, once the agent is in production - day-level granularity hides the variance that matters.
Metric 3 - Median time-to-decision (hours)
Formula: median(case-open to closure timestamp). Use the median, not the mean, because long-tail outliers distort the average for any SIU running litigated or complex cases in parallel with high-volume claims. AI target is same-day; manual median is typically multi-day.
Quadrant 2 - Quality metrics
A fast wrong answer is worse than a slow right one. Quality metrics measure whether each case file would hold up under audit and adverse-action review. They map directly to the regulatory documentation requirements that already govern SIU work.
Metric 4 - False-positive rate (%)
Formula: cases closed as not-fraud / total cases investigated. Rules-based detection produces 60-85% false positives at the alert layer. SIU-side investigated false-positive rate is a separate number and should trend down as detection tuning improves and as investigators get better at early-stage triage. A rising SIU-investigated FPR usually signals the detection layer needs retraining, not that the SIU is performing worse.
Metric 5 - Case-quality score
Formula: percentage of cases where the investigation file addresses all six elements required by California 10 CCR 2698.36 - facts suggesting fraud, alleged misrepresentations, materiality, witnesses, supporting documentation, and a completeness statement. Score 6/6 = 100, 5/6 = 83, and so on. Target: 95% or above. The California rubric is the most prescriptive in the country, which makes it a useful default even for carriers domiciled elsewhere.
Metric 6 - Audit-defensibility rate (%)
Formula: percentage of sampled closed cases that pass QA review against the antifraud plan filed with the state DOI. Sample 5-10% of closed cases monthly. Target: 90%+. This is the metric that protects the carrier when the DOI requests a market-conduct exam.
Quadrant 3 - Financial metrics
This is the quadrant the CFO cares about. If the SIU cannot articulate dollar impact per investigated case, the budget conversation is one-sided and the unit gets framed as a cost center. Three metrics make the financial view defensible.
Metric 7 - Loss-avoidance dollars per investigated case
Formula: sum(reserve reductions + denied indemnity + settlement reductions attributable to SIU work) / cases investigated. Track by line of business - the benchmark for auto bodily injury looks nothing like the benchmark for workers comp. The trend matters more than the absolute number.
Metric 8 - ROI per investigator FTE
Formula: (annual loss avoidance + recoveries attributable to investigator) / fully-loaded investigator cost. Under manual operations at ~$2,500 per case and ~10 investigations per investigator per month, ROI per FTE sits in the 1-2x range. AI-augmented operations at ~$150 per case and 800+ cases per investigator per month push this above 5x. A low ROI under manual workflows is usually a coverage problem, not a productivity problem - the unit is doing good work on the 25% it touches, but the other 75% is the source of the leakage.
Metric 9 - Recovery rate on adverse decisions
Formula: dollars actually recovered or denied / dollars identified as fraudulent. This tracks the gap between 'we caught it' and 'we kept the money.' A high identification number with a low recovery rate signals weakness in the post-SIU handoff to subrogation or legal, not a weakness in the investigation itself.
Quadrant 4 - Capacity metrics
Capacity is the constraint the rest of the scorecard runs into. Three numbers tell the Claims VP whether the SIU is scaling with the book or being outrun by it. This quadrant is where the 1.4% staffing growth statistic does most of its damage: the work keeps arriving, the FTE count does not. That dynamic is the reason 75% of flagged claims never get investigated, which we cover at length in why flagged insurance claims never get investigated.
Metric 10 - Caseload per investigator (queued)
Formula: total open cases / FTE investigators. Industry reality under manual operations: 200+ cases per investigator in the active queue. This is the current state, not a target. A unit that reports 50 cases per investigator is either over-staffed for its book or, more often, has a coverage problem that hides queued work in an upstream 'unassigned' bucket.
Metric 11 - Backlog age (median days)
Formula: median(today - flag-created date) across flags that have not been investigated. Operational tripwire: when median backlog age exceeds the policy review window the carrier filed in its antifraud plan, regulators notice. Target: under 30 days.
Metric 12 - Coverage rate (%)
Formula: cases investigated / flags raised. Manual baseline runs near 25%. AI-augmented target is 100%, because the cost-per-case math at ~$150 makes it economic to investigate every flag rather than triaging which ones merit human time. Coverage rate is the leading indicator that predicts most of the lagging financial metrics - a unit cannot avoid loss on a claim it never touched.
Coverage rate by SIU operating model
Four new metrics for AI-augmented SIU teams
Once an autonomous investigation agent is in the workflow, the 12-metric scorecard needs four additions. These are not optional - they are how the SIU lead retains accountability for AI-rendered conclusions and how the antifraud plan filed under NAIC Insurance Fraud Prevention Model Act 680 stays compliant when reviewed by the state DOI. The deployment side of this is covered in our claims VP AI investigation playbook.
- Autonomous-resolution rate - percentage of cases closed without human override. Frames how much load the agent is actually carrying.
- Human-override rate - percentage of AI conclusions overridden by the SIU lead on review. Healthy band: 5-15%. Below 5% suggests rubber-stamping; above 15% suggests the agent is not calibrated for the book.
- Evidence-completeness score - percentage of the 15+ defined investigation phases that produced valid evidence, per case. Maps directly to the California completeness requirement.
- Audit-trail integrity - binary or percentage. Every AI-rendered decision must be reconstructible from logs back to source inputs. Target: 100%. Ties directly to the 10 CCR 2698.36 documentation requirement.
These four governance metrics also satisfy regulatory pressure. State DOIs are starting to ask about AI governance during antifraud plan reviews, and a carrier that can produce the autonomous-resolution rate, override rate, evidence-completeness, and audit-trail integrity numbers on demand is in a far stronger position than one running an AI pilot without instrumentation.
Key takeaways
- The legacy SIU dashboard of referrals received, cases opened, and cases closed measures activity, not impact, and is not the report the Claims VP or CFO actually needs.
- A balanced 2026 scorecard covers four quadrants - speed, quality, financial, capacity - so that gains in one cannot mask regressions in another.
- Capacity is the binding constraint: SIU staffing grew only 1.4% from 2021 to 2022 while flagged-claim volume kept rising, which is why coverage rate sits near 25% under manual operations.
- Quality metrics should map directly to regulatory requirements, with California 10 CCR 2698.36's six summary elements and the antifraud plan filed under NAIC Model 680 providing a defensible rubric instead of a subjective grade.
- Once an autonomous investigation agent enters the workflow, four metrics become non-optional: autonomous-resolution rate, human-override rate, evidence-completeness score, and audit-trail integrity.