Hesper AI
BlogGuides
GuidesMay 25, 2026·12 min read·Nitish Badu, COO

SIU director's first 90 days with AI investigation: a week-by-week onboarding playbook

An SIU director's onboarding playbook for AI investigation across 90 days: baseline the unit, scope the pilot, redesign the analyst role, lock the audit trail, run QC, and measure lift.

NB
Nitish Badu · COO and Co-founder
May 25, 2026·12 min read
14+ days
Manual SIU baseline per case
The cycle time you measure before you change anything
2-4 hours
AI investigation per case
The cycle-time target to report at day 90
~25% → 100%
Flagged-claim coverage shift
What makes the credible-referral duty achievable
70%
Of SIUs run fully remote
Per the 2024 CAIF benchmarking study

An SIU director's first 90 days with AI investigation is not a software install - it is a redesign of how the unit runs. The agent changes throughput and coverage, but the director still owns case quality, the audit trail, the credible-referral duty, and the relationship with the state DOI. This playbook lays out the operational work across four phases - baseline the unit, scope the pilot and design routing, lock governance and quality control, then measure lift - and names what the director signs at each gate.

The money question belongs to a different seat. Budget ranges, the procurement cycle, the loss-ratio story, and the month-12 board report sit with the Claims VP - covered in the Claims VP playbook for deploying AI investigation. The Claims VP decides whether to deploy and funds it. The SIU director decides how the unit actually runs once the agent is live. Both roles are required; neither substitutes for the other.

The stakes are not small. US insurance fraud runs an estimated $308 billion a year per the Coalition Against Insurance Fraud, and roughly 10% of P&C claims involve fraud. The bottleneck is not detection - it is investigating the flags those detection systems already produce. For how SIUs operate in 2026, see the SIU operations pillar.

What changes for the SIU director (and what does not)

AI investigation changes two things in the unit: throughput and coverage. Manual SIU work runs 14+ days per case, and a typical investigator closes around 10 investigations per month. An autonomous investigation agent compresses the per-case cycle to 2-4 hours and runs 15+ investigation phases in parallel - document forensics, OSINT, statement cross-reference, timeline reconstruction, and financial pattern analysis - all at once rather than one case at a time. That is what lets coverage move from the roughly 25% of flagged claims most teams investigate today toward 100%.

What does not change is accountability. The director still owns case quality, signs off on referrals, defends the file in a deposition or examination under oath, and answers to the state DOI. The investigator's role shifts from execution to decision-making, but the judgment work stays human. An AI investigation that the director cannot read, override, and document is not usable, no matter how fast it runs.

Capacity was never going to come from hiring. SIU staffing growth slowed to 1.4% in 2021-2022, down from 2.5% the prior period, per the Insurance Information Institute. Meanwhile, the 2024 Property-Casualty Insurance Anti-Fraud Benchmarking Study from the Coalition Against Insurance Fraud and Aon found 80% of respondents now use predictive modeling to detect fraud, up from 55% in 2018. Detection is largely solved at carriers. Investigating the flags is the unsolved layer - and that is exactly the bottleneck the SIU lives every day.

Detection is upstream; investigation is downstream

The agent does not replace FRISS, Shift Technology, or Verisk. Those tools flag suspicious claims. The investigation agent ingests the flags they produce and resolves each one end-to-end with an audit-ready file. Complementary to FRISS, Shift Technology, and Verisk - not a replacement. The director keeps the existing detection feed; the agent works the queue it generates.

Weeks 1-2: baseline the unit before you change anything

You cannot prove lift you did not baseline. Week one is measurement, not deployment. Before a single case routes to the agent, capture where the unit stands today across the metrics that the day-90 report will move. Skip this and the pilot produces a number with nothing to compare it against, which is the fastest way to lose a scale decision.

Capture six core operational metrics: investigation cycle time, coverage as a percent of flagged claims, throughput per investigator per month, backlog depth, the share of flagged claims that close without an investigation file, and the current quality-control sample rate. Add false-positive load, because rules-based detection runs 60-85% false positives and that volume is much of what clogs the queue. For the full metric set, the SIU KPIs to track in 2026 guide lists the dozen worth instrumenting.

Week-1 baseline vs day-90 target (operational metrics)

Cycle time per case14+ days → 2-4 hrs
Flagged-claim coverage~25% → 100%
Throughput / investigator / mo~10 → 800+
Flags closed without a filehigh → near zero

Baseline across the whole team, not just the people in one office. The 2024 benchmarking study found 70% of SIUs are fully remote and another 29% are hybrid, so a distributed measurement process matters - you are capturing a starting point for analysts who may never sit in the same room. Record the numbers, date them, and have the Claims VP acknowledge them. That signed baseline is the reference point everything in week 13 is measured against.

Metric to baselineManual starting pointWhat it tells you
Cycle time per case14+ daysHow long a referral sits before resolution
Coverage of flagged claims~25%How many flags get a real investigation
Throughput per investigator~10 / monthCurrent capacity ceiling per head
Caseload per investigator200+ casesThe weight already on each desk
Flags closed without a fileOften highCoverage gap and compliance exposure
QC sample rateSet yoursCurrent quality-control discipline

Weeks 3-6: scope the pilot and design case routing

Pick one or two lines where referral volume is highest and red flags are well-defined - usually auto and workers compensation. A focused pilot produces a cleaner signal than a thin spread across every line. Scope it to enough cases that the read on cycle time and false-positive resolution is trustworthy; not fewer than about 1,000 cases gives the unit a signal that holds up to a scale decision rather than a number built on noise.

Design the routing before the first case moves. The flags your detection vendors already produce become the agent's input queue - the agent ingests output from FRISS, Shift Technology, or Verisk ISO ClaimSearch rather than replacing any of them. The 2024 benchmarking study notes a rise in automation within investigative referrals, and California 10 CCR 2698.36 explicitly covers "automated or system-generated referrals," so routing flagged claims to an agent sits squarely inside the regulatory frame. For the criteria to evaluate the platform itself, the checklist for evaluating AI fraud investigation vendors is the 12-point reference.

Write the routing as a sequence with an explicit human-in-the-loop gate:

  1. Detection vendor or rules engine flags a claim on a piloted line (auto or workers comp).
  2. The flag auto-routes to the investigation agent, which runs the full investigation across 15+ phases.
  3. The agent produces an evidence pack with sources, reasoning, and timestamps and places it in the investigator review queue.
  4. The investigator reviews the pack and decides: clear, refer, or escalate. The agent never closes a case on its own.
  5. Escalations route to the SIU director, who holds final authority on referral and SAR decisions.

Define the pilot exit criteria the director signs before kickoff. State the cases-investigated target, the cycle-time and coverage thresholds, the QC pass rate the unit needs to see, and the conditions under which the pilot scales, holds, or stops. Those criteria are the operator's decision, not the vendor's. The agent sits inside the stack the director already runs - see how it fits among existing tools in the SIU director toolkit for 2026.

Weeks 3-6: redesign the investigator role (in parallel)

The investigator stops gathering and starts deciding. Write the new role definition before the first case routes, because the desk-level change is the part of onboarding most likely to stall if it is left implicit. The work that fills a manual investigation - pulling records, cross-referencing statements, reconstructing timelines, running database lookups - is exactly what the agent runs across 15+ phases in parallel. The investigator's role shifts from execution to decision-making.

What stays on the human desk is judgment: weighing the evidence the agent assembled, deciding on referral, conducting EUOs, signing the file, and handling the cases that do not fit a pattern. Field and desk investigators make up the majority of SIU staff per the 2024 benchmarking study, and that mix does not disappear - it gets re-aimed. The same head that closed about 10 investigations a month gains headroom toward the 800+ per month the agent can carry, and that headroom goes to the previously uninvestigated tier, not to a smaller payroll.

Onboarding succeeds or fails at the desk. When investigators see the agent as the analyst who does the gathering so they can do the deciding, adoption holds. When it is framed as a replacement, the unit resists the queue and the pilot signal degrades.

Hesper AI product research

No headcount cut is part of this model - SIU staffing does not shrink, it gets re-aimed at higher-judgment work and the coverage gap. The commitment that there are no layoffs tied to the deployment is a Claims VP and executive ownership question; the Claims VP playbook covers how that commitment gets made and communicated. The director's job is the role definition itself: what each investigator now does with the agent in the loop.

Weeks 7-10: governance and the audit trail Marcus has to defend

An AI investigation is only usable if the investigator can see what the agent did, override it, and produce a documented trail for a deposition, EUO, or SAR filing. This is the section that answers the director's core veto question. The requirement is an audit-trail-native platform: every decision the agent makes is logged with its sources, its reasoning, and timestamps, so the file reads as a defensible chain, not a black-box conclusion.

California 10 CCR 2698.36 sets the bar in concrete terms. The SIU must investigate each credible referral of suspected insurance fraud it receives - including automated or system-generated referrals - and document an investigation summary. Per the regulation text on Cornell LII, that summary must answer what facts caused the belief of fraud, the suspected misrepresentations and who made them, materiality, the pertinent witnesses, and what documentation exists. An agent that produces that documented decision on every flagged claim is what makes the duty achievable at 100% coverage rather than the roughly 25% manual teams reach.

10 CCR 2698.36 documentation questionWhere it comes from in the agent file
What facts caused the belief of fraud?Flag rationale plus the agent's phase findings
What misrepresentations, and who made them?Statement cross-reference and timeline phases
Is the misrepresentation material?Financial pattern analysis and claim valuation
Who are the pertinent witnesses?OSINT and party-relationship findings
What documentation supports the conclusion?Logged sources and document-forensics output

Set the override workflow explicitly. The investigator can disagree with any agent finding, record the reason, and substitute a human decision, with the change logged alongside the original. Disputes route into a dispute log the unit owns. The escalation path ends with the SIU director as final authority on referral and SAR decisions - the agent informs that decision, it does not make it. That posture, written down before scale, is what a state DOI auditor expects to find.

Audit-trail-native is the load-bearing requirement

If the platform cannot reconstruct, on demand, every source and reasoning step behind a conclusion - with the investigator's override visible in the same chain - it does not clear the bar for a deposition, EUO, or SAR. This is the line that separates a usable AI investigation from a flag with extra steps.

Weeks 7-10: quality control and model review (in parallel)

Set a quality-control sampling cadence and a monthly model-quality review so the unit trusts the queue. A queue the investigators do not trust gets second-guessed case by case, which erases the throughput gain. The discipline that prevents that is the same one a well-run SIU already applies to manual work: sample, review, log, and adjust.

A workable starting cadence is to review every agent-flagged case as a matter of course and a fixed percentage of agent-cleared cases on a rolling sample. The cleared sample is where the unit catches whether the agent is closing anything it should not. Because rules-based detection runs 60-85% false positives, much of what the agent triages is noise the investigators no longer have to chase - the QC sample confirms that the triage is sound, not that work is being skipped.

Run a monthly model-quality review with a standing agenda: the QC sample results, the dispute log since last review, any patterns in overridden findings, and the routing-rule changes those patterns justify. A cluster of disputes on one flag type is the signal to adjust the routing rule for that flag, tighten the human gate, or feed the pattern back for tuning. The output of that meeting is a short, dated record of what changed and why - itself part of the defensible posture.

  • Review 100% of agent-flagged cases and a fixed rolling sample of agent-cleared cases.
  • Log every disputed agent finding with the investigator's reason and the resolution.
  • Hold a monthly model-quality review against a standing agenda the director owns.
  • Let a dispute cluster on one flag type trigger a documented routing-rule change.
  • Date and retain the review record as part of the audit posture.

Weeks 11-13: measure lift and refresh the antifraud plan

At day 90, report operational lift against the week-1 baseline and update the antifraud-plan filing to reflect the AI-assisted workflow. This is the operator's lift report to the Claims VP, not the month-12 board narrative - that one belongs to the Claims VP's seat. Keep it to the operational story: what moved, by how much, and what it means for the queue.

Operational metricWeek-1 baselineDay-90 direction
Cycle time on piloted lines14+ daysToward 2-4 hours
Coverage of flagged claims~25%Toward 100%
Throughput per investigator~10 / monthToward 800+ / month
Flags with a documented filePartialEvery flag investigated
QC sample pass rateSet in week 7Reported with disputes logged

Include the QC results in the report: the agent-cleared sample reviewed, the disputed outputs logged, and any routing-rule changes made during the pilot. A coverage lift with no quality evidence behind it is not a lift the director should sign. The point of the day-90 report is to show that the queue moved faster and wider without the case files getting thinner.

Close the 90 days by refreshing the antifraud-plan filing. The NAIC Insurance Fraud Prevention Model Act #680 frames the antifraud plan SIUs maintain - the official text is in the NAIC Model Act #680 PDF - and the filing should now describe the AI-assisted investigation workflow, the human-in-the-loop gates, and the documented-decision posture the unit runs. Cost per case (roughly $2,500 manual versus $150 with AI) belongs in the Claims VP and CFO reporting, not the operator's lift report. Hand the money story to that seat and keep the day-90 report on the operations.

One illustrative way to size the coverage shift: a carrier flagging 50,000 claims a year investigates about 12,500 of them at 25% coverage. Taking coverage to 100% means all 50,000 get a documented investigation. That figure is derived from coverage percentage times flagged volume - illustrative, not a published number - but it is the shape of what changes when the agent covers the flags the manual team never reached.

Key takeaways

  • The first 90 days with AI investigation is a unit redesign across four phases - baseline, pilot and routing, governance and QC, and lift measurement - not a software install.
  • Baseline six operational metrics in weeks 1-2, because lift you did not measure first cannot be proven at day 90.
  • The investigator's role shifts from execution to decision-making, with judgment staying human and freed capacity re-aimed at the previously uninvestigated coverage gap.
  • An audit-trail-native platform that logs every source, reasoning step, and override is what makes the credible-referral duty under California 10 CCR 2698.36 achievable at 100% coverage.
  • The day-90 report is an operational lift story for the Claims VP and a refreshed antifraud-plan filing under NAIC Model Act #680, while the budget and board narrative stays with the Claims VP.

Frequently asked questions

Treat it as a unit redesign across four phases, not a software install. Weeks 1-2: baseline the unit - cycle time (manual SIU runs 14+ days per case), coverage (most carriers investigate only about 25% of flagged claims), throughput (roughly 10 investigations per investigator per month), and backlog. Weeks 3-6: scope a pilot on one or two high-referral lines like auto and workers comp, design case routing with a human review gate, and redefine the investigator role from data gatherer to decision-maker. Weeks 7-10: lock the governance and audit-trail review standards and a quality-control sampling cadence. Weeks 11-13: measure lift against the week-1 baseline and refresh the antifraud plan. The budget case belongs to the Claims VP; the operational design belongs to the SIU director.

No. The investigator's role shifts from execution to decision-making. Manual SIU work is dominated by data gathering - pulling records, cross-referencing statements, reconstructing timelines, running database lookups - which is exactly what an autonomous investigation agent runs across 15+ phases in parallel. The judgment work stays human: weighing evidence, deciding on referral, handling EUOs, and signing the case. Because manual teams investigate only about 25% of flagged claims, the freed capacity covers the previously uninvestigated tier, not a headcount cut. With SIU staffing growing just 1.4% in 2021-2022 per the Insurance Information Institute, capacity was never going to come from hiring. The productive move is reallocating investigators to higher-judgment work and the coverage gap.

Require an audit-trail-native platform where every decision the agent makes is logged with its sources, reasoning, and timestamps, and where the investigator can see, override, and document each step. That is what makes a case defensible in a deposition, EUO, or SAR filing. California 10 CCR 2698.36 requires the SIU to investigate each credible referral of suspected fraud - including automated or system-generated referrals - and to document a summary covering what facts indicated fraud, the suspected misrepresentations and who made them, materiality, witnesses, and supporting documentation. An autonomous agent that produces that documented decision on every flagged claim makes the duty achievable at 100% coverage rather than the roughly 25% manual teams reach. Set a dispute-logging process with the SIU director as final authority.

Start with one or two lines where referral volume is highest and red flags are well-defined - usually auto and workers compensation. Route the flags your detection vendors already produce; the agent ingests output from FRISS, Shift Technology, or Verisk ISO ClaimSearch rather than replacing them. The Coalition Against Insurance Fraud's 2024 benchmarking study notes a rise in automation within investigative referrals, and California's SIU regulation explicitly covers automated or system-generated referrals, so routing flagged claims to an agent is consistent with the regulatory frame. Set a human review gate: the agent investigates and produces an evidence pack, the investigator reviews and decides. Scope the pilot to enough cases - not fewer than about 1,000 - that the signal on cycle time and false-positive resolution is clean enough to support a scale decision.

Capture the unit's starting point before anything changes, or you cannot prove lift. The core six: investigation cycle time (manual baseline is 14+ days per case), coverage as a percent of flagged claims (typically about 25%), throughput per investigator per month (around 10 manually), backlog depth, the share of flagged claims that close without an investigation file, and your current quality-control sample rate. Add false-positive load - rules-based detection runs 60-85% false positives, which is much of what clogs the queue. The targets to move toward are 2-4 hours per case, 100% coverage, and 800+ cases per investigator per month. Baseline across your whole team; the 2024 benchmarking study found 70% of SIUs are fully remote, so distributed measurement matters.

They own different seats. The Claims VP owns the budget case, the procurement cycle, the loss-ratio story, and the month-12 board report - the money and the mandate. The SIU director owns the operational design: baselining the unit, scoping the pilot from the investigator's vantage, designing case routing and escalation, redefining the investigator role, setting governance and audit-trail review standards, running quality control, and meeting the credible-referral obligations under California 10 CCR 2698.36 and the antifraud-plan requirements of NAIC Model Act #680. The Claims VP decides whether to deploy and funds it; the SIU director decides how the unit actually runs after the agent is live. Both roles are required; neither substitutes for the other.

It is an operational lift report against the week-1 baseline, not a board ROI narrative. Show cycle time moving from 14+ days toward 2-4 hours per case on the piloted lines, coverage of flagged claims rising from roughly 25% toward 100%, throughput climbing from about 10 investigations per investigator per month toward the 800+ ceiling, and the share of flagged claims with a complete documented investigation file. Include the quality-control results: the agent-cleared sample reviewed, disputed outputs logged, and any routing-rule changes made. Close by refreshing the antifraud-plan filing to reflect the AI-assisted workflow under NAIC Model Act #680. Cost per case (roughly $2,500 manual versus $150 with AI) belongs in the Claims VP's and CFO's reporting, not the operator's lift report.

← More articles on the Hesper AI blog

See Hesper AI on your documents

Request a demo and we'll run an analysis on your real document samples.