---
title: "The general counsel's guide to AI fraud investigation: discovery, privilege, and admissibility"
description: "The Claims VP buys AI investigation for speed; the GC has to make it survive discovery, privilege, authentication, and a bad-faith theory. Here is the litigation spec."
date: "2026-06-30"
lastModified: "2026-06-30"
author: "Nitish Badu"
tags: ["Guides"]
canonical: "https://gethesperai.com/blog/general-counsel-ai-fraud-investigation-litigation/"
---

# The general counsel's guide to AI fraud investigation: discovery, privilege, and admissibility

> **TL;DR** When an AI-assisted claim file becomes a litigation exhibit, the general counsel inherits four exposures the buying team never priced: discovery of model logs after Estate of Lokken v. UnitedHealth Group, the work-product trap on routine investigations, authentication under FRE 901 and the business-records exception, and Daubert reliability. A detection score is a discovery liability with no evidentiary upside. An audit-trail-native investigation that logs sources, reasoning, and timestamps per decision is the reconstructable record those rules actually want.
>
> - Lokken made AI model design and per-claim logs discoverable
> - Routine AI investigation is a business record, not work product
> - FRE 901(b)(9) wants a process that 'produces an accurate result'

- **766 F. Supp. 3d 835** - Estate of Lokken v. UnitedHealth (D. Minn. 2025 - AI logs held discoverable)
- **FRE 901(b)(9)** - Authenticate a process or system (must show it 'produces an accurate result')
- **14+ days to 2-4 hrs** - Investigation time (the prompt-investigation defense under Model #900)
- **~25% to 100%** - Flagged-claim coverage (reasonable investigation of every flag, not a sample)

The Claims VP buys AI investigation for speed and coverage. The general counsel is the one who has to make it survive discovery, authentication, a hearsay objection, and a bad-faith theory the day a claim file becomes a litigation exhibit. Those are different jobs, and the gap between them is where most carriers are exposed right now. Speed of investigation is a procurement metric; defensibility of the resulting record is a litigation metric, and the two are not automatically aligned.

This guide is written to the legal seat, not the buyer. It walks through the four downstream exposures a GC inherits when AI touches a claims decision: discovery of model logs and the new retention duty, attorney-client privilege versus work product over a routine AI investigation, admissibility under Federal Rules of Evidence 901 and 803(6), and reliability under Rule 702 and Daubert. It closes with a concrete spec for what the audit trail has to contain to clear all of them at once.

The throughline is a distinction the detection vendors do not market: a risk score is a discovery liability with no evidentiary upside, while an audit-trail-native investigation is a reconstructable record that the same rules of evidence are built to admit. For the operational version of this argument - what makes an AI fraud finding defensible in the first place - see [the defensibility standard for fraud investigation AI](/blog/fraud-investigation-ai-defensibility-standard). This post is the litigation-machinery version of that standard.

## The GC inherits the AI investigation when a claim becomes an exhibit

The general counsel's exposure to AI claims investigation begins the moment a disputed claim moves toward litigation. At that point the AI record stops being an internal efficiency artifact and becomes a discoverable document, a potential exhibit, and a target for cross-examination. The same automation that compressed a 14+ day manual investigation into 2-4 hours, and lifted coverage from roughly 25% of flagged claims to 100%, is the automation a plaintiff's counsel will later ask the carrier to explain under oath.

That upside is real and worth keeping. Investigating every flagged claim quickly, rather than triaging down to the roughly one in four a stretched SIU team can reach, is itself a defense to an unfair-claims-practices theory. The problem is that the legal surface scales with the automation. Once an AI system materially influenced a decision, the carrier owns four distinct exposures, and each maps to a different body of law.

1. Discovery: the model's design records, training intent, override guidance, and per-claim outputs are increasingly discoverable in bad-faith and coverage litigation.
2. Privilege: a routine AI investigation run on every flagged claim is an ordinary business record, not work product, and attempts to cloak it invite waiver and a bad-faith inference.
3. Admissibility: the investigation record must be authenticated under FRE 901 and qualify under the FRE 803(6) business-records exception, while the method itself may face Rule 702 and Daubert.
4. Bad faith: under the unfair-claims-practices framework, a denial needs a reasonable, prompt, individualized investigation - and AI standing in for human judgment is the failure mode.

This is the GC's distinctive vantage point. The compliance officer signs off on the tool before deployment; the SIU director defends the finding in deposition; the GC owns the file once it is an exhibit. For the pre-deployment regulatory sign-off that sits upstream of this post, see [the compliance officer's AI investigation deployment guide](/blog/compliance-officer-ai-investigation-deployment). The two roles share the same file from opposite ends.

## Discovery: assume the logs are discoverable

Discovery is the first exposure, and the safe assumption is the broad one: once an AI tool materially influenced a claims decision, its model documentation, design intent, override guidance, and per-claim outputs are discoverable in bad-faith and coverage litigation. The federal scope rule, [FRCP 26(b)(1)](https://www.law.cornell.edu/rules/frcp/rule_26), reaches any non-privileged matter relevant to a claim or defense and proportional to the needs of the case. AI claims records fit that test cleanly.

The leading example is Estate of Lokken v. UnitedHealth Group, Inc., 766 F. Supp. 3d 835 (D. Minn. 2025). The court let a bad-faith claim proceed on the allegation that claim handling was, per the [Zelle analysis of the case](https://www.zellelaw.com/AI_Tools_and_Bad_Faith_Risk_in_Insurance_Claim_Handling_Lessons_from_Lokken), dominated by automated outputs despite promises of individualized expert or professional review. On the discovery side, the same court allowed the plaintiffs, per the [National Law Review report of the order](https://natlawreview.com/article/court-allows-discovery-insurers-use-ai-deny-claims) (No. 23-CV-3514, D. Minn.), to obtain discovery into how the AI tool worked, its development goals and anticipated benefit, and whether it was designed to supplant physician decision-making.

Coverage-litigation commentary points the same direction. A [National Law Review piece on coverage litigation in the age of AI](https://natlawreview.com/article/imagining-coverage-litigation-age-artificial-intelligence) observes that discovery in these disputes targets the source and the mechanism for reaching an adverse coverage determination, and advises insurers to be ready to document AI training, bias auditing, policies and procedures, and output monitoring. The discoverability question is settled enough that the GC's real decision is not whether to produce, but what the produced record looks like.

This is where the detection-versus-investigation distinction does legal work. A risk score from an upstream detection layer forces the carrier to produce model documents it often cannot tie back to the specific claim - the score's features are not a per-decision trail. An audit-trail-native investigation that runs 15+ phases and logs sources, reasoning, and timestamps for each one produces a record that answers the discovery demand on its own terms. One deepens the exposure; the other narrows it.

| Evidentiary stage | What the rule demands | Where a detection score fails | Where an audit-trail-native investigation clears it |
| --- | --- | --- | --- |
| Discovery (FRCP 26) | Relevant, proportional model and decision records | Model docs must be produced but cannot be tied to the specific claim | Per-claim record answers 'what did the system find and why' directly |
| Privilege (FRCP 26(b)(3)) | Materials prepared in anticipation of litigation | Routine score is plainly ordinary-course; no protection either way | Routine investigation framed as a business record built to be produced |
| Authentication (FRE 901) | Show the process 'produces an accurate result' | Non-reproducible features; no per-claim accuracy showing | Defined phases plus contemporaneous logs are reproducible |
| Business records (FRE 803(6)) | Records kept in the regular course; trustworthy | Opaque conclusion fails the (E) trustworthiness test | Custodian or SIU lead lays the foundation from the logged chain |
| Reliability (FRE 702 / Daubert) | Tested method, known error rate, reliable application | 60-85% false-positive rate is the error-rate argument | Phase-structured, logged application to the facts of this claim |
| Bad faith (Model #900) | Reasonable, prompt, individualized investigation | A flag is not an investigation | 100% coverage in hours with documented human sign-off |

### Spoliation and the new retention duty

Discovery brings a retention duty that did not exist before AI entered the decision chain. A litigation hold now has to reach model-call logs, prompts, model or phase versions, and the per-decision reasoning trail - all of which are discoverable artifacts. If those records are ephemeral by design, a carrier risks a spoliation argument over materials it never set out to keep. The fix is architectural: design the audit trail to be retained and reconstructable, not transient. A system that logs every phase as a durable record makes the hold a matter of preserving what already exists rather than reconstructing what was lost.

## Privilege and work product: the anticipation-of-litigation line

Privilege is the exposure most likely to be mishandled, because the instinct to protect the AI investigation file usually backfires. A routine AI investigation run automatically on every flagged claim is an ordinary-course business activity, not litigation-driven work, so it is generally discoverable and not work product. Work-product protection under [FRCP 26(b)(3)](https://www.law.cornell.edu/rules/frcp/rule_26) covers materials prepared in anticipation of litigation by or for a party or its representative - and it expressly extends to the insurer or its agent. That is the hook, and it is also the limit.

The limit matters because routine claims investigation does not happen in anticipation of litigation; it happens because the carrier investigates claims as a matter of course. Trying to cloak that work as privileged invites two bad outcomes. First, a court can find waiver and order production anyway, often after the carrier has signaled it had something to hide. Second, an aggressive privilege posture can support a bad-faith inference - that the carrier was building a litigation file instead of fairly adjusting the claim.

> **The honest posture beats the protective one**
>
> Treat the routine AI investigation as a business record built to be produced and to look good when it is. Reserve work-product and privilege for the genuinely litigation-triggered escalation - the case where counsel directs additional work because litigation is actually anticipated. Mixing the two is what creates waiver and dual-purpose-document fights. The defensible AI record is one you would be comfortable handing to opposing counsel, because that is exactly what will happen.

### Waiver and the dual-purpose document problem

Two waiver risks recur with AI investigations. The first is mixing counsel's mental impressions into the routine investigation file. FRCP 26(b)(3) gives heightened protection to mental impressions, conclusions, and legal theories, but blending them into a record that is otherwise an ordinary business document creates a dual-purpose document that courts can compel in whole or in part. The second is over-claiming - asserting privilege across the entire AI investigation set, which a court can treat as a waiver of the protection that would have attached to the genuinely privileged subset. The cleaner architecture keeps the routine investigation record and the litigation-driven attorney work product as separate, identifiable layers from the start.

## Authentication and the business-records exception

Admissibility starts with two gates: authentication under FRE 901 and the hearsay exception under FRE 803(6). To get an AI investigation into evidence - or to rebut a claimant's challenge to a denial - the carrier authenticates the record and qualifies it as a business record. [FRE 901(a)](https://www.law.cornell.edu/rules/fre/rule_901) requires evidence sufficient to support a finding that the item is what the proponent claims, and 901(b)(9) speaks directly to computer processes: authentication of a process or system requires evidence describing it and showing that it produces an accurate result.

That standard is the reproducibility requirement most AI tools quietly fail. You cannot show a black-box score produces an accurate result for the specific claim if its features are not reconstructable per decision. A defined, phase-structured investigation with contemporaneous logs is reproducible - you can describe the process and demonstrate that, run on this claim's inputs, it produces this claim's record.

The report then comes in under [FRE 803(6)](https://www.law.cornell.edu/rules/fre/rule_803), the business-records exception. It requires the record to be made at or near the time by someone with knowledge, kept in the course of a regularly conducted activity, made as a regular practice, and shown by a custodian or other qualified witness. An AI investigation that runs on every flagged claim is, by definition, kept in the course of a regularly conducted activity made as a regular practice - it satisfies 803(6)(B) and (C) on its face. The hard part is 803(6)(E): the record is excluded if the source of information or the method or circumstances indicate a lack of trustworthiness.

> The trustworthiness escape hatch in 803(6)(E) is where opaque AI dies and reconstructable investigation survives. If your SIU lead can sit in the witness chair and walk the chain - here is the source, here is the reasoning, here is the timestamp, here is where I reviewed and signed off - the record holds. If the only honest answer is 'the model returned a number,' the trustworthiness objection has somewhere to land.
>
> - Hesper AI product research, Q2 2026

The practical requirement is a custodian or SIU lead who can lay the foundation from a record that is actually readable. The 15+ logged phases give that witness something concrete to authenticate and a contemporaneous chain to describe. For the operational mechanics of producing that record at speed, see [how AI produces an audit-ready fraud report without sacrificing speed](/blog/audit-ready-fraud-report-speed-ai).

## Daubert, Frye, and the AI method itself

When the AI method is offered through expert testimony, it faces a third gate: reliability under FRE 702 and the Daubert standard. [FRE 702](https://www.law.cornell.edu/rules/fre/rule_702), as amended December 1, 2023, requires the proponent to show by a preponderance of the evidence that the testimony rests on sufficient facts, is the product of reliable principles and methods, and reflects a reliable application of those methods to the facts of the case. The amendment sharpened the last point: 702(d) now makes the reliable application to this case an explicit, court-decided question rather than a matter of weight.

The [Daubert standard](https://www.law.cornell.edu/wex/daubert_standard) (Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993)) gives the trial court five gatekeeping factors: whether the technique can be and has been tested, whether it has been peer reviewed, its known or potential error rate, the existence of standards controlling its operation, and general acceptance. Two of those - testability and error rate - are exactly where AI methods struggle.

| Detection score vs. audit-trail-native investigation against the evidentiary tests | Value | Share |
| --- | --- | --- |
| Discoverable per-claim reasoning | Score: weak / Investigation: yes | 50% |
| Reproducible (FRE 901(b)(9)) | Score: weak / Investigation: yes | 50% |
| Custodian can lay 803(6) foundation | Score: weak / Investigation: yes | 50% |
| Known error rate helps under Daubert | Score: 60-85% FP / Investigation: per-claim | 50% |
| Supports individualized-review defense | Score: no / Investigation: yes | 50% |

A rules-based detection score with a 60-85% false-positive rate hands the opposing expert the error-rate argument before the carrier says a word. A non-deterministic, prompt-only black box is hard to call testable, because the same inputs may not reproduce the same output. A defined, deterministic, phase-structured investigation with logged reasoning is testable and gives the court a reconstructable application to the facts of this specific claim - the 702(d) requirement the 2023 amendment foregrounded.

One state-law caveat the GC should hold. Daubert is the federal standard, but several states - including New York, Pennsylvania, Illinois, and Washington - still apply the Frye general-acceptance test. Because much coverage and bad-faith litigation is in state court, the carrier may face the older general-acceptance bar instead. A method that is novel, opaque, and not generally accepted in the field is more vulnerable under Frye, not less, which is another reason the defensible posture leans on a documented, conventional investigation process rather than the proprietary scoring model.

## Bad faith: speed and documentation as the defense

Bad faith is where the AI record becomes an affirmative defense rather than a liability - if it is built correctly. Under the unfair-claims-practices framework reflected in the NAIC [Unfair Claims Settlement Practices Act (Model #900)](https://content.naic.org/sites/default/files/model-law-900.pdf), an insurer must adopt and implement reasonable standards for the prompt investigation of claims and cannot refuse to pay a claim without conducting a reasonable investigation. AI that fully investigates flagged claims in 2-4 hours rather than 14+ days, and lifts coverage from roughly 25% to 100% of flags, strengthens the reasonable-and-prompt-investigation defense directly.

The danger is the inverse pattern, and Lokken is the warning. The bad-faith theory there was that claim handling was dominated by automated outputs despite promises of individualized expert or professional review. AI used to replace individualized judgment, with weak override governance, fuels an unreasonableness argument rather than rebutting it. The line between defense and liability is whether a human could actually read the AI finding and override it - and whether the record shows the human did so.

This is why the human-in-the-loop posture is doing legal work, not optics. The defensible design is a documented human sign-off on a finding the reviewer could genuinely evaluate, in a system where the investigator's role shifts from execution to decision-making. A detection score handed to an SIU team that can only reach a quarter of its flags produces the opposite - a nominal review that the volume makes pro forma. The Sierra Health bad-faith verdict and the NAIC AI Model Bulletin sit behind this point; both are covered in [the defensibility standard post](/blog/fraud-investigation-ai-defensibility-standard), which is the place to read that argument in full.

> **Under-investigating to dodge bad faith is its own exposure**
>
> Some carriers quietly cap investigation to avoid building a record that could be used against them. That logic inverts under Model #900: refusing to investigate is the unreasonable-investigation problem, not the cure. At roughly $150 per AI investigation versus $2,500 manually, investigating every flag fully is now cheaper than the selective-investigation posture that creates bad-faith exposure.

## What a defensible AI investigation record must contain

A defensible AI investigation record is one that clears discovery, privilege, authentication, the business-records exception, and Daubert at the same time - because the same underlying property satisfies all of them. That property is a contemporaneous, reconstructable, per-decision audit trail. The spec below maps each requirement back to the rule it answers, so the GC can hand it to the buying team as an acceptance criterion rather than a wish list.

| Requirement in the record | Rule it satisfies | What the record must show |
| --- | --- | --- |
| Per-decision sources logged | FRE 901(b)(9), 803(6) | Which data and documents the finding drew on, per phase |
| Reasoning chain captured | FRE 803(6)(E), 702(d) | How the sources connected to the conclusion, reconstructably |
| Timestamps on every phase | FRE 803(6)(A) | The record was made at or near the time of the activity |
| Model or phase version recorded | FRE 702, retention | Which version produced this finding, for reproducibility and holds |
| Human review and override visible | Model #900, Lokken | A reviewer could read and override; sign-off is in the same chain |
| Retention-ready, durable logs | FRCP 26, spoliation | Logs preserved rather than ephemeral, ready for a litigation hold |
| Deterministic phase structure | FRE 702 / Daubert | Testable method with a reconstructable application to this claim |

Two design notes for the GC. First, the human review must live in the same chain as the AI findings, not in a separate sign-off log, so the record shows the reviewer acted on what the AI actually produced. Second, the record has to be built to be produced. The defensible posture across discovery and privilege is the same record either way: an investigation file you would be comfortable handing to opposing counsel because it reads as a thorough, individualized, reasonable investigation.

None of this displaces the detection layer. Hesper AI sits downstream of detection and is complementary to FRISS, Shift Technology, and Verisk - not a replacement. Detection is upstream and produces the flag; investigation is downstream and produces the record a court tests. The carrier runs both, and the move from fraud detection to fraud resolution is precisely the move from a score that is a discovery liability to a record that is an evidentiary asset.

*Figure: From flag to exhibit: how a per-decision audit trail clears discovery, authentication, 803(6), and Daubert at each stage. Source: Hesper AI Research.*

## Key takeaways

- Once AI materially influences a claims decision, the general counsel inherits four exposures the buying team never priced: discovery, privilege, admissibility, and bad faith.
- After Estate of Lokken v. UnitedHealth Group (766 F. Supp. 3d 835, D. Minn. 2025), AI model design records, training intent, override guidance, and per-claim outputs are discoverable in bad-faith and coverage litigation.
- A routine AI investigation run on every flagged claim is an ordinary business record, not work product, and trying to cloak it as privileged invites waiver and a bad-faith inference.
- FRE 901(b)(9) requires showing the process produces an accurate result, FRE 803(6)(E) excludes untrustworthy records, and Daubert tests error rate and reproducibility - all of which a reconstructable per-decision audit trail satisfies and an opaque score does not.
- Investigating 100% of flagged claims in 2-4 hours with documented human sign-off is a reasonable-and-prompt-investigation defense under Model #900; AI standing in for individualized judgment is the failure mode Lokken describes.

For related reading across this cluster, see [the defensibility standard for fraud investigation AI](/blog/fraud-investigation-ai-defensibility-standard) and [the compliance officer's AI investigation deployment guide](/blog/compliance-officer-ai-investigation-deployment).

## Frequently asked questions

### Are an insurer's AI claims-decision logs discoverable in bad-faith litigation?

Yes, increasingly. In Estate of Lokken v. UnitedHealth Group (766 F. Supp. 3d 835, D. Minn. 2025), the court allowed plaintiffs discovery into how the insurer's AI tool worked, its development goals, and whether it was designed to supplant professional judgment. Coverage-litigation commentary notes discovery in these disputes targets the source and the mechanism for an adverse decision, so model documentation, training materials, override guidance, and per-claim outputs are all in play once AI materially influenced the decision. The practical takeaway for a GC is to assume the AI record is discoverable and design it to read as a clean, reconstructable investigation. An audit-trail-native system that logs sources, reasoning, and timestamps per decision answers the demand. An opaque score forces you to produce model documents you cannot tie back to the specific claim.

### Is an AI-assisted fraud investigation protected by work-product or attorney-client privilege?

Usually not, if it runs as a routine business process. Work-product protection under FRCP 26(b)(3) covers materials prepared in anticipation of litigation, and it explicitly extends to an insurer or agent. But an AI investigation run automatically on every flagged claim is an ordinary-course business activity, not litigation-driven, so it is generally discoverable. Trying to cloak routine investigations as privileged invites two bad outcomes: a court finding of waiver, and a bad-faith inference that you were building a litigation file instead of fairly adjusting the claim. The defensible posture is to treat the routine AI investigation as a business record built to be produced and to look good when it is, and to reserve privilege for genuinely litigation-triggered escalations where counsel directs the work.

### Can an AI fraud finding be admitted into evidence?

Yes, if it is authenticated and qualifies under a hearsay exception. Authentication runs through FRE 901; for a computer process, 901(b)(9) requires evidence describing a process or system and showing that it produces an accurate result. The investigation report itself typically comes in under the FRE 803(6) business-records exception, which needs the record made at or near the time, kept in the regular course of business, as a regular practice, and laid by a custodian or qualified witness. The catch is 803(6)(E): the record is excluded if the method or circumstances indicate a lack of trustworthiness. A reconstructable, contemporaneously logged investigation clears that bar; a black-box conclusion an SIU lead cannot explain does not. This is why an auditable record beats an opaque score for admissibility.

### Does an AI fraud-detection method have to pass a Daubert challenge?

If the method is offered through expert testimony, yes. FRE 702, amended December 1, 2023, requires the proponent to show by a preponderance that the testimony rests on reliable principles and methods and reflects a reliable application of those methods to the facts of the case. Daubert v. Merrell Dow Pharmaceuticals (509 U.S. 579, 1993) gives courts five gatekeeping factors, including whether the technique can be tested and its known or potential error rate. AI struggles most on error rate and reproducibility - a rules-based detection score with a 60-85% false-positive rate hands the opposing expert the error-rate argument. A defined, deterministic, phase-structured investigation with logged reasoning is testable and reconstructable for the specific claim. Note that some states still apply the Frye general-acceptance standard instead.

### How does using AI in claims handling affect bad-faith exposure?

It cuts both ways. Under the NAIC Unfair Claims Settlement Practices Act (Model #900), an insurer must adopt reasonable standards for the prompt investigation of claims and cannot refuse to pay without a reasonable investigation. AI that fully investigates flagged claims in hours rather than weeks, and lifts coverage from roughly 25% to 100% of flags, strengthens the reasonable-and-prompt-investigation defense. The danger is the opposite pattern. In Lokken, the bad-faith theory was that claim handling was dominated by automated outputs despite promises of individualized expert or professional review. AI used to replace individualized judgment, with weak override governance, fuels an unreasonableness argument. The defensible design is a documented human sign-off on a finding the reviewer could actually read and override.

### What records should an insurer keep to defend an AI-assisted claim decision?

Keep a contemporaneous, per-decision audit trail: the sources the investigation drew on, the reasoning that connected them, timestamps, the model or phase version used, and a visible record of the human reviewer's sign-off or override. This is what lets a custodian or SIU lead lay the FRE 803(6) foundation, satisfies FRE 901(b)(9)'s produces-an-accurate-result requirement, and supplies the testability a Daubert inquiry wants. It is also what a litigation hold now has to preserve - model-call logs and prompts are discoverable artifacts, so retention has to be designed in, not left ephemeral. National Law Review commentary advises insurers to be ready to document AI training, policies and procedures, bias auditing, and output monitoring. A system that is audit-trail-native by design produces these records as a byproduct of running.

### Is a fraud-detection score enough to defend a claim denial in court?

Generally no. A detection score from a tool like FRISS, Shift, or Verisk tells you a claim is suspicious; it is not, by itself, a documented investigation. Detection sits upstream; investigation sits downstream. The score's model features are rarely reconstructable as a per-claim decision trail, so you cannot easily authenticate it under FRE 901(b)(9) or rebut a trustworthiness challenge under FRE 803(6)(E), and the high false-positive rate invites a Daubert error-rate attack. Under Model #900, a denial needs a reasonable, individualized investigation, not just a flag. The defensible artifact is the downstream investigation that takes the flag and produces a reconstructable finding the carrier acted on. Detection and investigation are complementary layers; carriers run both, and only the investigation layer produces the record a court tests.
