Hesper AI
BlogUse cases
Use casesMarch 14, 2026·7 min read·Hesper AI Threat Research

KYC document fraud: detecting fake IDs at fintech onboarding

Fintechs lose billions to fake IDs and forged income documents at onboarding. The fraud techniques targeting KYC workflows and how to stop them.

67%
Of synthetic identity fraud relies on document manipulation
Fabricated or altered IDs, payslips, and statements
$20B
Annual losses from synthetic identity fraud in the US
Federal Reserve and industry estimates, 2025
12%
Estimated fake document rate at high-growth fintechs
Based on post-deployment detection data
400%
Rise in AI-generated identity documents since 2024
Driven by generative AI tools and templates

The KYC fraud problem at fintechs

Know Your Customer workflows are the front door of every fintech. They determine who gets access to accounts, credit, and payment infrastructure. When that front door is breached by fraudulent documents, the downstream consequences are severe: credit losses, regulatory penalties, money laundering exposure, and reputational damage that can threaten a fintech's banking partnerships.

The scale of the problem has grown dramatically. Synthetic identity fraud - where criminals combine real and fabricated information to create entirely new identities - now accounts for an estimated $20 billion in annual losses in the United States alone, according to Federal Reserve research on synthetic identity payments fraud. And 67% of synthetic identity fraud cases rely on manipulated documents to pass the onboarding gate. The fraudster needs a fake ID, a forged payslip, or an altered bank statement to make the synthetic identity appear real.

The challenge is compounded by the speed at which fintechs operate. High-growth neobanks and lending platforms onboard thousands of customers per day, often with fully automated KYC flows. Manual document review at that scale is not feasible. And as we have documented in our analysis of document fraud statistics in 2026, the tools most commonly deployed - OCR validation, template matching, and basic liveness checks - were not built to detect the current generation of AI-produced fakes.

We assumed our IDV provider was catching document fraud. After running a parallel analysis with pixel-level detection, we found that 8% of approved applications in the prior quarter had submitted manipulated documents. The IDV checks had passed them all.

- Head of Risk, European neobank (anonymised)

Types of document fraud at onboarding

Document fraud at the KYC stage falls into four broad categories, each with distinct manipulation techniques and detection challenges.

Fake identity documents are the most direct attack vector. These range from crudely edited scans of real IDs - where the name or date of birth has been replaced - to fully AI-generated identity documents that were never issued by any government. The latter category has exploded since 2024, with generative AI tools capable of producing photorealistic driver's licences and passports complete with holograms, microprint patterns, and correct formatting for specific jurisdictions.

Altered bank statements are used to inflate balances or fabricate transaction histories, typically to meet minimum balance requirements or to demonstrate income. Forged payslips serve a similar purpose - they are submitted as proof of income during lending or account opening. For a detailed examination of payslip fraud techniques, see our post on detecting forged payslips with AI.

Synthetic identity documents combine elements from multiple sources - a real Social Security number paired with a fabricated name and a generated photo - into a single coherent identity package. FinCEN advisories on identity fraud note these are the hardest to catch because each individual document may look authentic in isolation. The fraud is only apparent when cross-document analysis reveals inconsistencies in pixel provenance, compression signatures, or rendering patterns.

Document typeCommon manipulationStandard IDV detectionPixel-level AI detection
Government-issued IDsName/DOB replacement, full AI generation, photo swapPartial - catches poor edits onlyDetects editing boundaries, generation artifacts, font anomalies
Bank statementsBalance inflation, transaction fabrication, template generation✗ Not examined by most IDV toolsDetects compression discontinuities, clone patterns, text rendering shifts
Payslips / pay stubsIncome inflation, employer fabrication, date alteration✗ Not examined by most IDV toolsDetects digit manipulation, font inconsistencies, layer boundaries
Synthetic identity docsCross-source composition, AI-generated photos, merged records✗ Each doc passes individuallyCross-document provenance analysis, generation signature matching

Why standard KYC and IDV tools miss document-level fakes

Standard identity verification tools were designed around a different threat model. They excel at three tasks: (1) confirming that a submitted ID matches a known template for its jurisdiction, (2) performing liveness checks to verify the person holding the ID is present, and (3) extracting and cross-referencing data fields against external databases. These are valuable checks. But none of them examine the pixel-level integrity of the document itself.

Template matching catches IDs that have the wrong layout, font, or field placement for their type. But a fraudster who starts with a real template - or who uses an AI tool trained on real templates - will produce a document that passes template validation perfectly. The manipulation is in the content within the template, not in the template structure.

The gap is even wider for supporting documents. Most IDV providers do not analyze bank statements or payslips at all - they are designed for identity documents specifically. A fintech that relies solely on its IDV provider has no automated check on the income and financial documents that accompany the ID. This is the same fundamental limitation we explored in why OCR alone is not enough for document fraud detection: the tools read what documents say, but they do not examine whether the documents have been altered.

How pixel-level detection works for KYC documents

Pixel-level document forensics operates on the raw image data, before any text extraction or template matching. It examines three categories of evidence that standard IDV tools cannot access.

Generation artifacts are the first category. AI-generated documents carry statistical signatures in their pixel distributions, noise patterns, and compression characteristics that differ from photographs of real documents. These signatures are invisible to the human eye but reliably detectable by models trained specifically on document forensics. A generated passport photo, for instance, will have different noise grain characteristics than a photo captured by a camera or scanner.

Editing signatures are the second category. When a document is opened in an editor and specific fields are modified - a name replaced, a digit changed, a balance inflated - the editing operation leaves traces in the pixel data. Compression discontinuities appear at the boundary between edited and unedited regions. Font rendering shifts occur when new text is overlaid. Clone stamp patterns appear when content is duplicated to fill space. These artifacts persist even after the document is re-saved or re-compressed.

Cross-document analysis is the third category. When a KYC submission includes multiple documents - an ID, a bank statement, and a payslip - pixel-level analysis can compare the provenance of each. Documents created by the same generation tool will share statistical fingerprints. Documents scanned on the same device will share noise characteristics. Inconsistencies across documents in a single submission are a strong fraud signal.

Integration patterns for fintech onboarding flows

For fintechs, the integration pattern is straightforward: add a pixel-level analysis step to your existing onboarding pipeline, alongside - not instead of - your current IDV checks. The architecture is API-first. When a document is uploaded during onboarding, it is sent simultaneously to your IDV provider and to the fraud detection API. Both return results within seconds. The combined signals give you both identity verification and document integrity verification.

  1. Customer uploads identity document, bank statement, and/or payslip during onboarding
  2. Documents are sent to your IDV provider (for identity checks) and to the fraud detection API (for pixel analysis) in parallel
  3. Fraud detection API returns a fraud score, verdict, and structured findings with pixel coordinates for each document
  4. Documents with scores below threshold continue through your normal approval flow
  5. Documents with scores above threshold are routed to a focused manual review queue with findings attached
  6. Cross-document analysis flags are surfaced when multiple documents in a submission show provenance inconsistencies

Integration considerations

For regulated fintechs, ensure your fraud detection provider supports zero document retention (documents are analyzed and discarded, never stored), structured audit trails for compliance reporting, and webhook-based async processing for high-volume flows. Most teams complete the API integration in a single sprint. The key architectural decision is whether to run pixel analysis in parallel with IDV (faster) or sequentially after IDV passes (lower API volume).

The parallel pattern is recommended for high-risk products (lending, credit cards) where the cost of a fraudulent approval is high. The sequential pattern - only running pixel analysis on documents that pass IDV - is appropriate for lower-risk products where false approval costs are lower and API volume is a concern.

Key takeaways

  • 67% of synthetic identity fraud relies on manipulated documents to pass KYC gates - fake IDs, forged payslips, and altered bank statements.
  • Standard IDV tools verify identity but do not examine pixel-level document integrity - they miss AI-generated fakes that use correct templates.
  • Pixel-level forensics detects three categories of evidence: generation artifacts, editing signatures, and cross-document provenance inconsistencies.
  • The integration pattern is API-first and runs alongside existing IDV checks - parallel for high-risk products, sequential for lower-risk flows.
  • High-growth fintechs report 8–12% fake document rates at onboarding when pixel-level detection is deployed retroactively on previously approved applications.

Frequently asked questions

The four main categories are: fake identity documents (edited or AI-generated IDs), altered bank statements (inflated balances or fabricated transactions), forged payslips (manipulated income figures), and synthetic identity documents (composited from multiple real and fabricated sources). Each requires different detection techniques, and standard IDV tools only partially address the first category.

Standard IDV tools are designed for identity verification - template matching, liveness detection, and database cross-referencing. They confirm that an ID matches the expected format for its jurisdiction and that the person presenting it is present. But they do not examine the pixel-level integrity of the document. A fake ID built on a correct template will pass template matching. And most IDV providers do not analyze bank statements or payslips at all.

OCR extracts text from a document and checks it for logical consistency - matching names, valid dates, amounts within ranges. Pixel-level detection analyzes the raw image before text extraction, looking for manipulation artifacts: compression discontinuities, font rendering anomalies, generation signatures, and clone stamp patterns. These two approaches are complementary. OCR catches logical errors; pixel analysis catches visual manipulation that produces logically consistent but fraudulent text.

Yes. AI-generated documents carry statistical signatures in their pixel data that differ from photographs of real documents. These include characteristic noise patterns, compression artifacts unique to generative models, and subtle rendering inconsistencies in fine details like microprint and hologram patterns. Detection models trained on document forensics identify these signatures reliably, even as generation quality improves - because the artifacts are inherent to the generation process.

The standard pattern is API-based: when documents are uploaded during onboarding, they are sent to the fraud detection API in parallel with the existing IDV provider. The fraud API returns a score, verdict, and pixel-coordinate findings within seconds. Documents below the fraud threshold continue through the normal flow. Documents above threshold are routed to manual review with findings attached. Most fintech engineering teams complete the integration in a single sprint.

See Hesper AI on your documents

Request a demo and we'll run an analysis on your real document samples.