Hesper AI
BlogTechnical
TechnicalMarch 31, 2026·8 min read·Hesper AI Threat Research

How to tell if a PDF has been edited or tampered with

Edited PDFs leave forensic traces - but most are invisible to the naked eye. How to check PDF metadata, spot visual artifacts, and why pixel-level AI is the only reliable method at scale.

Every edited PDF leaves traces. The question is whether you know where to look - and whether your method can scale beyond a handful of documents per day. This guide covers the three layers of PDF tampering detection: metadata inspection, visual analysis, and pixel-level forensics. Each catches a different class of edits. None is sufficient on its own.

Why PDF tampering matters

PDFs are the default format for financial documents - invoices, bank statements, payslips, tax returns, contracts, insurance claims. Organisations make consequential decisions based on the assumption that these documents are authentic. When that assumption is wrong, the consequences range from approving fraudulent loans to paying fabricated insurance claims to reimbursing fake expenses.

The problem has accelerated sharply since 2023. According to ACFE data and Hesper internal analysis, AI-generated document fraud has increased 3,000% in three years. The tools are free, require no technical skill, and produce output that is visually indistinguishable from authentic documents. The cost of producing a convincing fake has collapsed to near zero.

73%
Of fraud cases involve tampered documents
Invoices, bank statements, payslips, IDs
3,000%
Increase in AI-generated document fraud
Since 2023
22-41%
Detection rate with traditional methods
Depending on sector and review depth
<30s
Time for AI to analyze a document
Across 200+ fraud signals

How PDFs get edited

Understanding editing methods matters because each leaves different forensic signatures. There are three categories, each progressively harder to detect.

Text-layer editing

PDFs store text as vector objects positioned on a page. Tools like Adobe Acrobat, Foxit, and numerous free editors allow direct modification of these text objects. Change a number, delete a line, add a paragraph. The visual result updates immediately. This is the simplest form of tampering and the easiest to detect - because the editing tool's metadata is typically embedded in the modified file.

Image-layer manipulation

The document is converted to an image (rasterized), edited at the pixel level using tools like Photoshop, GIMP, or AI inpainting, then re-exported as a PDF. This destroys the original text layer entirely. The resulting PDF contains a flat image with no selectable text - or has a new OCR-generated text layer overlaid on the modified image. This method leaves compression artifacts and pixel-level evidence but no metadata trail from the original edit.

From-scratch generation

The document was never real. It is generated from a template (HTML/CSS, Canva, or similar) or by an AI tool that produces the entire document content and layout. There is no "original" to compare against. These documents are internally consistent - all amounts reconcile, formatting is clean, and the content is plausible. Detection requires comparing the document's structure and metadata against known genuine templates for that document type.

Checking PDF metadata

The fastest way to check if a PDF has been modified is to inspect its metadata. Every PDF contains a set of document properties that record how it was created and whether it was subsequently modified.

In most PDF readers, open File > Properties (or Document Properties). Look for these fields:

FieldWhat it tells youSuspicious values
CreatorApplication that originally created the PDFMicrosoft Word, Canva, Chrome (for a "bank" statement)
ProducerLibrary or tool that rendered the PDFLibraries like iTextSharp, FPDF, wkhtmltopdf in a "bank" document
Creation DateWhen the file was first createdDates that don't match the statement period
Modification DateWhen the file was last savedAny date after creation - indicates post-creation editing
AuthorUser or system that authored the documentPersonal names, generic values, or blank when a bank should be named
Page CountNumber of pages in the documentMismatch with expected length for that document type

Metadata is easy to strip

Metadata inspection is useful but not reliable on its own. Many editing tools can strip or overwrite metadata before saving. A savvy fraudster will clear the modification date, set the creator to match the expected bank software, and remove all traces. Metadata catches careless edits - it does not catch careful ones.

Visual clues (and their limits)

If you zoom to 300-400% on a document and examine the areas most likely to be edited (amount fields, dates, names), you may spot visual evidence of tampering. These signs are subtle and require training to identify consistently.

  • Font inconsistencies - characters in edited regions may render at slightly different weights, sizes, or with different anti-aliasing than surrounding text
  • Baseline misalignment - replaced text may sit slightly above or below the original text baseline
  • Compression blocks - JPEG artifacts that form visible rectangular blocks around edited regions (visible when zoomed in on a rasterized PDF)
  • Color mismatches - the background colour behind edited text may differ slightly from the surrounding page
  • Spacing irregularities - character kerning or word spacing that differs from the document's native font metrics
  • Blurred boundaries - AI inpainting often produces slightly softer edges at the boundary of the edited region compared to the sharp rendering of original text

The limitation is obvious: these clues require significant zoom, careful attention, and experience. In a study of document reviewers, trained analysts detected visual manipulation artifacts 55% of the time when given unlimited time. Under time pressure (2 minutes per document), detection rates dropped below 30%. For AI-generated documents, visual detection rates are below 10% because the entire document is generated at consistent quality - there are no editing boundaries to find.

Pixel-level forensics

Pixel-level forensics is the only approach that reliably detects all three editing methods. It works by analyzing the raw image data of each page for statistical patterns that are invisible to humans but measurable by trained models.

The core techniques:

  1. Error Level Analysis (ELA) - re-saves the image at a known compression level and measures the difference. Edited regions show different error levels than the rest of the image because they were saved at a different compression stage.
  2. Noise analysis - every camera and scanner introduces a characteristic noise pattern. Edited regions disrupt this pattern because the inserted content was generated by a different source (an editing tool, a different document, or an AI model).
  3. Font rendering forensics - compares the rendering characteristics of each character against a database of known font renderers. Characters inserted by an editing tool render differently from characters placed by the original document generator.
  4. Compression artifact mapping - identifies the quantization tables and block boundaries in the image data. Edits create inconsistencies in these patterns that are mathematically detectable but invisible at any zoom level.
  5. Statistical distribution analysis - measures the histogram, entropy, and spatial frequency characteristics of regions within the document. Manipulated regions deviate from the expected statistical profile of the surrounding content.

These techniques work on all three editing methods. Text-layer edits leave font rendering artifacts. Image manipulation leaves compression and noise artifacts. Generated documents fail font rendering and statistical distribution checks because their rendering pipeline differs from genuine bank or corporate systems.

One API call, 200+ signals

Hesper AI runs all of these forensic checks - plus layout validation, metadata analysis, and cross-document comparison - in a single API call, returning results in under 30 seconds. See how the detection pipeline works.

Detection methods compared

MethodText-layer editsImage manipulationGenerated documentsTime per docScalable
Metadata inspectionCatches mostPartialCatches some1-2 minYes
Visual review (trained)Catches someCatches someRarely catches15-20 minNo
OCR + rule validationMisses allMisses allCatches some5-10 secYes
Pixel-level AICatches mostCatches mostCatches most<30 secYes

No single method catches everything. The strongest approach combines metadata inspection, OCR-based rule validation, and pixel-level forensics. This is what Hesper AI does - all three layers run in parallel on every document.

For a deeper look at why OCR misses manipulation, see why OCR alone isn't enough for document verification. To understand how these methods apply specifically to bank statements, see our guide on detecting fake bank statements.

Key takeaways

  • Every edited PDF leaves traces - but most are invisible to human reviewers, especially under time pressure.
  • Three editing methods exist: text-layer editing, image manipulation, and from-scratch generation. Each leaves different forensic signatures.
  • Metadata inspection is fast but easily defeated by stripping or overwriting document properties.
  • Visual inspection catches ~55% of edits with unlimited time, dropping below 30% under real-world time constraints.
  • Pixel-level forensics (compression analysis, noise patterns, font rendering) is the only reliable method across all editing types.
  • The strongest detection stack combines all three: metadata, OCR rules, and pixel-level AI - running in parallel in under 30 seconds.

Frequently asked questions

Start with metadata: open File > Properties in your PDF reader and check the Modification Date, Creator, and Producer fields. If the modification date is after the creation date, the file was edited after creation. If the Creator shows an unexpected application (like Microsoft Word for a bank statement), the document may not be genuine. For deeper analysis, zoom to 300-400% on amount fields and look for font inconsistencies, compression blocks, or baseline misalignment. For definitive results, use a pixel-level analysis tool that can detect artifacts invisible to the human eye.

Often, yes. Adobe Acrobat typically records itself in the PDF's Producer or Creator metadata field. It also modifies the document's internal cross-reference table in a characteristic way. However, a user who knows to strip metadata before saving can remove these traces. Pixel-level forensics can still detect the edit because Acrobat's text rendering engine produces subtly different character shapes than the original document generator - even when the same font is used.

Error Level Analysis re-saves an image at a known JPEG compression level and measures the difference between the original and re-saved version. In an unmodified image, all regions show similar error levels because they were compressed together. In an edited image, the manipulated region shows a different error level because it was introduced at a different compression stage. ELA is effective for detecting image-layer edits but does not catch text-layer modifications in vector PDFs.

Yes, though they require different detection methods than edited documents. AI-generated PDFs have no editing boundary to find - the entire document is synthetic. Detection relies on comparing the document's rendering characteristics against known genuine templates: font rendering engine fingerprints, PDF structure conventions used by specific banks or institutions, and statistical patterns in the pixel data that differ between AI generation pipelines and genuine document production systems. Hesper AI maintains a database of known document templates for this comparison.

You can perform basic checks for free: inspect metadata in any PDF reader (File > Properties), zoom to 300%+ and look for visual inconsistencies, and try selecting text (if text isn't selectable in a document that should have a text layer, it may have been rasterized to hide edits). Online tools like FotoForensics offer free Error Level Analysis for images. However, these methods have significant limitations - they catch careless edits but miss professional-quality forgeries and AI-generated documents. For reliable detection at any volume, pixel-level AI analysis is required.

Hesper AI runs five parallel detection layers on every document: metadata validation (checking creator, producer, modification dates, and internal structure), compression forensics (detecting regions saved at different quality levels), font rendering analysis (comparing character rendering against known font engines), noise and statistical analysis (measuring pixel distribution patterns for anomalies), and cross-document comparison (matching the document's structure against known genuine templates). Results are returned in under 30 seconds as a fraud score, verdict, and structured findings with pixel coordinates.

See Hesper AI on your documents

Request a demo and we'll run an analysis on your real document samples.