Every edited PDF leaves traces. The question is whether you know where to look - and whether your method can scale beyond a handful of documents per day. This guide covers the three layers of PDF tampering detection: metadata inspection, visual analysis, and pixel-level forensics. Each catches a different class of edits. None is sufficient on its own.
Why PDF tampering matters
PDFs are the default format for financial documents - invoices, bank statements, payslips, tax returns, contracts, insurance claims. Organisations make consequential decisions based on the assumption that these documents are authentic. When that assumption is wrong, the consequences range from approving fraudulent loans to paying fabricated insurance claims to reimbursing fake expenses.
The problem has accelerated sharply since 2023. According to ACFE data and Hesper internal analysis, AI-generated document fraud has increased 3,000% in three years. The tools are free, require no technical skill, and produce output that is visually indistinguishable from authentic documents. The cost of producing a convincing fake has collapsed to near zero.
How PDFs get edited
Understanding editing methods matters because each leaves different forensic signatures. There are three categories, each progressively harder to detect.
Text-layer editing
PDFs store text as vector objects positioned on a page. Tools like Adobe Acrobat, Foxit, and numerous free editors allow direct modification of these text objects. Change a number, delete a line, add a paragraph. The visual result updates immediately. This is the simplest form of tampering and the easiest to detect - because the editing tool's metadata is typically embedded in the modified file.
Image-layer manipulation
The document is converted to an image (rasterized), edited at the pixel level using tools like Photoshop, GIMP, or AI inpainting, then re-exported as a PDF. This destroys the original text layer entirely. The resulting PDF contains a flat image with no selectable text - or has a new OCR-generated text layer overlaid on the modified image. This method leaves compression artifacts and pixel-level evidence but no metadata trail from the original edit.
From-scratch generation
The document was never real. It is generated from a template (HTML/CSS, Canva, or similar) or by an AI tool that produces the entire document content and layout. There is no "original" to compare against. These documents are internally consistent - all amounts reconcile, formatting is clean, and the content is plausible. Detection requires comparing the document's structure and metadata against known genuine templates for that document type.
Checking PDF metadata
The fastest way to check if a PDF has been modified is to inspect its metadata. Every PDF contains a set of document properties that record how it was created and whether it was subsequently modified.
In most PDF readers, open File > Properties (or Document Properties). Look for these fields:
Metadata is easy to strip
Metadata inspection is useful but not reliable on its own. Many editing tools can strip or overwrite metadata before saving. A savvy fraudster will clear the modification date, set the creator to match the expected bank software, and remove all traces. Metadata catches careless edits - it does not catch careful ones.
Visual clues (and their limits)
If you zoom to 300-400% on a document and examine the areas most likely to be edited (amount fields, dates, names), you may spot visual evidence of tampering. These signs are subtle and require training to identify consistently.
- Font inconsistencies - characters in edited regions may render at slightly different weights, sizes, or with different anti-aliasing than surrounding text
- Baseline misalignment - replaced text may sit slightly above or below the original text baseline
- Compression blocks - JPEG artifacts that form visible rectangular blocks around edited regions (visible when zoomed in on a rasterized PDF)
- Color mismatches - the background colour behind edited text may differ slightly from the surrounding page
- Spacing irregularities - character kerning or word spacing that differs from the document's native font metrics
- Blurred boundaries - AI inpainting often produces slightly softer edges at the boundary of the edited region compared to the sharp rendering of original text
The limitation is obvious: these clues require significant zoom, careful attention, and experience. In a study of document reviewers, trained analysts detected visual manipulation artifacts 55% of the time when given unlimited time. Under time pressure (2 minutes per document), detection rates dropped below 30%. For AI-generated documents, visual detection rates are below 10% because the entire document is generated at consistent quality - there are no editing boundaries to find.
Pixel-level forensics
Pixel-level forensics is the only approach that reliably detects all three editing methods. It works by analyzing the raw image data of each page for statistical patterns that are invisible to humans but measurable by trained models.
The core techniques:
- Error Level Analysis (ELA) - re-saves the image at a known compression level and measures the difference. Edited regions show different error levels than the rest of the image because they were saved at a different compression stage.
- Noise analysis - every camera and scanner introduces a characteristic noise pattern. Edited regions disrupt this pattern because the inserted content was generated by a different source (an editing tool, a different document, or an AI model).
- Font rendering forensics - compares the rendering characteristics of each character against a database of known font renderers. Characters inserted by an editing tool render differently from characters placed by the original document generator.
- Compression artifact mapping - identifies the quantization tables and block boundaries in the image data. Edits create inconsistencies in these patterns that are mathematically detectable but invisible at any zoom level.
- Statistical distribution analysis - measures the histogram, entropy, and spatial frequency characteristics of regions within the document. Manipulated regions deviate from the expected statistical profile of the surrounding content.
These techniques work on all three editing methods. Text-layer edits leave font rendering artifacts. Image manipulation leaves compression and noise artifacts. Generated documents fail font rendering and statistical distribution checks because their rendering pipeline differs from genuine bank or corporate systems.
One API call, 200+ signals
Hesper AI runs all of these forensic checks - plus layout validation, metadata analysis, and cross-document comparison - in a single API call, returning results in under 30 seconds. See how the detection pipeline works.
Detection methods compared
No single method catches everything. The strongest approach combines metadata inspection, OCR-based rule validation, and pixel-level forensics. This is what Hesper AI does - all three layers run in parallel on every document.
For a deeper look at why OCR misses manipulation, see why OCR alone isn't enough for document verification. To understand how these methods apply specifically to bank statements, see our guide on detecting fake bank statements.
Key takeaways
- Every edited PDF leaves traces - but most are invisible to human reviewers, especially under time pressure.
- Three editing methods exist: text-layer editing, image manipulation, and from-scratch generation. Each leaves different forensic signatures.
- Metadata inspection is fast but easily defeated by stripping or overwriting document properties.
- Visual inspection catches ~55% of edits with unlimited time, dropping below 30% under real-world time constraints.
- Pixel-level forensics (compression analysis, noise patterns, font rendering) is the only reliable method across all editing types.
- The strongest detection stack combines all three: metadata, OCR rules, and pixel-level AI - running in parallel in under 30 seconds.