A Vision Transformer (ViT) fine-tuned to distinguish AI-generated images from
real photographs. Benchmarked against four alternatives on our 217-image test set —
selected for its 94% accuracy and 0% false positive rate.
Known limitation: the classifier has a training cutoff. It scores
near 0% on generators released after its training data was collected — not because
the image is real, but because it has never seen that visual style. Frequency and
metadata signals provide supporting evidence when the classifier is uncertain.
| Generator |
Classifier |
| Stable Diffusion 1.x / 2.x / XL | ✅ |
| Midjourney v1–v5 | ✅ |
| DALL-E 2 | ✅ |
| StyleGAN / GAN-based | ✅ |
| Midjourney v6 / DALL-E 3 | ⚠️ partial |
| Flux 1.x (Black Forest Labs) | ❌ |
| Sora (OpenAI) | ❌ |
| GPT Image / DALL-E 3 | ❌ |
| Gemini / Imagen 3 | ❌ |
| Kling / VEO / Runway | ❌ |
Weight: 46%. Strong for generators up to ~2023; frequency and metadata signals provide supporting evidence for newer generators.
Analyses the DCT/FFT frequency spectrum of the image. AI generators produce
characteristic patterns in high-frequency noise — a kind of spectral fingerprint —
that differs systematically from the organic noise produced by a camera sensor.
This signal is difficult to remove without visibly degrading image quality.
Weight: 19%. Especially effective after JPEG compression.
Photo Response Non-Uniformity (PRNU): every camera sensor has a unique,
repeatable noise pattern caused by microscopic manufacturing differences.
Real photos retain this fingerprint; AI-generated images do not — they have
random or synthetic noise with no consistent sensor signature.
Weight: 11%. Most reliable when a reference camera is known.
Reads embedded metadata. Strong evidence of AI generation: software tags
naming an AI tool (Stable Diffusion, Midjourney, Firefly, etc.), or exact
pixel dimensions that match AI generator defaults (1024×1024, 1024×1792, etc.)
combined with no camera data. Strong evidence of real photo: presence of camera
make, model, lens, aperture, ISO, and shutter speed fields.
Also reads IPTC press/agency fields (credit line, copyright,
byline) embedded by photo agencies and government photo offices. When a press
credit is detected — e.g. "The White House", "AFP", "Getty Images" — the score
is adjusted downward and missing camera EXIF fields are treated as normal
(press photos are routinely stripped of sensor data by distribution systems).
Weight: 10%. Conclusive only when AI software tag is present.
Google DeepMind's SynthID embeds an imperceptible watermark into the frequency
domain of images generated by Imagen and Gemini. This detector looks for
SynthID-compatible patterns that survive resizing and moderate compression.
Weight: 5%. Specific to Google AI tools; low false-positive rate.
Checks for other invisible watermark patterns embedded by AI generators.
Several platforms (e.g. Stable Diffusion XL, Midjourney) optionally embed
low-amplitude watermarks that are detectable even after moderate image editing.
Acts as a tie-breaker; not weighted in the main score.
Compares the image against a local database of previously identified
AI-generated and manipulated images using perceptual hashing (pHash). A match —
even after cropping, resizing, or re-compression — returns an immediate positive
result with a link to the original fact-check. The database is updated daily
from Google Fact Check Tools, AFP, Snopes, PolitiFact, and the
Database for Known Fakes (DBKF) by Ontotext — which aggregates
22,000+ ClaimReviews from 35+ fact-checkers worldwide.
Effective only for images already in the database. Coverage grows with each daily update.
Every image analysed with sufficient confidence is automatically added to the
local community database. A perceptual hash of the image — not the image itself —
is stored alongside the verdict. Future uploads of the same image (even after
cropping or re-compression) will instantly return the stored verdict without
re-running the full pipeline. No images or personal data are stored or shared.
Grows automatically over time as more images are analysed.
Divides the image into a grid of regions and estimates the dominant light source
direction in each region using luminance gradients. In a real photograph, light
comes from one source — shadows and highlights are consistent across the scene.
AI generators, and composite images that paste subjects from different photos,
frequently produce lighting that is physically impossible: shadows pointing in
different directions, or highlights inconsistent with shadow placement.
Measures the angular spread and circular variance of gradient directions across
regions. A spread above 70° between regions flags inconsistent lighting.
Weight: 5%. Most effective on images with clear shadows or directional lighting. Skipped on flat/low-contrast images.
Detects images that mix AI-generated and real-photo regions — for example, a real
person's face pasted onto an AI-generated background, or an AI object inserted
into a real photograph. This is one of the most common manipulation techniques
in disinformation.
Divides the image into a 3×3 grid and computes three forensic signals per region:
Error Level Analysis (ELA) — how a region responds to re-compression;
noise variance — the texture randomness of the region;
and frequency energy ratio — the high-to-low frequency balance.
When these signals vary sharply between adjacent regions, the image is likely a composite.
Weight: 4%. When composite and shadow signals fire together, the ensemble raises the overall score to at least Likely AI.
Searches Google's image index for prior appearances of the uploaded image.
Runs in parallel with the forensic pipeline so it adds no extra waiting time.
Only exact matches are treated as meaningful — these mean the same
image (or a crop/resize of it) has been published somewhere before. A real news
photograph will typically have exact matches on agency sites, news outlets, or
official pages.
An AI-generated image created for disinformation often has zero exact
matches — it was made specifically for this campaign and never published
before. Zero matches for a supposed official portrait of a public figure is a
significant red flag.
Visually similar images (other photos with similar content — e.g.
other politicians in suits in front of EU flags) are shown separately as context,
not as matches. They do not indicate the image has been published before.
Does not contribute to the AI score — provides editorial context. Requires Google Cloud Vision API.