Skip to main content
Transparency

Detection Methodology

How Aiscern detects AI-generated content — the models, signals, accuracy benchmarks, and known limitations explained openly.

Accuracy Benchmarks

v4.0.0Last validated: April 2026

Benchmarks measured on held-out test sets not used during training. Figures represent overall accuracy (true positive rate + true negative rate averaged). Test sets include content from all major AI generators available at the time of evaluation.

Text Detection

HuggingFace roberta-base-openai-detector + Gemini 2.0 Flash ensemble

~85%
Image Detection

EfficientNet-B4 fine-tuned on Midjourney/DALL-E/SD datasets

~82%
Audio Detection

Wav2Vec2 + spectral fingerprint classifier

~79%
Video Detection

Frame-sampled image detection + temporal consistency analysis

~76%

* Accuracy varies by content type, generator, and compression level. Figures updated with each model version release. See Limitations section below.

Ensemble Approach

No single signal reliably distinguishes AI content from human content across all edge cases. Aiscern combines multiple independent signals through a trained ensemble model. Each signal is weighted based on its empirically measured reliability for the specific content type, then combined into a single confidence score.

Text Signals

Perplexity score

Measures how statistically predictable each word choice is. AI text scores low; human writing scores high.

Burstiness

Variation in sentence length and complexity. Human writing has high burstiness; AI tends toward uniformity.

Vocabulary diversity

Ratio of unique words to total words. AI frequently reuses high-frequency vocabulary.

Structural patterns

AI text tends toward balanced paragraph lengths and consistent heading hierarchies uncommon in natural writing.

Model fingerprint

Specific token-choice patterns associated with known LLMs, detected via trained classifier.

Image Signals

Frequency artifacts

Fourier-domain analysis reveals the periodic artifacts left by diffusion model upsampling steps.

Facial geometry

Geometric consistency of landmarks — eye spacing, ear symmetry, catchlight positions.

Background coherence

Shadows, reflections, and perspective consistency between foreground subjects and background.

EXIF metadata

AI images lack camera EXIF data. Absence of shutter speed, ISO, and GPS is a strong signal.

Compression signature

JPEG blocking artifacts appear in atypical locations in AI images vs. real photography.

Interpreting Confidence Scores

90–100%

Very High — AI

Strong ensemble agreement. Multiple independent signals all point to AI generation.

70–89%

High — Likely AI

Most signals indicate AI. Some ambiguity — review flagged signals before acting.

45–69%

Uncertain

Signals are mixed. Do not use this result as evidence of AI use without additional review.

20–44%

Likely Human

Most signals point to human authorship. Low probability of AI generation.

0–19%

Very High — Human

Strong ensemble agreement on human origin. Multiple signals confirm natural content.

Known Limitations

We publish these limitations openly because we believe responsible use of AI detection requires honest understanding of what it cannot do. Never use a single detection result as sole evidence for high-stakes decisions.

  • Short text (under 150 words) has insufficient signal for reliable classification
  • Non-native English speakers may trigger false positives due to constrained vocabulary patterns
  • Heavily compressed images (< 50KB) lose frequency artifacts detectors rely on
  • AI content edited by humans after generation reduces detectability significantly
  • Hybrid content (AI inpainting on real photos) is currently below 70% accuracy
  • Very short audio clips (< 5 seconds) provide insufficient spectral data
  • Novel AI generators released after our last model update may evade detection until the next fine-tune

Model Update Cadence

Detection models are retrained quarterly or whenever a major new AI generator reaches significant market penetration. Model versions are tracked in our changelog. The accuracy figures on this page reflect the most recent production model. Fine-tuning data is sourced from public benchmarks, synthetic test sets, and anonymized user feedback (opt-in only).

Try Detection Free →

No account required. Core features free during early access.