Detection Methodology
How Aiscern detects AI-generated content — the models, signals, accuracy benchmarks, and known limitations explained openly.
Accuracy Benchmarks
Benchmarks measured on held-out test sets not used during training. Figures represent overall accuracy (true positive rate + true negative rate averaged). Test sets include content from all major AI generators available at the time of evaluation.
HuggingFace roberta-base-openai-detector + Gemini 2.0 Flash ensemble
EfficientNet-B4 fine-tuned on Midjourney/DALL-E/SD datasets
Wav2Vec2 + spectral fingerprint classifier
Frame-sampled image detection + temporal consistency analysis
* Accuracy varies by content type, generator, and compression level. Figures updated with each model version release. See Limitations section below.
Ensemble Approach
No single signal reliably distinguishes AI content from human content across all edge cases. Aiscern combines multiple independent signals through a trained ensemble model. Each signal is weighted based on its empirically measured reliability for the specific content type, then combined into a single confidence score.
Text Signals
Perplexity score
Measures how statistically predictable each word choice is. AI text scores low; human writing scores high.
Burstiness
Variation in sentence length and complexity. Human writing has high burstiness; AI tends toward uniformity.
Vocabulary diversity
Ratio of unique words to total words. AI frequently reuses high-frequency vocabulary.
Structural patterns
AI text tends toward balanced paragraph lengths and consistent heading hierarchies uncommon in natural writing.
Model fingerprint
Specific token-choice patterns associated with known LLMs, detected via trained classifier.
Image Signals
Frequency artifacts
Fourier-domain analysis reveals the periodic artifacts left by diffusion model upsampling steps.
Facial geometry
Geometric consistency of landmarks — eye spacing, ear symmetry, catchlight positions.
Background coherence
Shadows, reflections, and perspective consistency between foreground subjects and background.
EXIF metadata
AI images lack camera EXIF data. Absence of shutter speed, ISO, and GPS is a strong signal.
Compression signature
JPEG blocking artifacts appear in atypical locations in AI images vs. real photography.
Interpreting Confidence Scores
90–100%
Very High — AI
Strong ensemble agreement. Multiple independent signals all point to AI generation.
70–89%
High — Likely AI
Most signals indicate AI. Some ambiguity — review flagged signals before acting.
45–69%
Uncertain
Signals are mixed. Do not use this result as evidence of AI use without additional review.
20–44%
Likely Human
Most signals point to human authorship. Low probability of AI generation.
0–19%
Very High — Human
Strong ensemble agreement on human origin. Multiple signals confirm natural content.
Known Limitations
We publish these limitations openly because we believe responsible use of AI detection requires honest understanding of what it cannot do. Never use a single detection result as sole evidence for high-stakes decisions.
- Short text (under 150 words) has insufficient signal for reliable classification
- Non-native English speakers may trigger false positives due to constrained vocabulary patterns
- Heavily compressed images (< 50KB) lose frequency artifacts detectors rely on
- AI content edited by humans after generation reduces detectability significantly
- Hybrid content (AI inpainting on real photos) is currently below 70% accuracy
- Very short audio clips (< 5 seconds) provide insufficient spectral data
- Novel AI generators released after our last model update may evade detection until the next fine-tune
Model Update Cadence
Detection models are retrained quarterly or whenever a major new AI generator reaches significant market penetration. Model versions are tracked in our changelog. The accuracy figures on this page reflect the most recent production model. Fine-tuning data is sourced from public benchmarks, synthetic test sets, and anonymized user feedback (opt-in only).
No account required. Core features free during early access.