Skip to content

PAD & Quality

Research compiled: 2026-02-20 Coverage: 2020–2026 (focus on 2024–2025)


Part 1: Presentation Attack Detection (PAD / Face Anti-Spoofing)

Section titled “Part 1: Presentation Attack Detection (PAD / Face Anti-Spoofing)”

Presentation Attack Detection (PAD) — also called face liveness detection or face anti-spoofing — covers two broad attack categories:

  • Physical attacks: print photos, replay video, 3D masks, partial paper masks
  • Digital attacks: deepfakes, face swaps, GAN-generated faces, face reenactment

Modern systems must handle both categories. The field has moved toward Unified Physical-Digital Attack Detection since ~2023.


DatasetYearSubjectsVideos/ImagesAttack Types
CASIA-FASD201250600 videosPrint, cut, replay
Replay-Attack2012501200 videosPrint, replay
OULU-NPU2017555940 videosPrint, replay (4 protocols)
SiW20181654478 videosPrint, replay, makeup, partial
SiW-M202049314K+ videos13 attack types
CASIA-SURF CeFA2021300Multi-modal, cross-ethnicity6 types
UniAttackData2024180028,706 videos2 physical + 12 digital

UniAttackData (CVPR 2024 Challenge) is the current gold standard for unified attack detection, with 28,706 videos covering all modern attack types. It attracted 136 teams globally for its associated challenge.


  • ACER (Average Classification Error Rate) = (APCER + BPCER) / 2
  • APCER (Attack Presentation Classification Error Rate): FP rate — bona fide classified as attack
  • BPCER (Bona Fide Presentation Classification Error Rate): FN rate — attack classified as bona fide
  • HTER (Half Total Error Rate): for cross-dataset testing

Standard: ISO/IEC 30107-3 defines PAD metrics and testing protocols.


A. CDCN / CDCN++ (CVPR 2020) — Influential Baseline

Section titled “A. CDCN / CDCN++ (CVPR 2020) — Influential Baseline”

Paper: “Searching Central Difference Convolutional Networks for Face Anti-Spoofing” Authors: Zitong Yu et al. Type: CNN with central difference convolution + NAS Key idea: Central Difference Convolution (CDC) captures gradient-level texture information (micro-texture) instead of just intensity. CDC replaces vanilla convolution:

CDC output = vanilla conv + θ * (center - neighbors)

CDCN++ combines NAS-discovered architecture with Multiscale Attention Fusion Module (MAFM).

Benchmark Results:

ProtocolACER
OULU-NPU Prot-10.2%
OULU-NPU Prot-21.5%
OULU-NPU Prot-32.4%
OULU-NPU Prot-44.6%
CASIA→Replay HTER6.5%

Competition: 1st place ChaLearn Multi-Modal FAS Challenge @ CVPR 2020 Code: https://github.com/ZitongYu/CDCN Status: Reference baseline; still widely cited in 2024–2025 papers.


B. Domain Generalization Methods (2022–2024)

Section titled “B. Domain Generalization Methods (2022–2024)”

Cross-dataset generalization remains the hardest open problem. Key approaches:

SSAN (CVPR 2021): Style/content separation with adversarial domain alignment.

S-Adapter (arXiv 2023/2024): “Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens”

  • Adds Statistical Adapter to ViT that gathers local discriminative and statistical information via token histograms
  • Token Style Regularization (TSR) reduces domain style variance
  • Outperforms SOTA in zero-shot and few-shot cross-domain protocols

Test-Time Domain Generalization / TTDG (CVPR 2024):

  • Test-Time Style Projection (TTSS) projects unseen samples into learned style space
  • Diverse Style Shifts Simulation (DSSS) synthesizes distribution shifts via hyperspherical feature space
  • Adapts at inference time without retraining

Gradient Alignment (CVPR 2024):

  • Applies Sharpness-Aware Minimization (SAM) for domain generalization
  • Aligns generalization gradients across domains for flat, robust minima

MMDG (CVPR 2024): Multi-Modal Domain Generalized FAS

  • Uncertainty-guided cross-Adapters (U-Adapter) fine-tunes ViT for modality reliability
  • “Suppress and Rebalance” for multi-modal cross-domain generalization

C. Vision Transformer (ViT) Approaches (2023–2024)

Section titled “C. Vision Transformer (ViT) Approaches (2023–2024)”

ViT-based methods have become dominant for generalization:

  • Multimodal ViT (IJCV 2024): “Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing” — uses MAE pretraining + ViT for multi-modal FAS
  • Hyp-OC (FG 2024): Hyperbolic One-Class Classification for FAS — treats FAS as anomaly detection in hyperbolic space
  • Head-Aware KD (2024): Knowledge distillation from large ViT → 5MB student model with 17× faster inference

D. Unified Physical-Digital Detection (2024–2025)

Section titled “D. Unified Physical-Digital Detection (2024–2025)”

UniAttack (IJCV 2025): “Unified Physical-Digital Face Attack Detection”

  • Single model handles both physical spoofing and digital deepfakes
  • Competitive vs. specialized detectors for each attack type

Joint Physical-Digital (arXiv 2024): “Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues”

  • Synthesizes spoofing clues during training for both attack domains

Mixture-of-Attack-Experts (2025): Uses expert routing to handle diverse attack types with class regularization.


InstructFLIP (arXiv 2025): “Exploring Unified Vision-Language Model for Face Anti-spoofing”

  • Surpasses SOTA across multiple FAS benchmarks
  • Language-guided supervision captures spoof-related patterns
  • Substantially reduces training overhead

CLIP-based Domain Generalization (2024/2025):

  • Forensics Adapter: Adapts CLIP for generalizable face forgery detection (CVPR 2025)
  • “Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection” (CVPR 2025)
  • ViT-L/14 CLIP backbone significantly outperforms ViT-B/16 for deepfake detection

Self-Supervised Liveness Detection (arXiv 2024): “Transformer-based Self-Supervised Learning for Face Anti-Spoofing” — reduces dependency on labeled data.


ModelParamsFLOPsNotes
Head-Aware KD student5 MB17× faster than teacher
MobileNetV3-based (2024)~3–5 MBMobile deployment
Paired-Sampling Contrastive4.46 GFLOPsTrains in <1 hour
DenseNet201 (fine-tuned)~77M98.5% NUAA, 97.71% Replay-Attack
MobileNetV2~3.4MBest efficiency for real-time

FaceCloseup (2025): Novel perspective-distortion liveness detection on mobile — 99.48% accuracy.


Problem: Digital attacks (deepfakes, face swaps) require different detection than physical spoofs due to no physical medium artifacts.

Key approaches:

  • GenConViT: Generative Convolutional ViT for video deepfake detection
  • AltFreezing + TALL: 3D CNN + Swin-Transformer capturing temporal inconsistencies
  • Hybrid transformer-CNN: Lightweight, real-time (Frontiers 2025)
  • CLIP ViT-L/14 fine-tuning: Freezes all but last 8 encoder layers; top CVPR 2025 performer

Morphing attacks blend two identities into a single face image — a unique threat to border control.

MADation (WACV 2025): First adaptation of foundation models (CLIP + LoRA) to MAD task

MADPromptS (arXiv 2025): “Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation”

SynMorph (arXiv 2024): Synthetic morphing dataset with 2,450 identities, 100,000+ morphs — addresses privacy constraints in dataset creation.

AMONOT (ICPR 2024): Synthetic aging dataset to evaluate MAD robustness under aging.

NIST FATE MORPH 4B (2024): Guidelines for operational morph detection deployment:

  • Single-image detection (only morphed image available)
  • Differential detection (morphed + genuine reference available)

FATE (Face Analysis Technology Evaluation) replaced FRVT for face analysis tasks.

FATE PAD Part 10: Evaluates passive, software-based PAD algorithms

  • Dataset: ~20,000 attack presentations + ~21,000 bona fide presentations
  • 9 categories of presentation attacks
  • Two tasks: impersonation attacks + evasion attacks
  • Standard: ISO/IEC 30107-3

2024–2025 updates:

  • January 2025: New FATE SIDD report including results from IGD, Veridium, Mobbeel, ROC, Kasikorn Labs
  • Aware achieved top performer ranking while optimizing demographic parity
  • Paravision publishes detailed FATE PAD performance breakdowns
  • FATE Quality evaluations also ongoing in parallel

MethodTypeCross-DatasetProduction-ReadyISO-Compliant
CDCN/CDCN++CNN+NASModerate (6.5% HTER)YesNo
S-AdapterViT adapterStrongModerateNo
TTDGViT+TTAStrongNo (test-time adapt)No
InstructFLIPVLMStrong (multi-bench SOTA)NoNo
UniAttackUnified modelGoodModeratePartial
MADation (CLIP+LoRA)FMMAD-specificModerateNo
Lightweight KDViT→CNN distillModerateYes (5MB)No
NIST FATE participantsVariousEvaluated by NISTYesISO 30107-3

Part 2: Face Image Quality Assessment (FIQA)

Section titled “Part 2: Face Image Quality Assessment (FIQA)”

FIQA estimates how suitable a face image is for biometric recognition. Unlike PAD, FIQA does not detect attacks — it predicts whether a sample will yield reliable recognition scores. High-quality samples produce consistent, accurate embeddings; low-quality samples increase error rates.

Primary use: Automatic quality gates in enrollment, verification pipelines, and ID document capture.


Type: Supervised CNN regression Approach: Trains a ResNet to predict quality from pseudo-labels derived from face recognition errors. Limitations: Requires ground-truth quality labels; coupled to specific FR model.


Paper: “Unsupervised Face Image Quality Assessment Based on Stochastic Embedding Robustness” Type: Unsupervised, FR-model-coupled Key idea: Run the same face image through the FR network N times with dropout stochasticity. Images where the embedding is stable (low variance across runs) = high quality; unstable = low quality. Metric: Uses standard deviation of embeddings as quality proxy. Advantages: No quality labels needed; directly correlated with FR model behavior. Code: https://github.com/pterhoer/FaceImageQuality


Paper: “SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance” Type: Unsupervised pseudo-label generation + regression Key idea: Computes Wasserstein Distance between intra-class and inter-class similarity distributions to generate quality pseudo-labels, then trains a regression network.

Benchmark Results:

  • +13.9% AOC improvement over best competitor on LFW
  • +5.2% AOC on Adience
  • +2.4% AOC on IJB-C
  • Good generalization across different recognition systems

Type: Magnitude-aware FR loss — quality implicit in embedding magnitude Key idea: Large-magnitude embeddings = high quality; trains FR model to encode quality in the norm. Advantage: No separate quality model needed — quality is a byproduct of training.


Paper: “CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability” Type: Supervised, uses FR training signal Key idea: During FR training, monitors how confidently each sample is classified (certainty ratio). High certainty = high quality. The quality predictor is trained on these internal observations. Performance: Achieves best AUC (lowest error) in almost all benchmark settings.

Benchmark results (XQLFW, CFP-FP, CPLFW, Adience, IJB-C):

  • Best or top-2 on most settings
  • Outperforms SER-FIQ, SDD-FIQA, FaceQnet

Paper: “CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration” Type: Supervised with calibration Key idea: Adds confidence calibration to FIQA predictions — produces well-calibrated probability estimates, not just point estimates. Addresses overconfidence in CR-FIQA-style methods. Conference: CVPR 2024


Paper: “IG-FIQA: Improving Face Image Quality Assessment through Intra-class Variance Guidance robust to Inaccurate Pseudo-Labels” Key idea: Addresses the problem of noisy pseudo-labels by using intra-class variance as a regularizer.


Paper: “ViT-FIQA: Assessing Face Image Quality using Vision Transformers” Type: ViT backbone with learnable quality token Key idea: Extends standard ViT (optimized for FR) with a learnable quality token concatenated to standard image patch tokens. The quality token aggregates a scalar utility score. Performance: Multiple top-3 rankings, particularly on Adience, CFP-FP, CPLFW, XQLFW.


Paper: “MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data” Key idea: Uses synthetic references to generate high-precision quality labels, improving performance of FIQA models trained on synthetic data.


Paper: “A Lightweight Ensemble-Based Face Image Quality Assessment Method with Correlation-Aware […]” Type: Ensemble with correlation awareness Focus: Production efficiency with competitive quality estimation accuracy.


MethodTypeFR-model couplingLabels neededKey strength
FaceQnetCNN regressionYesYesSimple baseline
SER-FIQStochastic robustnessYesNoNo labels, intuitive
SDD-FIQAWasserstein pseudo-labelsPartialNoGood generalization
MagFaceMagnitude in FR lossIntegralNoZero overhead
CR-FIQACertainty ratioYesSemiBest reported AUC
CLIB-FIQACalibrated confidenceYesSemiCalibrated predictions
IG-FIQAIntra-class varianceYesSemiRobust to label noise
ViT-FIQAViT + quality tokenYesSemiStrong on CFP-FP, XQLFW
MR-FIQAMulti-reference syntheticPartialNoSynthetic data ready

ISO/IEC 29794-5:2025 — Face Image Quality

Section titled “ISO/IEC 29794-5:2025 — Face Image Quality”

Status: Published April 2025 (DIS 2024, FDIS late 2024) Full title: “Information technology — Biometric sample quality — Part 5: Face image data”

Scope:

  • Quantifies how a face image’s properties conform with canonical face images
  • Three use-cases: ID document capture, enrollment, verification
  • Specifies quality components: illumination uniformity, pose, focus, expression, occlusion, etc.
  • Does not address comparison of two images (that’s ISO 19794-5)

Key quality factors addressed:

  • Illumination uniformity and direction
  • Facial pose (yaw, pitch, roll)
  • Focus / sharpness
  • Expression neutrality
  • Eye openness
  • Background uniformity
  • Head size and position

Developer: BSI (German Federal Office for Information Security) + eu-LISA Purpose: Reference implementation for ISO/IEC 29794-5 Language: C/C++ Release: v1.0.0 November 2024; v1.0.1 current Repository: https://github.com/BSI-OFIQ/OFIQ-Project

Key facts:

  • Open source, ISO-compliant
  • eu-LISA joined maintenance in August 2024 (EU biometric infrastructure)
  • OFIQ 2.0 planned by end of 2027
  • Presented at NIST IFPC 2025

OFIQ is the definitive production-grade FIQA implementation for ISO compliance in European border control and ID document systems.


Runs in parallel to FATE PAD. Evaluates:

  • Sample quality estimation accuracy
  • Correlation with recognition performance
  • Equal error rates at different quality thresholds

Separate from PAD but often evaluated by the same organizations.


Section titled “Part 3: Cross-Cutting Themes and Trends (2024–2025)”

All recent top-performing methods adapt large pretrained models:

  • CLIP (ViT-L/14) for deepfake detection and MAD
  • MAE-pretrained ViT for multi-modal FAS
  • LLM-guided supervision (InstructFLIP) for FAS
  • LoRA adaptation (MADation) for morphing detection

Foundation models provide strong generalization out-of-the-box, especially for unseen attack types.

Traditional FAS benchmarks (OULU-NPU, Replay-Attack) only test physical attacks. The field has shifted to unified models that detect both:

  • Physical: print, replay, mask
  • Digital: deepfakes, GAN faces, face swap

UniAttackData (2024) is the new reference benchmark.

3.3 Domain Generalization Remains Unsolved

Section titled “3.3 Domain Generalization Remains Unsolved”

Models still overfit to specific capture conditions, demographics, and attack types. Active research directions:

  • Test-time adaptation (TTDG)
  • Style normalization
  • Gradient alignment (SAM-based)
  • Language-guided domain-invariant features

Research gap exists between SOTA (large ViT) and production (mobile, <10ms):

  • Knowledge distillation from ViT → 5MB CNN
  • MobileNetV2/V3 backbone FAS
  • Head-aware distillation with 17× speedup
  • Paired-sampling contrastive at 4.46 GFLOPs

Most academic SOTA methods are not ISO-compliant:

  • FIQA: OFIQ is the only ISO 29794-5 compliant implementation
  • PAD: FATE evaluation uses ISO 30107-3 metrics, but academic papers rarely implement full ISO testing protocols
  • Morphing: NIST FATE MORPH 4B provides operational guidelines


End of report. See also: sota-detection.md, sota-recognition.md, sota-vector-search.md