PAD & Quality

Research compiled: 2026-02-20 Coverage: 2020–2026 (focus on 2024–2025)

Part 1: Presentation Attack Detection (PAD / Face Anti-Spoofing)

1.1 Problem Scope

Presentation Attack Detection (PAD) — also called face liveness detection or face anti-spoofing — covers two broad attack categories:

Physical attacks: print photos, replay video, 3D masks, partial paper masks
Digital attacks: deepfakes, face swaps, GAN-generated faces, face reenactment

Modern systems must handle both categories. The field has moved toward Unified Physical-Digital Attack Detection since ~2023.

1.2 Benchmark Datasets

Dataset	Year	Subjects	Videos/Images	Attack Types
CASIA-FASD	2012	50	600 videos	Print, cut, replay
Replay-Attack	2012	50	1200 videos	Print, replay
OULU-NPU	2017	55	5940 videos	Print, replay (4 protocols)
SiW	2018	165	4478 videos	Print, replay, makeup, partial
SiW-M	2020	493	14K+ videos	13 attack types
CASIA-SURF CeFA	2021	300	Multi-modal, cross-ethnicity	6 types
UniAttackData	2024	1800	28,706 videos	2 physical + 12 digital

UniAttackData (CVPR 2024 Challenge) is the current gold standard for unified attack detection, with 28,706 videos covering all modern attack types. It attracted 136 teams globally for its associated challenge.

1.3 Evaluation Metrics

ACER (Average Classification Error Rate) = (APCER + BPCER) / 2
APCER (Attack Presentation Classification Error Rate): FP rate — bona fide classified as attack
BPCER (Bona Fide Presentation Classification Error Rate): FN rate — attack classified as bona fide
HTER (Half Total Error Rate): for cross-dataset testing

Standard: ISO/IEC 30107-3 defines PAD metrics and testing protocols.

1.4 Key Approaches and Methods

A. CDCN / CDCN++ (CVPR 2020) — Influential Baseline

Paper: “Searching Central Difference Convolutional Networks for Face Anti-Spoofing” Authors: Zitong Yu et al. Type: CNN with central difference convolution + NAS Key idea: Central Difference Convolution (CDC) captures gradient-level texture information (micro-texture) instead of just intensity. CDC replaces vanilla convolution:

CDC output = vanilla conv + θ * (center - neighbors)

CDCN++ combines NAS-discovered architecture with Multiscale Attention Fusion Module (MAFM).

Benchmark Results:

Protocol	ACER
OULU-NPU Prot-1	0.2%
OULU-NPU Prot-2	1.5%
OULU-NPU Prot-3	2.4%
OULU-NPU Prot-4	4.6%
CASIA→Replay HTER	6.5%

Competition: 1st place ChaLearn Multi-Modal FAS Challenge @ CVPR 2020 Code: https://github.com/ZitongYu/CDCN Status: Reference baseline; still widely cited in 2024–2025 papers.

B. Domain Generalization Methods (2022–2024)

Cross-dataset generalization remains the hardest open problem. Key approaches:

SSAN (CVPR 2021): Style/content separation with adversarial domain alignment.

S-Adapter (arXiv 2023/2024): “Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens”

Adds Statistical Adapter to ViT that gathers local discriminative and statistical information via token histograms
Token Style Regularization (TSR) reduces domain style variance
Outperforms SOTA in zero-shot and few-shot cross-domain protocols

Test-Time Domain Generalization / TTDG (CVPR 2024):

Test-Time Style Projection (TTSS) projects unseen samples into learned style space
Diverse Style Shifts Simulation (DSSS) synthesizes distribution shifts via hyperspherical feature space
Adapts at inference time without retraining

Gradient Alignment (CVPR 2024):

Applies Sharpness-Aware Minimization (SAM) for domain generalization
Aligns generalization gradients across domains for flat, robust minima

MMDG (CVPR 2024): Multi-Modal Domain Generalized FAS

Uncertainty-guided cross-Adapters (U-Adapter) fine-tunes ViT for modality reliability
“Suppress and Rebalance” for multi-modal cross-domain generalization

C. Vision Transformer (ViT) Approaches (2023–2024)

ViT-based methods have become dominant for generalization:

Multimodal ViT (IJCV 2024): “Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing” — uses MAE pretraining + ViT for multi-modal FAS
Hyp-OC (FG 2024): Hyperbolic One-Class Classification for FAS — treats FAS as anomaly detection in hyperbolic space
Head-Aware KD (2024): Knowledge distillation from large ViT → 5MB student model with 17× faster inference

D. Unified Physical-Digital Detection (2024–2025)

UniAttack (IJCV 2025): “Unified Physical-Digital Face Attack Detection”

Single model handles both physical spoofing and digital deepfakes
Competitive vs. specialized detectors for each attack type

Joint Physical-Digital (arXiv 2024): “Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues”

Synthesizes spoofing clues during training for both attack domains

Mixture-of-Attack-Experts (2025): Uses expert routing to handle diverse attack types with class regularization.

E. Vision-Language Models (2024–2025)

InstructFLIP (arXiv 2025): “Exploring Unified Vision-Language Model for Face Anti-spoofing”

Surpasses SOTA across multiple FAS benchmarks
Language-guided supervision captures spoof-related patterns
Substantially reduces training overhead

CLIP-based Domain Generalization (2024/2025):

Forensics Adapter: Adapts CLIP for generalizable face forgery detection (CVPR 2025)
“Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection” (CVPR 2025)
ViT-L/14 CLIP backbone significantly outperforms ViT-B/16 for deepfake detection

Self-Supervised Liveness Detection (arXiv 2024): “Transformer-based Self-Supervised Learning for Face Anti-Spoofing” — reduces dependency on labeled data.

F. Lightweight / Production Models

Model	Params	FLOPs	Notes
Head-Aware KD student	5 MB	–	17× faster than teacher
MobileNetV3-based (2024)	~3–5 MB	–	Mobile deployment
Paired-Sampling Contrastive	–	4.46 GFLOPs	Trains in <1 hour
DenseNet201 (fine-tuned)	~77M	–	98.5% NUAA, 97.71% Replay-Attack
MobileNetV2	~3.4M	–	Best efficiency for real-time

FaceCloseup (2025): Novel perspective-distortion liveness detection on mobile — 99.48% accuracy.

1.5 Deepfake / Digital Attack Detection

Problem: Digital attacks (deepfakes, face swaps) require different detection than physical spoofs due to no physical medium artifacts.

Key approaches:

GenConViT: Generative Convolutional ViT for video deepfake detection
AltFreezing + TALL: 3D CNN + Swin-Transformer capturing temporal inconsistencies
Hybrid transformer-CNN: Lightweight, real-time (Frontiers 2025)
CLIP ViT-L/14 fine-tuning: Freezes all but last 8 encoder layers; top CVPR 2025 performer

1.6 Morphing Attack Detection (MAD)

Morphing attacks blend two identities into a single face image — a unique threat to border control.

MADation (WACV 2025): First adaptation of foundation models (CLIP + LoRA) to MAD task

Fine-tunes LoRA parameters + classification head
Surpasses alternative FM and transformer-based frameworks
Code: https://github.com/gurayozgur/MADation

MADPromptS (arXiv 2025): “Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation”

SynMorph (arXiv 2024): Synthetic morphing dataset with 2,450 identities, 100,000+ morphs — addresses privacy constraints in dataset creation.

AMONOT (ICPR 2024): Synthetic aging dataset to evaluate MAD robustness under aging.

NIST FATE MORPH 4B (2024): Guidelines for operational morph detection deployment:

Single-image detection (only morphed image available)
Differential detection (morphed + genuine reference available)

1.7 NIST FATE PAD Evaluation

FATE (Face Analysis Technology Evaluation) replaced FRVT for face analysis tasks.

FATE PAD Part 10: Evaluates passive, software-based PAD algorithms

Dataset: ~20,000 attack presentations + ~21,000 bona fide presentations
9 categories of presentation attacks
Two tasks: impersonation attacks + evasion attacks
Standard: ISO/IEC 30107-3

2024–2025 updates:

January 2025: New FATE SIDD report including results from IGD, Veridium, Mobbeel, ROC, Kasikorn Labs
Aware achieved top performer ranking while optimizing demographic parity
Paravision publishes detailed FATE PAD performance breakdowns
FATE Quality evaluations also ongoing in parallel

1.8 PAD Summary: Method Comparison

Method	Type	Cross-Dataset	Production-Ready	ISO-Compliant
CDCN/CDCN++	CNN+NAS	Moderate (6.5% HTER)	Yes	No
S-Adapter	ViT adapter	Strong	Moderate	No
TTDG	ViT+TTA	Strong	No (test-time adapt)	No
InstructFLIP	VLM	Strong (multi-bench SOTA)	No	No
UniAttack	Unified model	Good	Moderate	Partial
MADation (CLIP+LoRA)	FM	MAD-specific	Moderate	No
Lightweight KD	ViT→CNN distill	Moderate	Yes (5MB)	No
NIST FATE participants	Various	Evaluated by NIST	Yes	ISO 30107-3

Part 2: Face Image Quality Assessment (FIQA)

2.1 Problem Scope

FIQA estimates how suitable a face image is for biometric recognition. Unlike PAD, FIQA does not detect attacks — it predicts whether a sample will yield reliable recognition scores. High-quality samples produce consistent, accurate embeddings; low-quality samples increase error rates.

Primary use: Automatic quality gates in enrollment, verification pipelines, and ID document capture.

2.2 Key Methods (Chronological)

A. FaceQnet (2019)

Type: Supervised CNN regression Approach: Trains a ResNet to predict quality from pseudo-labels derived from face recognition errors. Limitations: Requires ground-truth quality labels; coupled to specific FR model.

B. SER-FIQ (2021)

Paper: “Unsupervised Face Image Quality Assessment Based on Stochastic Embedding Robustness” Type: Unsupervised, FR-model-coupled Key idea: Run the same face image through the FR network N times with dropout stochasticity. Images where the embedding is stable (low variance across runs) = high quality; unstable = low quality. Metric: Uses standard deviation of embeddings as quality proxy. Advantages: No quality labels needed; directly correlated with FR model behavior. Code: https://github.com/pterhoer/FaceImageQuality

C. SDD-FIQA (CVPR 2021)

Paper: “SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance” Type: Unsupervised pseudo-label generation + regression Key idea: Computes Wasserstein Distance between intra-class and inter-class similarity distributions to generate quality pseudo-labels, then trains a regression network.

Benchmark Results:

+13.9% AOC improvement over best competitor on LFW
+5.2% AOC on Adience
+2.4% AOC on IJB-C
Good generalization across different recognition systems

D. MagFace (CVPR 2021)

Type: Magnitude-aware FR loss — quality implicit in embedding magnitude Key idea: Large-magnitude embeddings = high quality; trains FR model to encode quality in the norm. Advantage: No separate quality model needed — quality is a byproduct of training.

E. CR-FIQA (CVPR 2023)

Paper: “CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability” Type: Supervised, uses FR training signal Key idea: During FR training, monitors how confidently each sample is classified (certainty ratio). High certainty = high quality. The quality predictor is trained on these internal observations. Performance: Achieves best AUC (lowest error) in almost all benchmark settings.

Benchmark results (XQLFW, CFP-FP, CPLFW, Adience, IJB-C):

Best or top-2 on most settings
Outperforms SER-FIQ, SDD-FIQA, FaceQnet

F. CLIB-FIQA (CVPR 2024)

Paper: “CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration” Type: Supervised with calibration Key idea: Adds confidence calibration to FIQA predictions — produces well-calibrated probability estimates, not just point estimates. Addresses overconfidence in CR-FIQA-style methods. Conference: CVPR 2024

G. IG-FIQA (2024)

Paper: “IG-FIQA: Improving Face Image Quality Assessment through Intra-class Variance Guidance robust to Inaccurate Pseudo-Labels” Key idea: Addresses the problem of noisy pseudo-labels by using intra-class variance as a regularizer.

H. ViT-FIQA (ICCVW 2025)

Paper: “ViT-FIQA: Assessing Face Image Quality using Vision Transformers” Type: ViT backbone with learnable quality token Key idea: Extends standard ViT (optimized for FR) with a learnable quality token concatenated to standard image patch tokens. The quality token aggregates a scalar utility score. Performance: Multiple top-3 rankings, particularly on Adience, CFP-FP, CPLFW, XQLFW.

I. MR-FIQA (ICCV 2025)

Paper: “MR-FIQA: Face Image Quality Assessment with Multi-Reference Representations from Synthetic Data” Key idea: Uses synthetic references to generate high-precision quality labels, improving performance of FIQA models trained on synthetic data.

J. Lightweight Ensemble FIQA (ICCVW 2025)

Paper: “A Lightweight Ensemble-Based Face Image Quality Assessment Method with Correlation-Aware […]” Type: Ensemble with correlation awareness Focus: Production efficiency with competitive quality estimation accuracy.

2.3 FIQA Method Comparison

Method	Type	FR-model coupling	Labels needed	Key strength
FaceQnet	CNN regression	Yes	Yes	Simple baseline
SER-FIQ	Stochastic robustness	Yes	No	No labels, intuitive
SDD-FIQA	Wasserstein pseudo-labels	Partial	No	Good generalization
MagFace	Magnitude in FR loss	Integral	No	Zero overhead
CR-FIQA	Certainty ratio	Yes	Semi	Best reported AUC
CLIB-FIQA	Calibrated confidence	Yes	Semi	Calibrated predictions
IG-FIQA	Intra-class variance	Yes	Semi	Robust to label noise
ViT-FIQA	ViT + quality token	Yes	Semi	Strong on CFP-FP, XQLFW
MR-FIQA	Multi-reference synthetic	Partial	No	Synthetic data ready

2.4 ISO / Regulatory Standards

ISO/IEC 29794-5:2025 — Face Image Quality

Status: Published April 2025 (DIS 2024, FDIS late 2024) Full title: “Information technology — Biometric sample quality — Part 5: Face image data”

Scope:

Quantifies how a face image’s properties conform with canonical face images
Three use-cases: ID document capture, enrollment, verification
Specifies quality components: illumination uniformity, pose, focus, expression, occlusion, etc.
Does not address comparison of two images (that’s ISO 19794-5)

Key quality factors addressed:

Illumination uniformity and direction
Facial pose (yaw, pitch, roll)
Focus / sharpness
Expression neutrality
Eye openness
Background uniformity
Head size and position

OFIQ — Open Source Face Image Quality

Developer: BSI (German Federal Office for Information Security) + eu-LISA Purpose: Reference implementation for ISO/IEC 29794-5 Language: C/C++ Release: v1.0.0 November 2024; v1.0.1 current Repository: https://github.com/BSI-OFIQ/OFIQ-Project

Key facts:

Open source, ISO-compliant
eu-LISA joined maintenance in August 2024 (EU biometric infrastructure)
OFIQ 2.0 planned by end of 2027
Presented at NIST IFPC 2025

OFIQ is the definitive production-grade FIQA implementation for ISO compliance in European border control and ID document systems.

2.5 NIST FATE Quality Evaluation

Runs in parallel to FATE PAD. Evaluates:

Sample quality estimation accuracy
Correlation with recognition performance
Equal error rates at different quality thresholds

Separate from PAD but often evaluated by the same organizations.

Part 3: Cross-Cutting Themes and Trends (2024–2025)

3.1 Foundation Models Are Taking Over

All recent top-performing methods adapt large pretrained models:

CLIP (ViT-L/14) for deepfake detection and MAD
MAE-pretrained ViT for multi-modal FAS
LLM-guided supervision (InstructFLIP) for FAS
LoRA adaptation (MADation) for morphing detection

Foundation models provide strong generalization out-of-the-box, especially for unseen attack types.

3.2 Unified Physical+Digital Detection

Traditional FAS benchmarks (OULU-NPU, Replay-Attack) only test physical attacks. The field has shifted to unified models that detect both:

Physical: print, replay, mask
Digital: deepfakes, GAN faces, face swap

UniAttackData (2024) is the new reference benchmark.

3.3 Domain Generalization Remains Unsolved

Models still overfit to specific capture conditions, demographics, and attack types. Active research directions:

Test-time adaptation (TTDG)
Style normalization
Gradient alignment (SAM-based)
Language-guided domain-invariant features

3.4 Lightweighting for Production

Research gap exists between SOTA (large ViT) and production (mobile, <10ms):

Knowledge distillation from ViT → 5MB CNN
MobileNetV2/V3 backbone FAS
Head-aware distillation with 17× speedup
Paired-sampling contrastive at 4.46 GFLOPs

3.5 ISO Compliance Gap

Most academic SOTA methods are not ISO-compliant:

FIQA: OFIQ is the only ISO 29794-5 compliant implementation
PAD: FATE evaluation uses ISO 30107-3 metrics, but academic papers rarely implement full ISO testing protocols
Morphing: NIST FATE MORPH 4B provides operational guidelines

Part 4: Relevant References

PAD / FAS

CDCN (CVPR 2020) — Yu et al., Central Difference Convolutional Networks
S-Adapter (arXiv 2024) — Generalizing ViT for FAS with Statistical Tokens
TTDG (CVPR 2024) — Test-Time Domain Generalization for FAS
Gradient Alignment (CVPR 2024) — SAM for cross-domain FAS
UniAttack (IJCV 2025) — Unified Physical-Digital Detection
UniAttackData Challenge (CVPRW 2024)
InstructFLIP (arXiv 2025) — VLM for FAS
MADation (WACV 2025) — Foundation models for morphing attack detection
NIST FATE PAD — Evaluation program
NIST IR 8525 (2024) — FATE report

FIQA

SER-FIQ — Unsupervised stochastic embedding robustness
SDD-FIQA (CVPR 2021) — Wasserstein distance pseudo-labels
CR-FIQA (CVPR 2023) — Certainty ratio quality
CLIB-FIQA (CVPR 2024) — Calibrated confidence
ViT-FIQA (ICCVW 2025) — ViT with learnable quality token
MR-FIQA (ICCV 2025) — Multi-reference synthetic

Standards

ISO/IEC 29794-5:2025 — Face image quality standard
OFIQ v1.0 (BSI) — Reference implementation
OFIQ Overview (NIST IFPC 2025)
NIST FATE Quality

End of report. See also: sota-detection.md, sota-recognition.md, sota-vector-search.md