Model Selection

This document records the research, reasoning, and final choices for each ONNX model slot in the UFME pipeline. Revisit this when upgrading models or evaluating alternatives.

Summary

Required models

Slot	Model	Size	Input	Output	Licence	Source
Detection	SCRFD_10G_BNKPS	16.9 MB	`[1,3,640,640]`	9 outputs (scores/bboxes/kps)	MIT	InsightFace buffalo_l
Recognition	w600k_r50 (ArcFace)	174 MB	`[1,3,112,112]`	`[1,512]` float32	MIT	InsightFace buffalo_l
PAD	MiniFASNetV2	~600 KB (INT8)	`[1,3,80,80]`	`[1,3]` (real/2d/3d-spoof)	Apache 2.0	yakhyo/face-anti-spoofing
MAD	SelfMAD HRNet-W18	~85 MB	`[1,3,384,384]`	`[1,2]` logits (genuine/morph)	Research	LeonTodorov/SelfMAD — auto-exported via `scripts/export_mad.py`
Quality	eDifFIQA(T)	~2 MB	`[1,3,112,112]`	`[1,1]` quality scalar	CC-BY-4.0	OpenCV Zoo

Optional models

Slot	Model	Size	Input	Output	Licence	Source
Age estimation	InsightFace genderage	1.3 MB	`[N,3,96,96]`	`[1,3]` (age + gender logits)	MIT	InsightFace buffalo_l
Head pose	yakhyo ResNet-18	~45 MB	`[1,3,224,224]`	`[1,3,3]` rotation matrix	MIT	yakhyo/head-pose-estimation
Deepfake detection	ViT-base quantised	~90 MB	`[1,3,224,224]`	`[1,2]` logits (real/fake)	Apache 2.0	onnx-community/Deep-Fake-Detector-v2
Face attributes	InsightFace genderage	1.3 MB	`[N,3,96,96]`	`[1,3]` (gender logit)	MIT	InsightFace buffalo_l (shared file)
Super-resolution	Real-ESRGAN x4plus	~65 MB	`[1,3,H,W]`	`[1,3,H4,W4]`	BSD-3	onnx-community/realesrgan-x4plus
Mask-aware recognition	w600k_mbf (ArcFace MobileFaceNet)	~20 MB	`[1,3,112,112]`	`[1,512]` float32	Apache 2.0	InsightFace buffalo_sc

Detailed Analysis

1. Face Detection — SCRFD_10G_BNKPS

Chosen: SCRFD_10G with bounding boxes, keypoints, and batch-norm fused kernels.

Why SCRFD over alternatives:

RetinaFace (2019): Predecessor architecture. SCRFD supersedes it in speed and accuracy on WiderFace. RetinaFace uses a heavier ResNet-50 backbone (~100 MB) and lacks the BNKPS NAS-optimised blocks.
YOLO-Face / YOLOv8-Face (2023): Fast single-stage detectors but optimised for general object detection. On WiderFace Hard, SCRFD_10G achieves 92.3% AP vs ~90% for YOLOv8-face. YOLO models also lack native 5-point landmark output needed for alignment.
MTCNN (2016): Classic cascade detector. Much slower due to 3-stage cascade. Not competitive on accuracy (WiderFace Hard ~60% AP). Only useful for legacy systems.
CenterFace (2019): Lightweight anchor-free detector. Good speed but lower accuracy (~88% WiderFace Hard). Insufficient for production gallery enrolment quality.

Key properties:

WiderFace Hard AP: 92.3%
5-point landmark output (eyes, nose, mouth corners) for alignment
ONNX-native, no custom ops
GFLOPs: ~10 (suitable for CPU inference)

Download: Available as det_10g.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.

2. Face Recognition — w600k_r50 (ArcFace-trained ResNet-50)

Chosen: ResNet-50 trained with ArcFace loss on WebFace600K (600K identities, 12.5M images).

Why w600k_r50 over alternatives:

AdaFace ViT-Base (2022): Quality-adaptive margin loss with ViT backbone. Excellent on low-quality benchmarks (IJB-S). However, the official ONNX export is not straightforward — requires custom attention export. The ResNet-50 ArcFace model is drop-in ONNX compatible and within 1-2% accuracy on standard benchmarks.
AdaFace IR-101 (2022): ResNet-101 variant. Higher accuracy (+0.5% on IJB-C) but 2x model size (~350 MB). Diminishing returns for the extra compute.
TopoFR (2024): Topological regularisation approach. SOTA on several benchmarks but no public ONNX model. PyTorch-only with custom loss functions. Would require significant export effort.
ArcFace IR-SE-100 (2019): Strong baseline but trained on MS1MV2 (85K identities) — smaller training set than WebFace600K.
buffalo_l w600k_r50: Production-proven in the InsightFace ecosystem, widely deployed, known ONNX compatibility.

Key properties:

LFW: 99.83%, CFP-FP: 99.26%, AgeDB-30: 98.10%, CALFW: 96.12%, CPLFW: 94.45% (measured — see docs/research/accuracy-validation.md)
512-dim L2-normalised embedding (matches our template spec)
ResNet-50 backbone → fast CPU inference (~15ms on AVX2)
Trained on WebFace600K — largest clean public training set

Download: Available as w600k_r50.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.

Future upgrade path: When AdaFace or TopoFR publish official ONNX exports, benchmark against w600k_r50 on our target population. The pipeline is model-agnostic — swap the ONNX file and update model_id in config.

3. Presentation Attack Detection — MiniFASNetV2

Chosen: MiniFASNetV2 with INT8 quantisation from the face-anti-spoofing repository.

Why MiniFASNetV2 over alternatives:

CDCN (Central Difference Convolution Network) (2020): Higher accuracy on OULU-NPU Protocol 4 (ACER 0.2% vs MiniFAS 1.2%) but 10x larger model (~6 MB) and slower inference. Also requires depth map supervision during training.
FAS-SGTD (Spatio-Temporal Depth) (2020): Requires multi-frame video input. Not compatible with our single-image pipeline.
FLIP-MCL (2023): Foundation-model-based PAD. Very high accuracy but requires a ViT-L backbone (~300 MB). Overkill for a fast inline gate.
Silent Face Anti-Spoofing (Minivision) (2020): The original MiniFASNet source. Our chosen repo provides cleaner ONNX export and INT8 quantisation.

Key properties:

Model size: ~600 KB (INT8 ONNX) — negligible latency overhead
3-class output: real, 2D-spoof (print/screen), 3D-spoof (mask)
OULU-NPU Protocol 1 ACER: < 2%
ISO 30107-3 Level 1 compliant for print/replay attacks

Limitation: Lower accuracy on sophisticated 3D mask attacks. For high-security deployments, consider CDCN as an upgrade.

Download: Pre-exported ONNX from the yakhyo/face-anti-spoofing GitHub releases.

4. Morphing Attack Detection — SelfMAD HRNet-W18

Chosen: SelfMAD with HRNet-W18 backbone, auto-exported to ONNX.

Why SelfMAD HRNet-W18 over alternatives:

MorphBuster (MixFaceNet) (2023): Lighter (~5 MB) but pretrained weights require researcher access form (gated download). Architecture and input specs poorly documented.
MADation (CLIP ViT + LoRA) (2023): Also gated access. CLIP-based models are heavier and LoRA weight merging complicates ONNX export.
SPL-MAD (2023): Public checkpoint available but architecture/input specs undocumented. Would require code inspection.
MAD-DDPM (diffusion-based): Computationally expensive, not suitable for real-time inline use.
Differential MAD (D-MAD): Requires both probe and trusted reference image. Not applicable to our single-image pipeline.
SelfMAD HRNet-W18 (2024): Publicly downloadable checkpoint (no access gate), uses timm library (trivial ONNX export), SOTA single-image MAD performance (D-EER < 5% on FRGC-Morph).

Key properties:

Single-image detection (no reference needed)
timm.create_model('hrnet_w18', num_classes=2) — clean architecture
Input: [1, 3, 384, 384], preprocessing: divide by 255 only (no ImageNet mean/std)
Output: [1, 2] raw logits — softmax(logits)[1] = P(morphed)
~85 MB ONNX model, ~21M parameters

Limitation: MAD is an active research area. No single model dominates across all morph generation methods (landmark-based, GAN-based, diffusion-based). Re-evaluate quarterly.

Download: Automated via scripts/export_mad.py — downloads checkpoint from Google Drive, exports to ONNX, verifies against PyTorch output. Run make models or python scripts/export_mad.py directly.

5. Quality Assessment — eDifFIQA(T)

Chosen: eDifFIQA Tiny from the OpenCV Model Zoo — pre-exported ONNX, direct download.

Why eDifFIQA(T) over alternatives:

CR-FIQA(L) (2023): Strong accuracy (Pearson r > 0.85 with recognition error). But only available as PyTorch checkpoint requiring manual ONNX export with custom IResNet architecture code. Large model (~170 MB, iresnet100 backbone).
CR-FIQA(S) (2023): Lighter iresnet50 variant but still needs manual export.
OFIQ (BSI) (2024): Official ISO 29794-5 reference implementation. C++ only, requires building from source, designed as standalone CLI tool. Best used for offline compliance reporting, not inline quality gating.
SER-FIQ (2020): Requires 10-100x forward passes through the recognition model. Too slow for inline use.
MagFace (2021): Couples quality to a specific recognition model’s magnitude.
eDifFIQA(L) (2024): Ranked #1 on NIST FATE-Quality Kiosk-to-Entry. Available as PyTorch weights on HuggingFace (iresnet100 backbone, needs export).
eDifFIQA(T) (2024): Already exported to ONNX in the OpenCV Model Zoo. MobileFaceNet backbone (~2 MB). CC-BY-4.0 license. Zero export effort.

Key properties:

Pre-exported ONNX — direct HTTP download, no PyTorch/timm dependency
MobileFaceNet backbone — tiny model, fast inference
Input: [1, 3, 112, 112], normalised with mean=0.5, std=0.5
Output: scalar quality score (higher = better quality)
CC-BY-4.0 license

OFIQ for compliance: The OfiqQualityAdapter in the codebase still supports the BSI OFIQ CLI binary for offline ISO 29794-5 compliance reporting. Use EdiffiqaAdapter for the inline pipeline quality gate.

Future upgrade path: If higher accuracy is needed, export eDifFIQA(L) from HuggingFace weights using a script similar to export_mad.py.

Download: Direct ONNX download from OpenCV Zoo via scripts/download_models.py.

6. Age Estimation — InsightFace genderage.onnx

Chosen: The genderage.onnx model from InsightFace’s buffalo_l pack — the same pack that provides det_10g.onnx and w600k_r50.onnx. No additional download required.

Why genderage over alternatives:

MiVOLO (2023): State-of-the-art multi-task age/gender model. However, the pre-trained weights are gated behind an access-request form. No publicly downloadable ONNX model.
SSR-Net (2018): Classic lightweight age estimator. Weaker accuracy than modern models and no maintained ONNX export.
InsightFace genderage: Already present in the buffalo_l model pack, 1.3 MB, no additional network access. MAE ~4 years on standard benchmarks.

Key properties:

Input: [N, 3, 96, 96] float32, normalised to [-1, 1]
Output: age = (out[-1] + 3.0) * 5.0 (InsightFace convention)
Model size: ~1.3 MB (shared file with face attributes adapter)
Optional stage: wired in only when models/genderage.onnx exists at startup

Download: Included in buffalo_l.zip from InsightFace HuggingFace.

7. Head Pose Estimation — yakhyo ResNet-18

Chosen: head_pose_resnet18.onnx from the yakhyo/head-pose-estimation repository.

Why ResNet-18 over alternatives:

HopeNet: Uses Euler angle classification (binned). Errors compound at extreme angles — exactly the failure mode that matters for the yaw gate.
FSA-Net (2019): Lightweight but accuracy degrades for large yaw angles.
6DRepNet (2022): Same 6D rotation matrix representation as our chosen model, slightly heavier architecture.
yakhyo ResNet-18: Outputs a (1, 3, 3) rotation matrix decoded inside the ONNX graph from ortho6D representation (gimbal-lock free). Clean ONNX export, MIT licence.

Key properties:

Input: [1, 3, 224, 224] float32, ImageNet normalisation
Output: [1, 3, 3] rotation matrix → pitch/yaw/roll in degrees
Yaw gate: |yaw| > max_head_pose_yaw (default 45°) → request rejected (HTTP 422)
Model size: ~45 MB

Download: Direct ONNX download from yakhyo/head-pose-estimation GitHub releases.

8. Deepfake Detection — ViT-base quantised (Deep-Fake-Detector-v2)

Chosen: Quantised INT8 ViT-base from HuggingFace onnx-community/Deep-Fake-Detector-v2.

Why ViT-base over alternatives:

XceptionNet detectors: Trained on FaceForensics++ — limited generalisation to novel generation methods.
CLIP-based detectors: Strong cross-dataset generalisation but ~300 MB. Too heavy for an inline gate.
Meso4: ~27 KB but poor accuracy on modern deepfake methods.
Deep-Fake-Detector-v2 (ViT-base): Fine-tuned on a diverse real/AI-generated face dataset. Pre-exported as quantised ONNX on HuggingFace — direct download, no export step required.

Key properties:

Input: [1, 3, 224, 224] float32, mean=0.5, std=0.5
Output: [1, 2] logits — softmax(logits)[1] = P(deepfake)
Model size: ~90 MB (INT8 quantised)
Optional stage: wired in only when models/deepfake_vit_q.onnx exists

Download: Direct ONNX download from HuggingFace onnx-community/Deep-Fake-Detector-v2.

9. Face Attributes — InsightFace genderage.onnx (gender head)

Chosen: The same genderage.onnx used for age estimation, loaded by a separate adapter that reads the gender output.

Each adapter satisfies a single-purpose port: AgeEstimationPort → age, FaceAttributesPort → gender. The model file is shared (1.3 MB); both adapters load it independently.

Key properties:

Input: [N, 3, 96, 96] float32, normalised to [-1, 1]
Output: P(female) = sigmoid(out[0]) — “female” when > 0.5
Optional stage: wired in only when models/genderage.onnx exists

10. Super-Resolution — Real-ESRGAN x4plus

Chosen: realesrgan_x4plus.onnx from HuggingFace onnx-community/realesrgan-x4plus.

Why Real-ESRGAN over alternatives:

SRCNN / EDSR: Classic SR. Weaker on face degradations (noise, JPEG compression, blur).
GFPGAN: Face-specific restoration. No standalone ONNX model available.
Real-ESRGAN x4plus: Trained on realistic degradation mixtures. Pre-exported ONNX, BSD-3 licence.

Key properties:

Input: [1, 3, H, W] float32, normalised to [0, 1]
Output: [1, 3, H*4, W*4] — 4x upscaled
Large images tiled (512px, 10px overlap) with overlap blending to suppress seams
Optional stage positioned before detect — upscales low-resolution inputs before face detection
Model size: ~65 MB

Download: Direct ONNX download from HuggingFace onnx-community/realesrgan-x4plus.

11. Mask-Aware Recognition — w600k_mbf (ArcFace MobileFaceNet)

Chosen: w600k_mbf.onnx from InsightFace’s buffalo_sc pack as an optional alternative to w600k_r50.

To activate mask-aware recognition: set model_paths.adaface = "models/w600k_mbf.onnx" in config.toml. No other change required.

Key properties:

Same embedding space as w600k_r50 (512-dim, L2-normalised, ArcFace, WebFace600K)
MobileFaceNet backbone — ~20 MB, faster CPU inference
Better handling of occluded/masked faces vs ResNet-50
Apache 2.0 licence

Download: Included in buffalo_sc.zip from InsightFace GitHub releases.

Download Script

All models are fully automated — no manual steps required:

make models          # Download all models (or: python scripts/download_models.py)
make models-dummy    # Generate minimal dummies for testing (no network)

What happens:

Detection, Recognition, PAD, Quality — direct ONNX download via HTTP
MAD — scripts/export_mad.py downloads the SelfMAD HRNet-W18 PyTorch checkpoint from Google Drive, builds the model via timm, exports to ONNX, and verifies output correctness

Models are saved to models/ (git-ignored). The script supports:

--dummy flag to generate minimal ONNX models for testing without network access
--force flag to re-download/re-export even if files exist
SHA-256 verification of downloaded files

Dependencies for MAD export: torch, timm, gdown (installed automatically if missing)

Upgrading Models

To evaluate a new model:

Add the ONNX file to models/ with a descriptive name
Update src/infra/config.py with the new model path and model_id
Run the benchmark suite: make bench
Compare accuracy metrics against the current model
Update this document with findings