Model Selection
This document records the research, reasoning, and final choices for each ONNX model slot in the UFME pipeline. Revisit this when upgrading models or evaluating alternatives.
Summary
Section titled “Summary”Required models
Section titled “Required models”| Slot | Model | Size | Input | Output | Licence | Source |
|---|---|---|---|---|---|---|
| Detection | SCRFD_10G_BNKPS | 16.9 MB | [1,3,640,640] | 9 outputs (scores/bboxes/kps) | MIT | InsightFace buffalo_l |
| Recognition | w600k_r50 (ArcFace) | 174 MB | [1,3,112,112] | [1,512] float32 | MIT | InsightFace buffalo_l |
| PAD | MiniFASNetV2 | ~600 KB (INT8) | [1,3,80,80] | [1,3] (real/2d/3d-spoof) | Apache 2.0 | yakhyo/face-anti-spoofing |
| MAD | SelfMAD HRNet-W18 | ~85 MB | [1,3,384,384] | [1,2] logits (genuine/morph) | Research | LeonTodorov/SelfMAD — auto-exported via scripts/export_mad.py |
| Quality | eDifFIQA(T) | ~2 MB | [1,3,112,112] | [1,1] quality scalar | CC-BY-4.0 | OpenCV Zoo |
Optional models
Section titled “Optional models”| Slot | Model | Size | Input | Output | Licence | Source |
|---|---|---|---|---|---|---|
| Age estimation | InsightFace genderage | 1.3 MB | [N,3,96,96] | [1,3] (age + gender logits) | MIT | InsightFace buffalo_l |
| Head pose | yakhyo ResNet-18 | ~45 MB | [1,3,224,224] | [1,3,3] rotation matrix | MIT | yakhyo/head-pose-estimation |
| Deepfake detection | ViT-base quantised | ~90 MB | [1,3,224,224] | [1,2] logits (real/fake) | Apache 2.0 | onnx-community/Deep-Fake-Detector-v2 |
| Face attributes | InsightFace genderage | 1.3 MB | [N,3,96,96] | [1,3] (gender logit) | MIT | InsightFace buffalo_l (shared file) |
| Super-resolution | Real-ESRGAN x4plus | ~65 MB | [1,3,H,W] | [1,3,H*4,W*4] | BSD-3 | onnx-community/realesrgan-x4plus |
| Mask-aware recognition | w600k_mbf (ArcFace MobileFaceNet) | ~20 MB | [1,3,112,112] | [1,512] float32 | Apache 2.0 | InsightFace buffalo_sc |
Detailed Analysis
Section titled “Detailed Analysis”1. Face Detection — SCRFD_10G_BNKPS
Section titled “1. Face Detection — SCRFD_10G_BNKPS”Chosen: SCRFD_10G with bounding boxes, keypoints, and batch-norm fused kernels.
Why SCRFD over alternatives:
- RetinaFace (2019): Predecessor architecture. SCRFD supersedes it in speed and accuracy on WiderFace. RetinaFace uses a heavier ResNet-50 backbone (~100 MB) and lacks the BNKPS NAS-optimised blocks.
- YOLO-Face / YOLOv8-Face (2023): Fast single-stage detectors but optimised for general object detection. On WiderFace Hard, SCRFD_10G achieves 92.3% AP vs ~90% for YOLOv8-face. YOLO models also lack native 5-point landmark output needed for alignment.
- MTCNN (2016): Classic cascade detector. Much slower due to 3-stage cascade. Not competitive on accuracy (WiderFace Hard ~60% AP). Only useful for legacy systems.
- CenterFace (2019): Lightweight anchor-free detector. Good speed but lower accuracy (~88% WiderFace Hard). Insufficient for production gallery enrolment quality.
Key properties:
- WiderFace Hard AP: 92.3%
- 5-point landmark output (eyes, nose, mouth corners) for alignment
- ONNX-native, no custom ops
- GFLOPs: ~10 (suitable for CPU inference)
Download: Available as det_10g.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.
2. Face Recognition — w600k_r50 (ArcFace-trained ResNet-50)
Section titled “2. Face Recognition — w600k_r50 (ArcFace-trained ResNet-50)”Chosen: ResNet-50 trained with ArcFace loss on WebFace600K (600K identities, 12.5M images).
Why w600k_r50 over alternatives:
- AdaFace ViT-Base (2022): Quality-adaptive margin loss with ViT backbone. Excellent on low-quality benchmarks (IJB-S). However, the official ONNX export is not straightforward — requires custom attention export. The ResNet-50 ArcFace model is drop-in ONNX compatible and within 1-2% accuracy on standard benchmarks.
- AdaFace IR-101 (2022): ResNet-101 variant. Higher accuracy (+0.5% on IJB-C) but 2x model size (~350 MB). Diminishing returns for the extra compute.
- TopoFR (2024): Topological regularisation approach. SOTA on several benchmarks but no public ONNX model. PyTorch-only with custom loss functions. Would require significant export effort.
- ArcFace IR-SE-100 (2019): Strong baseline but trained on MS1MV2 (85K identities) — smaller training set than WebFace600K.
- buffalo_l w600k_r50: Production-proven in the InsightFace ecosystem, widely deployed, known ONNX compatibility.
Key properties:
- LFW: 99.83%, CFP-FP: 99.26%, AgeDB-30: 98.10%, CALFW: 96.12%, CPLFW: 94.45% (measured — see
docs/research/accuracy-validation.md) - 512-dim L2-normalised embedding (matches our template spec)
- ResNet-50 backbone → fast CPU inference (~15ms on AVX2)
- Trained on WebFace600K — largest clean public training set
Download: Available as w600k_r50.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.
Future upgrade path: When AdaFace or TopoFR publish official ONNX exports, benchmark against w600k_r50 on our target population. The pipeline is model-agnostic — swap the ONNX file and update model_id in config.
3. Presentation Attack Detection — MiniFASNetV2
Section titled “3. Presentation Attack Detection — MiniFASNetV2”Chosen: MiniFASNetV2 with INT8 quantisation from the face-anti-spoofing repository.
Why MiniFASNetV2 over alternatives:
- CDCN (Central Difference Convolution Network) (2020): Higher accuracy on OULU-NPU Protocol 4 (ACER 0.2% vs MiniFAS 1.2%) but 10x larger model (~6 MB) and slower inference. Also requires depth map supervision during training.
- FAS-SGTD (Spatio-Temporal Depth) (2020): Requires multi-frame video input. Not compatible with our single-image pipeline.
- FLIP-MCL (2023): Foundation-model-based PAD. Very high accuracy but requires a ViT-L backbone (~300 MB). Overkill for a fast inline gate.
- Silent Face Anti-Spoofing (Minivision) (2020): The original MiniFASNet source. Our chosen repo provides cleaner ONNX export and INT8 quantisation.
Key properties:
- Model size: ~600 KB (INT8 ONNX) — negligible latency overhead
- 3-class output: real, 2D-spoof (print/screen), 3D-spoof (mask)
- OULU-NPU Protocol 1 ACER: < 2%
- ISO 30107-3 Level 1 compliant for print/replay attacks
Limitation: Lower accuracy on sophisticated 3D mask attacks. For high-security deployments, consider CDCN as an upgrade.
Download: Pre-exported ONNX from the yakhyo/face-anti-spoofing GitHub releases.
4. Morphing Attack Detection — SelfMAD HRNet-W18
Section titled “4. Morphing Attack Detection — SelfMAD HRNet-W18”Chosen: SelfMAD with HRNet-W18 backbone, auto-exported to ONNX.
Why SelfMAD HRNet-W18 over alternatives:
- MorphBuster (MixFaceNet) (2023): Lighter (~5 MB) but pretrained weights require researcher access form (gated download). Architecture and input specs poorly documented.
- MADation (CLIP ViT + LoRA) (2023): Also gated access. CLIP-based models are heavier and LoRA weight merging complicates ONNX export.
- SPL-MAD (2023): Public checkpoint available but architecture/input specs undocumented. Would require code inspection.
- MAD-DDPM (diffusion-based): Computationally expensive, not suitable for real-time inline use.
- Differential MAD (D-MAD): Requires both probe and trusted reference image. Not applicable to our single-image pipeline.
- SelfMAD HRNet-W18 (2024): Publicly downloadable checkpoint (no access gate), uses
timmlibrary (trivial ONNX export), SOTA single-image MAD performance (D-EER < 5% on FRGC-Morph).
Key properties:
- Single-image detection (no reference needed)
timm.create_model('hrnet_w18', num_classes=2)— clean architecture- Input:
[1, 3, 384, 384], preprocessing: divide by 255 only (no ImageNet mean/std) - Output:
[1, 2]raw logits —softmax(logits)[1]= P(morphed) - ~85 MB ONNX model, ~21M parameters
Limitation: MAD is an active research area. No single model dominates across all morph generation methods (landmark-based, GAN-based, diffusion-based). Re-evaluate quarterly.
Download: Automated via scripts/export_mad.py — downloads checkpoint from Google Drive, exports to ONNX, verifies against PyTorch output. Run make models or python scripts/export_mad.py directly.
5. Quality Assessment — eDifFIQA(T)
Section titled “5. Quality Assessment — eDifFIQA(T)”Chosen: eDifFIQA Tiny from the OpenCV Model Zoo — pre-exported ONNX, direct download.
Why eDifFIQA(T) over alternatives:
- CR-FIQA(L) (2023): Strong accuracy (Pearson r > 0.85 with recognition error). But only available as PyTorch checkpoint requiring manual ONNX export with custom
IResNetarchitecture code. Large model (~170 MB, iresnet100 backbone). - CR-FIQA(S) (2023): Lighter iresnet50 variant but still needs manual export.
- OFIQ (BSI) (2024): Official ISO 29794-5 reference implementation. C++ only, requires building from source, designed as standalone CLI tool. Best used for offline compliance reporting, not inline quality gating.
- SER-FIQ (2020): Requires 10-100x forward passes through the recognition model. Too slow for inline use.
- MagFace (2021): Couples quality to a specific recognition model’s magnitude.
- eDifFIQA(L) (2024): Ranked #1 on NIST FATE-Quality Kiosk-to-Entry. Available as PyTorch weights on HuggingFace (iresnet100 backbone, needs export).
- eDifFIQA(T) (2024): Already exported to ONNX in the OpenCV Model Zoo. MobileFaceNet backbone (~2 MB). CC-BY-4.0 license. Zero export effort.
Key properties:
- Pre-exported ONNX — direct HTTP download, no PyTorch/timm dependency
- MobileFaceNet backbone — tiny model, fast inference
- Input:
[1, 3, 112, 112], normalised with mean=0.5, std=0.5 - Output: scalar quality score (higher = better quality)
- CC-BY-4.0 license
OFIQ for compliance: The OfiqQualityAdapter in the codebase still supports the BSI OFIQ CLI binary for offline ISO 29794-5 compliance reporting. Use EdiffiqaAdapter for the inline pipeline quality gate.
Future upgrade path: If higher accuracy is needed, export eDifFIQA(L) from HuggingFace weights using a script similar to export_mad.py.
Download: Direct ONNX download from OpenCV Zoo via scripts/download_models.py.
6. Age Estimation — InsightFace genderage.onnx
Section titled “6. Age Estimation — InsightFace genderage.onnx”Chosen: The genderage.onnx model from InsightFace’s buffalo_l pack — the same pack that provides det_10g.onnx and w600k_r50.onnx. No additional download required.
Why genderage over alternatives:
- MiVOLO (2023): State-of-the-art multi-task age/gender model. However, the pre-trained weights are gated behind an access-request form. No publicly downloadable ONNX model.
- SSR-Net (2018): Classic lightweight age estimator. Weaker accuracy than modern models and no maintained ONNX export.
- InsightFace genderage: Already present in the buffalo_l model pack, 1.3 MB, no additional network access. MAE ~4 years on standard benchmarks.
Key properties:
- Input:
[N, 3, 96, 96]float32, normalised to[-1, 1] - Output:
age = (out[-1] + 3.0) * 5.0(InsightFace convention) - Model size: ~1.3 MB (shared file with face attributes adapter)
- Optional stage: wired in only when
models/genderage.onnxexists at startup
Download: Included in buffalo_l.zip from InsightFace HuggingFace.
7. Head Pose Estimation — yakhyo ResNet-18
Section titled “7. Head Pose Estimation — yakhyo ResNet-18”Chosen: head_pose_resnet18.onnx from the yakhyo/head-pose-estimation repository.
Why ResNet-18 over alternatives:
- HopeNet: Uses Euler angle classification (binned). Errors compound at extreme angles — exactly the failure mode that matters for the yaw gate.
- FSA-Net (2019): Lightweight but accuracy degrades for large yaw angles.
- 6DRepNet (2022): Same 6D rotation matrix representation as our chosen model, slightly heavier architecture.
- yakhyo ResNet-18: Outputs a
(1, 3, 3)rotation matrix decoded inside the ONNX graph from ortho6D representation (gimbal-lock free). Clean ONNX export, MIT licence.
Key properties:
- Input:
[1, 3, 224, 224]float32, ImageNet normalisation - Output:
[1, 3, 3]rotation matrix → pitch/yaw/roll in degrees - Yaw gate:
|yaw| > max_head_pose_yaw(default 45°) → request rejected (HTTP 422) - Model size: ~45 MB
Download: Direct ONNX download from yakhyo/head-pose-estimation GitHub releases.
8. Deepfake Detection — ViT-base quantised (Deep-Fake-Detector-v2)
Section titled “8. Deepfake Detection — ViT-base quantised (Deep-Fake-Detector-v2)”Chosen: Quantised INT8 ViT-base from HuggingFace onnx-community/Deep-Fake-Detector-v2.
Why ViT-base over alternatives:
- XceptionNet detectors: Trained on FaceForensics++ — limited generalisation to novel generation methods.
- CLIP-based detectors: Strong cross-dataset generalisation but ~300 MB. Too heavy for an inline gate.
- Meso4: ~27 KB but poor accuracy on modern deepfake methods.
- Deep-Fake-Detector-v2 (ViT-base): Fine-tuned on a diverse real/AI-generated face dataset. Pre-exported as quantised ONNX on HuggingFace — direct download, no export step required.
Key properties:
- Input:
[1, 3, 224, 224]float32, mean=0.5, std=0.5 - Output:
[1, 2]logits —softmax(logits)[1]= P(deepfake) - Model size: ~90 MB (INT8 quantised)
- Optional stage: wired in only when
models/deepfake_vit_q.onnxexists
Download: Direct ONNX download from HuggingFace onnx-community/Deep-Fake-Detector-v2.
9. Face Attributes — InsightFace genderage.onnx (gender head)
Section titled “9. Face Attributes — InsightFace genderage.onnx (gender head)”Chosen: The same genderage.onnx used for age estimation, loaded by a separate adapter that reads the gender output.
Each adapter satisfies a single-purpose port: AgeEstimationPort → age, FaceAttributesPort → gender. The model file is shared (1.3 MB); both adapters load it independently.
Key properties:
- Input:
[N, 3, 96, 96]float32, normalised to[-1, 1] - Output:
P(female) = sigmoid(out[0])— “female” when > 0.5 - Optional stage: wired in only when
models/genderage.onnxexists
10. Super-Resolution — Real-ESRGAN x4plus
Section titled “10. Super-Resolution — Real-ESRGAN x4plus”Chosen: realesrgan_x4plus.onnx from HuggingFace onnx-community/realesrgan-x4plus.
Why Real-ESRGAN over alternatives:
- SRCNN / EDSR: Classic SR. Weaker on face degradations (noise, JPEG compression, blur).
- GFPGAN: Face-specific restoration. No standalone ONNX model available.
- Real-ESRGAN x4plus: Trained on realistic degradation mixtures. Pre-exported ONNX, BSD-3 licence.
Key properties:
- Input:
[1, 3, H, W]float32, normalised to[0, 1] - Output:
[1, 3, H*4, W*4]— 4x upscaled - Large images tiled (512px, 10px overlap) with overlap blending to suppress seams
- Optional stage positioned before detect — upscales low-resolution inputs before face detection
- Model size: ~65 MB
Download: Direct ONNX download from HuggingFace onnx-community/realesrgan-x4plus.
11. Mask-Aware Recognition — w600k_mbf (ArcFace MobileFaceNet)
Section titled “11. Mask-Aware Recognition — w600k_mbf (ArcFace MobileFaceNet)”Chosen: w600k_mbf.onnx from InsightFace’s buffalo_sc pack as an optional alternative to w600k_r50.
To activate mask-aware recognition: set model_paths.adaface = "models/w600k_mbf.onnx" in config.toml. No other change required.
Key properties:
- Same embedding space as
w600k_r50(512-dim, L2-normalised, ArcFace, WebFace600K) - MobileFaceNet backbone — ~20 MB, faster CPU inference
- Better handling of occluded/masked faces vs ResNet-50
- Apache 2.0 licence
Download: Included in buffalo_sc.zip from InsightFace GitHub releases.
Download Script
Section titled “Download Script”All models are fully automated — no manual steps required:
make models # Download all models (or: python scripts/download_models.py)make models-dummy # Generate minimal dummies for testing (no network)What happens:
- Detection, Recognition, PAD, Quality — direct ONNX download via HTTP
- MAD —
scripts/export_mad.pydownloads the SelfMAD HRNet-W18 PyTorch checkpoint from Google Drive, builds the model viatimm, exports to ONNX, and verifies output correctness
Models are saved to models/ (git-ignored). The script supports:
--dummyflag to generate minimal ONNX models for testing without network access--forceflag to re-download/re-export even if files exist- SHA-256 verification of downloaded files
Dependencies for MAD export: torch, timm, gdown (installed automatically if missing)
Upgrading Models
Section titled “Upgrading Models”To evaluate a new model:
- Add the ONNX file to
models/with a descriptive name - Update
src/infra/config.pywith the new model path andmodel_id - Run the benchmark suite:
make bench - Compare accuracy metrics against the current model
- Update this document with findings