Skip to content

Model Selection

This document records the research, reasoning, and final choices for each ONNX model slot in the UFME pipeline. Revisit this when upgrading models or evaluating alternatives.

SlotModelSizeInputOutputLicenceSource
DetectionSCRFD_10G_BNKPS16.9 MB[1,3,640,640]9 outputs (scores/bboxes/kps)MITInsightFace buffalo_l
Recognitionw600k_r50 (ArcFace)174 MB[1,3,112,112][1,512] float32MITInsightFace buffalo_l
PADMiniFASNetV2~600 KB (INT8)[1,3,80,80][1,3] (real/2d/3d-spoof)Apache 2.0yakhyo/face-anti-spoofing
MADSelfMAD HRNet-W18~85 MB[1,3,384,384][1,2] logits (genuine/morph)ResearchLeonTodorov/SelfMAD — auto-exported via scripts/export_mad.py
QualityeDifFIQA(T)~2 MB[1,3,112,112][1,1] quality scalarCC-BY-4.0OpenCV Zoo
SlotModelSizeInputOutputLicenceSource
Age estimationInsightFace genderage1.3 MB[N,3,96,96][1,3] (age + gender logits)MITInsightFace buffalo_l
Head poseyakhyo ResNet-18~45 MB[1,3,224,224][1,3,3] rotation matrixMITyakhyo/head-pose-estimation
Deepfake detectionViT-base quantised~90 MB[1,3,224,224][1,2] logits (real/fake)Apache 2.0onnx-community/Deep-Fake-Detector-v2
Face attributesInsightFace genderage1.3 MB[N,3,96,96][1,3] (gender logit)MITInsightFace buffalo_l (shared file)
Super-resolutionReal-ESRGAN x4plus~65 MB[1,3,H,W][1,3,H*4,W*4]BSD-3onnx-community/realesrgan-x4plus
Mask-aware recognitionw600k_mbf (ArcFace MobileFaceNet)~20 MB[1,3,112,112][1,512] float32Apache 2.0InsightFace buffalo_sc

Chosen: SCRFD_10G with bounding boxes, keypoints, and batch-norm fused kernels.

Why SCRFD over alternatives:

  • RetinaFace (2019): Predecessor architecture. SCRFD supersedes it in speed and accuracy on WiderFace. RetinaFace uses a heavier ResNet-50 backbone (~100 MB) and lacks the BNKPS NAS-optimised blocks.
  • YOLO-Face / YOLOv8-Face (2023): Fast single-stage detectors but optimised for general object detection. On WiderFace Hard, SCRFD_10G achieves 92.3% AP vs ~90% for YOLOv8-face. YOLO models also lack native 5-point landmark output needed for alignment.
  • MTCNN (2016): Classic cascade detector. Much slower due to 3-stage cascade. Not competitive on accuracy (WiderFace Hard ~60% AP). Only useful for legacy systems.
  • CenterFace (2019): Lightweight anchor-free detector. Good speed but lower accuracy (~88% WiderFace Hard). Insufficient for production gallery enrolment quality.

Key properties:

  • WiderFace Hard AP: 92.3%
  • 5-point landmark output (eyes, nose, mouth corners) for alignment
  • ONNX-native, no custom ops
  • GFLOPs: ~10 (suitable for CPU inference)

Download: Available as det_10g.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.


2. Face Recognition — w600k_r50 (ArcFace-trained ResNet-50)

Section titled “2. Face Recognition — w600k_r50 (ArcFace-trained ResNet-50)”

Chosen: ResNet-50 trained with ArcFace loss on WebFace600K (600K identities, 12.5M images).

Why w600k_r50 over alternatives:

  • AdaFace ViT-Base (2022): Quality-adaptive margin loss with ViT backbone. Excellent on low-quality benchmarks (IJB-S). However, the official ONNX export is not straightforward — requires custom attention export. The ResNet-50 ArcFace model is drop-in ONNX compatible and within 1-2% accuracy on standard benchmarks.
  • AdaFace IR-101 (2022): ResNet-101 variant. Higher accuracy (+0.5% on IJB-C) but 2x model size (~350 MB). Diminishing returns for the extra compute.
  • TopoFR (2024): Topological regularisation approach. SOTA on several benchmarks but no public ONNX model. PyTorch-only with custom loss functions. Would require significant export effort.
  • ArcFace IR-SE-100 (2019): Strong baseline but trained on MS1MV2 (85K identities) — smaller training set than WebFace600K.
  • buffalo_l w600k_r50: Production-proven in the InsightFace ecosystem, widely deployed, known ONNX compatibility.

Key properties:

  • LFW: 99.83%, CFP-FP: 99.26%, AgeDB-30: 98.10%, CALFW: 96.12%, CPLFW: 94.45% (measured — see docs/research/accuracy-validation.md)
  • 512-dim L2-normalised embedding (matches our template spec)
  • ResNet-50 backbone → fast CPU inference (~15ms on AVX2)
  • Trained on WebFace600K — largest clean public training set

Download: Available as w600k_r50.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.

Future upgrade path: When AdaFace or TopoFR publish official ONNX exports, benchmark against w600k_r50 on our target population. The pipeline is model-agnostic — swap the ONNX file and update model_id in config.


3. Presentation Attack Detection — MiniFASNetV2

Section titled “3. Presentation Attack Detection — MiniFASNetV2”

Chosen: MiniFASNetV2 with INT8 quantisation from the face-anti-spoofing repository.

Why MiniFASNetV2 over alternatives:

  • CDCN (Central Difference Convolution Network) (2020): Higher accuracy on OULU-NPU Protocol 4 (ACER 0.2% vs MiniFAS 1.2%) but 10x larger model (~6 MB) and slower inference. Also requires depth map supervision during training.
  • FAS-SGTD (Spatio-Temporal Depth) (2020): Requires multi-frame video input. Not compatible with our single-image pipeline.
  • FLIP-MCL (2023): Foundation-model-based PAD. Very high accuracy but requires a ViT-L backbone (~300 MB). Overkill for a fast inline gate.
  • Silent Face Anti-Spoofing (Minivision) (2020): The original MiniFASNet source. Our chosen repo provides cleaner ONNX export and INT8 quantisation.

Key properties:

  • Model size: ~600 KB (INT8 ONNX) — negligible latency overhead
  • 3-class output: real, 2D-spoof (print/screen), 3D-spoof (mask)
  • OULU-NPU Protocol 1 ACER: < 2%
  • ISO 30107-3 Level 1 compliant for print/replay attacks

Limitation: Lower accuracy on sophisticated 3D mask attacks. For high-security deployments, consider CDCN as an upgrade.

Download: Pre-exported ONNX from the yakhyo/face-anti-spoofing GitHub releases.


4. Morphing Attack Detection — SelfMAD HRNet-W18

Section titled “4. Morphing Attack Detection — SelfMAD HRNet-W18”

Chosen: SelfMAD with HRNet-W18 backbone, auto-exported to ONNX.

Why SelfMAD HRNet-W18 over alternatives:

  • MorphBuster (MixFaceNet) (2023): Lighter (~5 MB) but pretrained weights require researcher access form (gated download). Architecture and input specs poorly documented.
  • MADation (CLIP ViT + LoRA) (2023): Also gated access. CLIP-based models are heavier and LoRA weight merging complicates ONNX export.
  • SPL-MAD (2023): Public checkpoint available but architecture/input specs undocumented. Would require code inspection.
  • MAD-DDPM (diffusion-based): Computationally expensive, not suitable for real-time inline use.
  • Differential MAD (D-MAD): Requires both probe and trusted reference image. Not applicable to our single-image pipeline.
  • SelfMAD HRNet-W18 (2024): Publicly downloadable checkpoint (no access gate), uses timm library (trivial ONNX export), SOTA single-image MAD performance (D-EER < 5% on FRGC-Morph).

Key properties:

  • Single-image detection (no reference needed)
  • timm.create_model('hrnet_w18', num_classes=2) — clean architecture
  • Input: [1, 3, 384, 384], preprocessing: divide by 255 only (no ImageNet mean/std)
  • Output: [1, 2] raw logits — softmax(logits)[1] = P(morphed)
  • ~85 MB ONNX model, ~21M parameters

Limitation: MAD is an active research area. No single model dominates across all morph generation methods (landmark-based, GAN-based, diffusion-based). Re-evaluate quarterly.

Download: Automated via scripts/export_mad.py — downloads checkpoint from Google Drive, exports to ONNX, verifies against PyTorch output. Run make models or python scripts/export_mad.py directly.


Chosen: eDifFIQA Tiny from the OpenCV Model Zoo — pre-exported ONNX, direct download.

Why eDifFIQA(T) over alternatives:

  • CR-FIQA(L) (2023): Strong accuracy (Pearson r > 0.85 with recognition error). But only available as PyTorch checkpoint requiring manual ONNX export with custom IResNet architecture code. Large model (~170 MB, iresnet100 backbone).
  • CR-FIQA(S) (2023): Lighter iresnet50 variant but still needs manual export.
  • OFIQ (BSI) (2024): Official ISO 29794-5 reference implementation. C++ only, requires building from source, designed as standalone CLI tool. Best used for offline compliance reporting, not inline quality gating.
  • SER-FIQ (2020): Requires 10-100x forward passes through the recognition model. Too slow for inline use.
  • MagFace (2021): Couples quality to a specific recognition model’s magnitude.
  • eDifFIQA(L) (2024): Ranked #1 on NIST FATE-Quality Kiosk-to-Entry. Available as PyTorch weights on HuggingFace (iresnet100 backbone, needs export).
  • eDifFIQA(T) (2024): Already exported to ONNX in the OpenCV Model Zoo. MobileFaceNet backbone (~2 MB). CC-BY-4.0 license. Zero export effort.

Key properties:

  • Pre-exported ONNX — direct HTTP download, no PyTorch/timm dependency
  • MobileFaceNet backbone — tiny model, fast inference
  • Input: [1, 3, 112, 112], normalised with mean=0.5, std=0.5
  • Output: scalar quality score (higher = better quality)
  • CC-BY-4.0 license

OFIQ for compliance: The OfiqQualityAdapter in the codebase still supports the BSI OFIQ CLI binary for offline ISO 29794-5 compliance reporting. Use EdiffiqaAdapter for the inline pipeline quality gate.

Future upgrade path: If higher accuracy is needed, export eDifFIQA(L) from HuggingFace weights using a script similar to export_mad.py.

Download: Direct ONNX download from OpenCV Zoo via scripts/download_models.py.


6. Age Estimation — InsightFace genderage.onnx

Section titled “6. Age Estimation — InsightFace genderage.onnx”

Chosen: The genderage.onnx model from InsightFace’s buffalo_l pack — the same pack that provides det_10g.onnx and w600k_r50.onnx. No additional download required.

Why genderage over alternatives:

  • MiVOLO (2023): State-of-the-art multi-task age/gender model. However, the pre-trained weights are gated behind an access-request form. No publicly downloadable ONNX model.
  • SSR-Net (2018): Classic lightweight age estimator. Weaker accuracy than modern models and no maintained ONNX export.
  • InsightFace genderage: Already present in the buffalo_l model pack, 1.3 MB, no additional network access. MAE ~4 years on standard benchmarks.

Key properties:

  • Input: [N, 3, 96, 96] float32, normalised to [-1, 1]
  • Output: age = (out[-1] + 3.0) * 5.0 (InsightFace convention)
  • Model size: ~1.3 MB (shared file with face attributes adapter)
  • Optional stage: wired in only when models/genderage.onnx exists at startup

Download: Included in buffalo_l.zip from InsightFace HuggingFace.


7. Head Pose Estimation — yakhyo ResNet-18

Section titled “7. Head Pose Estimation — yakhyo ResNet-18”

Chosen: head_pose_resnet18.onnx from the yakhyo/head-pose-estimation repository.

Why ResNet-18 over alternatives:

  • HopeNet: Uses Euler angle classification (binned). Errors compound at extreme angles — exactly the failure mode that matters for the yaw gate.
  • FSA-Net (2019): Lightweight but accuracy degrades for large yaw angles.
  • 6DRepNet (2022): Same 6D rotation matrix representation as our chosen model, slightly heavier architecture.
  • yakhyo ResNet-18: Outputs a (1, 3, 3) rotation matrix decoded inside the ONNX graph from ortho6D representation (gimbal-lock free). Clean ONNX export, MIT licence.

Key properties:

  • Input: [1, 3, 224, 224] float32, ImageNet normalisation
  • Output: [1, 3, 3] rotation matrix → pitch/yaw/roll in degrees
  • Yaw gate: |yaw| > max_head_pose_yaw (default 45°) → request rejected (HTTP 422)
  • Model size: ~45 MB

Download: Direct ONNX download from yakhyo/head-pose-estimation GitHub releases.


8. Deepfake Detection — ViT-base quantised (Deep-Fake-Detector-v2)

Section titled “8. Deepfake Detection — ViT-base quantised (Deep-Fake-Detector-v2)”

Chosen: Quantised INT8 ViT-base from HuggingFace onnx-community/Deep-Fake-Detector-v2.

Why ViT-base over alternatives:

  • XceptionNet detectors: Trained on FaceForensics++ — limited generalisation to novel generation methods.
  • CLIP-based detectors: Strong cross-dataset generalisation but ~300 MB. Too heavy for an inline gate.
  • Meso4: ~27 KB but poor accuracy on modern deepfake methods.
  • Deep-Fake-Detector-v2 (ViT-base): Fine-tuned on a diverse real/AI-generated face dataset. Pre-exported as quantised ONNX on HuggingFace — direct download, no export step required.

Key properties:

  • Input: [1, 3, 224, 224] float32, mean=0.5, std=0.5
  • Output: [1, 2] logits — softmax(logits)[1] = P(deepfake)
  • Model size: ~90 MB (INT8 quantised)
  • Optional stage: wired in only when models/deepfake_vit_q.onnx exists

Download: Direct ONNX download from HuggingFace onnx-community/Deep-Fake-Detector-v2.


9. Face Attributes — InsightFace genderage.onnx (gender head)

Section titled “9. Face Attributes — InsightFace genderage.onnx (gender head)”

Chosen: The same genderage.onnx used for age estimation, loaded by a separate adapter that reads the gender output.

Each adapter satisfies a single-purpose port: AgeEstimationPort → age, FaceAttributesPort → gender. The model file is shared (1.3 MB); both adapters load it independently.

Key properties:

  • Input: [N, 3, 96, 96] float32, normalised to [-1, 1]
  • Output: P(female) = sigmoid(out[0]) — “female” when > 0.5
  • Optional stage: wired in only when models/genderage.onnx exists

10. Super-Resolution — Real-ESRGAN x4plus

Section titled “10. Super-Resolution — Real-ESRGAN x4plus”

Chosen: realesrgan_x4plus.onnx from HuggingFace onnx-community/realesrgan-x4plus.

Why Real-ESRGAN over alternatives:

  • SRCNN / EDSR: Classic SR. Weaker on face degradations (noise, JPEG compression, blur).
  • GFPGAN: Face-specific restoration. No standalone ONNX model available.
  • Real-ESRGAN x4plus: Trained on realistic degradation mixtures. Pre-exported ONNX, BSD-3 licence.

Key properties:

  • Input: [1, 3, H, W] float32, normalised to [0, 1]
  • Output: [1, 3, H*4, W*4] — 4x upscaled
  • Large images tiled (512px, 10px overlap) with overlap blending to suppress seams
  • Optional stage positioned before detect — upscales low-resolution inputs before face detection
  • Model size: ~65 MB

Download: Direct ONNX download from HuggingFace onnx-community/realesrgan-x4plus.


11. Mask-Aware Recognition — w600k_mbf (ArcFace MobileFaceNet)

Section titled “11. Mask-Aware Recognition — w600k_mbf (ArcFace MobileFaceNet)”

Chosen: w600k_mbf.onnx from InsightFace’s buffalo_sc pack as an optional alternative to w600k_r50.

To activate mask-aware recognition: set model_paths.adaface = "models/w600k_mbf.onnx" in config.toml. No other change required.

Key properties:

  • Same embedding space as w600k_r50 (512-dim, L2-normalised, ArcFace, WebFace600K)
  • MobileFaceNet backbone — ~20 MB, faster CPU inference
  • Better handling of occluded/masked faces vs ResNet-50
  • Apache 2.0 licence

Download: Included in buffalo_sc.zip from InsightFace GitHub releases.


All models are fully automated — no manual steps required:

Terminal window
make models # Download all models (or: python scripts/download_models.py)
make models-dummy # Generate minimal dummies for testing (no network)

What happens:

  1. Detection, Recognition, PAD, Quality — direct ONNX download via HTTP
  2. MADscripts/export_mad.py downloads the SelfMAD HRNet-W18 PyTorch checkpoint from Google Drive, builds the model via timm, exports to ONNX, and verifies output correctness

Models are saved to models/ (git-ignored). The script supports:

  • --dummy flag to generate minimal ONNX models for testing without network access
  • --force flag to re-download/re-export even if files exist
  • SHA-256 verification of downloaded files

Dependencies for MAD export: torch, timm, gdown (installed automatically if missing)

To evaluate a new model:

  1. Add the ONNX file to models/ with a descriptive name
  2. Update src/infra/config.py with the new model path and model_id
  3. Run the benchmark suite: make bench
  4. Compare accuracy metrics against the current model
  5. Update this document with findings