Model Selection
This document records the research, reasoning, and final choices for each ONNX model slot in the UFME pipeline. Revisit this when upgrading models or evaluating alternatives.
Summary
Section titled “Summary”| Slot | Model | Size | Input | Output | License | Source |
|---|---|---|---|---|---|---|
| Detection | SCRFD_10G_BNKPS | 16.9 MB | [1,3,640,640] | 9 outputs (scores/bboxes/kps per stride) | MIT | InsightFace buffalo_l |
| Recognition | w600k_r50 (ArcFace) | 174 MB | [1,3,112,112] | [1,512] float32 | MIT | InsightFace buffalo_l |
| PAD | MiniFASNetV2 | ~600 KB (INT8) | [1,3,80,80] | [1,3] (real/2d-spoof/3d-spoof) | Apache 2.0 | yakhyo/face-anti-spoofing |
| MAD | SelfMAD HRNet-W18 | ~85 MB | [1,3,384,384] | [1,2] logits (genuine/morph) | Research | LeonTodorov/SelfMAD — auto-exported via scripts/export_mad.py |
| Quality | eDifFIQA(T) | ~2 MB | [1,3,112,112] | [1,1] quality scalar | CC-BY-4.0 | OpenCV Zoo |
Detailed Analysis
Section titled “Detailed Analysis”1. Face Detection — SCRFD_10G_BNKPS
Section titled “1. Face Detection — SCRFD_10G_BNKPS”Chosen: SCRFD_10G with bounding boxes, keypoints, and batch-norm fused kernels.
Why SCRFD over alternatives:
- RetinaFace (2019): Predecessor architecture. SCRFD supersedes it in speed and accuracy on WiderFace. RetinaFace uses a heavier ResNet-50 backbone (~100 MB) and lacks the BNKPS NAS-optimised blocks.
- YOLO-Face / YOLOv8-Face (2023): Fast single-stage detectors but optimised for general object detection. On WiderFace Hard, SCRFD_10G achieves 92.3% AP vs ~90% for YOLOv8-face. YOLO models also lack native 5-point landmark output needed for alignment.
- MTCNN (2016): Classic cascade detector. Much slower due to 3-stage cascade. Not competitive on accuracy (WiderFace Hard ~60% AP). Only useful for legacy systems.
- CenterFace (2019): Lightweight anchor-free detector. Good speed but lower accuracy (~88% WiderFace Hard). Insufficient for production gallery enrolment quality.
Key properties:
- WiderFace Hard AP: 92.3%
- 5-point landmark output (eyes, nose, mouth corners) for alignment
- ONNX-native, no custom ops
- GFLOPs: ~10 (suitable for CPU inference)
Download: Available as det_10g.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.
2. Face Recognition — w600k_r50 (ArcFace-trained ResNet-50)
Section titled “2. Face Recognition — w600k_r50 (ArcFace-trained ResNet-50)”Chosen: ResNet-50 trained with ArcFace loss on WebFace600K (600K identities, 12.5M images).
Why w600k_r50 over alternatives:
- AdaFace ViT-Base (2022): Quality-adaptive margin loss with ViT backbone. Excellent on low-quality benchmarks (IJB-S). However, the official ONNX export is not straightforward — requires custom attention export. The ResNet-50 ArcFace model is drop-in ONNX compatible and within 1-2% accuracy on standard benchmarks.
- AdaFace IR-101 (2022): ResNet-101 variant. Higher accuracy (+0.5% on IJB-C) but 2x model size (~350 MB). Diminishing returns for the extra compute.
- TopoFR (2024): Topological regularisation approach. SOTA on several benchmarks but no public ONNX model. PyTorch-only with custom loss functions. Would require significant export effort.
- ArcFace IR-SE-100 (2019): Strong baseline but trained on MS1MV2 (85K identities) — smaller training set than WebFace600K.
- buffalo_l w600k_r50: Production-proven in the InsightFace ecosystem, widely deployed, known ONNX compatibility.
Key properties:
- LFW: 99.83%, CFP-FP: 98.50%, AgeDB-30: 98.17%
- 512-dim L2-normalised embedding (matches our template spec)
- ResNet-50 backbone → fast CPU inference (~15ms on AVX2)
- Trained on WebFace600K — largest clean public training set
Download: Available as w600k_r50.onnx inside the buffalo_l.zip model pack from InsightFace HuggingFace.
Future upgrade path: When AdaFace or TopoFR publish official ONNX exports, benchmark against w600k_r50 on our target population. The pipeline is model-agnostic — swap the ONNX file and update model_id in config.
3. Presentation Attack Detection — MiniFASNetV2
Section titled “3. Presentation Attack Detection — MiniFASNetV2”Chosen: MiniFASNetV2 with INT8 quantisation from the face-anti-spoofing repository.
Why MiniFASNetV2 over alternatives:
- CDCN (Central Difference Convolution Network) (2020): Higher accuracy on OULU-NPU Protocol 4 (ACER 0.2% vs MiniFAS 1.2%) but 10x larger model (~6 MB) and slower inference. Also requires depth map supervision during training.
- FAS-SGTD (Spatio-Temporal Depth) (2020): Requires multi-frame video input. Not compatible with our single-image pipeline.
- FLIP-MCL (2023): Foundation-model-based PAD. Very high accuracy but requires a ViT-L backbone (~300 MB). Overkill for a fast inline gate.
- Silent Face Anti-Spoofing (Minivision) (2020): The original MiniFASNet source. Our chosen repo provides cleaner ONNX export and INT8 quantisation.
Key properties:
- Model size: ~600 KB (INT8 ONNX) — negligible latency overhead
- 3-class output: real, 2D-spoof (print/screen), 3D-spoof (mask)
- OULU-NPU Protocol 1 ACER: < 2%
- ISO 30107-3 Level 1 compliant for print/replay attacks
Limitation: Lower accuracy on sophisticated 3D mask attacks. For high-security deployments, consider CDCN as an upgrade.
Download: Pre-exported ONNX from the yakhyo/face-anti-spoofing GitHub releases.
4. Morphing Attack Detection — SelfMAD HRNet-W18
Section titled “4. Morphing Attack Detection — SelfMAD HRNet-W18”Chosen: SelfMAD with HRNet-W18 backbone, auto-exported to ONNX.
Why SelfMAD HRNet-W18 over alternatives:
- MorphBuster (MixFaceNet) (2023): Lighter (~5 MB) but pretrained weights require researcher access form (gated download). Architecture and input specs poorly documented.
- MADation (CLIP ViT + LoRA) (2023): Also gated access. CLIP-based models are heavier and LoRA weight merging complicates ONNX export.
- SPL-MAD (2023): Public checkpoint available but architecture/input specs undocumented. Would require code inspection.
- MAD-DDPM (diffusion-based): Computationally expensive, not suitable for real-time inline use.
- Differential MAD (D-MAD): Requires both probe and trusted reference image. Not applicable to our single-image pipeline.
- SelfMAD HRNet-W18 (2024): Publicly downloadable checkpoint (no access gate), uses
timmlibrary (trivial ONNX export), SOTA single-image MAD performance (D-EER < 5% on FRGC-Morph).
Key properties:
- Single-image detection (no reference needed)
timm.create_model('hrnet_w18', num_classes=2)— clean architecture- Input:
[1, 3, 384, 384], preprocessing: divide by 255 only (no ImageNet mean/std) - Output:
[1, 2]raw logits —softmax(logits)[1]= P(morphed) - ~85 MB ONNX model, ~21M parameters
Limitation: MAD is an active research area. No single model dominates across all morph generation methods (landmark-based, GAN-based, diffusion-based). Re-evaluate quarterly.
Download: Automated via scripts/export_mad.py — downloads checkpoint from Google Drive, exports to ONNX, verifies against PyTorch output. Run make models or python scripts/export_mad.py directly.
5. Quality Assessment — eDifFIQA(T)
Section titled “5. Quality Assessment — eDifFIQA(T)”Chosen: eDifFIQA Tiny from the OpenCV Model Zoo — pre-exported ONNX, direct download.
Why eDifFIQA(T) over alternatives:
- CR-FIQA(L) (2023): Strong accuracy (Pearson r > 0.85 with recognition error). But only available as PyTorch checkpoint requiring manual ONNX export with custom
IResNetarchitecture code. Large model (~170 MB, iresnet100 backbone). - CR-FIQA(S) (2023): Lighter iresnet50 variant but still needs manual export.
- OFIQ (BSI) (2024): Official ISO 29794-5 reference implementation. C++ only, requires building from source, designed as standalone CLI tool. Best used for offline compliance reporting, not inline quality gating.
- SER-FIQ (2020): Requires 10-100x forward passes through the recognition model. Too slow for inline use.
- MagFace (2021): Couples quality to a specific recognition model’s magnitude.
- eDifFIQA(L) (2024): Ranked #1 on NIST FATE-Quality Kiosk-to-Entry. Available as PyTorch weights on HuggingFace (iresnet100 backbone, needs export).
- eDifFIQA(T) (2024): Already exported to ONNX in the OpenCV Model Zoo. MobileFaceNet backbone (~2 MB). CC-BY-4.0 license. Zero export effort.
Key properties:
- Pre-exported ONNX — direct HTTP download, no PyTorch/timm dependency
- MobileFaceNet backbone — tiny model, fast inference
- Input:
[1, 3, 112, 112], normalised with mean=0.5, std=0.5 - Output: scalar quality score (higher = better quality)
- CC-BY-4.0 license
OFIQ for compliance: The OfiqQualityAdapter in the codebase still supports the BSI OFIQ CLI binary for offline ISO 29794-5 compliance reporting. Use EdiffiqaAdapter for the inline pipeline quality gate.
Future upgrade path: If higher accuracy is needed, export eDifFIQA(L) from HuggingFace weights using a script similar to export_mad.py.
Download: Direct ONNX download from OpenCV Zoo via scripts/download_models.py.
Download Script
Section titled “Download Script”All 5 models are fully automated — no manual steps required:
make models # Download all models (or: python scripts/download_models.py)make models-dummy # Generate minimal dummies for testing (no network)What happens:
- Detection, Recognition, PAD, Quality — direct ONNX download via HTTP
- MAD —
scripts/export_mad.pydownloads the SelfMAD HRNet-W18 PyTorch checkpoint from Google Drive, builds the model viatimm, exports to ONNX, and verifies output correctness
Models are saved to models/ (git-ignored). The script supports:
--dummyflag to generate minimal ONNX models for testing without network access--forceflag to re-download/re-export even if files exist- SHA-256 verification of downloaded files
Dependencies for MAD export: torch, timm, gdown (installed automatically if missing)
Upgrading Models
Section titled “Upgrading Models”To evaluate a new model:
- Add the ONNX file to
models/with a descriptive name - Update
src/infra/config.pywith the new model path andmodel_id - Run the benchmark suite:
make bench - Compare accuracy metrics against the current model
- Update this document with findings