Face Detection
Research compiled: 2026-02-20
1. Benchmark: WIDER Face
Section titled “1. Benchmark: WIDER Face”The standard benchmark for face detection is WIDER FACE, containing 32,203 images with 393,703 annotated faces across three difficulty splits: Easy, Medium, and Hard. The Hard split is most challenging (small, occluded, low-resolution faces). All mAP values below are AP (average precision) on the validation set unless stated otherwise.
2. Top Detection Models
Section titled “2. Top Detection Models”2.1 SCRFD (Sample and Computation Redistribution for Efficient Face Detection)
Section titled “2.1 SCRFD (Sample and Computation Redistribution for Efficient Face Detection)”- Paper: arXiv:2105.04714 — published ICLR 2022
- Source: InsightFace / deepinsight
- Key idea: Redistributes training samples to harder detection stages (Sample Redistribution) and reallocates compute between backbone/neck/head (Computation Redistribution).
Model family performance on WIDER Face validation:
| Model | GFLOPs | Easy | Medium | Hard | Notes |
|---|---|---|---|---|---|
| SCRFD_500M | 0.5 | 90.57 | 88.12 | 68.51 | 0.57M params, ultra-lightweight |
| SCRFD_2.5GF | 2.5 | ~93 | ~91 | 77.87 | Post SR+CR |
| SCRFD_10G | 10 | 95.16 | 93.87 | 83.05 | Strong accuracy |
| SCRFD_34GF | 34 | ~96 | ~95 | ~90 | Beats TinaFace by 4.78% on Hard, 3× faster |
Inference speed: SCRFD_500MF and RetinaFace-MobileNet0.25 both achieve ~42–46 ms on VGA (640×480) images on CPU.
Landmarks: Outputs 5 keypoints (eyes, nose, mouth corners).
Production readiness: Yes — ONNX export supported. Part of InsightFace. Widely used in production pipelines. Note: buffalo_l and similar pretrained packages are non-commercial only; a commercial license is required for production use.
2.2 RetinaFace
Section titled “2.2 RetinaFace”- Paper: CVPR 2020 — classic single-stage multi-task face detector
- Source: deepinsight/insightface
Performance:
| Backbone | Easy | Medium | Hard |
|---|---|---|---|
| MobileNet-0.25 | ~91.7 | ~89 | ~72 |
| ResNet-50 | ~95 | ~93 | ~83 |
Inference speed: RetinaFace-MobileNet0.25: ~42 ms on VGA (CPU); ResNet-50 variant is slower.
Landmarks: 5 keypoints. Also predicts 3D dense alignment (68 points via additional head in some variants).
Production readiness: Yes — ONNX export. Well-established, many community wrappers.
2.3 YOLO5Face / YOLOv8-Face / YOLOv11-Face / YOLOv12-Face
Section titled “2.3 YOLO5Face / YOLOv8-Face / YOLOv11-Face / YOLOv12-Face”YOLO-based detectors adapted for face detection are actively maintained community efforts.
YOLO5Face (2021, still widely used)
Section titled “YOLO5Face (2021, still widely used)”- Paper: arXiv:2105.12931
- YOLOv5x6 backbone achieves 96.67 / 95.08 / 86.55 (Easy/Medium/Hard) — among the best at time of release.
- Outputs 5 facial landmarks.
YOLOv8-Face (2023–2024)
Section titled “YOLOv8-Face (2023–2024)”- Community implementations: lindevs/yolov8-face, yakhyo/yolov8-face-onnx-inference
- Performance on WIDER Face (val):
| Model | Easy | Medium | Hard |
|---|---|---|---|
| YOLOv8n-Face | 94.5–94.6 | 92.2–92.3 | 79.0–79.6 |
| YOLOv8-Lite-s | 93.4 | 91.2 | 78.6 |
| YOLOv8-Lite-t | 90.4 | 87.7 | 73.3 |
- Outputs 5 landmarks.
- ONNX export: Yes (ONNX models are ~2× size of PyTorch due to serialization format).
YOLOv11 / YOLOv12-Face (2024–2025)
Section titled “YOLOv11 / YOLOv12-Face (2024–2025)”- YOLOv11 (nano) marginally outperforms YOLOv12 on precision/mAP50 for face detection.
- YOLOv12 introduces Area Attention (A2) module + FlashAttention — achieves higher mAP at all scales with similar or better latency.
- ONNX YOLOv12-Face models released December 2025.
- Forensic face detection study (2025): YOLOv12 achieves superior latency and precision vs YOLOv8/YOLOv10 baselines on WIDER FACE subset.
Production readiness: Yes — all YOLO variants support ONNX/CoreML/TFLite export.
2.4 YuNet (OpenCV Built-in)
Section titled “2.4 YuNet (OpenCV Built-in)”- Paper: YuNet: A Tiny Millisecond-level Face Detector — published Machine Intelligence Research 2023
- Source: opencv/opencv_zoo
Performance:
| Metric | Value |
|---|---|
| WIDER Face Easy (AP) | 88.44% |
| WIDER Face Medium (AP) | 86.56% |
| WIDER Face Hard (AP) | 75.03% |
| Hard mAP (single-scale) | 81.1% |
Speed: ~1.6 ms per frame at 320×320 on Intel i7-12700K (CPU); ~5 ms vs. 25 ms for traditional Cascade methods.
Model size: Only 75,856 parameters — less than 1/5 of other small detectors.
Landmarks: 5 keypoints.
Production readiness: Excellent — ships natively with OpenCV DNN module, no extra dependencies. Zero-cost deployment. Ideal for edge/serverless. ONNX export supported.
2.5 BlazeFace / MediaPipe Face Detection + Face Mesh
Section titled “2.5 BlazeFace / MediaPipe Face Detection + Face Mesh”- Source: Google / MediaPipe
- Architecture: BlazeFace (lightweight, SSD-inspired, GPU-friendly anchor scheme) + separate 3D landmark model.
BlazeFace performance:
- Competitive accuracy to heavier models.
- 200–1000+ FPS on high-end mobile phones (GPU-accelerated).
- Designed for real-time mobile/browser inference (TFLite, WebAssembly, GPU delegate).
Face Mesh (landmark model):
- Outputs 468 (legacy) or 478 3D face landmarks in real-time on mobile.
- Operates on face crops from BlazeFace detector.
- Includes iris landmarks in the 478-point version.
Landmarks: 5 (BlazeFace detector) → then 478 3D (Face Mesh landmark model).
Production readiness: Excellent — Google-maintained, used in billions of devices. TFLite + WASM. Not ONNX natively (TFLite format; community conversions exist).
2.6 InsightFace Buffalo Pack (Production Bundle)
Section titled “2.6 InsightFace Buffalo Pack (Production Bundle)”The buffalo_l model pack bundles:
- Detection: SCRFD_10G (ONNX)
- 3D Landmark: 1k3d68.onnx — 68 3D landmark predictor
- Recognition: ArcFace R100 (ONNX)
- Attribute: gender/age model
Key detail: buffalo_l is widely used in open-source projects (e.g., immich) but is non-commercial research only. Commercial licensing available separately.
2.7 ASFD (Automatic and Scalable Face Detector)
Section titled “2.7 ASFD (Automatic and Scalable Face Detector)”- Paper: arXiv:2201.10781 — Tencent
- ASFD-D6 achieves ~96.7 / 96.2 / 92.1 (Easy/Medium/Hard test set) — near top of Papers with Code leaderboard.
- Large model (ResNeXt + NAS-searched neck), primarily a research benchmark leader.
- Not widely used in production pipelines.
3. Face Landmark Detection: Comparison by Point Count
Section titled “3. Face Landmark Detection: Comparison by Point Count”| Points | Model Examples | Use Cases | Tradeoffs |
|---|---|---|---|
| 5 | SCRFD, RetinaFace, YuNet, YOLOv8-Face, BlazeFace | Face alignment for recognition, crop/warp preprocessing | Fastest; sufficient for alignment & recognition |
| 68 | dlib, 1k3d68 (InsightFace), face-alignment lib (adrianbulat) | Facial analysis, expression, detailed geometry | ~99.7 MB (dlib); 8–10% slower than 5-point |
| 468/478 | MediaPipe Face Mesh, TF face-landmarks-detection | Face swap, AR, expression detection, 3D reconstruction | Full face mesh; mobile-optimized (TFLite); ~9 MB TFLite model |
Research finding (2023): 68 landmarks are efficient for 3D face alignment — adding more points shows diminishing returns for face recognition downstream tasks. 5-point alignment is the practical standard for recognition pipelines.
CVPR 2025: T-FAKE paper demonstrates accurate 70 and 478-point landmark prediction in challenging conditions (thermal images), suggesting dense landmark detection is maturing.
ICCV 2025: “Heatmap Regression without Soft-Argmax for Facial Landmark Detection” advances accuracy on standard benchmarks beyond previous SOTA (STAR method).
4. NIST Evaluation Context
Section titled “4. NIST Evaluation Context”NIST runs the Face Recognition Technology Evaluation (FRTE), which focuses on recognition accuracy, not detection in isolation. The FRTE FIVE track covers face detection in video.
Key result (April 2025): NEC ranked #1 in 1:N Identification on 12M-person gallery with 0.07% authentication error rate — but this is recognition, not detection.
Detection quality is evaluated separately via the FATE Quality program.
5. Summary Comparison Table
Section titled “5. Summary Comparison Table”| Model | WF Easy | WF Medium | WF Hard | Speed | Landmarks | Size | ONNX | Production |
|---|---|---|---|---|---|---|---|---|
| SCRFD_500M | 90.6 | 88.1 | 68.5 | ~46 ms CPU VGA | 5 | ~1MB | Yes | Yes* |
| SCRFD_10G | 95.2 | 93.9 | 83.1 | ~80 ms CPU VGA | 5 | ~17MB | Yes | Yes* |
| RetinaFace MN0.25 | 91.7 | 89.0 | 72.0 | ~42 ms CPU VGA | 5 | ~2MB | Yes | Yes |
| RetinaFace R50 | 95.0 | 93.0 | 83.0 | slower | 5 | ~105MB | Yes | Yes |
| YuNet | 88.4 | 86.6 | 75.0 | 1.6 ms i7 320px | 5 | <1MB | Yes | Excellent |
| YOLOv8n-Face | 94.5 | 92.2 | 79.0 | fast | 5 | ~6MB | Yes | Yes |
| YOLOv12-Face | ~95+ | ~93+ | ~80+ | fast | 5 | varies | Yes | Yes |
| BlazeFace | competitive | competitive | — | 200-1000+ FPS mobile | 5 | ~2MB TFLite | No (TFLite) | Yes (mobile) |
| MediaPipe FaceMesh | N/A (landmark model) | — | — | real-time mobile | 478 3D | ~9MB TFLite | No (TFLite) | Yes (mobile) |
| InsightFace 1k3d68 | N/A (landmark model) | — | — | ~5ms GPU | 68 3D | ~72MB | Yes | Yes* |
| ASFD-D6 | 96.7 | 96.2 | 92.1 | slow | 5 | large | No | Research |
| YOLO5Face (YOLOv5x6) | 96.7 | 95.1 | 86.6 | moderate | 5 | large | Yes | Yes |
*Non-commercial license for InsightFace pretrained models; commercial license available.
6. UFME Recommendations
Section titled “6. UFME Recommendations”For UFME-specific recommendations based on this research, see Executive Summary.
Sources
Section titled “Sources”- WIDER Face Benchmark — Papers with Code (Hard)
- SCRFD paper — arXiv:2105.04714
- InsightFace SCRFD page
- YuNet paper — Springer Machine Intelligence Research
- YuNet OpenCV Zoo
- YOLOv8-Face ONNX inference
- lindevs/yolov8-face — Pre-trained models
- MediaPipe Face Landmarker Guide
- InsightFace GitHub
- buffalo_l on HuggingFace
- 68 landmarks sufficient for 3D alignment — PMC
- ASFD paper — arXiv:2201.10781
- NIST FRTE 1:1 Verification
- NIST FRTE FIVE (video detection)
- YOLO5Face paper — arXiv:2105.12931
- SCRFD vs RetinaFace community comparison — InsightFace Issue #1639
- OpenCV Face Detection: Cascade vs YuNet