Face Recognition
Research compiled February 2026. Covers backbone architectures, training losses, benchmark results, and production deployment considerations.
1. Evolution of Loss Functions
Section titled “1. Evolution of Loss Functions”1.1 Margin-Based Softmax Losses (Chronological)
Section titled “1.1 Margin-Based Softmax Losses (Chronological)”| Loss | Year | Key Innovation | Margin Type |
|---|---|---|---|
| Softmax | baseline | Classification logits | None |
| SphereFace | 2017 | Angular margin in L2-normalized space | Multiplicative angular |
| CosFace / LMCL | 2018 | Additive cosine margin (simpler optimization) | Additive cosine |
| ArcFace | 2019 | Additive angular margin (geodesic correspondence) | Additive angular |
| Sub-center ArcFace | 2020 | K sub-centers per class, robust to label noise | Additive angular + sub-centers |
| MagFace | 2021 | Magnitude-aware margin; feature norm as quality proxy | Adaptive |
| ElasticFace | 2022 | Random margin from normal distribution; stochastic flexibility | Elastic (stochastic) |
| AdaFace | 2022 | Quality-adaptive margin via feature norm approximation | Adaptive (norm-based) |
| UniTSFace / USS | 2023 | Unified threshold S2S loss; explicit positive/negative separation threshold | Sample-to-sample |
| TopoFR | 2024 | Topological structure alignment (PTSA); persistent homology | Topology-aware |
1.2 ArcFace (2019, CVPR)
Section titled “1.2 ArcFace (2019, CVPR)”- Key idea: additive angular margin
madded to the angle between embedding and class center in the cosine space - Formula:
L = -log(e^{s·cos(θ+m)} / (e^{s·cos(θ+m)} + Σe^{s·cos(θ_j)})) - Angular margin has exact geodesic correspondence on hypersphere
- Linear angular margin throughout interval (unlike SphereFace/CosFace non-linear)
- Scale
s=64, marginm=0.5common settings - Reference: ArcFace paper
1.3 AdaFace (2022, CVPR Oral)
Section titled “1.3 AdaFace (2022, CVPR Oral)”- Key idea: image quality approximated by feature norm; emphasizes easy samples for low-quality images and hard samples for high-quality images
- Low norm → emphasize near-boundary samples; high norm → emphasize away-from-boundary samples
- Particularly effective on low-quality / long-range capture scenarios
- Outperforms ArcFace on IJB-B, IJB-C, IJB-S, TinyFace
- Domain studies show AdaFace strongest on long-range/remote domains
- Reference: AdaFace arXiv | MSU Project Page | GitHub
1.4 MagFace (2021, CVPR Oral)
Section titled “1.4 MagFace (2021, CVPR Oral)”- Key idea: auxiliary loss promotes larger feature magnitude for higher-quality faces
- Feature magnitude serves as face quality score (weak assumption of optimal quality)
- High-quality faces pulled to class center; low-quality pushed away
- Combined with ArcFace loss for quality-aware compactness
- Reference: GitHub
1.5 ElasticFace (2022, CVPRW)
Section titled “1.5 ElasticFace (2022, CVPRW)”- Key idea: random margin
m ~ N(μ, σ²)drawn from normal distribution per sample per iteration - Extended with guidance to focus harder classification samples
- Advances SOTA on 7/9 mainstream benchmarks
- Reference: arXiv | GitHub
1.6 UniTSFace / USS Loss (2023, NeurIPS)
Section titled “1.6 UniTSFace / USS Loss (2023, NeurIPS)”- Key idea: unified threshold for positive/negative separation in sample-to-sample loss; combines S2S with sample-to-class softmax
- Overcomes pairing complexity of pure S2S methods
- Architecture: ResNet backbone + USS + cosine-margin Softmax
- Pretrained R50 on CASIA-WebFace: 99.53% LFW, 50.25% MR-All
- Outperforms CosFace, ArcFace, VPL, AnchorFace, UNPG
- Reference: arXiv | GitHub | NeurIPS 2023
2. Backbone Architectures
Section titled “2. Backbone Architectures”2.1 CNN Backbones
Section titled “2.1 CNN Backbones”ResNet-based (InsightFace / ArcFace series)
Section titled “ResNet-based (InsightFace / ArcFace series)”- IResNet (Identity Residual Network): IR-18, IR-34, IR-50, IR-100, IR-200
- IR-100 on MS1M + ArcFace: LFW 99.83%, CFP-FP 98.74%, AgeDB-30 98.28%, IJB-B 96.21% TAR@FAR=1e-4
- IR-100 on Glint360K: IJB-C 97.32% TAR@FAR=1e-4 (vs 96.21% on MS1M)
- Embedding: 512-dim standard
Sub-center ArcFace (ECCV 2020)
Section titled “Sub-center ArcFace (ECCV 2020)”- K sub-centers per class; robust to noisy web training data
- Enables training on Celeb500K noisy web data
- SOTA on IJB-B, IJB-C, MegaFace
GhostFaceNets
Section titled “GhostFaceNets”- Lightweight: GhostNet modules + ArcFace loss
- LFW: 99.73%, AgeDB-30: 98.00%, CFP-FP: 96.83%
- Good accuracy/efficiency trade-off for mobile
2.2 Lightweight / Edge CNNs
Section titled “2.2 Lightweight / Edge CNNs”EdgeFace (IEEE T-BIOM 2024)
Section titled “EdgeFace (IEEE T-BIOM 2024)”- Architecture: EdgeNeXt hybrid (CNN+Transformer) + Low-Rank Linear (LoRaLin) module
- LoRaLin replaces FC layers with two low-rank matrices
- Params: 1.77M parameters
- LFW: 99.73%, IJB-B: 92.67%, IJB-C: 94.85%
- Won IJCB 2023 Efficient Face Recognition Competition (compact track, <2M params)
- ONNX-exportable; targets edge/embedded devices
- Reference: arXiv | GitHub | HuggingFace
PocketNet
Section titled “PocketNet”- Neural Architecture Search (DARTS) on CASIA-WebFace
- Multi-step knowledge distillation training
- Serves as lightweight baseline for comparison
MobileFaceNet / MobileNet / EfficientNet-B0
Section titled “MobileFaceNet / MobileNet / EfficientNet-B0”- Common lightweight baselines
- Suitable for mobile deployment
2.3 Vision Transformer (ViT) Backbones
Section titled “2.3 Vision Transformer (ViT) Backbones”TransFace (ICCV 2023)
Section titled “TransFace (ICCV 2023)”- Key insight: ViTs are data-hungry; vulnerable on large-scale face data due to overfitting
- EHSM (Entropy-aware Hard Sample Mining): uses information entropy in local tokens to weight hard vs easy samples
- DPAP (Dominant Patch Amplitude Perturbation): top-K dominant patches randomly perturbed in amplitude for diversity
- Achieves stable ViT training for face recognition
- TransFace++ extends to operate on raw image bytes
- Reference: ICCV 2023 paper | GitHub
LVFace (ICCV 2025 Highlight) — ByteDance
Section titled “LVFace (ICCV 2025 Highlight) — ByteDance”- Architecture: ViT backbone with Progressive Cluster Optimization (PCO)
- PCO stages:
- Negative Class Sub-sampling (NCS): robust, fast alignment from initialization
- Feature Expectation Penalties: centroid stabilization
- Cluster Boundary Refinement: full-batch training without NCS constraints
- Training: WebFace42M, 64 GPUs, AdamW optimizer
- Results: SOTA, surpasses UniFace and TopoFR across multiple benchmarks
- 1st place ICCV 2021 MFR-Ongoing challenge (academic track)
- Reference: arXiv | GitHub
2.4 Topology-Aware
Section titled “2.4 Topology-Aware”TopoFR (NeurIPS 2024)
Section titled “TopoFR (NeurIPS 2024)”- Key idea: encodes topological structure from training data into latent space using persistent homology
- PTSA (Persistent Topology Structure Alignment): aligns topological structures of input space and embedding space
- SDE (Structure Damage score): identifies hard samples by measuring structure damage
- Addresses structure collapse from overfitting
- 2nd place ICCV21 MFR-Ongoing challenge (as of May 2024)
- Reference: arXiv | NeurIPS 2024 | GitHub
3. Benchmark Results Summary
Section titled “3. Benchmark Results Summary”3.1 Standard Benchmarks
Section titled “3.1 Standard Benchmarks”| Model | Backbone | Loss | LFW | CFP-FP | AgeDB-30 | IJB-B TAR@1e-4 | IJB-C TAR@1e-4 | Embed Dim |
|---|---|---|---|---|---|---|---|---|
| ArcFace | IR-50 | ArcFace | 99.77% | 98.27% | 97.90% | ~94.0% | ~96.0% | 512 |
| ArcFace | IR-100 (MS1M) | ArcFace | 99.83% | 98.74% | 98.28% | 96.21% | ~97.0% | 512 |
| ArcFace | IR-100 (Glint360K) | ArcFace | ~99.85% | ~98.9% | ~98.5% | ~96.5% | 97.32% | 512 |
| AdaFace | IR-101 | AdaFace | 99.82% | 98.49% | 98.05% | 96.03% | 97.39% | 512 |
| Sub-center ArcFace | R-100 | Sub-ArcFace | ~99.8% | ~98.5% | ~98.3% | — | — | 512 |
| UniTSFace | R-50 | USS+ArcFace | 99.53% | — | — | — | — | 512 |
| TransFace | ViT-B | ArcFace/EHSM | ~99.8% | ~98.7% | ~98.3% | ~96.5% | ~97.5% | 512 |
| TopoFR | R-100 | PTSA+SDE | ~99.85%+ | — | — | — | SOTA+ | 512 |
| LVFace | ViT | PCO | SOTA | SOTA | SOTA | SOTA | SOTA | 512 |
| EdgeFace | EdgeNeXt | ArcFace | 99.73% | — | — | 92.67% | 94.85% | 512 |
| GhostFaceNet | GhostNet | ArcFace | 99.73% | 96.83% | 98.00% | — | — | 512 |
| ElasticFace | R-100 | Elastic | ~99.8%+ | ~98.5%+ | ~98.3%+ | — | SOTA (7/9) | 512 |
Note: exact numbers vary by training data (MS1M vs Glint360K vs WebFace42M) and implementation. See original papers for definitive figures.
3.2 IJB-C / IJB-S Domain-Specific Analysis
Section titled “3.2 IJB-C / IJB-S Domain-Specific Analysis”From OODFace robustness study (2024):
- AdaFace: best robust accuracy under appearance variations; best on long-range/remote domains
- CosFace-IR / ArcFace-IR: tied top performers for overall clean accuracy (avg 97.17%)
- AdaFace: highest average accuracy among open-source models across 10 appearance variation subcategories
- Domain-specific trained models outperform zero-shot foundation models on all face benchmarks
3.3 LFW Near-Saturation
Section titled “3.3 LFW Near-Saturation”LFW accuracy is near-saturated (>99.7% for top models); IJB-C TAR@FAR=1e-4 and IJB-S are more discriminative for SOTA comparisons.
4. Embedding Dimensions
Section titled “4. Embedding Dimensions”| Dimension | Notes |
|---|---|
| 512 | Industry standard for ArcFace, AdaFace, IR-series; best accuracy for large models |
| 256 | Some architectures find 256 optimal (FN8 model); smaller storage footprint |
| 128 | FaceNet standard; ~87.9% on FaceNet tasks; information sparse — 64-dim may suffice |
| 64 | Can accommodate most key face information (embedding is sparse) |
Key insight: Increasing from 128→512 in FaceNet degraded accuracy; larger embeddings may need more training. For SOTA large models (IR-100, ViT), 512-dim is standard. For edge/mobile, 128-256 with distillation.
5. Large-Scale Training: PartialFC
Section titled “5. Large-Scale Training: PartialFC”- PartialFC (InsightFace): sparse model-parallel architecture; dynamic subset sampling of class centers
- Trains 10M+ identities on 8 GPUs vs ArcFace max ~1M identities
- Achieves up to 29M identities (largest to date)
- Maintains same accuracy while providing several-times faster training
- Lower GPU memory utilization vs full softmax
- Reference: Analytics Vidhya guide
6. Synthetic Data for Training (FRCSyn)
Section titled “6. Synthetic Data for Training (FRCSyn)”- FRCSyn Challenge (WACV 2024 / CVPR 2024): explores synthetic data for face recognition training
- Addresses: data privacy, demographic bias, generalization to novel scenarios (age, pose, occlusion)
- Task 1: demographic bias mitigation; Task 2: overall performance with synthetic data
- 2nd edition FRCSyn-onGoing ongoing (post-CVPR 2024)
- Winning solutions use diffusion-model-generated face images mixed with real data
- Reference: FRCSyn CVPR 2024 | arXiv
7. Production / Deployment
Section titled “7. Production / Deployment”7.1 NIST FRVT / FRTE Top Performers (2025)
Section titled “7.1 NIST FRVT / FRTE Top Performers (2025)”| Rank | Vendor | Notable Result |
|---|---|---|
| #1 (1:N) | NEC | 0.07% error rate, 12M-person database; #1 in aging tests (10+ and 12+ years) |
| #3 (1:1) | ROC | #1 Western vendor 1:1 Verification and Investigative Search |
| Top 11 | Keyless | 99.93% accuracy on 1.6M-identity database |
| Top-ranked | KBY-AI | Top global rank in FRVT 1:1 |
| Consistent | Innovatrics | Top performer in every NIST FRVT category |
Source: NIST FRVT | NEC press release | ROC announcement
7.2 ONNX / Runtime Deployment
Section titled “7.2 ONNX / Runtime Deployment”- InsightFace buffalo bundles: 5 pre-packaged ONNX model sets (40–150 MB); range from 1400 FPS (edge) to 350 FPS (best accuracy)
- InspireFace C/C++ SDK (2024): supports ARM, x86, CUDA, OpenCL, RKNN backends
- FaceONNX: complete face recognition + analytics library on ONNX Runtime
- Common ONNX optimizations:
- FP16 TensorRT: 1.8× extra FPS, <0.05% accuracy drop (buffalo_l)
- INT8 quantization: 4× smaller model, ArcFace embedding error +0.02%
- ONNX Runtime: ~3.2× speedup at batch=8 vs batch=1
7.3 ONNX Exportability
Section titled “7.3 ONNX Exportability”| Model | ONNX-exportable | Notes |
|---|---|---|
| ArcFace / IResNet | Yes | InsightFace ships ONNX models |
| AdaFace | Yes | Standard PyTorch→ONNX |
| EdgeFace | Yes | Edge-optimized, HuggingFace ONNX |
| GhostFaceNet | Yes | Lightweight CNN |
| MobileFaceNet | Yes | Mobile-first |
| TransFace / ViT-based | Yes (with opset ≥14) | Self-attention exportable in recent opsets |
| LVFace | Likely yes | ViT, standard PyTorch |
| TopoFR | Inference only; topology computed at training | Yes for inference model |
8. ViT vs CNN for Face Recognition: 2025 State
Section titled “8. ViT vs CNN for Face Recognition: 2025 State”- ViTs outperform CNNs in 13/15 performance evaluations including face recognition (when pretrained on large data)
- Fine-tuned ViTs on large datasets (WebFace42M, Glint360K) beat ResNet and EfficientNet families
- CNNs remain competitive in low-data regimes and resource-constrained deployment
- Hybrid models (ConvNeXt, CoAtNet, EdgeNeXt) increasingly popular: combine CNN efficiency with ViT global context
- ViT challenge: data-hungry; prone to overfitting face data → TransFace’s EHSM/DPAP and LVFace’s PCO address this
- For edge deployment: lightweight CNNs (EdgeFace, GhostFaceNet, MobileFaceNet) still preferred
9. Key Trends and Recommendations
Section titled “9. Key Trends and Recommendations”For Maximum Accuracy
Section titled “For Maximum Accuracy”- Loss: AdaFace or TopoFR (best robustness); ArcFace strong baseline
- Backbone: IR-100 or ViT with PCO/EHSM on WebFace42M or Glint360K
- Training scale: PartialFC for >1M identity datasets
- Expected: LFW ~99.8%+, IJB-C TAR@FAR=1e-4 ~97%+
For Edge/Mobile
Section titled “For Edge/Mobile”- EdgeFace: best <2M param model (IJB-C 94.85%)
- GhostFaceNet: comparable accuracy, GhostNet efficiency
- MobileFaceNet: proven mobile baseline
- Loss: ArcFace or AdaFace (both export to ONNX fine)
For Noisy Training Data
Section titled “For Noisy Training Data”- Sub-center ArcFace: robust to label noise via K sub-centers
- AdaFace: quality-adaptive emphasis avoids unidentifiable noisy images
For Production Deployment
Section titled “For Production Deployment”- InsightFace + ONNX + TensorRT is standard production stack
- buffalo_l (best accuracy) vs buffalo_s (edge speed)
- INT8 quantization viable with minimal accuracy loss
10. Reference Links
Section titled “10. Reference Links”- ArcFace (CVPR 2019)
- AdaFace (CVPR 2022)
- AdaFace GitHub
- MagFace (CVPR 2021)
- ElasticFace (CVPRW 2022)
- Sub-center ArcFace (ECCV 2020)
- UniTSFace (NeurIPS 2023)
- TransFace (ICCV 2023)
- TopoFR (NeurIPS 2024)
- LVFace (ICCV 2025)
- EdgeFace (IEEE T-BIOM 2024)
- GhostFaceNets overview
- InsightFace (deepinsight)
- PartialFC guide
- NIST FRVT
- FRCSyn CVPR 2024
- Papers With Code - LFW SOTA
- LVFace GitHub (ByteDance)