Face Recognition

Research compiled February 2026. Covers backbone architectures, training losses, benchmark results, and production deployment considerations.

1. Evolution of Loss Functions

1.1 Margin-Based Softmax Losses (Chronological)

Loss	Year	Key Innovation	Margin Type
Softmax	baseline	Classification logits	None
SphereFace	2017	Angular margin in L2-normalized space	Multiplicative angular
CosFace / LMCL	2018	Additive cosine margin (simpler optimization)	Additive cosine
ArcFace	2019	Additive angular margin (geodesic correspondence)	Additive angular
Sub-center ArcFace	2020	K sub-centers per class, robust to label noise	Additive angular + sub-centers
MagFace	2021	Magnitude-aware margin; feature norm as quality proxy	Adaptive
ElasticFace	2022	Random margin from normal distribution; stochastic flexibility	Elastic (stochastic)
AdaFace	2022	Quality-adaptive margin via feature norm approximation	Adaptive (norm-based)
UniTSFace / USS	2023	Unified threshold S2S loss; explicit positive/negative separation threshold	Sample-to-sample
TopoFR	2024	Topological structure alignment (PTSA); persistent homology	Topology-aware

1.2 ArcFace (2019, CVPR)

Key idea: additive angular margin m added to the angle between embedding and class center in the cosine space
Formula: L = -log(e^{s·cos(θ+m)} / (e^{s·cos(θ+m)} + Σe^{s·cos(θ_j)}))
Angular margin has exact geodesic correspondence on hypersphere
Linear angular margin throughout interval (unlike SphereFace/CosFace non-linear)
Scale s=64, margin m=0.5 common settings
Reference: ArcFace paper

1.3 AdaFace (2022, CVPR Oral)

Key idea: image quality approximated by feature norm; emphasizes easy samples for low-quality images and hard samples for high-quality images
Low norm → emphasize near-boundary samples; high norm → emphasize away-from-boundary samples
Particularly effective on low-quality / long-range capture scenarios
Outperforms ArcFace on IJB-B, IJB-C, IJB-S, TinyFace
Domain studies show AdaFace strongest on long-range/remote domains
Reference: AdaFace arXiv | MSU Project Page | GitHub

1.4 MagFace (2021, CVPR Oral)

Key idea: auxiliary loss promotes larger feature magnitude for higher-quality faces
Feature magnitude serves as face quality score (weak assumption of optimal quality)
High-quality faces pulled to class center; low-quality pushed away
Combined with ArcFace loss for quality-aware compactness
Reference: GitHub

1.5 ElasticFace (2022, CVPRW)

Key idea: random margin m ~ N(μ, σ²) drawn from normal distribution per sample per iteration
Extended with guidance to focus harder classification samples
Advances SOTA on 7/9 mainstream benchmarks
Reference: arXiv | GitHub

1.6 UniTSFace / USS Loss (2023, NeurIPS)

Key idea: unified threshold for positive/negative separation in sample-to-sample loss; combines S2S with sample-to-class softmax
Overcomes pairing complexity of pure S2S methods
Architecture: ResNet backbone + USS + cosine-margin Softmax
Pretrained R50 on CASIA-WebFace: 99.53% LFW, 50.25% MR-All
Outperforms CosFace, ArcFace, VPL, AnchorFace, UNPG
Reference: arXiv | GitHub | NeurIPS 2023

2. Backbone Architectures

2.1 CNN Backbones

ResNet-based (InsightFace / ArcFace series)

IResNet (Identity Residual Network): IR-18, IR-34, IR-50, IR-100, IR-200
IR-100 on MS1M + ArcFace: LFW 99.83%, CFP-FP 98.74%, AgeDB-30 98.28%, IJB-B 96.21% TAR@FAR=1e-4
IR-100 on Glint360K: IJB-C 97.32% TAR@FAR=1e-4 (vs 96.21% on MS1M)
Embedding: 512-dim standard

Sub-center ArcFace (ECCV 2020)

K sub-centers per class; robust to noisy web training data
Enables training on Celeb500K noisy web data
SOTA on IJB-B, IJB-C, MegaFace

GhostFaceNets

Lightweight: GhostNet modules + ArcFace loss
LFW: 99.73%, AgeDB-30: 98.00%, CFP-FP: 96.83%
Good accuracy/efficiency trade-off for mobile

2.2 Lightweight / Edge CNNs

EdgeFace (IEEE T-BIOM 2024)

Architecture: EdgeNeXt hybrid (CNN+Transformer) + Low-Rank Linear (LoRaLin) module
LoRaLin replaces FC layers with two low-rank matrices
Params: 1.77M parameters
LFW: 99.73%, IJB-B: 92.67%, IJB-C: 94.85%
Won IJCB 2023 Efficient Face Recognition Competition (compact track, <2M params)
ONNX-exportable; targets edge/embedded devices
Reference: arXiv | GitHub | HuggingFace

PocketNet

Neural Architecture Search (DARTS) on CASIA-WebFace
Multi-step knowledge distillation training
Serves as lightweight baseline for comparison

MobileFaceNet / MobileNet / EfficientNet-B0

Common lightweight baselines
Suitable for mobile deployment

2.3 Vision Transformer (ViT) Backbones

TransFace (ICCV 2023)

Key insight: ViTs are data-hungry; vulnerable on large-scale face data due to overfitting
EHSM (Entropy-aware Hard Sample Mining): uses information entropy in local tokens to weight hard vs easy samples
DPAP (Dominant Patch Amplitude Perturbation): top-K dominant patches randomly perturbed in amplitude for diversity
Achieves stable ViT training for face recognition
TransFace++ extends to operate on raw image bytes
Reference: ICCV 2023 paper | GitHub

LVFace (ICCV 2025 Highlight) — ByteDance

Architecture: ViT backbone with Progressive Cluster Optimization (PCO)
PCO stages:
1. Negative Class Sub-sampling (NCS): robust, fast alignment from initialization
2. Feature Expectation Penalties: centroid stabilization
3. Cluster Boundary Refinement: full-batch training without NCS constraints
Training: WebFace42M, 64 GPUs, AdamW optimizer
Results: SOTA, surpasses UniFace and TopoFR across multiple benchmarks
1st place ICCV 2021 MFR-Ongoing challenge (academic track)
Reference: arXiv | GitHub

2.4 Topology-Aware

TopoFR (NeurIPS 2024)

Key idea: encodes topological structure from training data into latent space using persistent homology
PTSA (Persistent Topology Structure Alignment): aligns topological structures of input space and embedding space
SDE (Structure Damage score): identifies hard samples by measuring structure damage
Addresses structure collapse from overfitting
2nd place ICCV21 MFR-Ongoing challenge (as of May 2024)
Reference: arXiv | NeurIPS 2024 | GitHub

3. Benchmark Results Summary

3.1 Standard Benchmarks

Model	Backbone	Loss	LFW	CFP-FP	AgeDB-30	IJB-B TAR@1e-4	IJB-C TAR@1e-4	Embed Dim
ArcFace	IR-50	ArcFace	99.77%	98.27%	97.90%	~94.0%	~96.0%	512
ArcFace	IR-100 (MS1M)	ArcFace	99.83%	98.74%	98.28%	96.21%	~97.0%	512
ArcFace	IR-100 (Glint360K)	ArcFace	~99.85%	~98.9%	~98.5%	~96.5%	97.32%	512
AdaFace	IR-101	AdaFace	99.82%	98.49%	98.05%	96.03%	97.39%	512
Sub-center ArcFace	R-100	Sub-ArcFace	~99.8%	~98.5%	~98.3%	—	—	512
UniTSFace	R-50	USS+ArcFace	99.53%	—	—	—	—	512
TransFace	ViT-B	ArcFace/EHSM	~99.8%	~98.7%	~98.3%	~96.5%	~97.5%	512
TopoFR	R-100	PTSA+SDE	~99.85%+	—	—	—	SOTA+	512
LVFace	ViT	PCO	SOTA	SOTA	SOTA	SOTA	SOTA	512
EdgeFace	EdgeNeXt	ArcFace	99.73%	—	—	92.67%	94.85%	512
GhostFaceNet	GhostNet	ArcFace	99.73%	96.83%	98.00%	—	—	512
ElasticFace	R-100	Elastic	~99.8%+	~98.5%+	~98.3%+	—	SOTA (7/9)	512

Note: exact numbers vary by training data (MS1M vs Glint360K vs WebFace42M) and implementation. See original papers for definitive figures.

3.2 IJB-C / IJB-S Domain-Specific Analysis

From OODFace robustness study (2024):

AdaFace: best robust accuracy under appearance variations; best on long-range/remote domains
CosFace-IR / ArcFace-IR: tied top performers for overall clean accuracy (avg 97.17%)
AdaFace: highest average accuracy among open-source models across 10 appearance variation subcategories
Domain-specific trained models outperform zero-shot foundation models on all face benchmarks

3.3 LFW Near-Saturation

LFW accuracy is near-saturated (>99.7% for top models); IJB-C TAR@FAR=1e-4 and IJB-S are more discriminative for SOTA comparisons.

4. Embedding Dimensions

Dimension	Notes
512	Industry standard for ArcFace, AdaFace, IR-series; best accuracy for large models
256	Some architectures find 256 optimal (FN8 model); smaller storage footprint
128	FaceNet standard; ~87.9% on FaceNet tasks; information sparse — 64-dim may suffice
64	Can accommodate most key face information (embedding is sparse)

Key insight: Increasing from 128→512 in FaceNet degraded accuracy; larger embeddings may need more training. For SOTA large models (IR-100, ViT), 512-dim is standard. For edge/mobile, 128-256 with distillation.

5. Large-Scale Training: PartialFC

PartialFC (InsightFace): sparse model-parallel architecture; dynamic subset sampling of class centers
Trains 10M+ identities on 8 GPUs vs ArcFace max ~1M identities
Achieves up to 29M identities (largest to date)
Maintains same accuracy while providing several-times faster training
Lower GPU memory utilization vs full softmax
Reference: Analytics Vidhya guide

6. Synthetic Data for Training (FRCSyn)

FRCSyn Challenge (WACV 2024 / CVPR 2024): explores synthetic data for face recognition training
Addresses: data privacy, demographic bias, generalization to novel scenarios (age, pose, occlusion)
Task 1: demographic bias mitigation; Task 2: overall performance with synthetic data
2nd edition FRCSyn-onGoing ongoing (post-CVPR 2024)
Winning solutions use diffusion-model-generated face images mixed with real data
Reference: FRCSyn CVPR 2024 | arXiv

7. Production / Deployment

7.1 NIST FRVT / FRTE Top Performers (2025)

Rank	Vendor	Notable Result
#1 (1:N)	NEC	0.07% error rate, 12M-person database; #1 in aging tests (10+ and 12+ years)
#3 (1:1)	ROC	#1 Western vendor 1:1 Verification and Investigative Search
Top 11	Keyless	99.93% accuracy on 1.6M-identity database
Top-ranked	KBY-AI	Top global rank in FRVT 1:1
Consistent	Innovatrics	Top performer in every NIST FRVT category

Source: NIST FRVT | NEC press release | ROC announcement

7.2 ONNX / Runtime Deployment

InsightFace buffalo bundles: 5 pre-packaged ONNX model sets (40–150 MB); range from 1400 FPS (edge) to 350 FPS (best accuracy)
InspireFace C/C++ SDK (2024): supports ARM, x86, CUDA, OpenCL, RKNN backends
FaceONNX: complete face recognition + analytics library on ONNX Runtime
Common ONNX optimizations:
- FP16 TensorRT: 1.8× extra FPS, <0.05% accuracy drop (buffalo_l)
- INT8 quantization: 4× smaller model, ArcFace embedding error +0.02%
- ONNX Runtime: ~3.2× speedup at batch=8 vs batch=1

7.3 ONNX Exportability

Model	ONNX-exportable	Notes
ArcFace / IResNet	Yes	InsightFace ships ONNX models
AdaFace	Yes	Standard PyTorch→ONNX
EdgeFace	Yes	Edge-optimized, HuggingFace ONNX
GhostFaceNet	Yes	Lightweight CNN
MobileFaceNet	Yes	Mobile-first
TransFace / ViT-based	Yes (with opset ≥14)	Self-attention exportable in recent opsets
LVFace	Likely yes	ViT, standard PyTorch
TopoFR	Inference only; topology computed at training	Yes for inference model

8. ViT vs CNN for Face Recognition: 2025 State

ViTs outperform CNNs in 13/15 performance evaluations including face recognition (when pretrained on large data)
Fine-tuned ViTs on large datasets (WebFace42M, Glint360K) beat ResNet and EfficientNet families
CNNs remain competitive in low-data regimes and resource-constrained deployment
Hybrid models (ConvNeXt, CoAtNet, EdgeNeXt) increasingly popular: combine CNN efficiency with ViT global context
ViT challenge: data-hungry; prone to overfitting face data → TransFace’s EHSM/DPAP and LVFace’s PCO address this
For edge deployment: lightweight CNNs (EdgeFace, GhostFaceNet, MobileFaceNet) still preferred