Skip to content

Face Recognition

Research compiled February 2026. Covers backbone architectures, training losses, benchmark results, and production deployment considerations.


1.1 Margin-Based Softmax Losses (Chronological)

Section titled “1.1 Margin-Based Softmax Losses (Chronological)”
LossYearKey InnovationMargin Type
SoftmaxbaselineClassification logitsNone
SphereFace2017Angular margin in L2-normalized spaceMultiplicative angular
CosFace / LMCL2018Additive cosine margin (simpler optimization)Additive cosine
ArcFace2019Additive angular margin (geodesic correspondence)Additive angular
Sub-center ArcFace2020K sub-centers per class, robust to label noiseAdditive angular + sub-centers
MagFace2021Magnitude-aware margin; feature norm as quality proxyAdaptive
ElasticFace2022Random margin from normal distribution; stochastic flexibilityElastic (stochastic)
AdaFace2022Quality-adaptive margin via feature norm approximationAdaptive (norm-based)
UniTSFace / USS2023Unified threshold S2S loss; explicit positive/negative separation thresholdSample-to-sample
TopoFR2024Topological structure alignment (PTSA); persistent homologyTopology-aware
  • Key idea: additive angular margin m added to the angle between embedding and class center in the cosine space
  • Formula: L = -log(e^{s·cos(θ+m)} / (e^{s·cos(θ+m)} + Σe^{s·cos(θ_j)}))
  • Angular margin has exact geodesic correspondence on hypersphere
  • Linear angular margin throughout interval (unlike SphereFace/CosFace non-linear)
  • Scale s=64, margin m=0.5 common settings
  • Reference: ArcFace paper
  • Key idea: image quality approximated by feature norm; emphasizes easy samples for low-quality images and hard samples for high-quality images
  • Low norm → emphasize near-boundary samples; high norm → emphasize away-from-boundary samples
  • Particularly effective on low-quality / long-range capture scenarios
  • Outperforms ArcFace on IJB-B, IJB-C, IJB-S, TinyFace
  • Domain studies show AdaFace strongest on long-range/remote domains
  • Reference: AdaFace arXiv | MSU Project Page | GitHub
  • Key idea: auxiliary loss promotes larger feature magnitude for higher-quality faces
  • Feature magnitude serves as face quality score (weak assumption of optimal quality)
  • High-quality faces pulled to class center; low-quality pushed away
  • Combined with ArcFace loss for quality-aware compactness
  • Reference: GitHub
  • Key idea: random margin m ~ N(μ, σ²) drawn from normal distribution per sample per iteration
  • Extended with guidance to focus harder classification samples
  • Advances SOTA on 7/9 mainstream benchmarks
  • Reference: arXiv | GitHub
  • Key idea: unified threshold for positive/negative separation in sample-to-sample loss; combines S2S with sample-to-class softmax
  • Overcomes pairing complexity of pure S2S methods
  • Architecture: ResNet backbone + USS + cosine-margin Softmax
  • Pretrained R50 on CASIA-WebFace: 99.53% LFW, 50.25% MR-All
  • Outperforms CosFace, ArcFace, VPL, AnchorFace, UNPG
  • Reference: arXiv | GitHub | NeurIPS 2023

ResNet-based (InsightFace / ArcFace series)

Section titled “ResNet-based (InsightFace / ArcFace series)”
  • IResNet (Identity Residual Network): IR-18, IR-34, IR-50, IR-100, IR-200
  • IR-100 on MS1M + ArcFace: LFW 99.83%, CFP-FP 98.74%, AgeDB-30 98.28%, IJB-B 96.21% TAR@FAR=1e-4
  • IR-100 on Glint360K: IJB-C 97.32% TAR@FAR=1e-4 (vs 96.21% on MS1M)
  • Embedding: 512-dim standard
  • K sub-centers per class; robust to noisy web training data
  • Enables training on Celeb500K noisy web data
  • SOTA on IJB-B, IJB-C, MegaFace
  • Lightweight: GhostNet modules + ArcFace loss
  • LFW: 99.73%, AgeDB-30: 98.00%, CFP-FP: 96.83%
  • Good accuracy/efficiency trade-off for mobile
  • Architecture: EdgeNeXt hybrid (CNN+Transformer) + Low-Rank Linear (LoRaLin) module
  • LoRaLin replaces FC layers with two low-rank matrices
  • Params: 1.77M parameters
  • LFW: 99.73%, IJB-B: 92.67%, IJB-C: 94.85%
  • Won IJCB 2023 Efficient Face Recognition Competition (compact track, <2M params)
  • ONNX-exportable; targets edge/embedded devices
  • Reference: arXiv | GitHub | HuggingFace
  • Neural Architecture Search (DARTS) on CASIA-WebFace
  • Multi-step knowledge distillation training
  • Serves as lightweight baseline for comparison

MobileFaceNet / MobileNet / EfficientNet-B0

Section titled “MobileFaceNet / MobileNet / EfficientNet-B0”
  • Common lightweight baselines
  • Suitable for mobile deployment
  • Key insight: ViTs are data-hungry; vulnerable on large-scale face data due to overfitting
  • EHSM (Entropy-aware Hard Sample Mining): uses information entropy in local tokens to weight hard vs easy samples
  • DPAP (Dominant Patch Amplitude Perturbation): top-K dominant patches randomly perturbed in amplitude for diversity
  • Achieves stable ViT training for face recognition
  • TransFace++ extends to operate on raw image bytes
  • Reference: ICCV 2023 paper | GitHub

LVFace (ICCV 2025 Highlight) — ByteDance

Section titled “LVFace (ICCV 2025 Highlight) — ByteDance”
  • Architecture: ViT backbone with Progressive Cluster Optimization (PCO)
  • PCO stages:
    1. Negative Class Sub-sampling (NCS): robust, fast alignment from initialization
    2. Feature Expectation Penalties: centroid stabilization
    3. Cluster Boundary Refinement: full-batch training without NCS constraints
  • Training: WebFace42M, 64 GPUs, AdamW optimizer
  • Results: SOTA, surpasses UniFace and TopoFR across multiple benchmarks
  • 1st place ICCV 2021 MFR-Ongoing challenge (academic track)
  • Reference: arXiv | GitHub
  • Key idea: encodes topological structure from training data into latent space using persistent homology
  • PTSA (Persistent Topology Structure Alignment): aligns topological structures of input space and embedding space
  • SDE (Structure Damage score): identifies hard samples by measuring structure damage
  • Addresses structure collapse from overfitting
  • 2nd place ICCV21 MFR-Ongoing challenge (as of May 2024)
  • Reference: arXiv | NeurIPS 2024 | GitHub

ModelBackboneLossLFWCFP-FPAgeDB-30IJB-B TAR@1e-4IJB-C TAR@1e-4Embed Dim
ArcFaceIR-50ArcFace99.77%98.27%97.90%~94.0%~96.0%512
ArcFaceIR-100 (MS1M)ArcFace99.83%98.74%98.28%96.21%~97.0%512
ArcFaceIR-100 (Glint360K)ArcFace~99.85%~98.9%~98.5%~96.5%97.32%512
AdaFaceIR-101AdaFace99.82%98.49%98.05%96.03%97.39%512
Sub-center ArcFaceR-100Sub-ArcFace~99.8%~98.5%~98.3%512
UniTSFaceR-50USS+ArcFace99.53%512
TransFaceViT-BArcFace/EHSM~99.8%~98.7%~98.3%~96.5%~97.5%512
TopoFRR-100PTSA+SDE~99.85%+SOTA+512
LVFaceViTPCOSOTASOTASOTASOTASOTA512
EdgeFaceEdgeNeXtArcFace99.73%92.67%94.85%512
GhostFaceNetGhostNetArcFace99.73%96.83%98.00%512
ElasticFaceR-100Elastic~99.8%+~98.5%+~98.3%+SOTA (7/9)512

Note: exact numbers vary by training data (MS1M vs Glint360K vs WebFace42M) and implementation. See original papers for definitive figures.

3.2 IJB-C / IJB-S Domain-Specific Analysis

Section titled “3.2 IJB-C / IJB-S Domain-Specific Analysis”

From OODFace robustness study (2024):

  • AdaFace: best robust accuracy under appearance variations; best on long-range/remote domains
  • CosFace-IR / ArcFace-IR: tied top performers for overall clean accuracy (avg 97.17%)
  • AdaFace: highest average accuracy among open-source models across 10 appearance variation subcategories
  • Domain-specific trained models outperform zero-shot foundation models on all face benchmarks

LFW accuracy is near-saturated (>99.7% for top models); IJB-C TAR@FAR=1e-4 and IJB-S are more discriminative for SOTA comparisons.


DimensionNotes
512Industry standard for ArcFace, AdaFace, IR-series; best accuracy for large models
256Some architectures find 256 optimal (FN8 model); smaller storage footprint
128FaceNet standard; ~87.9% on FaceNet tasks; information sparse — 64-dim may suffice
64Can accommodate most key face information (embedding is sparse)

Key insight: Increasing from 128→512 in FaceNet degraded accuracy; larger embeddings may need more training. For SOTA large models (IR-100, ViT), 512-dim is standard. For edge/mobile, 128-256 with distillation.


  • PartialFC (InsightFace): sparse model-parallel architecture; dynamic subset sampling of class centers
  • Trains 10M+ identities on 8 GPUs vs ArcFace max ~1M identities
  • Achieves up to 29M identities (largest to date)
  • Maintains same accuracy while providing several-times faster training
  • Lower GPU memory utilization vs full softmax
  • Reference: Analytics Vidhya guide

  • FRCSyn Challenge (WACV 2024 / CVPR 2024): explores synthetic data for face recognition training
  • Addresses: data privacy, demographic bias, generalization to novel scenarios (age, pose, occlusion)
  • Task 1: demographic bias mitigation; Task 2: overall performance with synthetic data
  • 2nd edition FRCSyn-onGoing ongoing (post-CVPR 2024)
  • Winning solutions use diffusion-model-generated face images mixed with real data
  • Reference: FRCSyn CVPR 2024 | arXiv

7.1 NIST FRVT / FRTE Top Performers (2025)

Section titled “7.1 NIST FRVT / FRTE Top Performers (2025)”
RankVendorNotable Result
#1 (1:N)NEC0.07% error rate, 12M-person database; #1 in aging tests (10+ and 12+ years)
#3 (1:1)ROC#1 Western vendor 1:1 Verification and Investigative Search
Top 11Keyless99.93% accuracy on 1.6M-identity database
Top-rankedKBY-AITop global rank in FRVT 1:1
ConsistentInnovatricsTop performer in every NIST FRVT category

Source: NIST FRVT | NEC press release | ROC announcement

  • InsightFace buffalo bundles: 5 pre-packaged ONNX model sets (40–150 MB); range from 1400 FPS (edge) to 350 FPS (best accuracy)
  • InspireFace C/C++ SDK (2024): supports ARM, x86, CUDA, OpenCL, RKNN backends
  • FaceONNX: complete face recognition + analytics library on ONNX Runtime
  • Common ONNX optimizations:
    • FP16 TensorRT: 1.8× extra FPS, <0.05% accuracy drop (buffalo_l)
    • INT8 quantization: 4× smaller model, ArcFace embedding error +0.02%
    • ONNX Runtime: ~3.2× speedup at batch=8 vs batch=1
ModelONNX-exportableNotes
ArcFace / IResNetYesInsightFace ships ONNX models
AdaFaceYesStandard PyTorch→ONNX
EdgeFaceYesEdge-optimized, HuggingFace ONNX
GhostFaceNetYesLightweight CNN
MobileFaceNetYesMobile-first
TransFace / ViT-basedYes (with opset ≥14)Self-attention exportable in recent opsets
LVFaceLikely yesViT, standard PyTorch
TopoFRInference only; topology computed at trainingYes for inference model

8. ViT vs CNN for Face Recognition: 2025 State

Section titled “8. ViT vs CNN for Face Recognition: 2025 State”
  • ViTs outperform CNNs in 13/15 performance evaluations including face recognition (when pretrained on large data)
  • Fine-tuned ViTs on large datasets (WebFace42M, Glint360K) beat ResNet and EfficientNet families
  • CNNs remain competitive in low-data regimes and resource-constrained deployment
  • Hybrid models (ConvNeXt, CoAtNet, EdgeNeXt) increasingly popular: combine CNN efficiency with ViT global context
  • ViT challenge: data-hungry; prone to overfitting face data → TransFace’s EHSM/DPAP and LVFace’s PCO address this
  • For edge deployment: lightweight CNNs (EdgeFace, GhostFaceNet, MobileFaceNet) still preferred

  • Loss: AdaFace or TopoFR (best robustness); ArcFace strong baseline
  • Backbone: IR-100 or ViT with PCO/EHSM on WebFace42M or Glint360K
  • Training scale: PartialFC for >1M identity datasets
  • Expected: LFW ~99.8%+, IJB-C TAR@FAR=1e-4 ~97%+
  • EdgeFace: best <2M param model (IJB-C 94.85%)
  • GhostFaceNet: comparable accuracy, GhostNet efficiency
  • MobileFaceNet: proven mobile baseline
  • Loss: ArcFace or AdaFace (both export to ONNX fine)
  • Sub-center ArcFace: robust to label noise via K sub-centers
  • AdaFace: quality-adaptive emphasis avoids unidentifiable noisy images
  • InsightFace + ONNX + TensorRT is standard production stack
  • buffalo_l (best accuracy) vs buffalo_s (edge speed)
  • INT8 quantization viable with minimal accuracy loss