Skip to content

Architecture

UFME (Universal Face Matching Engine) is an open source biometric face matching engine built for large-scale deployments. It uses a strict hexagonal architecture to decouple biometric algorithms from infrastructure, enabling high-throughput 1:N face search with quality assessment, presentation attack detection, and morphing attack detection.

Design targets: Multi-million face gallery, 60 million annual 1:N searches, sub-second end-to-end latency.

Matching a probe face against hundreds of millions of gallery entries in real-time demands a system that solves several problems simultaneously:

  • Scale — Searching millions of vectors with sub-second latency rules out brute-force approaches. The index must compress vectors aggressively while preserving recall.
  • Quality — Input images vary from controlled passport photos to surveillance captures. Quality must be measured (not assumed) and the accept/reject decision must be a configurable policy, not embedded logic.
  • Security — Presentation attacks (print, replay, 3D mask, deepfake) and morphing attacks (blended document photos) must be detected before a template enters the gallery or produces a match.
  • Modularity — Detection models, extraction models, vector stores, and transport layers all evolve independently. Replacing any component must not require rewriting the pipeline.

UFME follows a strict Hexagonal Architecture (Ports and Adapters). The core domain defines Protocol interfaces for all operations; adapters implement those interfaces against concrete infrastructure.

┌─────────────────────────────────────┐
│ Inbound Adapters │
│ REST / gRPC Gateways │
└──────────────┬──────────────────────┘
┌──────────────▼──────────────────────┐
│ Biometric Pipeline │
│ [SR] → Receive → Detect → Align → │
│ [Head Pose] → PAD → MAD → │
│ [Deepfake] → Quality → Extract → │
│ [Age] → [Attributes] → Route → │
│ Respond │
└──────────────┬──────────────────────┘
┌────────────────────┼────────────────────┐
│ │ │
┌─────────▼─────────┐ ┌───────▼───────────┐ ┌─────▼─────────────┐
│ Core Domain │ │ FAISS Cluster │ │ ONNX Runtime │
│ Pure functions │ │ IVF-PQ shards │ │ SCRFD detection │
│ Frozen types │ │ Event-sourced │ │ w600k_r50 │
│ Split ports │ │ EventLogPort │ │ CPU / TensorRT │
└───────────────────┘ └───────────────────┘ └───────────────────┘

Stages in [brackets] are optional — they are wired into the pipeline only when the corresponding model file is present at startup.

  • Inbound Adapters — REST and gRPC gateways that accept biometric requests and translate them into domain operations.
  • Core Domain — Pure logic with zero external dependencies. Frozen dataclasses, Protocol ports, and pure functions for template operations, scoring, and orchestration.
  • Outbound Adapters — ONNX Runtime inference, FAISS vector search (via split ports: VectorSearchPort for ANN search, VectorLookupPort for 1:1 template fetch, VectorMutationPort for enrol/delete), OFIQ quality assessment, PAD, MAD, age estimation, head pose, deepfake detection, face attributes, and super-resolution modules.
LayerTechnologyRole
OrchestrationPythonAPI layer, pipeline coordination, service orchestration
Hot pathsRustFAISS shard bindings, SIMD operations, compaction
InferenceONNX RuntimeCPU (AVX-512) and optional GPU (TensorRT FP16/INT8)
Vector storeFAISS IndexIVFPQMulti-million vectors with 32x PQ compression
DetectionSCRFD_10G5-point landmark detection (95.2% WiderFace Easy)
Recognitionw600k_r50 (ArcFace on WebFace600K)512-dim L2-normalised embeddings
Mask-aware recognitionw600k_mbf (ArcFace MobileFaceNet)Optional; buffalo_sc, for occluded/masked faces
QualityeDifFIQA(T)ISO/IEC 29794-5 compliant face image quality scoring
Anti-spoofingMiniFASNetV2Physical + digital attack detection (ISO 30107-3)
Morphing detectionHRNet-W18 (SelfMAD)Document enrollment morphing attacks
Age estimationInsightFace genderage.onnxAge in years from aligned face crop; optional stage
Head poseyakhyo ResNet-18Pitch/yaw/roll from rotation matrix; yaw gate rejects profiles
Deepfake detectionViT-base (Deep-Fake-Detector-v2, quantised)Binary genuine/deepfake classifier; optional stage
Face attributesInsightFace genderage.onnxGender classification; optional stage
Super-resolutionReal-ESRGAN x4plus4x upscaling pre-detect; optional stage for low-res inputs
TransportgRPC (internal), REST (external)Inter-service and gateway communication
DeploymentDocker, Kubernetes, Terraform (AWS EKS)Containerised microservices

The architecture follows Rich Hickey’s “Simple Made Easy” philosophy — simplicity through decoupling, not through convenience.

  • Values over State — Immutable frozen dataclasses for templates and transaction records. No in-place mutation of indexes or templates.
  • Data over Objects — Templates are plain 512-dim float32 vectors. cosine_similarity(a, b), not template.match(other).
  • Queues over Direct Coupling — Pipeline stages are independent functions connected by async queues. Each stage takes a dict and returns a dict.
  • Composition over Inheritance — Pipeline stages compose via ports and adapters, never inherit.
  • Policy as Configuration — Thresholds, routing rules, and quality gates are configuration values, not embedded logic. Measurement and decision are always separated.
  • Epochal Time — FAISS indexes are immutable snapshots. Mutations are events that produce new index versions.
  • Optional stages — Every non-core capability (age, head pose, deepfake, attributes, super-resolution) is wired in only when its model file is present. Absent model = stage omitted. No configuration flag required.

The biometric pipeline processes images through a sequence of composable, dict-in/dict-out stages. Required stages run on every request; optional stages (marked [optional]) are present only when the corresponding ONNX model file exists at startup.

#StageOptionalModel
1Receiveno
2Super-resolutionyesrealesrgan_x4plus.onnx
3Detectnodet_10g.onnx
4Alignno
5Head poseyeshead_pose_resnet18.onnx
6PADnoMiniFASNetV2.onnx
7MADnomad_selfmad_hrnet_w18.onnx
8Deepfakeyesdeepfake_vit_q.onnx
9Qualitynoediffiqa_tiny.onnx
10Extractnow600k_r50.onnx
11Ageyesgenderage.onnx
12Face attributesyesgenderage.onnx
13Routeno
14Respondno

Each stage is a pure function with no shared mutable state. The pipeline runner manages a context dict that accumulates results; each stage receives only the payload keys it declares — stages never see the full context. Stages are connected via async queues and run by a generic queue runner. Raw imagery is held in volatile memory only during pipeline processing and is never written to disk.

See Pipeline Reference for complete stage inputs, outputs, and gate logic.

UFME uses a sharded, in-memory FAISS cluster with an epochal time model:

  • IndexIVFPQ — The 512-dim space is partitioned into Voronoi cells via k-means. Within each cell, vectors are compressed via Product Quantisation (M=64 sub-vectors, 8-bit codebook) from 2,048 bytes to 64 bytes per vector (32x compression).
  • Scatter-gather search — A query is broadcast to all shards (4-6 nodes) via gRPC. Each shard returns local top-K results. The aggregator merges and reranks the top candidates against exact stored vectors, recovering Recall@1 to 97%+. Vectors cross the Python–Rust boundary as raw little-endian bytes in protobuf (not repeated float) for zero-overhead serialization.
  • Epochal snapshots — Index mutations are captured as events via EventLogPort (append-only, ordered). New index versions are built from the event log, producing immutable snapshots. No in-place mutation of the live index.
  • Binning and filtering — Metadata-based pre-filtering via bitsets narrows the search space before distance computation. Logical partitions segment the gallery by use case.
  • Future scale — DiskANN for 500M+ vectors (SSD-backed, <5ms latency); CAGRA/cuVS for GPU-accelerated batch analytics (33-77x throughput).

See docs/design/faiss-design.md for the full vector store specification.

DocumentDescription
Domain designCore domain model: frozen types, Protocol ports, pure functions
Pipeline designPipeline architecture: composable dict-in/dict-out stages
FAISS designFAISS IVF-PQ: epochal snapshots, scatter-gather, event sourcing
Complecting auditHickey “Simple Made Easy” analysis of the design
Research: executive summarySOTA gap analysis and recommendations (2026)
DocumentDescription
Face detection SOTASCRFD, YOLO, RetinaFace — comparison and rationale
Face recognition SOTAAdaFace, TopoFR, ArcFace — comparison and rationale
Vector search SOTAFAISS IVF-PQ, DiskANN, CAGRA at 200M+ scale
PAD and quality SOTAPresentation attack detection and ISO quality assessment