Skip to content

Architecture

UFME (Universal Face Matching Engine) is an open source biometric face matching engine built for large-scale deployments. It uses a strict hexagonal architecture to decouple biometric algorithms from infrastructure, enabling high-throughput 1:N face search with quality assessment, presentation attack detection, and morphing attack detection.

Design targets: ~200 million face gallery, 60 million annual 1:N searches, sub-second end-to-end latency.

Matching a probe face against hundreds of millions of gallery entries in real-time demands a system that solves several problems simultaneously:

  • Scale — Searching 200M+ vectors with sub-second latency rules out brute-force approaches. The index must compress vectors aggressively while preserving recall.
  • Quality — Input images vary from controlled passport photos to surveillance captures. Quality must be measured (not assumed) and the accept/reject decision must be a configurable policy, not embedded logic.
  • Security — Presentation attacks (print, replay, 3D mask, deepfake) and morphing attacks (blended document photos) must be detected before a template enters the gallery or produces a match.
  • Modularity — Detection models, extraction models, vector stores, and transport layers all evolve independently. Replacing any component must not require rewriting the pipeline.

UFME follows a strict Hexagonal Architecture (Ports and Adapters). The core domain defines Protocol interfaces for all operations; adapters implement those interfaces against concrete infrastructure.

┌─────────────────────────────────────┐
│ Inbound Adapters │
│ REST / gRPC Gateways │
└──────────────┬──────────────────────┘
┌──────────────▼──────────────────────┐
│ Biometric Pipeline │
│ Receive → PAD → Enrol Router → MAD →│
│ Detect → Align → Quality → Extract →│
│ Route → Respond │
└──────────────┬──────────────────────┘
┌────────────────────┼────────────────────┐
│ │ │
┌─────────▼─────────┐ ┌───────▼───────────┐ ┌─────▼─────────────┐
│ Core Domain │ │ FAISS Cluster │ │ ONNX Runtime │
│ Pure functions │ │ IVF-PQ shards │ │ SCRFD + ViT │
│ Frozen types │ │ Event-sourced │ │ AdaFace-trained │
│ Split ports │ │ EventLogPort │ │ CPU / TensorRT │
└───────────────────┘ └───────────────────┘ └───────────────────┘
  • Inbound Adapters — REST and gRPC gateways that accept biometric requests and translate them into domain operations.
  • Core Domain — Pure logic with zero external dependencies. Frozen dataclasses, Protocol ports, and pure functions for template operations, scoring, and orchestration.
  • Outbound Adapters — ONNX Runtime inference, FAISS vector search (via split ports: VectorSearchPort for ANN search, VectorLookupPort for 1:1 template fetch, VectorMutationPort for enrol/delete), OFIQ quality assessment, PAD, and MAD modules.
LayerTechnologyRole
OrchestrationPythonAPI layer, pipeline coordination, service orchestration
Hot pathsRustFAISS shard bindings, SIMD operations, compaction
InferenceONNX RuntimeCPU (AVX-512) and optional GPU (TensorRT FP16/INT8)
Vector storeFAISS IndexIVFPQ200M vectors in ~12.5 GB RAM (32x PQ compression)
DetectionSCRFD_10G5-point landmark detection (95.2% WiderFace Easy)
RecognitionViT + AdaFace512-dim L2-normalised embeddings, quality-adaptive margin
QualityOFIQISO/IEC 29794-5 compliant face image quality scoring
Anti-spoofingUnified PAD (ViT)Physical + digital attack detection (ISO 30107-3)
Morphing detectionCLIP + LoRA (MADation)Document enrollment morphing attacks
TransportgRPC (internal), REST (external)Inter-service and gateway communication
DeploymentDocker, KubernetesContainerised microservices

The architecture follows Rich Hickey’s “Simple Made Easy” philosophy — simplicity through decoupling, not through convenience.

  • Values over State — Immutable frozen dataclasses for templates and transaction records. No in-place mutation of indexes or templates.
  • Data over Objects — Templates are plain 512-dim float32 vectors. cosine_similarity(a, b), not template.match(other).
  • Queues over Direct Coupling — Pipeline stages are independent functions connected by async queues. Each stage takes a dict and returns a dict.
  • Composition over Inheritance — Pipeline stages compose via ports and adapters, never inherit.
  • Policy as Configuration — Thresholds, routing rules, and quality gates are configuration values, not embedded logic. Measurement and decision are always separated.
  • Epochal Time — FAISS indexes are immutable snapshots. Mutations are events that produce new index versions.

The biometric pipeline processes images through a sequence of composable, dict-in/dict-out stages:

  1. Receive — Parse inbound payload into a plain dict. DELETE operations route directly to Route (bypassing the image pipeline).
  2. PAD — Unified presentation attack detection (physical + digital). Produces a spoof score; the gate decision is a separate configurable policy.
  3. Enrol Router — Routes ENROL operations to MAD, all others to Detect. Separates operation routing from spoof detection (single-concern gates).
  4. MAD — Morphing attack detection on the enrollment path only (CLIP + LoRA). Runs before Detect so morphed images are rejected before expensive face processing.
  5. Detect — SCRFD_10G locates the face and five landmark points.
  6. Align — Affine transform crops and aligns the face to a 112x112 pixel grid.
  7. Quality — OFIQ measures quality components (illumination, pose, focus, occlusion). The accept/reject gate is a separate policy.
  8. Extract — ViT produces a 512-dim L2-normalised embedding.
  9. Route — Dispatches to Search, Verify, Enrol, or Delete based on the operation type.
  10. Respond — Formats the result for the outbound transport.

Each stage is a pure function with no shared mutable state. The pipeline runner manages a context dict (request metadata, routing, accumulated results) while each stage receives only the payload keys it requires — stages never see the full context. Stages are connected via async queues and run by a generic queue runner. No raw imagery is persisted to disk — images exist in volatile memory only during processing.

See docs/design/pipeline-design.md for the full pipeline specification.

UFME uses a sharded, in-memory FAISS cluster with an epochal time model:

  • IndexIVFPQ — The 512-dim space is partitioned into Voronoi cells via k-means. Within each cell, vectors are compressed via Product Quantisation (M=64 sub-vectors, 8-bit codebook) from 2,048 bytes to 64 bytes per vector.
  • Scatter-gather search — A query is broadcast to all shards (4-6 nodes) via gRPC. Each shard returns local top-K results. The aggregator merges and reranks the top candidates against exact stored vectors, recovering Recall@1 to 97%+. Vectors cross the Python–Rust boundary as raw little-endian bytes in protobuf (not repeated float) for zero-overhead serialization. See docs/design/faiss-design.md §2 “Wire Format” for the full protobuf schema and accretion policy.
  • Epochal snapshots — Index mutations are captured as events via EventLogPort (append-only, ordered, with read-from-offset and GC support). New index versions are built from the event log, producing immutable snapshots. No in-place mutation of the live index.
  • Binning and filtering — Metadata-based pre-filtering via bitsets narrows the search space before distance computation. Logical partitions segment the gallery by use case.
  • Future scale — DiskANN for 500M+ vectors (SSD-backed, <5ms latency); CAGRA/cuVS for GPU-accelerated batch analytics (33-77x throughput).

See docs/design/faiss-design.md for the full vector store specification.

DocumentDescription
docs/design/domain-design.mdCore domain model: frozen types, Protocol ports, pure functions
docs/design/pipeline-design.mdPipeline architecture: composable dict-in/dict-out stages
docs/design/faiss-design.mdFAISS IVF-PQ: epochal snapshots, scatter-gather, event sourcing
docs/design/complecting-audit.mdHickey “Simple Made Easy” analysis of the design
docs/research/executive-summary.mdSOTA gap analysis and recommendations (2026)
DocumentDescription
docs/research/sota-detection.mdFace detection SOTA (SCRFD, YOLO, etc.)
docs/research/sota-recognition.mdFace recognition SOTA (AdaFace, TopoFR, etc.)
docs/research/sota-vector-search.mdVector search at 200M+ scale
docs/research/sota-pad-quality.mdPAD and quality assessment research