Architecture

Overview

UFME (Universal Face Matching Engine) is an open source biometric face matching engine built for large-scale deployments. It uses a strict hexagonal architecture to decouple biometric algorithms from infrastructure, enabling high-throughput 1:N face search with quality assessment, presentation attack detection, and morphing attack detection.

Design targets: Multi-million face gallery, 60 million annual 1:N searches, sub-second end-to-end latency.

Problem Statement

Matching a probe face against hundreds of millions of gallery entries in real-time demands a system that solves several problems simultaneously:

Scale — Searching millions of vectors with sub-second latency rules out brute-force approaches. The index must compress vectors aggressively while preserving recall.
Quality — Input images vary from controlled passport photos to surveillance captures. Quality must be measured (not assumed) and the accept/reject decision must be a configurable policy, not embedded logic.
Security — Presentation attacks (print, replay, 3D mask, deepfake) and morphing attacks (blended document photos) must be detected before a template enters the gallery or produces a match.
Modularity — Detection models, extraction models, vector stores, and transport layers all evolve independently. Replacing any component must not require rewriting the pipeline.

Architecture

UFME follows a strict Hexagonal Architecture (Ports and Adapters). The core domain defines Protocol interfaces for all operations; adapters implement those interfaces against concrete infrastructure.

                ┌─────────────────────────────────────┐
                │         Inbound Adapters             │
                │      REST / gRPC Gateways            │
                └──────────────┬──────────────────────┘
                               │
                ┌──────────────▼──────────────────────┐
                │        Biometric Pipeline            │
                │  [SR] → Receive → Detect → Align →   │
                │  [Head Pose] → PAD → MAD →           │
                │  [Deepfake] → Quality → Extract →    │
                │  [Age] → [Attributes] → Route →      │
                │  Respond                             │
                └──────────────┬──────────────────────┘
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
┌─────────▼─────────┐ ┌───────▼───────────┐ ┌─────▼─────────────┐
│   Core Domain     │ │  FAISS Cluster    │ │  ONNX Runtime     │
│  Pure functions   │ │  IVF-PQ shards    │ │  SCRFD detection  │
│  Frozen types     │ │  Event-sourced    │ │  w600k_r50        │
│  Split ports      │ │  EventLogPort     │ │  CPU / TensorRT   │
└───────────────────┘ └───────────────────┘ └───────────────────┘

Stages in [brackets] are optional — they are wired into the pipeline only when the corresponding model file is present at startup.

Inbound Adapters — REST and gRPC gateways that accept biometric requests and translate them into domain operations.
Core Domain — Pure logic with zero external dependencies. Frozen dataclasses, Protocol ports, and pure functions for template operations, scoring, and orchestration.
Outbound Adapters — ONNX Runtime inference, FAISS vector search (via split ports: VectorSearchPort for ANN search, VectorLookupPort for 1:1 template fetch, VectorMutationPort for enrol/delete), OFIQ quality assessment, PAD, MAD, age estimation, head pose, deepfake detection, face attributes, and super-resolution modules.

Tech Stack

Layer	Technology	Role
Orchestration	Python	API layer, pipeline coordination, service orchestration
Hot paths	Rust	FAISS shard bindings, SIMD operations, compaction
Inference	ONNX Runtime	CPU (AVX-512) and optional GPU (TensorRT FP16/INT8)
Vector store	FAISS IndexIVFPQ	Multi-million vectors with 32x PQ compression
Detection	SCRFD_10G	5-point landmark detection (95.2% WiderFace Easy)
Recognition	w600k_r50 (ArcFace on WebFace600K)	512-dim L2-normalised embeddings
Mask-aware recognition	w600k_mbf (ArcFace MobileFaceNet)	Optional; buffalo_sc, for occluded/masked faces
Quality	eDifFIQA(T)	ISO/IEC 29794-5 compliant face image quality scoring
Anti-spoofing	MiniFASNetV2	Physical + digital attack detection (ISO 30107-3)
Morphing detection	HRNet-W18 (SelfMAD)	Document enrollment morphing attacks
Age estimation	InsightFace genderage.onnx	Age in years from aligned face crop; optional stage
Head pose	yakhyo ResNet-18	Pitch/yaw/roll from rotation matrix; yaw gate rejects profiles
Deepfake detection	ViT-base (Deep-Fake-Detector-v2, quantised)	Binary genuine/deepfake classifier; optional stage
Face attributes	InsightFace genderage.onnx	Gender classification; optional stage
Super-resolution	Real-ESRGAN x4plus	4x upscaling pre-detect; optional stage for low-res inputs
Transport	gRPC (internal), REST (external)	Inter-service and gateway communication
Deployment	Docker, Kubernetes, Terraform (AWS EKS)	Containerised microservices

Design Principles

The architecture follows Rich Hickey’s “Simple Made Easy” philosophy — simplicity through decoupling, not through convenience.

Values over State — Immutable frozen dataclasses for templates and transaction records. No in-place mutation of indexes or templates.
Data over Objects — Templates are plain 512-dim float32 vectors. cosine_similarity(a, b), not template.match(other).
Queues over Direct Coupling — Pipeline stages are independent functions connected by async queues. Each stage takes a dict and returns a dict.
Composition over Inheritance — Pipeline stages compose via ports and adapters, never inherit.
Policy as Configuration — Thresholds, routing rules, and quality gates are configuration values, not embedded logic. Measurement and decision are always separated.
Epochal Time — FAISS indexes are immutable snapshots. Mutations are events that produce new index versions.
Optional stages — Every non-core capability (age, head pose, deepfake, attributes, super-resolution) is wired in only when its model file is present. Absent model = stage omitted. No configuration flag required.

Pipeline

The biometric pipeline processes images through a sequence of composable, dict-in/dict-out stages. Required stages run on every request; optional stages (marked [optional]) are present only when the corresponding ONNX model file exists at startup.

#	Stage	Optional	Model
1	Receive	no	—
2	Super-resolution	yes	`realesrgan_x4plus.onnx`
3	Detect	no	`det_10g.onnx`
4	Align	no	—
5	Head pose	yes	`head_pose_resnet18.onnx`
6	PAD	no	`MiniFASNetV2.onnx`
7	MAD	no	`mad_selfmad_hrnet_w18.onnx`
8	Deepfake	yes	`deepfake_vit_q.onnx`
9	Quality	no	`ediffiqa_tiny.onnx`
10	Extract	no	`w600k_r50.onnx`
11	Age	yes	`genderage.onnx`
12	Face attributes	yes	`genderage.onnx`
13	Route	no	—
14	Respond	no	—

Each stage is a pure function with no shared mutable state. The pipeline runner manages a context dict that accumulates results; each stage receives only the payload keys it declares — stages never see the full context. Stages are connected via async queues and run by a generic queue runner. Raw imagery is held in volatile memory only during pipeline processing and is never written to disk.

See Pipeline Reference for complete stage inputs, outputs, and gate logic.

Vector Store

UFME uses a sharded, in-memory FAISS cluster with an epochal time model:

IndexIVFPQ — The 512-dim space is partitioned into Voronoi cells via k-means. Within each cell, vectors are compressed via Product Quantisation (M=64 sub-vectors, 8-bit codebook) from 2,048 bytes to 64 bytes per vector (32x compression).
Scatter-gather search — A query is broadcast to all shards (4-6 nodes) via gRPC. Each shard returns local top-K results. The aggregator merges and reranks the top candidates against exact stored vectors, recovering Recall@1 to 97%+. Vectors cross the Python–Rust boundary as raw little-endian bytes in protobuf (not repeated float) for zero-overhead serialization.
Epochal snapshots — Index mutations are captured as events via EventLogPort (append-only, ordered). New index versions are built from the event log, producing immutable snapshots. No in-place mutation of the live index.
Binning and filtering — Metadata-based pre-filtering via bitsets narrows the search space before distance computation. Logical partitions segment the gallery by use case.
Future scale — DiskANN for 500M+ vectors (SSD-backed, <5ms latency); CAGRA/cuVS for GPU-accelerated batch analytics (33-77x throughput).

See docs/design/faiss-design.md for the full vector store specification.

Design Documentation

Architecture and Design

Document	Description
Domain design	Core domain model: frozen types, Protocol ports, pure functions
Pipeline design	Pipeline architecture: composable dict-in/dict-out stages
FAISS design	FAISS IVF-PQ: epochal snapshots, scatter-gather, event sourcing
Complecting audit	Hickey “Simple Made Easy” analysis of the design
Research: executive summary	SOTA gap analysis and recommendations (2026)

Research

Document	Description
Face detection SOTA	SCRFD, YOLO, RetinaFace — comparison and rationale
Face recognition SOTA	AdaFace, TopoFR, ArcFace — comparison and rationale
Vector search SOTA	FAISS IVF-PQ, DiskANN, CAGRA at 200M+ scale
PAD and quality SOTA	Presentation attack detection and ISO quality assessment