Architecture
Overview
Section titled “Overview”UFME (Universal Face Matching Engine) is an open source biometric face matching engine built for large-scale deployments. It uses a strict hexagonal architecture to decouple biometric algorithms from infrastructure, enabling high-throughput 1:N face search with quality assessment, presentation attack detection, and morphing attack detection.
Design targets: Multi-million face gallery, 60 million annual 1:N searches, sub-second end-to-end latency.
Problem Statement
Section titled “Problem Statement”Matching a probe face against hundreds of millions of gallery entries in real-time demands a system that solves several problems simultaneously:
- Scale — Searching millions of vectors with sub-second latency rules out brute-force approaches. The index must compress vectors aggressively while preserving recall.
- Quality — Input images vary from controlled passport photos to surveillance captures. Quality must be measured (not assumed) and the accept/reject decision must be a configurable policy, not embedded logic.
- Security — Presentation attacks (print, replay, 3D mask, deepfake) and morphing attacks (blended document photos) must be detected before a template enters the gallery or produces a match.
- Modularity — Detection models, extraction models, vector stores, and transport layers all evolve independently. Replacing any component must not require rewriting the pipeline.
Architecture
Section titled “Architecture”UFME follows a strict Hexagonal Architecture (Ports and Adapters). The core domain defines Protocol interfaces for all operations; adapters implement those interfaces against concrete infrastructure.
┌─────────────────────────────────────┐ │ Inbound Adapters │ │ REST / gRPC Gateways │ └──────────────┬──────────────────────┘ │ ┌──────────────▼──────────────────────┐ │ Biometric Pipeline │ │ [SR] → Receive → Detect → Align → │ │ [Head Pose] → PAD → MAD → │ │ [Deepfake] → Quality → Extract → │ │ [Age] → [Attributes] → Route → │ │ Respond │ └──────────────┬──────────────────────┘ │ ┌────────────────────┼────────────────────┐ │ │ │┌─────────▼─────────┐ ┌───────▼───────────┐ ┌─────▼─────────────┐│ Core Domain │ │ FAISS Cluster │ │ ONNX Runtime ││ Pure functions │ │ IVF-PQ shards │ │ SCRFD detection ││ Frozen types │ │ Event-sourced │ │ w600k_r50 ││ Split ports │ │ EventLogPort │ │ CPU / TensorRT │└───────────────────┘ └───────────────────┘ └───────────────────┘Stages in [brackets] are optional — they are wired into the pipeline only when the corresponding model file is present at startup.
- Inbound Adapters — REST and gRPC gateways that accept biometric requests and translate them into domain operations.
- Core Domain — Pure logic with zero external dependencies. Frozen dataclasses, Protocol ports, and pure functions for template operations, scoring, and orchestration.
- Outbound Adapters — ONNX Runtime inference, FAISS vector search (via split ports:
VectorSearchPortfor ANN search,VectorLookupPortfor 1:1 template fetch,VectorMutationPortfor enrol/delete), OFIQ quality assessment, PAD, MAD, age estimation, head pose, deepfake detection, face attributes, and super-resolution modules.
Tech Stack
Section titled “Tech Stack”| Layer | Technology | Role |
|---|---|---|
| Orchestration | Python | API layer, pipeline coordination, service orchestration |
| Hot paths | Rust | FAISS shard bindings, SIMD operations, compaction |
| Inference | ONNX Runtime | CPU (AVX-512) and optional GPU (TensorRT FP16/INT8) |
| Vector store | FAISS IndexIVFPQ | Multi-million vectors with 32x PQ compression |
| Detection | SCRFD_10G | 5-point landmark detection (95.2% WiderFace Easy) |
| Recognition | w600k_r50 (ArcFace on WebFace600K) | 512-dim L2-normalised embeddings |
| Mask-aware recognition | w600k_mbf (ArcFace MobileFaceNet) | Optional; buffalo_sc, for occluded/masked faces |
| Quality | eDifFIQA(T) | ISO/IEC 29794-5 compliant face image quality scoring |
| Anti-spoofing | MiniFASNetV2 | Physical + digital attack detection (ISO 30107-3) |
| Morphing detection | HRNet-W18 (SelfMAD) | Document enrollment morphing attacks |
| Age estimation | InsightFace genderage.onnx | Age in years from aligned face crop; optional stage |
| Head pose | yakhyo ResNet-18 | Pitch/yaw/roll from rotation matrix; yaw gate rejects profiles |
| Deepfake detection | ViT-base (Deep-Fake-Detector-v2, quantised) | Binary genuine/deepfake classifier; optional stage |
| Face attributes | InsightFace genderage.onnx | Gender classification; optional stage |
| Super-resolution | Real-ESRGAN x4plus | 4x upscaling pre-detect; optional stage for low-res inputs |
| Transport | gRPC (internal), REST (external) | Inter-service and gateway communication |
| Deployment | Docker, Kubernetes, Terraform (AWS EKS) | Containerised microservices |
Design Principles
Section titled “Design Principles”The architecture follows Rich Hickey’s “Simple Made Easy” philosophy — simplicity through decoupling, not through convenience.
- Values over State — Immutable frozen dataclasses for templates and transaction records. No in-place mutation of indexes or templates.
- Data over Objects — Templates are plain 512-dim float32 vectors.
cosine_similarity(a, b), nottemplate.match(other). - Queues over Direct Coupling — Pipeline stages are independent functions connected by async queues. Each stage takes a dict and returns a dict.
- Composition over Inheritance — Pipeline stages compose via ports and adapters, never inherit.
- Policy as Configuration — Thresholds, routing rules, and quality gates are configuration values, not embedded logic. Measurement and decision are always separated.
- Epochal Time — FAISS indexes are immutable snapshots. Mutations are events that produce new index versions.
- Optional stages — Every non-core capability (age, head pose, deepfake, attributes, super-resolution) is wired in only when its model file is present. Absent model = stage omitted. No configuration flag required.
Pipeline
Section titled “Pipeline”The biometric pipeline processes images through a sequence of composable, dict-in/dict-out stages. Required stages run on every request; optional stages (marked [optional]) are present only when the corresponding ONNX model file exists at startup.
| # | Stage | Optional | Model |
|---|---|---|---|
| 1 | Receive | no | — |
| 2 | Super-resolution | yes | realesrgan_x4plus.onnx |
| 3 | Detect | no | det_10g.onnx |
| 4 | Align | no | — |
| 5 | Head pose | yes | head_pose_resnet18.onnx |
| 6 | PAD | no | MiniFASNetV2.onnx |
| 7 | MAD | no | mad_selfmad_hrnet_w18.onnx |
| 8 | Deepfake | yes | deepfake_vit_q.onnx |
| 9 | Quality | no | ediffiqa_tiny.onnx |
| 10 | Extract | no | w600k_r50.onnx |
| 11 | Age | yes | genderage.onnx |
| 12 | Face attributes | yes | genderage.onnx |
| 13 | Route | no | — |
| 14 | Respond | no | — |
Each stage is a pure function with no shared mutable state. The pipeline runner manages a context dict that accumulates results; each stage receives only the payload keys it declares — stages never see the full context. Stages are connected via async queues and run by a generic queue runner. Raw imagery is held in volatile memory only during pipeline processing and is never written to disk.
See Pipeline Reference for complete stage inputs, outputs, and gate logic.
Vector Store
Section titled “Vector Store”UFME uses a sharded, in-memory FAISS cluster with an epochal time model:
- IndexIVFPQ — The 512-dim space is partitioned into Voronoi cells via k-means. Within each cell, vectors are compressed via Product Quantisation (M=64 sub-vectors, 8-bit codebook) from 2,048 bytes to 64 bytes per vector (32x compression).
- Scatter-gather search — A query is broadcast to all shards (4-6 nodes) via gRPC. Each shard returns local top-K results. The aggregator merges and reranks the top candidates against exact stored vectors, recovering Recall@1 to 97%+. Vectors cross the Python–Rust boundary as raw little-endian
bytesin protobuf (notrepeated float) for zero-overhead serialization. - Epochal snapshots — Index mutations are captured as events via
EventLogPort(append-only, ordered). New index versions are built from the event log, producing immutable snapshots. No in-place mutation of the live index. - Binning and filtering — Metadata-based pre-filtering via bitsets narrows the search space before distance computation. Logical partitions segment the gallery by use case.
- Future scale — DiskANN for 500M+ vectors (SSD-backed, <5ms latency); CAGRA/cuVS for GPU-accelerated batch analytics (33-77x throughput).
See docs/design/faiss-design.md for the full vector store specification.
Design Documentation
Section titled “Design Documentation”Architecture and Design
Section titled “Architecture and Design”| Document | Description |
|---|---|
| Domain design | Core domain model: frozen types, Protocol ports, pure functions |
| Pipeline design | Pipeline architecture: composable dict-in/dict-out stages |
| FAISS design | FAISS IVF-PQ: epochal snapshots, scatter-gather, event sourcing |
| Complecting audit | Hickey “Simple Made Easy” analysis of the design |
| Research: executive summary | SOTA gap analysis and recommendations (2026) |
Research
Section titled “Research”| Document | Description |
|---|---|
| Face detection SOTA | SCRFD, YOLO, RetinaFace — comparison and rationale |
| Face recognition SOTA | AdaFace, TopoFR, ArcFace — comparison and rationale |
| Vector search SOTA | FAISS IVF-PQ, DiskANN, CAGRA at 200M+ scale |
| PAD and quality SOTA | Presentation attack detection and ISO quality assessment |