Architecture
Overview
Section titled “Overview”UFME (Universal Face Matching Engine) is an open source biometric face matching engine built for large-scale deployments. It uses a strict hexagonal architecture to decouple biometric algorithms from infrastructure, enabling high-throughput 1:N face search with quality assessment, presentation attack detection, and morphing attack detection.
Design targets: ~200 million face gallery, 60 million annual 1:N searches, sub-second end-to-end latency.
Problem Statement
Section titled “Problem Statement”Matching a probe face against hundreds of millions of gallery entries in real-time demands a system that solves several problems simultaneously:
- Scale — Searching 200M+ vectors with sub-second latency rules out brute-force approaches. The index must compress vectors aggressively while preserving recall.
- Quality — Input images vary from controlled passport photos to surveillance captures. Quality must be measured (not assumed) and the accept/reject decision must be a configurable policy, not embedded logic.
- Security — Presentation attacks (print, replay, 3D mask, deepfake) and morphing attacks (blended document photos) must be detected before a template enters the gallery or produces a match.
- Modularity — Detection models, extraction models, vector stores, and transport layers all evolve independently. Replacing any component must not require rewriting the pipeline.
Architecture
Section titled “Architecture”UFME follows a strict Hexagonal Architecture (Ports and Adapters). The core domain defines Protocol interfaces for all operations; adapters implement those interfaces against concrete infrastructure.
┌─────────────────────────────────────┐ │ Inbound Adapters │ │ REST / gRPC Gateways │ └──────────────┬──────────────────────┘ │ ┌──────────────▼──────────────────────┐ │ Biometric Pipeline │ │ Receive → PAD → Enrol Router → MAD →│ │ Detect → Align → Quality → Extract →│ │ Route → Respond │ └──────────────┬──────────────────────┘ │ ┌────────────────────┼────────────────────┐ │ │ │┌─────────▼─────────┐ ┌───────▼───────────┐ ┌─────▼─────────────┐│ Core Domain │ │ FAISS Cluster │ │ ONNX Runtime ││ Pure functions │ │ IVF-PQ shards │ │ SCRFD + ViT ││ Frozen types │ │ Event-sourced │ │ AdaFace-trained ││ Split ports │ │ EventLogPort │ │ CPU / TensorRT │└───────────────────┘ └───────────────────┘ └───────────────────┘- Inbound Adapters — REST and gRPC gateways that accept biometric requests and translate them into domain operations.
- Core Domain — Pure logic with zero external dependencies. Frozen dataclasses, Protocol ports, and pure functions for template operations, scoring, and orchestration.
- Outbound Adapters — ONNX Runtime inference, FAISS vector search (via split ports:
VectorSearchPortfor ANN search,VectorLookupPortfor 1:1 template fetch,VectorMutationPortfor enrol/delete), OFIQ quality assessment, PAD, and MAD modules.
Tech Stack
Section titled “Tech Stack”| Layer | Technology | Role |
|---|---|---|
| Orchestration | Python | API layer, pipeline coordination, service orchestration |
| Hot paths | Rust | FAISS shard bindings, SIMD operations, compaction |
| Inference | ONNX Runtime | CPU (AVX-512) and optional GPU (TensorRT FP16/INT8) |
| Vector store | FAISS IndexIVFPQ | 200M vectors in ~12.5 GB RAM (32x PQ compression) |
| Detection | SCRFD_10G | 5-point landmark detection (95.2% WiderFace Easy) |
| Recognition | ViT + AdaFace | 512-dim L2-normalised embeddings, quality-adaptive margin |
| Quality | OFIQ | ISO/IEC 29794-5 compliant face image quality scoring |
| Anti-spoofing | Unified PAD (ViT) | Physical + digital attack detection (ISO 30107-3) |
| Morphing detection | CLIP + LoRA (MADation) | Document enrollment morphing attacks |
| Transport | gRPC (internal), REST (external) | Inter-service and gateway communication |
| Deployment | Docker, Kubernetes | Containerised microservices |
Design Principles
Section titled “Design Principles”The architecture follows Rich Hickey’s “Simple Made Easy” philosophy — simplicity through decoupling, not through convenience.
- Values over State — Immutable frozen dataclasses for templates and transaction records. No in-place mutation of indexes or templates.
- Data over Objects — Templates are plain 512-dim float32 vectors.
cosine_similarity(a, b), nottemplate.match(other). - Queues over Direct Coupling — Pipeline stages are independent functions connected by async queues. Each stage takes a dict and returns a dict.
- Composition over Inheritance — Pipeline stages compose via ports and adapters, never inherit.
- Policy as Configuration — Thresholds, routing rules, and quality gates are configuration values, not embedded logic. Measurement and decision are always separated.
- Epochal Time — FAISS indexes are immutable snapshots. Mutations are events that produce new index versions.
Pipeline
Section titled “Pipeline”The biometric pipeline processes images through a sequence of composable, dict-in/dict-out stages:
- Receive — Parse inbound payload into a plain dict. DELETE operations route directly to Route (bypassing the image pipeline).
- PAD — Unified presentation attack detection (physical + digital). Produces a spoof score; the gate decision is a separate configurable policy.
- Enrol Router — Routes ENROL operations to MAD, all others to Detect. Separates operation routing from spoof detection (single-concern gates).
- MAD — Morphing attack detection on the enrollment path only (CLIP + LoRA). Runs before Detect so morphed images are rejected before expensive face processing.
- Detect — SCRFD_10G locates the face and five landmark points.
- Align — Affine transform crops and aligns the face to a 112x112 pixel grid.
- Quality — OFIQ measures quality components (illumination, pose, focus, occlusion). The accept/reject gate is a separate policy.
- Extract — ViT produces a 512-dim L2-normalised embedding.
- Route — Dispatches to Search, Verify, Enrol, or Delete based on the operation type.
- Respond — Formats the result for the outbound transport.
Each stage is a pure function with no shared mutable state. The pipeline runner manages a context dict (request metadata, routing, accumulated results) while each stage receives only the payload keys it requires — stages never see the full context. Stages are connected via async queues and run by a generic queue runner. No raw imagery is persisted to disk — images exist in volatile memory only during processing.
See docs/design/pipeline-design.md for the full pipeline specification.
Vector Store
Section titled “Vector Store”UFME uses a sharded, in-memory FAISS cluster with an epochal time model:
- IndexIVFPQ — The 512-dim space is partitioned into Voronoi cells via k-means. Within each cell, vectors are compressed via Product Quantisation (M=64 sub-vectors, 8-bit codebook) from 2,048 bytes to 64 bytes per vector.
- Scatter-gather search — A query is broadcast to all shards (4-6 nodes) via gRPC. Each shard returns local top-K results. The aggregator merges and reranks the top candidates against exact stored vectors, recovering Recall@1 to 97%+. Vectors cross the Python–Rust boundary as raw little-endian
bytesin protobuf (notrepeated float) for zero-overhead serialization. See docs/design/faiss-design.md §2 “Wire Format” for the full protobuf schema and accretion policy. - Epochal snapshots — Index mutations are captured as events via
EventLogPort(append-only, ordered, with read-from-offset and GC support). New index versions are built from the event log, producing immutable snapshots. No in-place mutation of the live index. - Binning and filtering — Metadata-based pre-filtering via bitsets narrows the search space before distance computation. Logical partitions segment the gallery by use case.
- Future scale — DiskANN for 500M+ vectors (SSD-backed, <5ms latency); CAGRA/cuVS for GPU-accelerated batch analytics (33-77x throughput).
See docs/design/faiss-design.md for the full vector store specification.
Design Documentation
Section titled “Design Documentation”Architecture and Design
Section titled “Architecture and Design”| Document | Description |
|---|---|
| docs/design/domain-design.md | Core domain model: frozen types, Protocol ports, pure functions |
| docs/design/pipeline-design.md | Pipeline architecture: composable dict-in/dict-out stages |
| docs/design/faiss-design.md | FAISS IVF-PQ: epochal snapshots, scatter-gather, event sourcing |
| docs/design/complecting-audit.md | Hickey “Simple Made Easy” analysis of the design |
| docs/research/executive-summary.md | SOTA gap analysis and recommendations (2026) |
Research
Section titled “Research”| Document | Description |
|---|---|
| docs/research/sota-detection.md | Face detection SOTA (SCRFD, YOLO, etc.) |
| docs/research/sota-recognition.md | Face recognition SOTA (AdaFace, TopoFR, etc.) |
| docs/research/sota-vector-search.md | Vector search at 200M+ scale |
| docs/research/sota-pad-quality.md | PAD and quality assessment research |