Deployment
Docker Compose (development)
Section titled “Docker Compose (development)”The quickest way to run UFME locally with all components:
docker compose upThis starts:
- REST gateway on port
8080 - 2 FAISS shard processes on ports
50051and50052 - Shared volume for model files
Build the images first:
make buildEnvironment variables
Section titled “Environment variables”Create a .env file in the project root:
UFME__QUALITY__MIN_SCORE=0.5UFME__PAD__ENABLED=trueUFME__MAD__ENABLED=trueUFME__SERVER__PORT=8080Kubernetes (production)
Section titled “Kubernetes (production)”Kubernetes manifests and a Helm chart are in deploy/.
Quick deploy to local cluster (minikube / kind)
Section titled “Quick deploy to local cluster (minikube / kind)”make deploy-localThis runs kubectl apply -k deploy/, which creates:
Deployment— REST gateway (1 replica, scales horizontally)Deployment— FAISS shard (4 replicas, each holding a partition of the gallery)Service— ClusterIP for internal gRPC communication between gateway and shardsService— LoadBalancer on port 8080 for external REST accessConfigMap— pipeline configurationPersistentVolumeClaim— model storage
Scaling shards
Section titled “Scaling shards”Each FAISS shard holds an independent partition of the gallery. Scatter-gather search fans out to all shards in parallel and merges results. To scale:
kubectl scale deployment ufme-shard --replicas=6Update config/config.yaml to add the new shard endpoints, then restart the gateway.
Resource requirements
Section titled “Resource requirements”| Component | CPU | Memory | Notes |
|---|---|---|---|
| Gateway | 2 cores | 2 GB | Scales horizontally |
| FAISS shard (50M vectors) | 4 cores | 8 GB | ~3.1 GB index + headroom |
| FAISS shard (200M vectors) | 8 cores | 24 GB | ~12.5 GB index + headroom |
Health checks
Section titled “Health checks”The gateway exposes GET /health for Kubernetes liveness and readiness probes:
livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 5Index persistence
Section titled “Index persistence”FAISS indexes are persisted as epochal snapshots. Each snapshot consists of:
snapshots/└── v1/ ├── manifest.json # metadata: ntotal, timestamp, shard config ├── index.faiss # serialised FAISS IndexIVFPQ binary ├── vectors.bin # raw float32 embeddings (for exact reranking) └── id_map.jsonl # subject_id → sequential FAISS index mappingTo restore from a snapshot on startup, mount the snapshot directory as a volume and set:
faiss: snapshot_dir: /data/snapshots load_on_startup: trueObservability
Section titled “Observability”UFME logs structured JSON to stdout. Recommended stack: Loki + Grafana or any ELK-compatible pipeline.
Key log fields:
| Field | Description |
|---|---|
operation | search / verify / enrol / delete |
latency_ms | End-to-end request latency |
quality_score | Face quality score (0–1) |
pad_score | Spoof probability (0–1) |
shard_count | Number of shards queried |
candidates_returned | Top-K candidates in response |