Deployment

Docker Compose (development)

The quickest way to run UFME locally with all components:

docker compose up

This starts:

REST gateway on port 8080
2 FAISS shard processes on ports 50051 and 50052
Shared volume for model files

Build the images first:

make build

Environment variables

Create a .env file in the project root:

UFME__QUALITY__MIN_SCORE=0.5
UFME__PAD__ENABLED=true
UFME__MAD__ENABLED=true
UFME__SERVER__PORT=8080

Kubernetes (production)

Kubernetes manifests and a Helm chart are in deploy/.

Quick deploy to local cluster (minikube / kind)

make deploy-local

This runs kubectl apply -k deploy/, which creates:

Deployment — REST gateway (1 replica, scales horizontally)
Deployment — FAISS shard (4 replicas, each holding a partition of the gallery)
Service — ClusterIP for internal gRPC communication between gateway and shards
Service — LoadBalancer on port 8080 for external REST access
ConfigMap — pipeline configuration
PersistentVolumeClaim — model storage

Scaling shards

Each FAISS shard holds an independent partition of the gallery. Scatter-gather search fans out to all shards in parallel and merges results. To scale:

kubectl scale deployment ufme-shard --replicas=6

Update config/config.yaml to add the new shard endpoints, then restart the gateway.

Resource requirements

Component	CPU	Memory	Notes
Gateway	2 cores	2 GB	Scales horizontally
FAISS shard (50M vectors)	4 cores	8 GB	~3.1 GB index + headroom
FAISS shard (200M vectors)	8 cores	24 GB	~12.5 GB index + headroom

Health checks

The gateway exposes GET /health for Kubernetes liveness and readiness probes:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

Index persistence

FAISS indexes are persisted as epochal snapshots. Each snapshot consists of:

snapshots/
└── v1/
    ├── manifest.json     # metadata: ntotal, timestamp, shard config
    ├── index.faiss       # serialised FAISS IndexIVFPQ binary
    ├── vectors.bin       # raw float32 embeddings (for exact reranking)
    └── id_map.jsonl      # subject_id → sequential FAISS index mapping

To restore from a snapshot on startup, mount the snapshot directory as a volume and set:

faiss:
  snapshot_dir: /data/snapshots
  load_on_startup: true

Observability

UFME logs structured JSON to stdout. Recommended stack: Loki + Grafana or any ELK-compatible pipeline.

Key log fields:

Field	Description
`operation`	`search` / `verify` / `enrol` / `delete`
`latency_ms`	End-to-end request latency
`quality_score`	Face quality score (0–1)
`pad_score`	Spoof probability (0–1)
`shard_count`	Number of shards queried
`candidates_returned`	Top-K candidates in response