Skip to content

Deployment

The quickest way to run UFME locally with all components:

Terminal window
docker compose up

This starts:

  • REST gateway on port 8080
  • 2 FAISS shard processes on ports 50051 and 50052
  • Shared volume for model files

Build the images first:

Terminal window
make build

Create a .env file in the project root:

Terminal window
UFME__QUALITY__MIN_SCORE=0.5
UFME__PAD__ENABLED=true
UFME__MAD__ENABLED=true
UFME__SERVER__PORT=8080

Kubernetes manifests and a Helm chart are in deploy/.

Quick deploy to local cluster (minikube / kind)

Section titled “Quick deploy to local cluster (minikube / kind)”
Terminal window
make deploy-local

This runs kubectl apply -k deploy/, which creates:

  • Deployment — REST gateway (1 replica, scales horizontally)
  • Deployment — FAISS shard (4 replicas, each holding a partition of the gallery)
  • Service — ClusterIP for internal gRPC communication between gateway and shards
  • Service — LoadBalancer on port 8080 for external REST access
  • ConfigMap — pipeline configuration
  • PersistentVolumeClaim — model storage

Each FAISS shard holds an independent partition of the gallery. Scatter-gather search fans out to all shards in parallel and merges results. To scale:

Terminal window
kubectl scale deployment ufme-shard --replicas=6

Update config/config.yaml to add the new shard endpoints, then restart the gateway.

ComponentCPUMemoryNotes
Gateway2 cores2 GBScales horizontally
FAISS shard (50M vectors)4 cores8 GB~3.1 GB index + headroom
FAISS shard (200M vectors)8 cores24 GB~12.5 GB index + headroom

The gateway exposes GET /health for Kubernetes liveness and readiness probes:

livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5

FAISS indexes are persisted as epochal snapshots. Each snapshot consists of:

snapshots/
└── v1/
├── manifest.json # metadata: ntotal, timestamp, shard config
├── index.faiss # serialised FAISS IndexIVFPQ binary
├── vectors.bin # raw float32 embeddings (for exact reranking)
└── id_map.jsonl # subject_id → sequential FAISS index mapping

To restore from a snapshot on startup, mount the snapshot directory as a volume and set:

faiss:
snapshot_dir: /data/snapshots
load_on_startup: true

UFME logs structured JSON to stdout. Recommended stack: Loki + Grafana or any ELK-compatible pipeline.

Key log fields:

FieldDescription
operationsearch / verify / enrol / delete
latency_msEnd-to-end request latency
quality_scoreFace quality score (0–1)
pad_scoreSpoof probability (0–1)
shard_countNumber of shards queried
candidates_returnedTop-K candidates in response