Kubernetes Deployment
UFME ships with production-ready Kubernetes manifests (deploy/k8s/) and a Helm chart (deploy/helm/) that parameterises the same resources. Use the raw manifests for simple fixed deployments; use Helm for environments that require per-environment value overrides.
Architecture
Section titled “Architecture” ┌──────────────────────────────────┐ │ ufme namespace │ │ │ Ingress ──────►│ ufme-api (Deployment) │ │ 2–8 replicas, HPA │ │ :8080 (HTTP, Prometheus scrape) │ │ │ gRPC │ │ ▼ │ │ ufme-shard (StatefulSet) │ │ ufme-shard-0 … ufme-shard-4 │ │ :50051, headless Service │ │ │ │ ufme-compaction (CronJob) │ │ runs every hour │ └──────────────────────────────────┘Three workload types:
| Workload | Kind | Default replicas | Purpose |
|---|---|---|---|
ufme-api | Deployment | 2 (HPA: 2–8) | REST gateway, pipeline orchestration |
ufme-shard | StatefulSet | 5 | FAISS IVF-PQ index shards |
ufme-compaction | CronJob | — (hourly) | Rebuilds index from event log |
Namespace
Section titled “Namespace”All resources live in the ufme namespace:
apiVersion: v1kind: Namespacemetadata: name: ufmeApply with kubectl apply -f deploy/k8s/namespace.yaml.
FAISS shards — StatefulSet
Section titled “FAISS shards — StatefulSet”Shards use a StatefulSet because:
- Stable pod identity — shards are addressed as
ufme-shard-0.ufme-shard.ufme.svcvia the headless Service. The API gateway uses these stable DNS names to fan out gRPC requests. - Ordered startup — shard-0 starts before shard-1, ensuring predictable index restoration on cluster restart.
- Per-pod PVCs —
volumeClaimTemplatesprovisions oneReadWriteOncePVC per shard pod, automatically namedindex-data-ufme-shard-{n}. Default size: 10 Gi (adjustable viashard.storage.sizein Helm values).
Headless Service
Section titled “Headless Service”spec: clusterIP: None # headless — DNS returns pod IPs directly selector: app.kubernetes.io/name: ufme-shard ports: - name: grpc port: 50051A headless service with clusterIP: None gives each pod a stable DNS A record (ufme-shard-{n}.ufme-shard.ufme.svc.cluster.local).
Resource profile
Section titled “Resource profile”| Resource | Request | Limit |
|---|---|---|
| CPU | 1 core | 2 cores |
| Memory | 4 Gi | 6 Gi |
| Storage (PVC) | 10 Gi | — |
The default 6 Gi memory limit is appropriate for development gallery sizes. For 40M+ vector production galleries, increase to accommodate the compressed IVF-PQ index (~2.56 GB) plus any reranking vectors.
Node affinity
Section titled “Node affinity”Shards benefit from dedicated high-memory nodes. Use node selectors and tolerations to isolate shard pods:
shard: nodeSelector: role: shard tolerations: - key: dedicated value: faiss-shard effect: NoScheduleLabel your shard nodes: kubectl label node <node> role=shard and apply the matching taint.
API gateway — Deployment
Section titled “API gateway — Deployment”The API gateway runs as a standard Deployment with HorizontalPodAutoscaler.
HPA configuration
Section titled “HPA configuration”spec: minReplicas: 2 maxReplicas: 8 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70Scale-out triggers at 70% CPU utilisation across the deployment. The minimum of 2 replicas provides availability during rolling updates.
Resource profile
Section titled “Resource profile”| Resource | Request | Limit |
|---|---|---|
| CPU | 500m | 2 cores |
| Memory | 1 Gi | 4 Gi |
Health checks
Section titled “Health checks”Both liveness and readiness probes call GET /health:
| Probe | Path | Initial delay | Period |
|---|---|---|---|
| Liveness | /health | 10 s | 30 s |
| Readiness | /health | 5 s | 10 s |
Prometheus metrics
Section titled “Prometheus metrics”The API pod is annotated for Prometheus scraping:
annotations: prometheus.io/scrape: "true" prometheus.io/port: "8080" prometheus.io/path: "/metrics"Compaction — CronJob
Section titled “Compaction — CronJob”The compaction worker rebuilds the FAISS index from the event log on a schedule:
spec: schedule: "0 */1 * * *" # every hour concurrencyPolicy: Forbid # never two compactions simultaneously backoffLimit: 2 activeDeadlineSeconds: 3600It reads from two PVCs shared with the API:
| PVC | Mount | Size | Purpose |
|---|---|---|---|
ufme-event-log | /data/events | 20 Gi | Append-only enrolment/delete events |
ufme-index-output | /data/index | 50 Gi | New index snapshots written here |
After building a new snapshot, the compaction binary calls SwapSnapshot RPC on each shard, which atomically replaces the live index.
Resource profile
Section titled “Resource profile”| Resource | Request | Limit |
|---|---|---|
| CPU | 2 cores | 4 cores |
| Memory | 8 Gi | 16 Gi |
PodDisruptionBudgets
Section titled “PodDisruptionBudgets”PodDisruptionBudgets ensure availability during node maintenance:
# API: at least 1 replica always availableminAvailable: 1
# Shard: at least 1 shard always availableminAvailable: 1Applied via deploy/k8s/pdbs.yaml.
Helm chart
Section titled “Helm chart”The Helm chart at deploy/helm/ is functionally equivalent to the raw manifests but parameterised.
Install
Section titled “Install”helm install ufme ./deploy/helm \ --namespace ufme \ --create-namespace \ --values deploy/helm/values.yamlKey values
Section titled “Key values”api: replicas: 2 hpa: enabled: true minReplicas: 2 maxReplicas: 8 targetCPUUtilization: 70
shard: replicas: 5 storage: size: 10Gi storageClass: "" # use default storage class s3Bucket: "" # optional: S3-backed snapshot storage
compaction: schedule: "0 */1 * * *" intervalSecs: 3600
config: thresholds: pad: "0.85" mad: "0.75" quality: "0.40" similarity: "0.45" models: scrfd: "/models/scrfd_10g.onnx" adaface: "/models/w600k_r50.onnx" pad: "/models/MiniFASNetV2.onnx" mad: "/models/mad_selfmad_hrnet_w18.onnx" quality: "/models/ediffiqa_tiny.onnx"Staging vs production
Section titled “Staging vs production”Override values per environment:
# staging (2 shards, smaller resources)helm upgrade ufme ./deploy/helm \ --set shard.replicas=2 \ --set shard.storage.size=5Gi
# production (5 shards, dedicated nodes, S3 snapshots)helm upgrade ufme ./deploy/helm \ --set shard.replicas=5 \ --set shard.storage.s3Bucket=my-ufme-snapshots \ --set shard.storage.size=100GiDeploy checklist
Section titled “Deploy checklist”- Build Docker images:
make build - Push images to your registry and update
image.repositoryinvalues.yaml - Apply namespace:
kubectl apply -f deploy/k8s/namespace.yaml - Apply PVCs:
kubectl apply -f deploy/k8s/pvcs.yaml - Apply ConfigMap:
kubectl apply -f deploy/k8s/configmap.yaml - Install Helm chart or apply raw manifests
- Verify shard pods are Running:
kubectl get pods -n ufme - Verify API is reachable:
kubectl port-forward svc/ufme-api 8080:8080 -n ufme - Load demo gallery:
curl -X POST http://localhost:8080/api/v1/demo/load