Kubernetes Deployment

UFME ships with production-ready Kubernetes manifests (deploy/k8s/) and a Helm chart (deploy/helm/) that parameterises the same resources. Use the raw manifests for simple fixed deployments; use Helm for environments that require per-environment value overrides.

Architecture

                  ┌──────────────────────────────────┐
                  │  ufme namespace                  │
                  │                                  │
   Ingress ──────►│  ufme-api (Deployment)           │
                  │  2–8 replicas, HPA               │
                  │  :8080 (HTTP, Prometheus scrape)  │
                  │         │ gRPC                   │
                  │         ▼                        │
                  │  ufme-shard (StatefulSet)         │
                  │  ufme-shard-0 … ufme-shard-4     │
                  │  :50051, headless Service         │
                  │                                  │
                  │  ufme-compaction (CronJob)        │
                  │  runs every hour                 │
                  └──────────────────────────────────┘

Three workload types:

Workload	Kind	Default replicas	Purpose
`ufme-api`	Deployment	2 (HPA: 2–8)	REST gateway, pipeline orchestration
`ufme-shard`	StatefulSet	5	FAISS IVF-PQ index shards
`ufme-compaction`	CronJob	— (hourly)	Rebuilds index from event log

Namespace

All resources live in the ufme namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: ufme

Apply with kubectl apply -f deploy/k8s/namespace.yaml.

FAISS shards — StatefulSet

Shards use a StatefulSet because:

Stable pod identity — shards are addressed as ufme-shard-0.ufme-shard.ufme.svc via the headless Service. The API gateway uses these stable DNS names to fan out gRPC requests.
Ordered startup — shard-0 starts before shard-1, ensuring predictable index restoration on cluster restart.
Per-pod PVCs — volumeClaimTemplates provisions one ReadWriteOnce PVC per shard pod, automatically named index-data-ufme-shard-{n}. Default size: 10 Gi (adjustable via shard.storage.size in Helm values).

Headless Service

spec:
  clusterIP: None          # headless — DNS returns pod IPs directly
  selector:
    app.kubernetes.io/name: ufme-shard
  ports:
    - name: grpc
      port: 50051

A headless service with clusterIP: None gives each pod a stable DNS A record (ufme-shard-{n}.ufme-shard.ufme.svc.cluster.local).

Resource profile

Resource	Request	Limit
CPU	1 core	2 cores
Memory	4 Gi	6 Gi
Storage (PVC)	10 Gi	—

The default 6 Gi memory limit is appropriate for development gallery sizes. For 40M+ vector production galleries, increase to accommodate the compressed IVF-PQ index (~2.56 GB) plus any reranking vectors.

Node affinity

Shards benefit from dedicated high-memory nodes. Use node selectors and tolerations to isolate shard pods:

shard:
  nodeSelector:
    role: shard
  tolerations:
    - key: dedicated
      value: faiss-shard
      effect: NoSchedule

Label your shard nodes: kubectl label node <node> role=shard and apply the matching taint.

API gateway — Deployment

The API gateway runs as a standard Deployment with HorizontalPodAutoscaler.

HPA configuration

spec:
  minReplicas: 2
  maxReplicas: 8
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Scale-out triggers at 70% CPU utilisation across the deployment. The minimum of 2 replicas provides availability during rolling updates.

Resource profile

Resource	Request	Limit
CPU	500m	2 cores
Memory	1 Gi	4 Gi

Health checks

Both liveness and readiness probes call GET /health:

Probe	Path	Initial delay	Period
Liveness	`/health`	10 s	30 s
Readiness	`/health`	5 s	10 s

Prometheus metrics

The API pod is annotated for Prometheus scraping:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

Compaction — CronJob

The compaction worker rebuilds the FAISS index from the event log on a schedule:

spec:
  schedule: "0 */1 * * *"    # every hour
  concurrencyPolicy: Forbid  # never two compactions simultaneously
  backoffLimit: 2
  activeDeadlineSeconds: 3600

It reads from two PVCs shared with the API:

PVC	Mount	Size	Purpose
`ufme-event-log`	`/data/events`	20 Gi	Append-only enrolment/delete events
`ufme-index-output`	`/data/index`	50 Gi	New index snapshots written here

After building a new snapshot, the compaction binary calls SwapSnapshot RPC on each shard, which atomically replaces the live index.

Resource profile

Resource	Request	Limit
CPU	2 cores	4 cores
Memory	8 Gi	16 Gi

PodDisruptionBudgets

PodDisruptionBudgets ensure availability during node maintenance:

# API: at least 1 replica always available
minAvailable: 1

# Shard: at least 1 shard always available
minAvailable: 1

Applied via deploy/k8s/pdbs.yaml.

Helm chart

The Helm chart at deploy/helm/ is functionally equivalent to the raw manifests but parameterised.

Install

helm install ufme ./deploy/helm \
  --namespace ufme \
  --create-namespace \
  --values deploy/helm/values.yaml

Key values

api:
  replicas: 2
  hpa:
    enabled: true
    minReplicas: 2
    maxReplicas: 8
    targetCPUUtilization: 70

shard:
  replicas: 5
  storage:
    size: 10Gi
    storageClass: ""      # use default storage class
    s3Bucket: ""          # optional: S3-backed snapshot storage

compaction:
  schedule: "0 */1 * * *"
  intervalSecs: 3600

config:
  thresholds:
    pad: "0.85"
    mad: "0.75"
    quality: "0.40"
    similarity: "0.45"
  models:
    scrfd: "/models/scrfd_10g.onnx"
    adaface: "/models/w600k_r50.onnx"
    pad: "/models/MiniFASNetV2.onnx"
    mad: "/models/mad_selfmad_hrnet_w18.onnx"
    quality: "/models/ediffiqa_tiny.onnx"

Staging vs production

Override values per environment:

# staging (2 shards, smaller resources)
helm upgrade ufme ./deploy/helm \
  --set shard.replicas=2 \
  --set shard.storage.size=5Gi

# production (5 shards, dedicated nodes, S3 snapshots)
helm upgrade ufme ./deploy/helm \
  --set shard.replicas=5 \
  --set shard.storage.s3Bucket=my-ufme-snapshots \
  --set shard.storage.size=100Gi

Deploy checklist

Build Docker images: make build
Push images to your registry and update image.repository in values.yaml
Apply namespace: kubectl apply -f deploy/k8s/namespace.yaml
Apply PVCs: kubectl apply -f deploy/k8s/pvcs.yaml
Apply ConfigMap: kubectl apply -f deploy/k8s/configmap.yaml
Install Helm chart or apply raw manifests
Verify shard pods are Running: kubectl get pods -n ufme
Verify API is reachable: kubectl port-forward svc/ufme-api 8080:8080 -n ufme
Load demo gallery: curl -X POST http://localhost:8080/api/v1/demo/load