Skip to content

GCP / Terraform

The Terraform configuration at deploy/terraform/ provisions the GCP infrastructure for the UFME 200M face search benchmark. It creates 5 FAISS shard VMs and 1 orchestrator VM in a security-hardened network configuration: no external IP addresses, outbound internet via Cloud NAT, SSH via Identity-Aware Proxy (IAP) tunnel.


GCP project (europe-west2-a)
├── Networking
│ ├── Cloud Router (ufme-router)
│ ├── Cloud NAT (ufme-nat) ← outbound internet for VMs
│ └── Firewall rules
│ ├── ufme-allow-grpc ← port 50051, shard tag only
│ └── ufme-allow-iap-ssh ← port 22, IAP source range 35.235.240.0/20
├── GCS bucket (ufme-benchmark-data) ← benchmark data transfer
├── Service account (ufme-shard) ← storage-ro access to GCS
├── Shard VMs × 5 (ufme-shard-0 … ufme-shard-4)
│ ├── Machine: n2-highmem-16 (16 vCPUs, 128 GB RAM)
│ ├── Disk: 150 GB pd-balanced
│ ├── OS: Ubuntu 22.04 LTS
│ └── Tags: ufme-shard
└── Orchestrator VM × 1 (ufme-orchestrator)
├── Machine: e2-standard-4 (4 vCPUs, 16 GB RAM)
├── Disk: 50 GB pd-balanced
├── OS: Ubuntu 22.04 LTS
└── Tags: ufme-orchestrator

Cloud Router + Cloud NAT — VMs have no external IP addresses. Outbound traffic (package downloads, GCS transfers) routes through Cloud NAT via the Cloud Router. This eliminates public attack surface on all VMs.

Firewall: ufme-allow-grpc — permits TCP 50051 from VMs tagged ufme-orchestrator to VMs tagged ufme-shard. The benchmark runner fans gRPC requests out to all shards over this path.

Firewall: ufme-allow-iap-ssh — permits TCP 22 from the IAP source range 35.235.240.0/20 to both shard and orchestrator VMs. SSH sessions are tunnelled through Google’s IAP service with IAM-based access control; no VPN or bastion host is required.

A GCS bucket (ufme-benchmark-data) is provisioned for benchmark data transfer — primarily the tiled MS1MV3 vectors uploaded from a local machine and downloaded by the shard VMs. The bucket has a 30-day lifecycle deletion rule.

PropertyValue
Count5 (configurable via shard_count)
Machine typen2-highmem-16
vCPUs16
RAM128 GB
Disk150 GB pd-balanced
OSUbuntu 22.04 LTS

128 GB RAM accommodates both the IVF-PQ compressed index (~2.56 GB for 40M vectors at 64 B/vector) and the full-precision stored vectors for reranking (~82 GB at 40M × 2,048 B). The 150 GB disk holds the FAISS binary, Python dependencies, and snapshot files.

PropertyValue
Machine typee2-standard-4
vCPUs4
RAM16 GB
Disk50 GB pd-balanced

The orchestrator runs the benchmark runner (scripts/run_benchmark.py), which fans gRPC search requests across all 5 shards, aggregates results, and posts metrics to the benchmark UI.

ComponentCountHourly unitHourly total
n2-highmem-16 (on-demand)5~$1.18~$5.90
e2-standard-4 (on-demand)1~$0.13~$0.13
Total~$6.03/hr

Stop VMs when not in use — terraform destroy removes all compute resources. GCS storage (~$0.02/GB/month) is the only ongoing cost between benchmark runs.


variable "project_id" # GCP project ID (required, no default)
variable "region" # default: "europe-west2"
variable "zone" # default: "europe-west2-a"
variable "shard_count" # default: 5
variable "shard_machine_type" # default: "n2-highmem-16"
variable "orchestrator_machine_type" # default: "e2-standard-4"
variable "shard_disk_size_gb" # default: 150
variable "orchestrator_disk_size_gb" # default: 50
variable "gcs_bucket_name" # default: "ufme-benchmark-data"
variable "repo_branch" # default: "main"
variable "total_vectors" # default: 200000000

Override variables via terraform.tfvars:

deploy/terraform/terraform.tfvars
project_id = "your-gcp-project-id"
shard_count = 3 # reduce for cost during development
region = "europe-west2"

After terraform apply, the following outputs are available:

Terminal window
terraform output shard_internal_ips # [10.x.x.1, 10.x.x.2, ...]
terraform output shard_names # [ufme-shard-0, ufme-shard-1, ...]
terraform output orchestrator_internal_ip # 10.x.x.10
terraform output shard_grpc_addresses # "10.x.x.1:50051 10.x.x.2:50051 ..."
terraform output gcs_bucket # gs://ufme-benchmark-data
terraform output estimated_hourly_cost # ~$6.03/hr (5x n2-highmem-16 ...)

  • GCP project with billing enabled
  • gcloud CLI authenticated: gcloud auth application-default login
  • terraform >= 1.5
  • GCP IAP API enabled: gcloud services enable iap.googleapis.com
Terminal window
cd deploy/terraform
terraform init
terraform plan -var="project_id=YOUR_PROJECT_ID"
terraform apply -var="project_id=YOUR_PROJECT_ID"
Terminal window
# Shard VM
gcloud compute ssh ufme-shard-0 \
--project=YOUR_PROJECT_ID \
--zone=europe-west2-a \
--tunnel-through-iap
# Orchestrator
gcloud compute ssh ufme-orchestrator \
--project=YOUR_PROJECT_ID \
--zone=europe-west2-a \
--tunnel-through-iap
Terminal window
gcloud compute scp scripts/setup_shard_vm.sh ufme-shard-0:/tmp/ \
--project=YOUR_PROJECT_ID \
--zone=europe-west2-a \
--tunnel-through-iap
Terminal window
# Stop all VMs
gcloud compute instances stop ufme-shard-0 ufme-shard-1 ufme-shard-2 \
ufme-shard-3 ufme-shard-4 ufme-orchestrator \
--zone=europe-west2-a --project=YOUR_PROJECT_ID
# Start all VMs
gcloud compute instances start ufme-shard-0 ufme-shard-1 ufme-shard-2 \
ufme-shard-3 ufme-shard-4 ufme-orchestrator \
--zone=europe-west2-a --project=YOUR_PROJECT_ID
Terminal window
cd deploy/terraform
terraform destroy -var="project_id=YOUR_PROJECT_ID"

This removes all compute instances, firewall rules, NAT router, and the GCS bucket (including benchmark data). Nothing persists after destroy.


After VMs are running, each shard must be initialised. The idempotent scripts/setup_shard_vm.sh script handles FAISS build, Rust build, IVF-PQ index training, and vector insertion. It is safe to re-run — guards check for existing installations before rebuilding.

Terminal window
# Copy setup script to shard
gcloud compute scp scripts/setup_shard_vm.sh ufme-shard-0:/tmp/ \
--tunnel-through-iap --zone=europe-west2-a --project=YOUR_PROJECT_ID
# Launch in background (stdin redirect required to survive IAP session closure)
gcloud compute ssh ufme-shard-0 --tunnel-through-iap \
--zone=europe-west2-a --project=YOUR_PROJECT_ID \
--command='SHARD_ID=0 nohup bash /tmp/setup_shard_vm.sh > /tmp/setup.log 2>&1 </dev/null &'
# Monitor progress
gcloud compute ssh ufme-shard-0 --tunnel-through-iap \
--zone=europe-west2-a --project=YOUR_PROJECT_ID \
--command='tail -50 /tmp/setup.log'

The script runs through 7 steps:

  1. Install system packages (build-essential, cmake, etc.)
  2. Build FAISS with CMake 3.31 (cmake.org binary, not Ubuntu’s outdated 3.22)
  3. Build Rust FAISS bindings
  4. Train IVF-PQ index (~105 min per shard — logs appear frozen during k-means)
  5. Insert 40M vectors
  6. Verify index integrity
  7. Start gRPC shard server on port 50051