Configuration Tuning

UFME’s behaviour is controlled by thresholds and FAISS parameters in config.toml (or environment variables). This guide explains what each parameter does, when to change it, and how to measure the impact.

Similarity threshold

Parameter: thresholds.similarity (default: 0.45)

This is the most operationally sensitive setting. It controls the minimum cosine similarity score for a match to be reported.

Scenario	Range	Trade-off
High-security (border, law enforcement)	0.50 — 0.60	Fewer false matches, more genuine misses
Standard verification (access, KYC)	0.40 — 0.50	Balanced
Permissive search (person-of-interest)	0.30 — 0.40	More candidates, requires human review

Why it matters at scale

In a 200M gallery, the prior probability of a random match is ~1 in 100M. Even a 0.01% false accept rate produces ~20,000 false alarms per search. Small threshold changes have large operational impact.

How to calibrate

Collect representative genuine pairs (same person, different sessions) and impostor pairs (different people)
Run both through the pipeline and record similarity scores
Plot score distributions — the threshold should sit where overlap is minimal
Use make accuracy-benchmark to measure TAR at your chosen FAR

Quality threshold

Parameter: thresholds.quality_min_score (default: 0.40)

Value	Effect
0.20 — 0.30	Permissive — poor capture conditions (CCTV, mobile)
0.40 — 0.50	Standard — balanced throughput and reliability
0.60 — 0.80	Strict — best accuracy, higher rejection rate

Lower if rejection rate is too high; raise if matching accuracy is degraded by poor-quality enrolments.

PAD and MAD thresholds

PAD: thresholds.pad_spoof_score (default: 0.85) — images with spoof score above this are rejected.

MAD: thresholds.mad_morph_score (default: 0.75) — applied at enrolment only.

Do not lower PAD below 0.5 in production without a documented risk assessment. The MAD default is appropriate for document verification; lower only if you see high false rejection rates on legitimate passport photos.

FAISS parameters

nprobe — the primary latency/recall knob

Parameter: faiss.nprobe (default: 96)

nprobe	Recall@1 (200M)	Latency (p50)
32	~94%	~1.0 ms
64	~96%	~1.6 ms
96	~97%	~2.1 ms
128	~97.5%	~2.8 ms
256	~98%	~5.5 ms

These numbers are from the 200M benchmark on n2-highmem-16. Run make accuracy-benchmark to measure on your hardware.

nlist — number of IVF cells

Parameter: faiss.nlist (default: 16384)

Set once when building the index. Rule of thumb: sqrt(gallery_size). For galleries under 1M, try 1,024 or 4,096.

pq_m — product quantisation sub-vectors

Parameter: faiss.pq_m (default: 64)

At pq_m=64 with 512-dim vectors, each template is compressed to 64 bytes (32x compression). Do not change without benchmarking.

Scaling guidance

Gallery size	Setup	RAM per shard
< 100K	Single process (`mode = "local"`)	512 MB
100K — 10M	Single process, IVF index	1 — 4 GB
10M — 50M	2 — 4 shards	4 — 8 GB
50M — 200M	4 — 6 shards	12 — 16 GB
200M+	6+ shards	16+ GB

Monitoring

Key Prometheus metrics (exposed at /metrics):

Metric	Watch for
`ufme_request_duration_seconds`	p99 > 500 ms
`ufme_pipeline_stage_duration_seconds`	Any stage > 100 ms
`ufme_faiss_search_duration_seconds`	p99 > 10 ms
`ufme_active_requests`	Sustained > 80% capacity
`ufme_queue_depth`	Growing over time (backpressure)
`ufme_gallery_vectors`	Unexpected drops (data loss)