Configuration Tuning
UFME’s behaviour is controlled by thresholds and FAISS parameters in config.toml (or environment variables). This guide explains what each parameter does, when to change it, and how to measure the impact.
Similarity threshold
Section titled “Similarity threshold”Parameter: thresholds.similarity (default: 0.45)
This is the most operationally sensitive setting. It controls the minimum cosine similarity score for a match to be reported.
| Scenario | Range | Trade-off |
|---|---|---|
| High-security (border, law enforcement) | 0.50 — 0.60 | Fewer false matches, more genuine misses |
| Standard verification (access, KYC) | 0.40 — 0.50 | Balanced |
| Permissive search (person-of-interest) | 0.30 — 0.40 | More candidates, requires human review |
Why it matters at scale
Section titled “Why it matters at scale”In a 200M gallery, the prior probability of a random match is ~1 in 100M. Even a 0.01% false accept rate produces ~20,000 false alarms per search. Small threshold changes have large operational impact.
How to calibrate
Section titled “How to calibrate”- Collect representative genuine pairs (same person, different sessions) and impostor pairs (different people)
- Run both through the pipeline and record similarity scores
- Plot score distributions — the threshold should sit where overlap is minimal
- Use
make accuracy-benchmarkto measure TAR at your chosen FAR
Quality threshold
Section titled “Quality threshold”Parameter: thresholds.quality_min_score (default: 0.40)
| Value | Effect |
|---|---|
| 0.20 — 0.30 | Permissive — poor capture conditions (CCTV, mobile) |
| 0.40 — 0.50 | Standard — balanced throughput and reliability |
| 0.60 — 0.80 | Strict — best accuracy, higher rejection rate |
Lower if rejection rate is too high; raise if matching accuracy is degraded by poor-quality enrolments.
PAD and MAD thresholds
Section titled “PAD and MAD thresholds”PAD: thresholds.pad_spoof_score (default: 0.85) — images with spoof score above this are rejected.
MAD: thresholds.mad_morph_score (default: 0.75) — applied at enrolment only.
Do not lower PAD below 0.5 in production without a documented risk assessment. The MAD default is appropriate for document verification; lower only if you see high false rejection rates on legitimate passport photos.
FAISS parameters
Section titled “FAISS parameters”nprobe — the primary latency/recall knob
Section titled “nprobe — the primary latency/recall knob”Parameter: faiss.nprobe (default: 96)
| nprobe | Recall@1 (200M) | Latency (p50) |
|---|---|---|
| 32 | ~94% | ~1.0 ms |
| 64 | ~96% | ~1.6 ms |
| 96 | ~97% | ~2.1 ms |
| 128 | ~97.5% | ~2.8 ms |
| 256 | ~98% | ~5.5 ms |
These numbers are from the 200M benchmark on n2-highmem-16. Run make accuracy-benchmark to measure on your hardware.
nlist — number of IVF cells
Section titled “nlist — number of IVF cells”Parameter: faiss.nlist (default: 16384)
Set once when building the index. Rule of thumb: sqrt(gallery_size). For galleries under 1M, try 1,024 or 4,096.
pq_m — product quantisation sub-vectors
Section titled “pq_m — product quantisation sub-vectors”Parameter: faiss.pq_m (default: 64)
At pq_m=64 with 512-dim vectors, each template is compressed to 64 bytes (32x compression). Do not change without benchmarking.
Scaling guidance
Section titled “Scaling guidance”| Gallery size | Setup | RAM per shard |
|---|---|---|
| < 100K | Single process (mode = "local") | 512 MB |
| 100K — 10M | Single process, IVF index | 1 — 4 GB |
| 10M — 50M | 2 — 4 shards | 4 — 8 GB |
| 50M — 200M | 4 — 6 shards | 12 — 16 GB |
| 200M+ | 6+ shards | 16+ GB |
Monitoring
Section titled “Monitoring”Key Prometheus metrics (exposed at /metrics):
| Metric | Watch for |
|---|---|
ufme_request_duration_seconds | p99 > 500 ms |
ufme_pipeline_stage_duration_seconds | Any stage > 100 ms |
ufme_faiss_search_duration_seconds | p99 > 10 ms |
ufme_active_requests | Sustained > 80% capacity |
ufme_queue_depth | Growing over time (backpressure) |
ufme_gallery_vectors | Unexpected drops (data loss) |