Skip to content

Request Flow Walkthrough

This page traces a request through every layer of the system. If you are new to UFME, read the Glossary first for term definitions.

At a high level, UFME has three layers: the REST gateway (accepts HTTP requests), the pipeline (processes the face image through a series of stages), and the vector store (searches or mutates the gallery).

flowchart LR
Client -->|multipart/form-data| Gateway
Gateway -->|XML envelope| Pipeline
Pipeline -->|template vector| VectorStore
VectorStore -->|ranked candidates| Pipeline
Pipeline -->|XML response| Gateway
Gateway -->|HTTP response| Client

This is the most common operation. The client submits a photo and receives a ranked list of matching identities.

The gateway receives a POST /api/v1/search request with an image file and JSON metadata. It:

  1. Validates the request (content type, image presence, metadata format)
  2. Generates a unique trace_id for this request
  3. Translates the request into an XML envelope (the pipeline’s internal format)
  4. Pushes the envelope onto the receive_q (an asyncio queue)
  5. Creates an asyncio.Future keyed by trace_id and waits for the pipeline to resolve it

The pipeline runner pulls the envelope from receive_q and passes it through stages. Each stage is a pure async function that takes a dict and returns a modified dict.

flowchart TD
Receive["Receive\n(parse XML envelope)"]
Detect["Detect\n(SCRFD: find faces + landmarks)"]
Align["Align\n(affine warp to 112x112)"]
PAD["PAD Gate\n(MiniFASNetV2: spoof check)"]
Quality["Quality Gate\n(eDifFIQA: ISO quality score)"]
Extract["Extract\n(ArcFace: 512-dim embedding)"]
Search["Search\n(FAISS: top-K nearest neighbours)"]
Respond["Respond\n(build XML response)"]
Receive --> Detect
Detect --> Align
Align --> PAD
PAD -->|pass| Quality
PAD -->|fail: spoof detected| Respond
Quality -->|pass| Extract
Quality -->|fail: below threshold| Respond
Extract --> Search
Search --> Respond
style PAD fill:#fff3cd
style Quality fill:#fff3cd

Gates (yellow) can short-circuit the pipeline. If PAD detects a spoof, processing stops immediately and an error response is returned — the image never reaches extraction or search.

Optional stages (not shown above) may be inserted depending on which model files are present: super-resolution (before detect), head pose estimation (after align), deepfake detection (after align), age estimation (after align), and morphing detection (after quality, enrol only).

The search stage sends the 512-dim template to the vector store. In sharded mode, this is a scatter-gather operation:

flowchart TD
API["API process"]
S0["Shard 0"]
S1["Shard 1"]
S2["Shard 2"]
S3["Shard 3"]
S4["Shard 4"]
Merge["Merge + rerank"]
API -->|"query vector (gRPC)"| S0
API -->|"query vector (gRPC)"| S1
API -->|"query vector (gRPC)"| S2
API -->|"query vector (gRPC)"| S3
API -->|"query vector (gRPC)"| S4
S0 -->|"top-50 PQ candidates"| Merge
S1 -->|"top-50 PQ candidates"| Merge
S2 -->|"top-50 PQ candidates"| Merge
S3 -->|"top-50 PQ candidates"| Merge
S4 -->|"top-50 PQ candidates"| Merge
Merge -->|"top-K exact reranked"| API

Each shard scans its local IVF-PQ index (probing nprobe cells) and returns up to local_k candidates. The API merges all candidates, optionally reranks with full-precision vectors, and returns the top K.

If a shard does not respond within deadline_seconds, the partial_result_policy determines behaviour: annotate (return results with a warning), reject (fail the request), or degrade (return partial results silently).

The respond stage builds the XML response, sets the trace_id on the gateway’s asyncio.Future, and the gateway returns the HTTP response to the client.


Enrolment runs a stricter pipeline than search because the template will persist in the gallery.

flowchart TD
Receive --> Detect
Detect --> Align
Align --> PAD
PAD -->|pass| MAD["MAD Gate\n(morphing detection)"]
PAD -->|fail| Respond
MAD -->|pass| Quality
MAD -->|fail: morphed image| Respond
Quality -->|pass| Extract
Quality -->|fail| Respond
Extract --> Enrol["Enrol\n(store template in gallery)"]
Enrol --> EventLog["Event Log\n(append EnrolEvent)"]
EventLog --> Respond
style PAD fill:#fff3cd
style MAD fill:#fff3cd
style Quality fill:#fff3cd

Key differences from search:

  • MAD gate is active (morphing detection) — prevents blended identity photos from entering the gallery
  • The template is stored in the vector index, not used for a query
  • An EnrolEvent is appended to the event log (immutable audit trail)

Verification compares a probe against a single enrolled subject, rather than searching the entire gallery.

flowchart TD
Receive --> Detect
Detect --> Align
Align --> PAD
PAD -->|pass| Quality
PAD -->|fail| Respond
Quality -->|pass| Extract
Quality -->|fail| Respond
Extract --> Lookup["Lookup\n(fetch subject's template)"]
Lookup --> Compare["Compare\n(cosine similarity)"]
Compare --> Respond
style PAD fill:#fff3cd
style Quality fill:#fff3cd

The pipeline extracts a template from the probe, fetches the enrolled template for the given subject_id, computes cosine similarity, and returns match/no-match with the score.


Deletion does not require an image.

flowchart TD
Receive --> Delete["Delete\n(remove from vector index)"]
Delete --> EventLog["Event Log\n(append DeleteEvent)"]
EventLog --> Respond

The template is removed from the FAISS index and a DeleteEvent is appended to the event log. The event log is append-only: deletions are recorded but old enrolment events are never removed, providing a full audit trail.


The respond stage includes per-stage timing in the XML response (as a <timing> element). This is useful for identifying bottlenecks:

<timing>
<detect>12.3</detect>
<align>1.2</align>
<pad>8.7</pad>
<quality>5.1</quality>
<extract>15.4</extract>
<search>2.1</search>
</timing>

Values are in milliseconds. In a typical search on CPU, detection (12—20 ms) and extraction (15—25 ms) are the most expensive stages. FAISS search is usually under 3 ms even at 200M scale.