Request Flow Walkthrough

This page traces a request through every layer of the system. If you are new to UFME, read the Glossary first for term definitions.

System overview

At a high level, UFME has three layers: the REST gateway (accepts HTTP requests), the pipeline (processes the face image through a series of stages), and the vector store (searches or mutates the gallery).

flowchart LR
    Client -->|multipart/form-data| Gateway
    Gateway -->|XML envelope| Pipeline
    Pipeline -->|template vector| VectorStore
    VectorStore -->|ranked candidates| Pipeline
    Pipeline -->|XML response| Gateway
    Gateway -->|HTTP response| Client

1:N search — step by step

This is the most common operation. The client submits a photo and receives a ranked list of matching identities.

Step 1: REST gateway

The gateway receives a POST /api/v1/search request with an image file and JSON metadata. It:

Validates the request (content type, image presence, metadata format)
Generates a unique trace_id for this request
Translates the request into an XML envelope (the pipeline’s internal format)
Pushes the envelope onto the receive_q (an asyncio queue)
Creates an asyncio.Future keyed by trace_id and waits for the pipeline to resolve it

Step 2: Pipeline stages

The pipeline runner pulls the envelope from receive_q and passes it through stages. Each stage is a pure async function that takes a dict and returns a modified dict.

flowchart TD
    Receive["Receive\n(parse XML envelope)"]
    Detect["Detect\n(SCRFD: find faces + landmarks)"]
    Align["Align\n(affine warp to 112x112)"]
    PAD["PAD Gate\n(MiniFASNetV2: spoof check)"]
    Quality["Quality Gate\n(eDifFIQA: ISO quality score)"]
    Extract["Extract\n(ArcFace: 512-dim embedding)"]
    Search["Search\n(FAISS: top-K nearest neighbours)"]
    Respond["Respond\n(build XML response)"]

    Receive --> Detect
    Detect --> Align
    Align --> PAD
    PAD -->|pass| Quality
    PAD -->|fail: spoof detected| Respond
    Quality -->|pass| Extract
    Quality -->|fail: below threshold| Respond
    Extract --> Search
    Search --> Respond

    style PAD fill:#fff3cd
    style Quality fill:#fff3cd

Gates (yellow) can short-circuit the pipeline. If PAD detects a spoof, processing stops immediately and an error response is returned — the image never reaches extraction or search.

Optional stages (not shown above) may be inserted depending on which model files are present: super-resolution (before detect), head pose estimation (after align), deepfake detection (after align), age estimation (after align), and morphing detection (after quality, enrol only).

Step 3: Vector search

The search stage sends the 512-dim template to the vector store. In sharded mode, this is a scatter-gather operation:

flowchart TD
    API["API process"]
    S0["Shard 0"]
    S1["Shard 1"]
    S2["Shard 2"]
    S3["Shard 3"]
    S4["Shard 4"]
    Merge["Merge + rerank"]

    API -->|"query vector (gRPC)"| S0
    API -->|"query vector (gRPC)"| S1
    API -->|"query vector (gRPC)"| S2
    API -->|"query vector (gRPC)"| S3
    API -->|"query vector (gRPC)"| S4
    S0 -->|"top-50 PQ candidates"| Merge
    S1 -->|"top-50 PQ candidates"| Merge
    S2 -->|"top-50 PQ candidates"| Merge
    S3 -->|"top-50 PQ candidates"| Merge
    S4 -->|"top-50 PQ candidates"| Merge
    Merge -->|"top-K exact reranked"| API

Each shard scans its local IVF-PQ index (probing nprobe cells) and returns up to local_k candidates. The API merges all candidates, optionally reranks with full-precision vectors, and returns the top K.

If a shard does not respond within deadline_seconds, the partial_result_policy determines behaviour: annotate (return results with a warning), reject (fail the request), or degrade (return partial results silently).

Step 4: Response

The respond stage builds the XML response, sets the trace_id on the gateway’s asyncio.Future, and the gateway returns the HTTP response to the client.

Enrol flow

Enrolment runs a stricter pipeline than search because the template will persist in the gallery.

flowchart TD
    Receive --> Detect
    Detect --> Align
    Align --> PAD
    PAD -->|pass| MAD["MAD Gate\n(morphing detection)"]
    PAD -->|fail| Respond
    MAD -->|pass| Quality
    MAD -->|fail: morphed image| Respond
    Quality -->|pass| Extract
    Quality -->|fail| Respond
    Extract --> Enrol["Enrol\n(store template in gallery)"]
    Enrol --> EventLog["Event Log\n(append EnrolEvent)"]
    EventLog --> Respond

    style PAD fill:#fff3cd
    style MAD fill:#fff3cd
    style Quality fill:#fff3cd

Key differences from search:

MAD gate is active (morphing detection) — prevents blended identity photos from entering the gallery
The template is stored in the vector index, not used for a query
An EnrolEvent is appended to the event log (immutable audit trail)

Verify flow (1:1)

Verification compares a probe against a single enrolled subject, rather than searching the entire gallery.

flowchart TD
    Receive --> Detect
    Detect --> Align
    Align --> PAD
    PAD -->|pass| Quality
    PAD -->|fail| Respond
    Quality -->|pass| Extract
    Quality -->|fail| Respond
    Extract --> Lookup["Lookup\n(fetch subject's template)"]
    Lookup --> Compare["Compare\n(cosine similarity)"]
    Compare --> Respond

    style PAD fill:#fff3cd
    style Quality fill:#fff3cd

The pipeline extracts a template from the probe, fetches the enrolled template for the given subject_id, computes cosine similarity, and returns match/no-match with the score.

Delete flow

Deletion does not require an image.

flowchart TD
    Receive --> Delete["Delete\n(remove from vector index)"]
    Delete --> EventLog["Event Log\n(append DeleteEvent)"]
    EventLog --> Respond

The template is removed from the FAISS index and a DeleteEvent is appended to the event log. The event log is append-only: deletions are recorded but old enrolment events are never removed, providing a full audit trail.

Timing breakdown

The respond stage includes per-stage timing in the XML response (as a <timing> element). This is useful for identifying bottlenecks:

<timing>
  <detect>12.3</detect>
  <align>1.2</align>
  <pad>8.7</pad>
  <quality>5.1</quality>
  <extract>15.4</extract>
  <search>2.1</search>
</timing>

Values are in milliseconds. In a typical search on CPU, detection (12—20 ms) and extraction (15—25 ms) are the most expensive stages. FAISS search is usually under 3 ms even at 200M scale.