Simplicity Audit

Philosophy: Rich Hickey’s “Simple Made Easy” (Strange Loop 2011)

“Simplicity is a prerequisite for reliability.” — Dijkstra (via Hickey)

Simple = one role, one task, no interleaving of concerns. Complecting = braiding independent concerns together so they cannot be reasoned about or changed independently.

Executive Summary

UFME’s VISION.md describes a sophisticated biometric system with several genuinely well-separated concerns (hexagonal architecture, stateless processing) and several areas where distinct concerns are complected together. The most significant complections are:

Quality policy braided into the detection stage (algorithmic mechanism + business policy)
Matching threshold braided into the aggregation stage (search mechanism + confidence policy)
Metadata filtering braided into the FAISS index (storage + query policy)
PAD (spoof detection) braided into the ingestion pipeline (security policy + data flow)
Infrastructure concerns woven through domain descriptions

Section 1: System Architecture (Hexagonal Pattern)

What is Simple (Properly Separated)

Ports and Adapters is fundamentally a simplifying pattern. The core domain speaks its own language; adapters translate. These are genuinely separate things.
Inbound vs Outbound adapters cleanly separate the “what comes in” from “what goes out.” No complection here.
The stated goal — “if the deployment replaces the upstream provider or the underlying network, the biometric core remains completely untouched” — is a correct expression of simplicity: the core is one thing, the transport is another.

What is Complected

Transaction state inside the Core Domain.

“Core Domain: Manages gallery partitioning, binning, filtering, and transaction state.”

Transaction state is the management of when and how a request evolves over time. Gallery partitioning and filtering are about what data to search. These are independent concerns braided into a single “Core Domain” description.

State + Routing Policy: transaction state (lifecycle management) is entangled with gallery partitioning (data segmentation policy). A transaction completing does not require knowledge of which gallery partition was searched — those are separate facts.
Mechanism + Policy: The core domain “manages” filtering. Management (mechanism) and the filter rules (policy) are different concerns. The rules about which gallery to search for an IABS vs IDENT1 request are policy — they belong in configuration or a policy layer, not inside the mechanism that executes the search.

How to Untangle

Separate three distinct things:

Request lifecycle (pure state machine: received → validated → dispatched → aggregated → responded)
Routing/partitioning rules (pure data: a map of request-type → gallery-ids)
Search execution (pure function: (query-vector, gallery-ids, k) → ranked-results)

The core domain should only know about (3). Rules in (2) are passed in as values. Lifecycle in (1) is managed by the inbound adapter or an orchestration layer.

Section 2: Algorithmic Pipeline

What is Simple (Properly Separated)

Detection, Alignment, and Extraction are described as sequential stages — a pipeline of transforms. Each stage takes an image or intermediate artefact and produces the next. This is a clean composition of pure functions.
The ViT architecture choice (local patches → global attention) is an internal detail of one stage. It does not leak into adjacent stages. Simple.
ArcFace as a training loss is a concern of the training process, not the inference process. The VISION.md correctly notes it as a training detail, keeping it out of the runtime pipeline description.

What is Complected

1. Quality Assessment Policy braided into the Detection Stage.

“Quality Assessment: An auxiliary lightweight network calculates a quality score based on blur, illumination, and yaw/pitch/roll angles, rejecting sub-standard images before they consume heavy extraction compute.”

Two independent concerns are fused here:

Measurement: Computing the quality score (a number). This is a pure function: image → quality-score. It belongs in the pipeline as a transform.
Policy (Rejection): Deciding what “sub-standard” means and acting on it (drop the image, return an error, flag for human review). This is a business rule that will change. Different use cases (immigration search vs law enforcement) may have different quality thresholds.

By having the quality assessment stage reject images, the mechanism (measuring quality) is fused with the policy (acting on quality). You cannot change the rejection threshold without touching the measurement component, and you cannot reuse the measurement in a monitoring context without also triggering rejections.

Simple alternative: The quality stage produces a value ({:quality-score 0.73, :blur 0.1, :yaw 12.5}). A separate, configurable policy step decides what to do with that value. The pipeline becomes:

image → detect → align → measure-quality → [policy gate] → extract → vector

The [policy gate] is a pure function of (quality-value, policy-config) → pass | reject-with-reason. Policy lives in config, not in the measurement function.

2. PAD (Presentation Attack Detection) braided into Data Ingestion.

“An auxiliary AI model sits ahead of the feature extraction pipeline.”

PAD is a security policy. Feature extraction is a biometric mechanism. The VISION.md describes PAD as positioned “ahead of” the pipeline — which is better than being inside it — but the description conflates the two concerns spatially (“sits ahead of the feature extraction pipeline”) rather than treating PAD as an independent, composable check.

The complection: PAD’s position is hardcoded to be before extraction. But PAD could also be run asynchronously for audit purposes, or could be a separate service the pipeline calls out to. Describing it as “sitting ahead of” a specific stage couples its position to the pipeline topology.

Simple alternative: PAD is a function image → {:pass true} | {:reject true, :reason :deepfake}. It is called by the orchestration layer as one step in a sequence. Its position relative to other steps is determined by the orchestrator (configuration/data), not by the PAD component itself.

Section 3: Vector Storage & Matching Engine

What is Simple (Properly Separated)

The memory mathematics (512 dims × 4 bytes × 200M = 400 GB) is a straightforward derivation from fixed facts. No complection.
The scatter-gather topology (fan-out to shards, aggregate results) is a clean description of mechanism. Fan-out and aggregation are treated as separate steps.
Inner Product vs Euclidean distance: the VISION.md correctly notes this is a mathematical equivalence enabled by L2-normalisation. One concern (the ViT’s output normalisation) enables a simplification in another concern (the distance metric). This is composition, not complection.

What is Complected

1. Confidence Threshold braided into the Aggregation Stage.

“The central node merges the 20 local lists, sorts them by Cosine Similarity, applies a dynamic confidence threshold, and returns the definitive global match.”

The aggregation step does three things:

Merge and rank: Pure function of lists → sorted list. Mechanism.
Apply threshold: Decide what score constitutes a “match.” Policy.
Return “definitive global match”: Implies a decision has been made. Business output.

The “Authority’s dynamic confidence threshold” is a policy value that will change (different operational modes, different legal standards, different risk appetites). By applying it inside the aggregation function, the aggregation mechanism is coupled to business policy. You cannot get the raw ranked list out of the aggregator without also accepting the policy interpretation.

Simple alternative: Aggregation returns a pure ranked list of (id, score) pairs. A downstream decision function takes (ranked-list, threshold-config) → match | no-match | candidates. The threshold is a value passed in, not embedded in the aggregator.

2. Metadata Filtering braided into the FAISS Index.

“Metadata (e.g., gender, age bracket, or recording event levels) is stored alongside the FAISS index IDs. The system uses pre-filtering bitsets to mask out non-relevant vectors before the distance calculation occurs.”

Two concerns fused:

Storage: Where metadata lives (alongside FAISS IDs).
Query policy: Which metadata attributes to filter on, and when.

Pre-filtering bitsets are a mechanism for efficient exclusion. The policy (which filters to apply for a given request type) is embedded in the construction of those bitsets at query time. If the operator adds a new metadata attribute (e.g., document type, nationality flag), the filtering logic — currently inside the FAISS interaction layer — must change.

Simple alternative: Metadata is a separate, queryable store. Before a search, a selector function takes (request-context, metadata-store) → bitset. This bitset is then passed as a pure value to the FAISS search function. The FAISS layer is only responsible for “search within this mask”; the policy for constructing the mask lives elsewhere.

3. Gallery Partitioning conflated with Physical Sharding.

“The FAISS cluster is logically segmented. An immigration search (IABS) does not unnecessarily scan the law enforcement (IDENT1) partitions unless explicitly instructed.”

Logical partitioning (which gallery does this request belong to?) is a business/policy concern. Physical sharding (which nodes hold which vectors?) is an infrastructure/performance concern. The VISION.md blurs these: “logically segmented” in the context of “physically distributed across 20 nodes.”

Complection risk: If the logical partition map and the physical shard map are the same thing, then changing the business rule (“IABS searches also need to check IDENT1”) requires a physical re-sharding of the cluster, or vice versa: physical rebalancing for performance reasons changes which logical partitions are searched. These should be independent decisions.

Simple alternative: Maintain two separate maps:

logical-partition → [vector-ids] (business/policy layer)
shard-id → [vector-ids] (infrastructure layer)

A query is resolved by first resolving logical partitions to vector IDs, then routing vector IDs to shards. Changes to business rules do not require infrastructure changes, and vice versa.

Section 4: Enterprise Integration Layer

What is Simple (Properly Separated)

Inbound XML Translation: XML in, gRPC out. This is a pure translation adapter — one job. Simple.
The description of PAD as an “auxiliary AI model” correctly identifies it as a separable module.

What is Complected

The PAD module’s rejection behaviour is described as absolute.

“Rejecting malicious payloads instantly.”

The PAD component makes a binary security decision and acts on it (rejection). This fuses:

Detection: image → spoof-score (measurement, mechanism)
Classification: spoof-score → is-spoof? (policy: what threshold?)
Action: is-spoof? → reject (policy: what to do on detection?)

All three are independent. A PAD model that is 90% confident of a spoof may warrant: (a) immediate rejection in a high-security context, (b) flagging for human review in an audit context, or (c) logging only in a monitoring context. Hardcoding “reject” inside the PAD component eliminates these options without restructuring the component.

Section 5: Infrastructure & Deployment

What is Simple (Properly Separated)

Stateless Processing is a genuine simplification: no mutable state persists between requests for raw imagery. The system treats images as values (ephemeral, immutable, passed through).
Docker + Kubernetes is a standard packaging/orchestration separation. Components remain unaware of their container context.
The ONNX Runtime CPU/GPU flexibility: the inference component is described as configurable at deployment time. The algorithm does not care about the hardware; the hardware is a deployment-time value.

What is Complected

Operational security requirements entangled with technical role descriptions.

Governance and security policy concerns were embedded in the technical architecture document, complecting “what the system does” with “who is authorised to operate it.” These are separate facts.

More subtly, support tier definitions (operational concern) were tied to security policy (access control concern). The access requirements depend on what data is accessed, not on what support activity is performed. A bug fix in a test environment has different access requirements than the same fix in production with live biometric data. The complection is support-activity + environment + data-sensitivity → access-requirement, but the original design presented it as support-activity → access-requirement.

Summary Table

Area	Complected Concerns	Simple Separation
Core Domain	State lifecycle + routing policy + search mechanism	Three separate components: state machine, policy map, search function
Quality Assessment	Measurement + rejection policy	Measure as a value; policy gate as a separate configurable step
PAD	Detection + classification threshold + rejection action	Three pure functions composed by orchestrator
Aggregation	Merge/rank + confidence threshold	Aggregator returns ranked list; decision function applies threshold
FAISS Filtering	Metadata storage + filter policy	Metadata store separate; selector function produces bitset as a value
Gallery Partitioning	Logical segmentation + physical sharding	Two independent maps; resolved separately
Operational Security	Support activity + environment context + data sensitivity	Access requirements as a function of (activity, environment, data-sensitivity)

Guiding Principle for Implementation

From Hickey: “If you want everything to be familiar, you have to make everything the same. That’s not simplicity.”

The path forward for UFME:

Make policy a value, not a behaviour. Thresholds, rejection rules, partition routing — all should be data passed into pure functions, not logic embedded in components.
Pipelines are composition, not objects. Each stage produces a value consumed by the next. No stage should cause side effects (like rejection) — it should produce a richer value that an orchestration layer acts on.
Separate the what from the when. “What gallery to search” (policy) and “when to execute the search” (mechanism) are different concerns. Keep them apart.
The aggregator should not decide. Aggregation produces ranked facts. Decision-making (match/no-match) is a separate, policy-driven function applied downstream.

Resolution Status

This section tracks which original complections have been resolved in the current design.

Complection	Status	Resolution	Design Doc
Transaction state inside Core Domain (state lifecycle + routing policy + search mechanism)	Resolved	Request lifecycle managed by orchestration layer; routing rules are policy data passed in; search is a pure function of (query, partitions, k). The route stage is split into a thin router + per-operation executors.	Pipeline Design
Quality measurement braided with rejection policy	Resolved	Quality measurement (`quality.py`) produces a value dict; quality gate (`quality_gate.py`) is a separate configurable policy step. OFIQ satisfies `QualityPort` as pure measurement.	Pipeline Design
PAD detection + classification + rejection fused	Resolved	PAD measurement (`pad.py`) produces a `PadScore` value; PAD gate (`pad_gate.py`) applies configurable APCER/BPCER thresholds. MAD (`mad.py`) is a separate stage for morphing. All three are independent composable stages.	Pipeline Design
Confidence threshold braided into aggregation	Resolved	Aggregator returns a pure ranked list of `(id, score)` pairs. Threshold application is a separate `threshold_check` pure function in domain ops, called by the orchestration layer.	Domain Design, FAISS Design
Metadata filtering braided into FAISS index	Resolved	Metadata is a separate queryable concern. A selector function computes `(request_context, metadata) -> bitset` before search. The bitset is passed as a value to the FAISS search function. Filter policy lives outside the FAISS adapter.	FAISS Design
Gallery partitioning conflated with physical sharding	Resolved	Logical partition map (`partition_id -> [vector_ids]`) and physical shard map (`shard_id -> [vector_ids]`) are separate. Partition is a data label resolved before routing to shards.	FAISS Design
Infrastructure concerns in domain descriptions	Resolved	Domain layer (`src/core/`) has zero infrastructure dependencies. Operational security, deployment topology, and infrastructure configuration are documented separately and do not appear in domain or pipeline code.	Domain Design

Second-Pass Audit (2026-02-20)

A three-agent Rich Hickey review examined the implemented design documents (domain-design.md, pipeline-design.md, faiss-design.md) against the original audit’s principles. The first-pass findings above addressed the VISION.md’s high-level complections. This second pass inspects the concrete designs that resolved them — and finds nine new complections introduced or exposed during the detailed design phase.

The pattern is familiar: solutions to complections introduce their own complections at the next level of detail. As Hickey would say: “Every new thing you think you need is something else that can go wrong.”

Finding S2-1: Pipeline dicts accumulate all upstream fields (snowball pattern)

Complected: Each stage’s input + cumulative history of all prior stages.

A stage at position N in the pipeline receives a dict containing every key produced by stages 0 through N-1. The stage cannot distinguish “my inputs” from “someone else’s outputs that happen to be in this dict.” Adding a new key in an early stage silently changes the shape of every downstream stage’s input. Stages are nominally independent but structurally coupled through a shared, growing data surface.

Resolution: Context/payload split. The pipeline runner manages the cumulative context dict. Each stage declares requires (the keys it reads) and produces (the keys it writes). The runner extracts only the required keys for each stage and merges the produced keys back into the context. Stages are decoupled from upstream output shapes — they see exactly what they declared, nothing more.

Finding S2-2: Error short-circuit logic distributed across every stage

Complected: Error handling semantics + stage transformation logic.

If error detection is embedded in each stage (check for error envelope, decide whether to skip, pass through), then every stage is complected with error flow control. The error propagation mechanism is braided into the transformation logic. You cannot reason about a stage’s behaviour without also reasoning about error states from stages it has never heard of.

Resolution: The runner checks for error envelopes before calling each stage. If the context contains an error, the runner short-circuits and skips remaining stages. Stages never see errors. They are pure transformations of valid inputs to valid outputs.

Finding S2-3: Gate functions close over global config

Complected: Pure routing logic + mutable global state.

Gate functions (quality gate, PAD gate) that close over a global config object cannot be tested in isolation — they depend on whatever the global config happens to contain at call time. The gate’s logic (compare score to threshold) is simple; the complection is the invisible dependency on shared mutable state.

Resolution: Config values are injected via partial application at pipeline construction time. A gate is constructed as make_quality_gate(threshold=0.35), returning a pure closure. The gate function has no runtime dependency on global state. It is a value.

Finding S2-4: VectorStorePort merges read, write, and lookup

Complected: Search (latency-critical read) + mutation (eventually-consistent write) + point lookup.

A single VectorStorePort that bundles search(), add(), remove(), and get_by_id() forces every consumer to depend on capabilities it does not use. The search orchestrator, which only reads, carries a dependency on mutation methods. The enrol orchestrator, which only writes, carries a dependency on search methods. These are independent operational concerns with different consistency, latency, and scaling characteristics.

Resolution: Split into three ports:

VectorSearchPort — search(query, k, partition, bitset) -> list[(id, score)]
VectorLookupPort — get_by_id(id) -> template | None
VectorMutationPort — add(id, vector, partition), remove(id)

Each orchestrator depends only on the ports it actually uses. The FAISS adapter implements all three; consumers are decoupled from each other.

Finding S2-5: Pipeline dict shapes are unversioned/unvalidated

Complected: Data shape contract + runtime behaviour.

If the dict flowing between stages has no declared schema, the contract between stages is implicit — encoded only in the keys each stage happens to read and write. A typo in a key name, or a removed field, produces a silent KeyError at runtime rather than a clear contract violation at build/test time.

Resolution: TypedDict schemas defined per stage boundary. Each stage’s requires and produces declarations serve as the contract. Schema validation runs at test time to catch mismatches between declared contracts and actual stage implementations. The contract is data, not convention.

Finding S2-6: Event log contract not formalized as a port

Complected: Ground-truth storage semantics + implementation choice.

The FAISS design describes event sourcing with an append-only log as the ground truth for gallery mutations. But if the event log is accessed directly (file I/O, Kafka client) rather than through a port, the orchestration layer is complected with the storage mechanism. Switching from file-backed to Kafka — or testing without either — requires changing the domain code.

Resolution: EventLogPort protocol defined with append(event), read_from(offset), and current_offset(). File-backed and Kafka implementations both satisfy the same contract. The domain depends on the protocol, never on the implementation.

Finding S2-7: trace_id injection at Respond stage unresolved

Complected: Envelope/payload separation + respond stage needs.

The context/payload split (S2-1 resolution) cleanly separates envelope metadata (trace_id, timestamps) from stage payloads. But the Respond stage needs the trace_id to construct its output. If the stage cannot access envelope fields, it cannot do its job. If it can access all envelope fields, the separation is undermined.

Resolution: The Stage dataclass supports an inject_envelope_keys field for declarative envelope-to-payload injection at specific stages. The Respond stage declares inject_envelope_keys=["trace_id"]. The runner copies only those declared keys from the envelope into the stage’s input. The injection is explicit, minimal, and visible in the stage’s declaration — not a backdoor to the entire envelope.

Finding S2-8: Orchestrators call datetime.now() directly

Complected: Workflow logic + system clock.

An orchestrator that calls datetime.now() or time.time() directly cannot be tested deterministically. The workflow logic is simple; the complection is the hidden dependency on a non-deterministic external resource.

Resolution: ClockPort protocol injected into orchestrators. The production implementation returns UTC wall-clock time. Test implementations return deterministic, controllable timestamps. The orchestrator depends on a value supplier, not on the system clock.

Finding S2-9: Queue abstraction claimed as port but not formalized

Complected: Pipeline infrastructure + implicit coupling to asyncio.Queue.

The pipeline design describes inter-stage communication via queues, but if the queue type is asyncio.Queue used directly, every pipeline component is coupled to asyncio’s specific API and event loop model. Replacing asyncio queues with multiprocessing queues, or ZeroMQ, or an in-memory channel, requires changing the pipeline runner and potentially the stages.

Resolution: QueuePort protocol defined with put(item) and async __aiter__(). The pipeline runner depends on QueuePort; the concrete implementation (asyncio.Queue, multiprocessing, etc.) is an infrastructure adapter. Stages and the runner are decoupled from the queue mechanism.

Second-Pass Resolution Status

#	Complection	Status	Resolution	Design Doc
S2-1	Pipeline dict snowball (stage input + cumulative upstream history)	Resolved	Context/payload split. Runner manages context; stages declare `requires`/`produces`.	Pipeline Design
S2-2	Error short-circuit in every stage (error handling + transformation logic)	Resolved	Runner checks for error envelopes before calling stages. Stages never see errors.	Pipeline Design
S2-3	Gate functions close over global config (routing logic + mutable global state)	Resolved	Config injected via partial application at construction. Gates are pure closures.	Pipeline Design
S2-4	VectorStorePort merges read/write/lookup (search + mutation + point lookup)	Resolved	Split into VectorSearchPort, VectorLookupPort, VectorMutationPort.	Domain Design
S2-5	Pipeline dict shapes unvalidated (data shape contract + runtime behaviour)	Resolved	TypedDict schemas per stage boundary. `requires`/`produces` declarations validated at test time.	Pipeline Design
S2-6	Event log contract not a port (storage semantics + implementation choice)	Resolved	EventLogPort protocol: append, read_from, current_offset.	Domain Design, FAISS Design
S2-7	trace_id injection unresolved (envelope/payload separation + respond stage)	Resolved	Stage `inject_envelope_keys` for declarative, minimal envelope-to-payload injection.	Pipeline Design
S2-8	Orchestrators call datetime.now() (workflow logic + system clock)	Resolved	ClockPort protocol injected into orchestrators. Tests use deterministic clocks.	Domain Design
S2-9	Queue abstraction not formalized (pipeline infra + asyncio coupling)	Resolved	QueuePort protocol: put(), aiter(). Runner depends on port, not implementation.	Domain Design

Third-Pass Audit (2026-02-20)

A three-agent Rich Hickey review (Simplicity Analyst, Data & State Architect, System Architecture Reviewer) examined the design documents holistically. The second-pass findings above addressed concrete design complections. This third pass inspects the documentation consistency, operational gaps, and remaining data-orientation tensions across all design docs.

Finding S3-1: Orchestrator definitions diverge between pipeline-design.md and domain-design.md

Complected: Two competing canonical definitions of the same component.

The pipeline-design.md contained a full “Orchestration Layer” section with orchestrator code examples using unsplit VectorStorePort and datetime.now(timezone.utc) — contradicting the resolved designs in domain-design.md (split ports, ClockPort). An implementer reading only pipeline-design.md would build against the stale design.

Resolution: The pipeline-design.md orchestration section now references domain-design.md Section 4 as canonical. Orchestrator code examples replaced with a port summary table and a functools.partial wiring pattern for injecting orchestrators into executor stages.

Finding S3-2: `pad_gate` complects PAD policy with operation routing

Complected: Spoof score evaluation (PAD concern) + ENROL-to-MAD routing (pipeline topology concern).

The original pad_gate checked spoof score and routed ENROL operations to MAD — two independent decisions in one function. Changing MAD requirements (e.g., requiring it for VERIFY) would require editing a PAD function.

Resolution: Split into two stages: pad_gate (pure PAD check: spoof score → respond or pass) and enrol_router (operation routing: ENROL → MAD, others → Detect). Each function has one responsibility. Pipeline Stage list updated with the new routing stage.

Finding S3-3: Runner context dict mutated in place

Complected: Value-oriented philosophy + place-oriented runner implementation.

The runner used ctx.update(output) and ctx.pop(key, None) — in-place mutation contradicting the “values over places” principle applied everywhere else. Latent concurrency hazard if the pipeline ever supports fan-out.

Resolution: Runner now uses ctx = {**ctx, **output} (dict merge producing new dict) and ctx = {k: v for k, v in ctx.items() if k not in stage.drops} (filtering producing new dict). Context is a value at every stage boundary.

Finding S3-4: `EnrolResult.replaced` reports intent, not outcome

Complected: Caller’s request intent + factual result assertion.

EnrolResult(replaced=request.replace_existing) set the result from the request flag, not from the actual outcome. Whether replacement occurred is a fact only the vector store knows.

Resolution: Added replaced: bool to EnrolAck. The orchestrator now uses replaced=ack.replaced — the result reflects what actually happened.

Finding S3-5: Event types diverge between domain-design.md and faiss-design.md

Complected: Two definitions of the same event types with different field names and shapes.

Domain EnrolEvent lacked the vector field (needed for index rebuilds), used timestamp vs FAISS’s enrolled_at, and used source_image_hash vs source_ref. DeleteEvent lacked reason_code.

Resolution: Domain event types updated to be canonical: enrolled_at, source_ref, vector field added to EnrolEvent, reason_code added to DeleteEvent. FAISS design doc references domain types as authoritative, with Rust structs derived via protobuf translation.

Finding S3-6: No backpressure design for queue pipeline

Complected: Queue decoupling benefit + unbounded memory growth risk.

Hickey advocates queues for decoupling, but unbounded queues are a liability. Extract at 120ms is the bottleneck; without bounded queues, upstream stages fill memory under sustained 1,900 searches/min load.

Resolution: Added backpressure section to pipeline-design.md with bounded queue sizes per stage boundary, saturation behaviour (producer blocks), and inbound adapter returning HTTP 503 / gRPC RESOURCE_EXHAUSTED when receive queue is full.

Finding S3-7: Python/Rust wire format unspecified

Complected: Design completeness + implementation ambiguity at the serialisation boundary.

The protobuf schema for Python↔Rust communication (especially float32 vector encoding) was not specified. This is where precision bugs and endianness issues hide.

Resolution: Added wire format specification to faiss-design.md: bytes field type for vectors (raw little-endian f32), protobuf schema snippet for key messages, numpy .tobytes()/frombuffer() serialisation. Referenced from ARCHITECTURE.md.

Finding S3-8: No accretion policy for stage dict shapes

Complected: Evolving stage contracts + silent consumer breakage.

No policy for how stage output shapes evolve. A renamed key breaks consumers silently.

Resolution: Added accretion policy to pipeline-design.md: “stages may add new keys but must never rename or remove existing keys” — the “provide more, require less” principle.

Finding S3-9: TypedDict schemas and requires/produces are redundant specifications

Complected: Two independent specs of the same contract that can drift.

TypedDict definitions and Stage requires/produces frozensets specify the same information separately.

Resolution: Added schema derivation note to pipeline-design.md: TypedDicts should be generated from requires/produces declarations at test time, eliminating the drift risk.

Finding S3-10: Executor stage dependency injection unspecified

Complected: Stage function signature (dict -> dict) + orchestrator dependency that has no injection path.

Executor stages needed to call domain orchestrators, but the stage function signature provided no mechanism for dependency injection. Gates used functools.partial for config, but executors had no equivalent pattern.

Resolution: Executor stages now use the same functools.partial pattern: partial(search_executor, search_orch) at construction time. Documented in pipeline-design.md Translation Boundary section.

Finding S3-11: `biometric.py` bundles all 12+ ports in a single file

Complected: Four independent concern families (operations, vector store, inference, infrastructure) in one module.

ClockPort and QueuePort have nothing to do with biometrics. A change to QueuePort opens the same file as VectorSearchPort. This is “easy” (one import) but not “simple” (multiple concerns in one module).

Recommendation (not yet resolved): Split into concern-aligned modules during Phase 1 implementation:

ports/operations.py — Search, Verify, Enrol, Delete
ports/vector_store.py — VectorSearch, VectorLookup, VectorMutation
ports/inference.py — Inference, Quality, PAD, MorphingDetection
ports/infrastructure.py — EventLog, Queue, Clock

Third-Pass Resolution Status

#	Complection	Status	Resolution	Design Doc
S3-1	Orchestrator definitions diverge (two competing canonicals)	Resolved	pipeline-design.md references domain-design.md as canonical	Pipeline Design, Domain Design
S3-2	pad_gate complects PAD + routing (two concerns in one gate)	Resolved	Split into pad_gate + enrol_router as separate stages	Pipeline Design
S3-3	Runner context mutated in place (values vs places)	Resolved	Dict spread and comprehension; context is a value at every boundary	Pipeline Design
S3-4	EnrolResult.replaced reports intent (request flag vs actual outcome)	Resolved	EnrolAck.replaced added; orchestrator uses ack.replaced	Domain Design
S3-5	Event types diverge (domain vs FAISS definitions)	Resolved	Domain types canonical; unified field names; vector field added	Domain Design, FAISS Design
S3-6	No backpressure (unbounded queues under load)	Resolved	Bounded queue sizes, saturation policy, 503/RESOURCE_EXHAUSTED	Pipeline Design
S3-7	Wire format unspecified (Python/Rust boundary)	Resolved	Protobuf schema with bytes fields, LE f32, numpy serialisation	FAISS Design, ARCHITECTURE.md
S3-8	No accretion policy (stage shapes can break silently)	Resolved	”Provide more, require less” policy documented	Pipeline Design
S3-9	Redundant TypedDict/requires specs (two specs, one contract)	Resolved	Schema derivation from requires/produces at test time	Pipeline Design
S3-10	Executor DI unspecified (no injection path for orchestrators)	Resolved	functools.partial pattern for executor stages	Pipeline Design
S3-11	biometric.py bundles all ports (four concern families in one file)	Open	Recommended split into 4 concern-aligned modules during Phase 1	—

Combined Audit Summary

First-pass audit (VISION.md): 7 complections identified, 7 resolved. Second-pass audit (design docs): 9 complections identified, 9 resolved. Third-pass audit (cross-doc review): 11 complections identified, 10 resolved, 1 open recommendation. Total: 27 complections identified, 26 resolved, 1 open.

Pass	Area	Complected Concerns	Simple Separation
1st	Core Domain	State lifecycle + routing policy + search mechanism	Three separate components: state machine, policy map, search function
1st	Quality Assessment	Measurement + rejection policy	Measure as a value; policy gate as a separate configurable step
1st	PAD	Detection + classification threshold + rejection action	Three pure functions composed by orchestrator
1st	Aggregation	Merge/rank + confidence threshold	Aggregator returns ranked list; decision function applies threshold
1st	FAISS Filtering	Metadata storage + filter policy	Metadata store separate; selector function produces bitset as a value
1st	Gallery Partitioning	Logical segmentation + physical sharding	Two independent maps; resolved separately
1st	Operational Security	Support activity + environment context + data sensitivity	Access requirements as function of (activity, environment, data-sensitivity)
2nd	Pipeline dicts	Stage input + cumulative upstream history	Context/payload split; stages declare requires/produces
2nd	Error handling	Error flow control + stage transformation logic	Runner owns error short-circuit; stages are pure transforms
2nd	Gate config	Pure routing logic + mutable global state	Partial application at construction; gates are pure closures
2nd	VectorStorePort	Search + mutation + point lookup	Three separate ports: Search, Lookup, Mutation
2nd	Dict schemas	Data shape contract + runtime behaviour	TypedDict schemas; requires/produces validated at test time
2nd	Event log	Storage semantics + implementation choice	EventLogPort protocol with append/read_from/current_offset
2nd	trace_id injection	Envelope/payload separation + respond stage needs	Declarative inject_envelope_keys on Stage dataclass
2nd	Clock dependency	Workflow logic + system clock	ClockPort protocol; deterministic in tests
2nd	Queue abstraction	Pipeline infra + asyncio coupling	QueuePort protocol; implementation is an adapter
3rd	Orchestrator docs	Two competing canonical definitions	pipeline-design.md references domain-design.md as canonical
3rd	pad_gate	PAD policy + operation routing in one function	Split into pad_gate + enrol_router stages
3rd	Runner context	Value philosophy + place-oriented mutation	Dict spread; context is a value at every boundary
3rd	EnrolResult.replaced	Request intent + factual outcome	EnrolAck.replaced; orchestrator uses actual outcome
3rd	Event types	Domain vs FAISS definitions diverge	Domain types canonical; unified names; vector field added
3rd	Backpressure	Queue decoupling + unbounded memory growth	Bounded queues, saturation policy, 503 responses
3rd	Wire format	Design completeness + serialisation ambiguity	Protobuf schema, LE f32 bytes, numpy serialisation
3rd	Accretion policy	Evolving contracts + silent breakage	”Provide more, require less” documented
3rd	Schema redundancy	Two specs of one contract	Derive TypedDicts from requires/produces
3rd	Executor DI	Stage signature + missing dependency path	functools.partial for executor stages
3rd	Port file organisation	Four concern families in one module	Recommended 4-file split (open)

Simplicity Audit

Executive Summary

Section 1: System Architecture (Hexagonal Pattern)

What is Simple (Properly Separated)

What is Complected

How to Untangle

Section 2: Algorithmic Pipeline

What is Simple (Properly Separated)

What is Complected

Section 3: Vector Storage & Matching Engine

What is Simple (Properly Separated)

What is Complected

Section 4: Enterprise Integration Layer

What is Simple (Properly Separated)

What is Complected

Section 5: Infrastructure & Deployment

What is Simple (Properly Separated)

What is Complected

Summary Table

Guiding Principle for Implementation

Resolution Status

Second-Pass Audit (2026-02-20)

Finding S2-1: Pipeline dicts accumulate all upstream fields (snowball pattern)

Finding S2-2: Error short-circuit logic distributed across every stage

Finding S2-3: Gate functions close over global config

Finding S2-4: VectorStorePort merges read, write, and lookup

Finding S2-5: Pipeline dict shapes are unversioned/unvalidated

Finding S2-6: Event log contract not formalized as a port

Finding S2-7: trace_id injection at Respond stage unresolved

Finding S2-8: Orchestrators call datetime.now() directly

Finding S2-9: Queue abstraction claimed as port but not formalized

Second-Pass Resolution Status

Third-Pass Audit (2026-02-20)

Finding S3-1: Orchestrator definitions diverge between pipeline-design.md and domain-design.md

Finding S3-2: pad_gate complects PAD policy with operation routing

Finding S3-3: Runner context dict mutated in place

Finding S3-4: EnrolResult.replaced reports intent, not outcome

Finding S3-5: Event types diverge between domain-design.md and faiss-design.md

Finding S3-6: No backpressure design for queue pipeline

Finding S3-7: Python/Rust wire format unspecified

Finding S3-8: No accretion policy for stage dict shapes

Finding S3-9: TypedDict schemas and requires/produces are redundant specifications

Finding S3-10: Executor stage dependency injection unspecified

Finding S3-11: biometric.py bundles all 12+ ports in a single file

Third-Pass Resolution Status

Combined Audit Summary

Finding S3-2: `pad_gate` complects PAD policy with operation routing

Finding S3-4: `EnrolResult.replaced` reports intent, not outcome

Finding S3-11: `biometric.py` bundles all 12+ ports in a single file