ENMay 20, 2026QuanData & AI — Office of the Chief Architect

The Blueprint of QuanData & AI's Agentic Orchestration for the Retail Industry

Classification: Internal — Strategic & Technical Document Type: Reference Architecture & Implementation Doctrine Version: 1.0 — May 2026 Owners: QuanData & AI — Office of the Chief Architect Status: Authoritative; supersedes all prior agentic-retail technical notes

Cover Brief

This document defines, with executive clarity and engineering precision, how QuanData & AI will architect, deploy, and operate multi-agent orchestration systems for the retail industry through 2026 and beyond.

It is not a survey. It is a doctrine. Every pattern, every agent, every guardrail, every line of the build phasing has been selected because it is what the production-grade leaders of the agentic era — Anthropic, Walmart, Amazon, Shopify, Sierra, Decagon, Mercado Libre — are demonstrably running in market today.

The thesis is simple and the evidence is overwhelming:

A single agent is a single employee. A coordinated team of specialized agents is a new operational category. The retailers building this category in 2026 are pulling away from the retailers that are not.

Executive Synthesis
Strategic Context — The Retail Agentic Landscape
Theoretical Foundations — The Three Orchestration Patterns
The QuanData Agent Topology
The Specialist Agent Roster
The Tool Catalog
Reference Orchestration Flows
State, Memory & The Dreaming Layer
Outcomes — The Rubric-Driven Quality Loop
Guardrail Architecture — Defense In Depth
Integration & Infrastructure Stack
Cost Economics
The 12-Month Build Phasing
Failure Mode Taxonomy
Appendix A — Decision Matrices
Appendix B — Glossary
Appendix C — Source Index

1. Executive Synthesis

On May 6, 2026, at the Code with Claude event, Anthropic formalized multi-agent orchestration for Claude Managed Agents — up to 20 specialized agents running in parallel on a single task. This was not a research demo. It was the productization of an architecture Netflix, Harvey, Shopify, and Mercado Libre were already running at scale.

Across the retail industry, the verified outcomes are no longer ambiguous:

Retailer	Agent	Verified Outcome
Amazon	Rufus	~$12B incremental sales (2025); 300M+ users; +60% purchase likelihood
Walmart	Sparky	+35% AOV for engaged users; ~50% of app users engaged
Klarna	OpenAI Assistant	67% automation; $40M profit lift (2024) — with a 2025 partial walk-back to human-hybrid
Mercado Libre	Pago Assistant	~90% query containment without human handoff
Lowe's	Mylow Companion	Deployed to >1,700 stores for associate copilot
WeightWatchers (Sierra)	CS Agent	70% containment week-1; >4.5/5 CSAT

QuanData & AI's mandate is to deliver this category of outcome to our retail clients — through an architecture that is:

Specialized — narrow agents, not generalist bots
Parallel — fan-out where the workload allows
Audited — every action traceable to a principal, a tool, and a decision
Reversible — every mutation has a documented compensation path
Governed — human-in-loop gates on every blast-radius action
Self-improving — Dreaming layers that compound institutional knowledge across sessions
Outcome-graded — rubric-driven self-evaluation before any output reaches a customer

The blueprint that follows is the canonical implementation of that mandate.

2. Strategic Context — The Retail Agentic Landscape

2.1 The architectural verdict from the market

Two architectural facts now dominate the retail agentic landscape:

Consolidation beats sprawl. Walmart publicly admitted in mid-2025 that fragmented point-bots did not scale and consolidated to four "super agents" (Sparky for customer, Marty for sellers/advertisers/suppliers, an associate agent, and a developer agent) on a unified ML platform (Element) with MCP as the agent-to-agent protocol.
Retailers must own the MCP layer. Shopify exposes three official MCP servers (Storefront, Customer Accounts, Dev), Adobe declared MCP the default agent protocol at Summit 2026, and Walmart's deliberately walled-off Sparky-in-Gemini integration showed why: when Walmart briefly piloted OpenAI's ChatGPT Instant Checkout, it converted 3× worse than click-out to walmart.com.

2.2 The competitive map

quadrantChart
    title Retail Agent Maturity — May 2026
    x-axis "Narrow Use Case" --> "Full Platform Strategy"
    y-axis "Pilot / Internal" --> "Verified Production Outcomes"
    quadrant-1 "Production Platform"
    quadrant-2 "Production Point-Solution"
    quadrant-3 "Pilot Point-Solution"
    quadrant-4 "Pilot Platform"
    Amazon Rufus: [0.85, 0.95]
    Walmart Sparky+Element: [0.92, 0.88]
    Shopify Sidekick/Magic: [0.88, 0.82]
    Mercado Libre Pago: [0.45, 0.90]
    Klarna OpenAI: [0.30, 0.55]
    Lowe's Mylow: [0.40, 0.78]
    Home Depot Magic Apron: [0.35, 0.65]
    Target Store Companion: [0.25, 0.55]
    Sephora Virtual Artist: [0.30, 0.60]
    Best Buy Agentic: [0.25, 0.45]
    Costco ML Forecast: [0.30, 0.85]
    IKEA Billie: [0.20, 0.35]

2.3 The vendor stratification

Tier	Vendors	Posture
Tier 1 — Platform Owners	Anthropic, OpenAI, Google, Microsoft	Define the substrate (LLMs, MCP, ACP, UCP)
Tier 1 — Hyperscale Retailers	Amazon, Walmart, Shopify, Alibaba	Build proprietary super-agents on top of substrate
Tier 2 — Vertical Agent Vendors	Sierra, Decagon, Ada, Salesforce Agentforce, Microsoft Copilot for Retail	Premium / white-glove agent platforms
Tier 2 — Commerce-Native Vendors	Mirakl, Klaviyo, Algolia, Constructor, Lily AI	Expose retail primitives as MCP-callable tools
Tier 3 — Adjacent Infrastructure	Riskified, Signifyd, Transcend, OneTrust	Fraud, privacy, governance — increasingly agent-aware

2.4 The three lessons QuanData & AI has internalized

flowchart LR
    A[Klarna 2025 Walk-Back] --> L1[Lesson 1<br/>Deflection without CSAT is a trap]
    B[Walmart Checkout 3× Drop] --> L2[Lesson 2<br/>Never surrender the funnel to a 3rd-party agent runtime]
    C[Walmart Element Consolidation] --> L3[Lesson 3<br/>Orchestrate from day one — point-bots do not scale]
    L1 --> R[QuanData Doctrine]
    L2 --> R
    L3 --> R
    R -->|enforces| D1[Every metric pairs with CSAT/quality]
    R -->|enforces| D2[QuanData owns the MCP layer]
    R -->|enforces| D3[Orchestrator + Specialist topology]
    style R fill:#0b3d91,stroke:#fff,color:#fff
    style D1 fill:#1f6feb,stroke:#fff,color:#fff
    style D2 fill:#1f6feb,stroke:#fff,color:#fff
    style D3 fill:#1f6feb,stroke:#fff,color:#fff

2.5 Regulatory perimeter

Regulation	Surface	QuanData Posture
EU AI Act	Discovery & personalization = minimal-risk; behavioral manipulation = prohibited	Build minimal-risk baseline; require Article-aligned documentation for any inferred-attribute pricing
EU DG COMP	Active algorithmic pricing inquiries (July 2025); OECD report October 2025	Surveillance-pricing prohibited by default; only category-level + cohort-level pricing permitted
FTC	Surveillance pricing scrutiny; deceptive-practices doctrine	Never price-discriminate by inferred income/race/health
GDPR / CCPA	Profile-based personalization requires lawful basis	Transcend Agentic Assist for DSARs; consent ledger consulted before PSA/MCA invocation
PCI-DSS	Card data scope explosion via agent-initiated payment	LLMs never see PAN/CVV; Stripe tokenization at the edge; agent only handles `pm_xxx` tokens
Agent-vs-Agent Fraud	Bot-driven returns, promo abuse, reseller arbitrage	Riskified + HUMAN-pattern defenses in production by Month 9

3. Theoretical Foundations — The Three Orchestration Patterns

Three orchestration patterns have emerged as the production-validated set across Anthropic's Research system, Walmart's Element, Shopify's agentic platform, Harvey's legal stack, and Sierra's customer constellation. QuanData & AI uses all three, deployed by use-case fit, never as defaults.

3.1 Pattern A — The Pipeline

Agents run sequentially. Each step takes the prior step's structured output as input. Used when later steps strictly depend on earlier ones.

flowchart LR
    U[User / Trigger] --> A1[Agent 1<br/>Research]
    A1 -->|structured output| A2[Agent 2<br/>Analysis]
    A2 -->|structured output| A3[Agent 3<br/>Writing]
    A3 -->|structured output| A4[Agent 4<br/>Review]
    A4 --> O[Final Output]
    style A1 fill:#0b3d91,stroke:#fff,color:#fff
    style A2 fill:#0b3d91,stroke:#fff,color:#fff
    style A3 fill:#0b3d91,stroke:#fff,color:#fff
    style A4 fill:#0b3d91,stroke:#fff,color:#fff

Retail applications: order-status inquiry → policy lookup → reply composition; product-launch brief → copy → compliance → publish.

3.2 Pattern B — The Fan-Out (Commander / Workers)

A commander agent decomposes a task and dispatches subtasks to parallel workers, each in an isolated context window. Results are aggregated by the commander. This is the pattern Netflix uses for parallel build-log analysis and Anthropic uses in its Research feature.

flowchart TB
    C[Commander Agent<br/>decomposes + synthesizes] --> W1[Worker 1<br/>document A]
    C --> W2[Worker 2<br/>document B]
    C --> W3[Worker 3<br/>document C]
    C --> W4[Worker 4<br/>document D]
    C --> W5[Worker 5<br/>document E]
    W1 --> S[Synthesis Layer]
    W2 --> S
    W3 --> S
    W4 --> S
    W5 --> S
    S --> O[Aggregated Output]
    style C fill:#0b3d91,stroke:#fff,color:#fff
    style S fill:#7c3aed,stroke:#fff,color:#fff

Retail applications: parallel competitor-price scraping across N retailers; parallel SKU-level demand forecasting for an entire category; parallel campaign-copy generation across 12 locales.

Critical engineering constraint (Anthropic, Multi-Agent Research System, June 2025): token usage explains ~80% of performance variance. The real value of fan-out is context compression, not just speed. Each worker compresses its findings before returning to the commander.

3.3 Pattern C — The Specialist Team

Multiple agents with distinct specializations collaborate on a single complex deliverable. Each owns a domain. The orchestrator routes, the specialists execute, results are merged. This is the pattern Harvey uses for legal work and Sierra deploys with its "constellation of 15+ models."

flowchart TB
    O[Orchestrator] --> CS[Customer Service Agent]
    O --> PS[Personal Shopper Agent]
    O --> IR[Inventory Agent]
    O --> PP[Pricing Agent]
    O --> MC[Marketing Content Agent]
    O --> SC[Supply Chain Agent]
    O --> AI[Analytics Agent]
    O --> SA[Store Associate Copilot]
    O --> FR[Fraud + Risk Agent]
    CS -.shared state.-> M[(Conversation /<br/>Memory Layer)]
    PS -.shared state.-> M
    IR -.shared state.-> M
    PP -.shared state.-> M
    MC -.shared state.-> M
    SC -.shared state.-> M
    AI -.shared state.-> M
    SA -.shared state.-> M
    FR -.shared state.-> M
    style O fill:#7c3aed,stroke:#fff,color:#fff
    style M fill:#0b3d91,stroke:#fff,color:#fff

Retail applications: the QuanData reference roster is built on this pattern (see §4–§5).

3.4 The pattern-selection decision tree

flowchart TD
    Q1{Does the task have<br/>strict step ordering?}
    Q1 -- Yes --> Q2{Is it a single domain?}
    Q1 -- No --> Q3{Same operation on<br/>many items?}
    Q2 -- Yes --> P_Pipe[Pipeline<br/>e.g. order-status → policy → reply]
    Q2 -- No --> P_Spec1[Specialist Team<br/>w/ ordered handoffs]
    Q3 -- Yes --> P_Fan[Fan-Out<br/>e.g. SKU-level forecasts]
    Q3 -- No --> Q4{Multiple distinct<br/>domains needed?}
    Q4 -- Yes --> P_Spec2[Specialist Team]
    Q4 -- No --> P_Single[Single Agent<br/>or function call]
    style P_Pipe fill:#0b3d91,stroke:#fff,color:#fff
    style P_Fan fill:#7c3aed,stroke:#fff,color:#fff
    style P_Spec1 fill:#16a34a,stroke:#fff,color:#fff
    style P_Spec2 fill:#16a34a,stroke:#fff,color:#fff
    style P_Single fill:#6b7280,stroke:#fff,color:#fff

3.5 The Cognition Theorem — "Multi-agent reads, single-threaded writes"

The most important production lesson of the last twelve months is from Cognition's Don't Build Multi-Agents (June 2025) and the 2026 follow-up Multi-Agents: What's Actually Working:

Parallel multi-agent execution is correct for reads and intelligence work. Single-threaded execution is correct for writes.

QuanData applies this as an inviolable rule: any agent that mutates inventory, payment, fulfillment, or pricing executes in a single-threaded, durable workflow (Temporal / Step Functions). Read paths fan out; write paths serialize.

4. The QuanData Agent Topology

4.1 System-level topology

flowchart TB
    subgraph CH[Channel Layer]
        WEB[Web / Storefront]
        APP[Mobile App]
        POS[POS / In-Store]
        SLK[Slack / Internal]
        EML[Email / SMS]
    end

    subgraph OL[Orchestration Layer]
        GW[API Gateway<br/>auth + rate-limit + redaction]
        OR[Orchestrator Agent<br/>intent · routing · audit · approval]
    end

    subgraph SL[Specialist Layer]
        CSA[Customer Service Agent]
        PSA[Personal Shopper Agent]
        IRA[Inventory + Replenishment Agent]
        PPA[Pricing + Promotions Agent]
        MCA[Marketing Content Agent]
        SCLA[Supply Chain + Logistics Agent]
        AIA[Analytics + Insights Agent]
        SAC[Store Associate Copilot]
        FRA[Fraud + Risk Agent]
    end

    subgraph TL[Tool Layer — MCP]
        T_SHOP[Shopify MCP]
        T_STRIPE[Stripe MCP]
        T_KLAV[Klaviyo MCP]
        T_ALG[Algolia MCP]
        T_INT[Internal MCP<br/>ERP · WMS · CDP · POS]
    end

    subgraph DL[Data Layer]
        OMS[(OMS / Commerce<br/>Shopify · SAP CC)]
        ERP[(ERP / WMS<br/>SAP · NetSuite)]
        CDP[(CDP<br/>RudderStack · Segment)]
        VEC[(Vector Store<br/>pgvector · Turbopuffer)]
        WH[(Warehouse<br/>Snowflake · BigQuery)]
        AUD[(Audit Log<br/>append-only)]
        DRM[(Dreaming Store<br/>distilled memory)]
    end

    CH --> GW
    GW --> OR
    OR --> CSA
    OR --> PSA
    OR --> IRA
    OR --> PPA
    OR --> MCA
    OR --> SCLA
    OR --> AIA
    OR --> SAC
    OR --> FRA

    CSA & PSA & IRA & PPA & MCA & SCLA & AIA & SAC & FRA --> T_SHOP
    CSA & PSA & IRA & PPA & MCA & SCLA & AIA & SAC & FRA --> T_STRIPE
    MCA & PSA --> T_KLAV
    PSA & SAC --> T_ALG
    CSA & IRA & PPA & MCA & SCLA & SAC & FRA --> T_INT

    T_SHOP --> OMS
    T_STRIPE --> OMS
    T_KLAV --> CDP
    T_ALG --> VEC
    T_INT --> ERP
    T_INT --> CDP
    T_INT --> WH

    OR -.audit.-> AUD
    OR -.nightly distill.-> DRM
    DRM -.context inject.-> OR

    style OR fill:#7c3aed,stroke:#fff,color:#fff
    style GW fill:#0b3d91,stroke:#fff,color:#fff
    style DRM fill:#dc2626,stroke:#fff,color:#fff
    style AUD fill:#0b3d91,stroke:#fff,color:#fff

4.2 The control plane vs the data plane

Plane	What it carries	Where it lives
Control	Intent, routing, approval status, audit IDs, agent identity	Orchestrator + Redis + Audit log
Data — Read	Catalog, inventory, customer profile, order history, embeddings	OMS / CDP / Vector store / Warehouse
Data — Write	New orders, refunds, returns, POs, price changes	OMS / ERP — always via durable workflow
Dreaming	Distilled summaries, learned merchant preferences, post-mortem patterns	Dreaming Store (Postgres + embeddings)

5. The Specialist Agent Roster

QuanData's reference deployment runs one orchestrator + nine specialists + an optional tenth.

5.1 The roster at a glance

flowchart LR
    subgraph OR_BOX[Top-Level]
        ORCH[Orchestrator / Router]
    end
    subgraph CF[Customer-Facing]
        CSA[Customer Service]
        PSA[Personal Shopper]
        SAC[Store Associate Copilot]
    end
    subgraph MF[Merchant / Operator-Facing]
        IRA[Inventory + Replenishment]
        PPA[Pricing + Promotions]
        MCA[Marketing Content]
        SCLA[Supply Chain + Logistics]
        AIA[Analytics + Insights]
    end
    subgraph GR[Governance]
        FRA[Fraud + Risk]
    end
    ORCH --> CF
    ORCH --> MF
    ORCH --> GR
    style ORCH fill:#7c3aed,stroke:#fff,color:#fff
    style CF fill:#1f6feb,stroke:#fff,color:#fff
    style MF fill:#16a34a,stroke:#fff,color:#fff
    style GR fill:#dc2626,stroke:#fff,color:#fff

5.2 The capability matrix

Agent	Primary Surface	Read Sources	Write Targets	Latency SLO p95	Pattern
Orchestrator	All	All (meta)	Audit, Conversation State	200ms (routing only)	Specialist Team
Customer Service	Web/App/Email/SMS	OMS, WMS, Returns, Policy KB	Returns, Refunds (gated), Tickets	3s	Pipeline + Specialist
Personal Shopper	Web/App/Chat	Catalog, Embeddings, CDP, Inventory	Impressions log	2s	Specialist + Fan-Out (multi-locale)
Inventory & Replenishment	Slack/Web (merchant)	ERP, WMS, Sales	Draft POs (gated), Replenishment signals	10s (interactive) / batch	Fan-Out (per SKU)
Pricing & Promotions	Slack/Web (merchant)	Pricing rules, Competitor feed, Elasticity model	Price proposals, Promo drafts (gated)	10s / batch	Specialist
Marketing Content	Slack/Web (merchant)	PIM, Brand voice, Past performance	Content drafts (gated), Campaign push	30s (drafting)	Fan-Out (variants)
Supply Chain & Logistics	Slack/Web/App	TMS, Carrier APIs, Weather	Transfer orders, Claims, ETA updates	5s	Pipeline
Analytics & Insights	Slack/Web (merchant)	Warehouse, Semantic layer	Saved queries, Alert rules	5s (query)	Specialist
Store Associate Copilot	POS/Mobile	Store inventory, CDP	Reservations, Save-the-sale orders, Notes	2s (hard SLA)	Pipeline
Fraud & Risk	Internal	Order risk feed, Return history	Account flags, Verification requests	3s	Specialist

5.3 Per-agent specifications (canonical definitions)

5.3.1 Orchestrator / Router

Field	Specification
Scope	Single entry point. Classifies intent, decomposes multi-step tasks, dispatches to specialists, aggregates results, enforces guardrails.
Inputs	Raw user message, channel metadata, authenticated principal, conversation history pointer
Outputs	Final response + structured trace (`run_id`, sub-agent calls, tool calls, confidence, cost)
Tools	`classify_intent`, `route_to_agent`, `request_human_approval`, `log_audit_event`, `get_conversation_state`, `write_conversation_state`
Reads	Session store, customer profile snippet
Writes	Audit log, conversation state
Invoke	Always first; re-invoked at every turn; never bypassed
Model	Claude Haiku 4.5 (intent) → Sonnet 4.6 (synthesis)

5.3.2 Customer Service Agent (CSA)

Field	Specification
Scope	Order status, returns/refunds, exchanges, shipping issues, complaints, policy Q&A
Tools	`get_order`, `get_shipment_tracking`, `create_return`, `issue_refund`, `create_replacement_order`, `lookup_policy`, `escalate_to_human`
Guardrails	Refund auto-approve ≤ $250 AND LTV decile ≥ 3 AND no abuse flags; otherwise → approval queue
KPIs	Deflection rate, first-contact resolution, refund accuracy, CSAT (paired)
Model	Sonnet 4.6 (Opus 4.7 for complex disputes)

5.3.3 Personal Shopper Agent (PSA)

Field	Specification
Scope	Product discovery, outfit/bundle building, gift assistant, post-purchase upsell
Tools	`search_catalog`, `vector_search_products`, `get_customer_preferences`, `get_purchase_history`, `check_stock`, `apply_personalization_model`, `build_bundle`
Guardrails	Consent-checked per locale (GDPR/CCPA); never recommends out-of-stock; respects MAP/brand floors
KPIs	Recall@10, NDCG@10, attach rate (A/B), novelty, diversity
Model	Sonnet 4.6 + internal ranker (SageMaker/Vertex)

5.3.4 Inventory & Replenishment Agent (IRA)

Field	Specification
Scope	Stock visibility, reorder point math, PO drafting, allocation
Tools	`check_stock`, `get_sales_velocity`, `forecast_demand`, `compute_reorder_point`, `draft_purchase_order`, `submit_po_for_approval`, `list_open_pos`
Guardrails	Never writes on-hand counts; PO > $25k auto → human; reliability-scored suppliers only for auto-PO
KPIs	Forecast MAPE & WAPE, stockout rate, weeks-of-supply variance, PO acceptance rate
Pattern	Fan-Out (per-SKU forecasts in parallel)

5.3.5 Pricing & Promotions Agent (PPA)

Field	Specification
Scope	Price recommendations, markdown cadence, promo design, competitive repricing
Tools	`get_price_history`, `get_competitor_prices`, `simulate_price_change`, `propose_promotion`, `update_price`, `create_promo_code`, `schedule_promo`
Guardrails	Δ > ±10% → human; below MAP → human; margin < 8% → human; promo > 30% → category manager; > 50% → director
Compliance	Surveillance-pricing prohibited; only category/cohort segmentation; EU/G7 algorithmic-pricing logging
KPIs	Margin lift, sell-through vs target, elasticity-model R², merchant approval rate

5.3.6 Marketing Content Agent (MCA)

Field	Specification
Scope	Product descriptions, ad copy, email, SMS, push, landing-page hero (multilingual)
Tools	`get_product_attributes`, `get_brand_voice`, `generate_copy`, `generate_image_prompt`, `check_compliance`, `submit_for_review`, `push_to_klaviyo`, `push_to_meta_ads`, `push_to_google_ads`
Guardrails	Brand-voice score < 0.85 → forced review; compliance auto-checked (CAN-SPAM, GDPR, FTC claims)
KPIs	Brand-voice score, compliance pass rate, win-rate vs human in blind test, CTR uplift in live A/B
Pattern	Fan-Out (variants × channels × locales)

5.3.7 Supply Chain & Logistics Agent (SCLA)

Field	Specification
Scope	ETA prediction, carrier selection, exception handling, transfers, last-mile
Tools	`get_shipment_tracking`, `predict_eta`, `select_carrier`, `create_transfer_order`, `file_carrier_claim`, `notify_customer_delay`
KPIs	ETA MAE, on-time delivery, claim recovery rate

5.3.8 Analytics & Insights Agent (AIA)

Field	Specification
Scope	Conversational BI for merchants. NL → SQL → chart + narrative
Tools	`nl_to_sql`, `run_query`, `get_anomaly_alerts`, `compute_attribution`, `generate_chart`, `save_dashboard`
Guardrails	RLS-enforced queries; hallucinated-column rate target = 0; warehouse cost caps
KPIs	Execution accuracy, semantic equivalence, hallucinated-column rate

5.3.9 Store Associate Copilot (SAC)

Field	Specification
Scope	Mobile/POS companion: find-in-store, clienteling, endless aisle, in-store returns, task list
Tools	`check_stock_nearby`, `reserve_item`, `lookup_customer`, `create_save_the_sale_order`, `get_clienteling_brief`, `print_label`, `get_task_list`
Guardrails	Hard 2s p95 SLA; bypasses orchestrator for read path, audits post-hoc
KPIs	Time-to-answer, save-the-sale conversion, associate CSAT

5.3.10 Fraud & Risk Agent (FRA) — optional 10th

Field	Specification
Scope	Order risk scoring, return-abuse detection, promo-abuse detection, chargeback assist
Tools	`score_order_risk`, `get_return_abuse_history`, `freeze_account`, `request_id_verification`
KPIs	AUC, FPR @ recall = 0.9, chargeback $ saved, agent-vs-agent fraud caught

6. The Tool Catalog

All tools return a structured envelope: {ok, data, error, audit_id}. Mutating tools embed the audit_id in the downstream system of record.

// ===== Orchestrator =====
classify_intent(text: string, context: SessionCtx): {intent: Intent, confidence: number}
route_to_agent(agent: AgentName, payload: object): AgentResult
request_human_approval(action: PendingAction, sla_minutes: number): ApprovalTicket
log_audit_event(event: AuditEvent): void
get_conversation_state(session_id: string): SessionState
write_conversation_state(session_id: string, patch: Partial<SessionState>): void

// ===== Customer Service =====
get_order(order_id: string): Order                                          // Shopify / SAP CC / OMS
get_shipment_tracking(shipment_id: string): TrackingEvents                  // Shippo / EasyPost
create_return(order_id: string, lines: ReturnLine[], reason: string): RMA   // Loop / Narvar / OMS
issue_refund(order_id: string, amount: Money, reason: string): RefundResult // Stripe / Adyen
create_replacement_order(original_order_id: string, lines: Line[]): Order
lookup_policy(topic: string, locale: string): PolicyExcerpt                 // Internal KB (vector)
escalate_to_human(ticket: TicketDraft): TicketId                            // Zendesk / Gorgias

// ===== Personal Shopper =====
search_catalog(query: string, filters: CatalogFilter): SKU[]                // Algolia / Constructor
vector_search_products(embedding: number[] | string, k: number, filters?: CatalogFilter): SKU[]
get_customer_preferences(customer_id: string): Preferences                  // RudderStack / Segment
get_purchase_history(customer_id: string, limit?: number): Order[]
check_stock(sku: string, location_id?: string): StockLevel                  // WMS / ERP
apply_personalization_model(customer_id: string, candidates: string[]): ScoredSKU[]
build_bundle(seed_sku: string, slots: BundleSlot[]): Bundle

// ===== Inventory & Replenishment =====
get_sales_velocity(sku: string, window_days: number, location_id?: string): VelocityStats
forecast_demand(sku: string, horizon_days: number, location_id?: string): Forecast
compute_reorder_point(sku: string, service_level: number): ReorderPoint
draft_purchase_order(supplier_id: string, lines: POLine[]): PODraft         // SAP / NetSuite
submit_po_for_approval(po_draft_id: string): ApprovalTicket
list_open_pos(supplier_id?: string): PO[]

// ===== Pricing & Promotions =====
get_price_history(sku: string, days: number): PricePoint[]
get_competitor_prices(sku: string, competitors?: string[]): CompetitorPrice[]
simulate_price_change(sku: string, new_price: Money): {expected_units, expected_margin, conf}
propose_promotion(scope: PromoScope, mechanic: PromoMechanic): PromoDraft
update_price(sku: string, new_price: Money, valid_from: ISODate, valid_to?: ISODate): void
create_promo_code(definition: PromoDef): PromoCode                          // Shopify / Talon.One
schedule_promo(promo_id: string, start: ISODate, end: ISODate): void

// ===== Marketing Content =====
get_product_attributes(sku: string): PIMRecord                              // Akeneo / Salsify
get_brand_voice(brand_id: string): BrandVoiceProfile
generate_copy(brief: CopyBrief): CopyVariant[]
generate_image_prompt(sku: string, scene: string): ImagePrompt
check_compliance(text: string, locale: string, category: string): ComplianceReport
submit_for_review(asset_id: string, reviewers: string[]): ReviewTicket
push_to_klaviyo(campaign: KlaviyoCampaign): CampaignId                      // Klaviyo MCP
push_to_meta_ads(creative: MetaCreative, adset_id: string): AdId
push_to_google_ads(asset: GoogleAsset, group_id: string): AdId

// ===== Supply Chain & Logistics =====
predict_eta(shipment_id: string): ETA
select_carrier(origin: Address, dest: Address, weight: number, sla: SLA): CarrierQuote[]
create_transfer_order(from_loc: string, to_loc: string, lines: Line[]): TransferOrder
file_carrier_claim(shipment_id: string, reason: string, amount: Money): ClaimId
notify_customer_delay(order_id: string, new_eta: ISODate, reason: string): void

// ===== Analytics & Insights =====
nl_to_sql(question: string, schema_scope: string[]): {sql: string, confidence: number}
run_query(sql: string, role: Role): QueryResult                             // Snowflake / BigQuery (RLS)
get_anomaly_alerts(scope: AnomalyScope): Anomaly[]
compute_attribution(window_days: number, model: AttrModel): AttributionTable
generate_chart(data: QueryResult, hint?: ChartHint): VegaSpec
save_dashboard(spec: DashboardSpec): DashboardId

// ===== Store Associate Copilot =====
check_stock_nearby(sku: string, store_id: string, radius_km: number): NearbyStock[]
reserve_item(sku: string, store_id: string, customer_id: string, hours: number): ReservationId
lookup_customer(query: string): CustomerSummary
create_save_the_sale_order(customer_id: string, lines: Line[], ship_to: Address): Order
get_clienteling_brief(customer_id: string): ClientelingCard
print_label(printer_id: string, payload: LabelPayload): void
get_task_list(associate_id: string, store_id: string): Task[]

// ===== Fraud & Risk =====
score_order_risk(order_id: string): RiskScore                               // Signifyd / Forter / Sift
get_return_abuse_history(customer_id: string): AbuseSignals
freeze_account(customer_id: string, reason: string): void
request_id_verification(customer_id: string): VerificationLink              // Persona / Stripe Identity

Total: 52 tools across 9 specialists + orchestrator.

7. Reference Orchestration Flows

These five flows are canonical. They are the patterns every QuanData retail deployment is benchmarked against.

7.1 Flow A — "Where's my order, and can I return one item?"

sequenceDiagram
    autonumber
    participant U as Customer
    participant O as Orchestrator
    participant CSA as Customer Service Agent
    participant OMS as OMS / Carrier
    participant POL as Policy KB
    participant AUD as Audit Log

    U->>O: "Where's my order, and can I return one item?"
    O->>O: classify_intent → {order_status, return}
    par Status lookup
        O->>CSA: order_status sub-task
        CSA->>OMS: get_order(order_id)
        CSA->>OMS: get_shipment_tracking(shipment_id)
        OMS-->>CSA: "Out for delivery, ETA 4pm"
    and Return setup
        O->>CSA: return sub-task
        CSA->>POL: lookup_policy("return_window")
        CSA->>OMS: create_return(order_id, line_2, "size_too_small")
        OMS-->>CSA: RMA-9981, refund $79
    end
    Note over CSA: Guardrail: $79 < $250 auto-threshold
    CSA-->>O: Combined result
    O->>AUD: log_audit_event(...)
    O-->>U: Tracking + RMA + QR label link

Latency budget: 3s p95. Both CSA sub-calls execute in parallel.

7.2 Flow B — Reorder a fast-moving SKU before stockout

sequenceDiagram
    autonumber
    participant AIA as Analytics Agent
    participant O as Orchestrator
    participant IRA as Inventory Agent
    participant ERP as ERP / WMS
    participant BUY as Buyer (Slack)
    participant SCLA as Supply Chain Agent

    Note over AIA: 06:00 scheduled anomaly scan
    AIA->>AIA: get_anomaly_alerts({type:"stockout_risk"})
    AIA-->>O: SKU-4421, days-of-cover=4, lead=14
    O->>IRA: replenish(SKU-4421)
    par Forecast & reorder math
        IRA->>IRA: forecast_demand(SKU-4421, 30)
        IRA->>IRA: compute_reorder_point(SKU-4421, 0.95)
    end
    IRA->>ERP: draft_purchase_order(SUP-12, [{SKU-4421, qty:1200}])
    Note over IRA: Guardrail: PO $48k > $25k → human
    IRA->>BUY: submit_po_for_approval (Slack card)
    BUY-->>IRA: ✅ Approve
    IRA->>ERP: submit PO
    IRA->>SCLA: register ASN callback for ETA visibility

7.3 Flow C — Flash promo on overstock category

sequenceDiagram
    autonumber
    participant M as Merchant
    participant O as Orchestrator
    participant AIA as Analytics
    participant PPA as Pricing Agent
    participant MCA as Marketing Content
    participant REV as Brand Lead

    M->>O: "We're long on summer outerwear. Move it."
    O->>AIA: identify overstock
    AIA->>AIA: run_query(category='outerwear-summer', WoS>20)
    AIA-->>O: 47 SKUs
    O->>PPA: design_promo(skus, target=60%_in_14d)
    loop top-10 SKUs
        PPA->>PPA: simulate_price_change @ {15%, 25%, 35%}
    end
    PPA->>PPA: propose_promotion → "SUMMER25", 14d
    Note over PPA: Guardrail: 3 SKUs margin<10% — flagged
    PPA-->>M: Proposal (3 flagged)
    M-->>PPA: ✅ Approve with exclusions
    PPA->>PPA: create_promo_code + schedule_promo
    O->>MCA: generate_campaign(promo, [email, meta, web])
    par Variant generation
        MCA->>MCA: get_product_attributes (batch)
        MCA->>MCA: get_brand_voice
        MCA->>MCA: generate_copy (5 variants/channel)
    end
    MCA->>MCA: check_compliance ✓
    MCA->>REV: submit_for_review (4h SLA)
    REV-->>MCA: ✅
    MCA->>MCA: push_to_klaviyo + push_to_meta_ads
    O->>AIA: register sell-through monitor

7.4 Flow D — Next week's lapsed-customer email

sequenceDiagram
    autonumber
    participant CRON as Calendar
    participant O as Orchestrator
    participant MCA as Marketing Content
    participant AIA as Analytics
    participant PSA as Personal Shopper
    participant REV as Marketing Lead
    participant KLAV as Klaviyo

    CRON->>O: weekly_lapsed_campaign
    O->>MCA: kick off
    MCA->>AIA: query_segment(last_purchase 90-180d, ltv≥6)
    AIA-->>MCA: 38,400 customers, by affinity
    par Segment picks (Fan-Out)
        MCA->>PSA: per_segment_picks(seg=A, k=6)
        MCA->>PSA: per_segment_picks(seg=B, k=6)
        MCA->>PSA: per_segment_picks(seg=C, k=6)
    end
    MCA->>MCA: generate_copy (3 subjects × 3 segments)
    MCA->>MCA: check_compliance (CAN-SPAM, GDPR)
    Note over MCA: brand_score=0.91, compliance=pass
    MCA->>REV: auto-approval candidate
    REV-->>MCA: ✅
    MCA->>KLAV: push_to_klaviyo (A/B/C, 10% holdout)
    O->>AIA: schedule 7d post-send eval

7.5 Flow E — Associate finds an item across nearby stores

sequenceDiagram
    autonumber
    participant A as Associate
    participant SAC as Store Associate Copilot
    participant WMS as Inventory
    participant CDP as Customer Profile
    participant OMS as OMS
    participant SCLA as Supply Chain
    participant O as Orchestrator (async audit)

    A->>SAC: scan barcode (sub-2s required)
    SAC->>WMS: check_stock(sku, this_store)
    WMS-->>SAC: 0
    SAC->>WMS: check_stock_nearby(sku, store, 25km)
    WMS-->>SAC: 3 stores (1, 2, 4 units)
    SAC->>CDP: lookup_customer(phone)
    CDP-->>SAC: customer + prefs
    SAC-->>A: options: reserve / SFS / STH
    A->>SAC: ship-from-store
    par
        SAC->>OMS: create_save_the_sale_order
    and
        SAC->>WMS: reserve_item (auto-release 4h)
    end
    SCLA-->>SCLA: async: carrier + label + customer notify
    SAC-->>O: post-hoc audit + clienteling note

8. State, Memory & The Dreaming Layer

8.1 The memory layer cake

flowchart TB
    subgraph EPH[Ephemeral Layer]
        CS[Conversation State<br/>Redis · 24h sliding]
        ST[Short-term Episodic<br/>last 20 turns]
    end
    subgraph WARM[Warm Layer]
        CP[Customer Profile<br/>Postgres + CDP]
        CAT[Catalog Embeddings<br/>pgvector / Turbopuffer]
        POL[Policy / KB Embeddings<br/>pgvector namespaced]
    end
    subgraph COLD[Cold / System-of-Record]
        OMS[(OMS · Orders · Returns)]
        WH[(Warehouse · Snowflake / BigQuery)]
        AUD[(Audit Log · 7yr retention)]
    end
    subgraph DREAM[Dreaming Layer]
        AM[Agent Memory<br/>Postgres + embeddings<br/>summaries · preferences · incidents]
        DSC[Dream Scheduler<br/>nightly background process]
    end

    CS --> ST
    ST --> CP
    CP --> CAT
    CAT --> POL
    POL --> OMS
    OMS --> WH
    WH --> AUD
    AUD -.feed.-> DSC
    DSC -->|distill| AM
    AM -->|inject context| CS

    style DREAM fill:#dc2626,stroke:#fff,color:#fff
    style EPH fill:#0b3d91,stroke:#fff,color:#fff
    style WARM fill:#1f6feb,stroke:#fff,color:#fff
    style COLD fill:#374151,stroke:#fff,color:#fff

8.2 The Dreaming process

Following Anthropic's Managed Agents Dreaming primitive (announced May 6, 2026): a scheduled background process that reviews past sessions, extracts patterns, identifies recurring mistakes, and curates each agent's memory stores.

stateDiagram-v2
    [*] --> Idle
    Idle --> Invoked: user message / trigger
    Invoked --> ToolCalling: classify + plan
    ToolCalling --> Reflecting: tool returns
    Reflecting --> ToolCalling: needs more
    Reflecting --> Outcome_Grading: draft ready
    Outcome_Grading --> ToolCalling: rubric fail
    Outcome_Grading --> Responding: rubric pass
    Responding --> [*]
    [*] --> Dreaming: nightly cron
    Dreaming --> Distilling: read audit + traces
    Distilling --> Updating_Memory: extract summaries / preferences / incidents
    Updating_Memory --> [*]

Harvey reported a ~6× completion-rate lift from enabling Dreaming on their legal agents — without a model change. The lift is purely from agents carrying institutional knowledge across sessions.

8.3 Canonical schemas

session_state {
  session_id PK, principal_id, principal_type,
  current_intent, pending_actions JSONB,
  agent_scratchpad JSONB, last_tool_calls JSONB[],
  updated_at
}

agent_memory {
  id PK, scope_type ENUM('customer','merchant','sku','store'),
  scope_id, kind ENUM('summary','preference','incident','note'),
  content TEXT, embedding VECTOR(1536),
  source_run_id, confidence FLOAT,
  created_at, expires_at NULL, version INT
}

audit_event {
  id PK, run_id, parent_run_id, principal_id,
  agent, tool, input_hash, output_hash,
  approval_id NULL, decision, latency_ms, cost_usd,
  pii_redacted BOOL, created_at
}

approval {
  id PK, action_type, payload JSONB, threshold_breached TEXT,
  requested_by_agent, approver_pool TEXT[],
  status ENUM('pending','approved','rejected','expired'),
  decided_by, decided_at, sla_minutes
}

8.4 Memory engineering principles

Summaries, not transcripts. Long-term memory stores distilled rows, never raw chat history.
PII redacted pre-write. A deterministic redactor runs before any memory row is persisted.
Versioned. Every memory row carries a version so contradictions can be resolved.
Source-traceable. Every memory row links to a source_run_id in the audit log.

9. Outcomes — The Rubric-Driven Quality Loop

Per Anthropic's Outcomes primitive (May 6, 2026): every output is graded against a defined rubric before reaching the customer. QuanData operationalizes this as a per-agent rubric set.

9.1 The grading loop

flowchart LR
    A[Agent generates output] --> B{Rubric grader<br/>self-evaluation}
    B -- pass --> C[Release to customer]
    B -- fail · iter < N --> A
    B -- fail · iter ≥ N --> H[Escalate to human]
    style B fill:#7c3aed,stroke:#fff,color:#fff
    style H fill:#dc2626,stroke:#fff,color:#fff

9.2 Per-agent rubric examples

Agent	Rubric Dimensions	Auto-Pass Threshold
CSA	Empathy score · policy correctness · resolution completeness · CSAT proxy	≥0.85 on all 4
PSA	Relevance · constraint adherence (budget, size) · diversity · in-stock filter	100% constraint + ≥0.80 relevance
MCA	Brand voice score · compliance pass · variant diversity · CTA presence	≥0.85 brand + 100% compliance
PPA	Margin floor respected · MAP respected · elasticity-confidence ≥ 0.7	100% guardrails
AIA	Hallucinated-column rate · semantic-equivalence · row-count sanity	0 hallucinated columns

9.3 The dual-track evaluation harness

flowchart TB
    subgraph OFF[Offline Track]
        GD[Golden Datasets<br/>per agent]
        REG[Nightly Regression]
        ALERT[2σ Drop → page]
    end
    subgraph ON[Online Track]
        AB[5% No-Agent Holdout]
        VAR[Variant A/B/C between agent versions]
        JUD[LLM-as-judge + 50/wk human spot-check]
    end
    subgraph RED[Red Team]
        INJ[Prompt-injection probes]
        EXF[PII exfiltration]
        JBR[Jailbreak — refund / price tools]
    end
    GD --> REG --> ALERT
    AB --> VAR --> JUD
    INJ --> ALERT
    EXF --> ALERT
    JBR --> ALERT

10. Guardrail Architecture — Defense In Depth

The most likely failure mode of any agentic retail system is not a model defect. It is a guardrail gap. QuanData architects guardrails as a layer cake — no single layer is sufficient.

10.1 The guardrail layer cake

flowchart TB
    L1[Layer 1 — Input Sanitization<br/>PII tokenization · prompt-injection classifier · channel auth]
    L2[Layer 2 — Intent Classification + Routing<br/>only registered intents reach specialists]
    L3[Layer 3 — Tool Allow-Lists<br/>each agent restricted to its tool set]
    L4[Layer 4 — Tool-Side Guardrails<br/>refund cap · MAP enforce · margin floor · MAP write block]
    L5[Layer 5 — Human-In-Loop Approvals<br/>refunds >$250 · POs >$25k · promos >30% · price Δ ±10%]
    L6[Layer 6 — Outcome Rubric Grader<br/>self-evaluation before release]
    L7[Layer 7 — Audit + Reversibility<br/>every mutation has audit_id + compensation path]
    L8[Layer 8 — Kill Switches<br/>per-agent flag + global pause]

    L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> L8

    style L1 fill:#0b3d91,stroke:#fff,color:#fff
    style L2 fill:#1f6feb,stroke:#fff,color:#fff
    style L3 fill:#1f6feb,stroke:#fff,color:#fff
    style L4 fill:#16a34a,stroke:#fff,color:#fff
    style L5 fill:#16a34a,stroke:#fff,color:#fff
    style L6 fill:#7c3aed,stroke:#fff,color:#fff
    style L7 fill:#7c3aed,stroke:#fff,color:#fff
    style L8 fill:#dc2626,stroke:#fff,color:#fff

10.2 The retail-specific guardrail register

Guardrail	Threshold / Rule	Mechanism
Refund auto-approve	≤ $250 AND LTV decile ≥ 3 AND no abuse	CSA tool wrapper
Price change	Δ > ±10% OR < MAP OR margin < 8% → human	PPA `update_price` pre-write
Promo depth	> 30% → category mgr; > 50% → director	PPA proposal stage
Inventory write	Agents cannot mutate on-hand directly; only transfer/PO/reservation	Tool layer enforces
Refund total per session	Cumulative > $1,000 → block	Orchestrator counter
PII handling	PAN, CVV, full SSN never reach LLM; tokenize at ingest	Pre-LLM redactor
Brand voice	Score < 0.85 → forced review	MCA `check_compliance`
Claims compliance	No "cures"/"clinically proven" unless category-approved	MCA per-category rules
Locale & regulation	EU: GDPR consent; CA: CCPA; alcohol/tobacco: age gate	Profile flags checked
Audit trail	Every tool call logged; mutations require `audit_id`	Audit middleware
Rate limits	Per-customer 60 turns/hr; per-merchant 1k tool-calls/min; per-agent circuit breaker @ 5% errors/60s	Gateway
Reversibility	All mutating actions must have `undo_*` or compensation	Tool registry metadata

10.3 Prompt injection — the OWASP LLM #1

Retail has the highest published prompt-injection vulnerability rate at 40% (2026 data). QuanData's defense is layered:

flowchart LR
    IN[Untrusted Input<br/>customer msg · review · scraped page · return reason] --> PG[PromptGuard 2<br/>~50ms classifier]
    PG -- benign --> LG[LlamaGuard 3 — 8B<br/>hazard classification]
    PG -- suspected --> Q[Quarantine + Manual Review]
    LG -- safe --> NEM[NeMo Guardrails<br/>orchestration]
    LG -- hazard --> Q
    NEM --> A[Trusted Agent Loop]
    style Q fill:#dc2626,stroke:#fff,color:#fff
    style PG fill:#7c3aed,stroke:#fff,color:#fff
    style LG fill:#7c3aed,stroke:#fff,color:#fff

Published benchmarks: layered defense reduces attack success from ~73% to ~9%. Even so, ~35% of adversarial evals still leak — therefore human-in-loop is required on every high-blast-radius action.

11. Integration & Infrastructure Stack

11.1 The reference stack — opinionated

flowchart TB
    subgraph FE[Front End]
        NX[Next.js · Vercel]
        EXP[Mobile · Expo]
    end
    subgraph AR[Agent Runtime]
        MAS[Mastra TS<br/>chat-heavy]
        LG[LangGraph Python<br/>stateful + durable]
        TEMP[Temporal<br/>durable writes]
        SDK[Anthropic Agent SDK<br/>Claude-first builds]
    end
    subgraph MO[Models]
        HK[Claude Haiku 4.5<br/>routing]
        SO[Claude Sonnet 4.6<br/>main loop]
        OP[Claude Opus 4.7<br/>escalation]
        FB[GPT-5.5 / Gemini 2.x<br/>diversity / fallback]
    end
    subgraph TL_MCP[Tool Layer — MCP-Native]
        SH[Shopify MCP — official]
        ST[Stripe MCP — official]
        KL[Klaviyo MCP — official]
        AL[Algolia MCP — official]
        AD[Adobe Commerce MCP — official]
        SF[Salesforce B2C Commerce MCP]
    end
    subgraph TL_WRAP[Tool Layer — Wrapped]
        SAP[SAP S/4HANA via OData wrapper]
        NS[NetSuite via SuiteTalk wrapper]
        POS[Square · Lightspeed via REST wrapper]
    end
    subgraph DATA[Data]
        PG[(Postgres + pgvector)]
        TP[(Turbopuffer<br/>large catalogs)]
        SN[(Snowflake / BigQuery)]
        RU[RudderStack CDP]
        HT[Hightouch Reverse-ETL]
    end
    subgraph OBS[Observability]
        LF[Langfuse self-hosted]
        BT[Braintrust evals]
        PF[Promptfoo CI]
    end
    subgraph SEC[Security]
        PA[PromptArmor + PromptGuard 2]
        TR[Transcend Agentic Assist]
        SI[Signifyd / Riskified]
    end

    FE --> AR
    AR --> MO
    AR --> TL_MCP
    AR --> TL_WRAP
    TL_MCP --> DATA
    TL_WRAP --> DATA
    AR --> OBS
    FE --> SEC
    AR --> SEC

    style AR fill:#7c3aed,stroke:#fff,color:#fff
    style MO fill:#0b3d91,stroke:#fff,color:#fff
    style SEC fill:#dc2626,stroke:#fff,color:#fff

11.2 The MCP landscape — production-ready in May 2026

Server	Status	Notes
Shopify Dev / Storefront / Admin MCP	Production · MIT	Live in every Hydrogen 2026.1.4 + Oxygen store by default
Stripe MCP	Production	Treasury + agentic-commerce tools
Klaviyo MCP	Production	Co-built with Anthropic
Algolia MCP	Production	Integrates with Algolia Agent Studio
Adobe Commerce MCP	Production	Default agent protocol post-Summit 2026
Salesforce B2C Commerce MCP	GA / pilot	Fully hosted
Transcend Agentic Assist + MCP	Production	DSAR / privacy
BigCommerce · Commercetools · Medusa · Shopware	Community only — IMMATURE	Wrap yourself
SAP S/4HANA · NetSuite · Dynamics 365 · Oracle Retail	No official — IMMATURE	Wrap via iPaaS
POS (Square · Toast · Lightspeed)	No official — IMMATURE	Wrap REST

11.3 Vector database selection

Scale	Recommendation	Rationale
≤ 1M SKUs	pgvector	Sane default, no new infra, ~$0 incremental
1M–10M SKUs	Turbopuffer	Under $10/mo at this scale; sub-10ms warm; long-tail-friendly
10M+ SKUs · hard SLA	Pinecone	$99–199/mo at this scale; always-warm <50ms p95

11.4 Hosting matrix

Workload	Platform	Rationale
Storefront / UI	Vercel	Best DX, edge-optimized
Chat agent (TS, stateful)	Fly.io	Persistent VMs, WebSocket-friendly
Heavy agent loops (Python)	Modal	Unlimited sandbox, GPU, 0–50k concurrent
Durable writes (POs, refunds, fulfillment)	Temporal or AWS Step Functions	Crash-proof, replayable, exactly-once semantics
Enterprise / regulated	AWS Step Functions + Bedrock	Compliance posture

12. Cost Economics

12.1 Per-conversation unit economics (May 2026 pricing)

Configuration	Input Tokens / Conv	Output Tokens / Conv	Cost / Conv
No caching · Sonnet 4.6 only	~53.6k	~4.8k	~$0.23
With prompt caching · 90% hit rate	~53.6k (mostly cached)	~4.8k	~$0.07–0.10
Opus 4.7 escalation @ 5%	+baseline	+baseline	+~$0.04
Haiku 4.5 routing only	~5k	~0.2k	~$0.006

12.2 Annual run-rate — 1,000 conversations / day

pie title Annual Cost Allocation — 1,000 conv/day with caching
    "LLM tokens (Sonnet + Haiku + 5% Opus)" : 30000
    "Vector DB (Turbopuffer 1M SKU)" : 1200
    "Observability (Langfuse self-host infra)" : 2400
    "Audit log infra (Postgres + S3)" : 1800
    "Eval & red-team compute" : 3600
    "Embedding refresh (catalog deltas)" : 600

Total realistic all-in: ~$3–5k / month for LLM and ~$1–3k / month supporting infra at the 1k conv/day tier. Caching is the single largest lever — without it, costs ~3× higher.

12.3 Cost-protection guardrails

Cap	Threshold
Per-session token cap	200k input / 50k output
Per-session cost cap	$1.00 (hard kill)
Opus 4.7 escalation rate	< 10% of conversations
Warehouse query cost cap (AIA)	$0.50 per query auto, $5 with merchant approval
Subagent fan-out cap	20 parallel workers (matches Anthropic Managed Agents limit)

13. The 12-Month Build Phasing

QuanData & AI builds in three deliberate phases. Sequencing is non-negotiable.

gantt
    title QuanData Retail Orchestration — 12-Month Build
    dateFormat YYYY-MM-DD
    axisFormat %b %y
    section Foundation (Days 0-90)
    Data plumbing audit            :a1, 2026-06-01, 14d
    Orchestrator + audit + approval scaffold :a2, after a1, 21d
    Tool registry + envelope contract :a3, after a1, 21d
    Guardrail middleware           :a4, after a2, 14d
    CSA (returns + order status only) :a5, after a4, 28d
    PSA (chat + on-site reco)      :a6, after a5, 28d
    Eval harness v1                :a7, after a4, 60d
    section Merchant + Internal (Mo. 4-6)
    AIA — conversational BI         :b1, 2026-09-01, 45d
    IRA — recommend-only            :b2, after b1, 30d
    MCA — descriptions + subjects   :b3, 2026-09-15, 60d
    SAC pilot — 5-10 stores         :b4, 2026-10-01, 60d
    Red-team v1 + audit dashboards  :b5, 2026-09-01, 90d
    section Automation + Hard Problems (Mo. 7-12)
    PPA — live markdowns w/ approval :c1, 2026-12-01, 60d
    SCLA — ETA + carrier opt        :c2, 2026-12-01, 60d
    MCA — full campaign mode        :c3, 2027-01-01, 60d
    FRA — fraud + abuse             :c4, 2027-01-15, 75d
    SAC — full rollout              :c5, 2027-02-01, 90d
    IRA — auto-PO under threshold   :c6, 2027-03-01, 60d
    Dreaming layer — production     :c7, 2027-01-01, 90d

13.1 Why this sequencing

flowchart LR
    P1[Phase 1<br/>Foundation] -->|Lowest blast radius<br/>+ Fastest ROI| P2[Phase 2<br/>Merchant + Internal]
    P2 -->|Trust earned<br/>+ Audit muscle built| P3[Phase 3<br/>Automation + Hard]
    P1 -->|CSA: ticket-cost reduction<br/>PSA: proven LLM win| ROI1[Customer-Visible ROI<br/>by Day 90]
    P2 -->|AIA: merchants love it<br/>IRA: recommend-only<br/>MCA: low-stakes content| ROI2[Operator Trust<br/>by Month 6]
    P3 -->|PPA · SCLA · MCA full · FRA<br/>only after audit + eval mature| ROI3[Full Autonomy w/ Gates<br/>by Month 12]
    style P1 fill:#0b3d91,stroke:#fff,color:#fff
    style P2 fill:#1f6feb,stroke:#fff,color:#fff
    style P3 fill:#7c3aed,stroke:#fff,color:#fff

13.2 The five sequencing principles

Read before write. Every agent ships read-only first.
Approval before autonomy. Human-in-loop default; thresholds widen with proven precision.
Internal before external. Merchant-facing failures are cheaper than customer-facing failures.
One source of truth per domain. Agents never mutate OMS / ERP / WMS directly — always via the system's native API with audit_id embedded.
Kill switches always. Per-agent feature flag + global "agent pause" button on every deployment.

14. Failure Mode Taxonomy

Based on a corpus of 591 documented production multi-agent incidents (2023–2026). 40% of multi-agent pilots fail within six months of production. QuanData designs against each.

14.1 Failure mode distribution

pie title Production Multi-Agent Failure Modes
    "Context Blindness (truncated/missing info)" : 31.6
    "Rogue Actions (wrong tool/arg)" : 30.3
    "Silent Degradation (looks right, isn't)" : 24.9
    "Memory Corruption" : 8.1
    "Runaway Execution (loops, cost blowouts)" : 5.1

14.2 QuanData countermeasure register

Mode	Share	QuanData Countermeasure
Context blindness	31.6%	Selective retrieval · summarization at handoff · per-turn context budgets · Anthropic's compression-at-subagent pattern
Rogue actions	30.3%	Typed tool schemas · dry-run mode · pre-write guardrails · `audit_id` requirement
Silent degradation	24.9%	Outcome rubric grader · LLM-as-judge + 50/wk human spot-check · 5% no-agent holdout
Memory corruption	8.1%	Append-only audit · Dreaming summaries (not transcripts) · separate read/write agents
Runaway execution	5.1%	100-step cap · $1/session cost cap · 60s circuit-breaker on 5% error rate · 20-worker fan-out cap

14.3 The seven mistakes the X-post framework explicitly warns against

Making every agent too general. QuanData rule: every agent has one job; narrow is powerful.
Not standardizing output formats. QuanData rule: the structured {ok, data, error, audit_id} envelope is canonical and enforced.
Running too many agents in parallel too early. QuanData rule: phased rollout — two agents → grade → add.
No error handling between agents. QuanData rule: every tool call returns a fail-safe envelope with documented compensation.
Ignoring token costs. QuanData rule: hard caps per session; prompt caching mandatory.
Letting deflection rates obscure CSAT decay. QuanData rule: every deflection KPI paired with CSAT/quality KPI (Klarna doctrine).
Surrendering the funnel to third-party agent runtimes. QuanData rule: QuanData clients own the MCP layer; external agents transact through it (Walmart doctrine).

15. Appendix A — Decision Matrices

15.1 Pattern selector

flowchart TD
    Start([New task to orchestrate])
    Q1{Mutates inventory /<br/>payment / fulfillment?}
    Q2{Strict step ordering?}
    Q3{Same op on many items?}
    Q4{Multiple distinct domains?}
    Out_Single[Single Threaded<br/>+ Temporal]
    Out_Pipe[Pipeline]
    Out_Fan[Fan-Out]
    Out_Spec[Specialist Team]
    Out_Fn[Function call<br/>— not an agent]

    Start --> Q1
    Q1 -- Yes --> Out_Single
    Q1 -- No --> Q2
    Q2 -- Yes · 1 domain --> Out_Pipe
    Q2 -- Yes · N domains --> Out_Spec
    Q2 -- No --> Q3
    Q3 -- Yes --> Out_Fan
    Q3 -- No --> Q4
    Q4 -- Yes --> Out_Spec
    Q4 -- No --> Out_Fn

    style Out_Single fill:#dc2626,stroke:#fff,color:#fff
    style Out_Pipe fill:#0b3d91,stroke:#fff,color:#fff
    style Out_Fan fill:#7c3aed,stroke:#fff,color:#fff
    style Out_Spec fill:#16a34a,stroke:#fff,color:#fff
    style Out_Fn fill:#6b7280,stroke:#fff,color:#fff

15.2 Framework selector

If…	Then
TS shop, chat-heavy UI	Mastra (or Vercel AI SDK if pure-chat)
Python shop, complex stateful flows	LangGraph + Temporal underneath
All-Claude builds	Anthropic Agent SDK directly
Need typed handoffs + voice	OpenAI Agents SDK
Money / inventory / fulfillment writes	Temporal under any framework

15.3 Model selector

Task Type	Model
Intent classification / routing	Claude Haiku 4.5
Standard tool-use loop	Claude Sonnet 4.6
Complex multi-step reasoning (disputes, fraud, escalation)	Claude Opus 4.7
Eval diversity / fallback	GPT-5.5 / Gemini 2.x
Image-heavy product workloads	Gemini 2.x

15.4 Vector DB selector

Catalog Size	Choice
≤ 1M SKUs	pgvector
1M–10M SKUs	Turbopuffer
10M+ SKUs · hard SLA	Pinecone

16. Appendix B — Glossary

Term	Definition
Agent	An LLM-driven process with a defined role, tools, and outputs
Orchestrator	The top-level agent that classifies intent, routes, audits, and aggregates
Specialist	A narrow agent owning a single domain (e.g., pricing, returns)
MCP (Model Context Protocol)	Anthropic-led open protocol for exposing tools to LLM agents
UCP (Universal Commerce Protocol)	Google-led protocol for discovery-through-purchase by agents
ACP (Agentic Commerce Protocol)	Stripe/OpenAI protocol for payment leg of agent transactions
Pipeline	Sequential agent pattern
Fan-Out	Commander + parallel-worker pattern
Specialist Team	Multiple specialists collaborating on a single deliverable
Dreaming	Scheduled background process that distills past sessions into memory
Outcomes	Rubric-based self-evaluation primitive
Audit ID	Unique identifier embedded in every mutating tool call for traceability
Compensation path	Documented reversal procedure for any mutating action
Dual-track eval	Offline regression + online A/B + 5% holdout, run simultaneously
Layer-cake guardrails	Eight independent enforcement layers from input sanitization to kill switch

17. Appendix C — Source Index

17.1 Foundational documents

Anthropic — Multi-Agent Research System (Jun 2025): https://www.anthropic.com/engineering/multi-agent-research-system
Anthropic — Code with Claude, Managed Agents announcement (May 6, 2026)
Cognition — Don't Build Multi-Agents (Jun 2025): https://cognition.ai/blog/dont-build-multi-agents
Cognition — Multi-Agents: What's Actually Working (2026): https://cognition.ai/blog/multi-agents-working
Shopify Engineering — Building Production-Ready Agentic Systems: https://shopify.engineering/building-production-ready-agentic-systems
Sierra — Constellation architecture: https://sierra.ai/about

Closing Note

This blueprint is QuanData & AI's commitment to building the agentic retail systems that the next decade of commerce will run on. It is anchored in what is shipping at Walmart, Amazon, Shopify, Mercado Libre, Sierra, and Decagon today — not what is being demoed at conferences.

Every pattern in this document is buildable as written. Every tool signature maps to a real API. Every guardrail threshold is concrete. Every phase reflects what a mid-size to enterprise retailer can actually staff.

The retailers that build this category in 2026 will pull away from the retailers that do not. QuanData & AI exists to make sure our clients are on the right side of that gap.

— Office of the Chief Architect QuanData & AI — May 2026

The Blueprint of QuanData & AI's Agentic Orchestration for the Retail Industry

The Blueprint of QuanData & AI's Agentic Orchestration for the Retail Industry

Cover Brief

Table of Contents

1. Executive Synthesis

2. Strategic Context — The Retail Agentic Landscape

2.1 The architectural verdict from the market

2.2 The competitive map

2.3 The vendor stratification

2.4 The three lessons QuanData & AI has internalized

2.5 Regulatory perimeter

3. Theoretical Foundations — The Three Orchestration Patterns

3.1 Pattern A — The Pipeline

3.2 Pattern B — The Fan-Out (Commander / Workers)

3.3 Pattern C — The Specialist Team

3.4 The pattern-selection decision tree

3.5 The Cognition Theorem — "Multi-agent reads, single-threaded writes"

4. The QuanData Agent Topology

4.1 System-level topology

4.2 The control plane vs the data plane

5. The Specialist Agent Roster

5.1 The roster at a glance

5.2 The capability matrix

5.3 Per-agent specifications (canonical definitions)

5.3.1 Orchestrator / Router

5.3.2 Customer Service Agent (CSA)

5.3.3 Personal Shopper Agent (PSA)

5.3.4 Inventory & Replenishment Agent (IRA)

5.3.5 Pricing & Promotions Agent (PPA)

5.3.6 Marketing Content Agent (MCA)

5.3.7 Supply Chain & Logistics Agent (SCLA)

5.3.8 Analytics & Insights Agent (AIA)

5.3.9 Store Associate Copilot (SAC)

5.3.10 Fraud & Risk Agent (FRA) — optional 10th

6. The Tool Catalog

7. Reference Orchestration Flows

7.1 Flow A — "Where's my order, and can I return one item?"

7.2 Flow B — Reorder a fast-moving SKU before stockout

7.3 Flow C — Flash promo on overstock category

7.4 Flow D — Next week's lapsed-customer email

7.5 Flow E — Associate finds an item across nearby stores

8. State, Memory & The Dreaming Layer

8.1 The memory layer cake

8.2 The Dreaming process

8.3 Canonical schemas

8.4 Memory engineering principles

9. Outcomes — The Rubric-Driven Quality Loop

9.1 The grading loop

9.2 Per-agent rubric examples

9.3 The dual-track evaluation harness

10. Guardrail Architecture — Defense In Depth

10.1 The guardrail layer cake

10.2 The retail-specific guardrail register

10.3 Prompt injection — the OWASP LLM #1

11. Integration & Infrastructure Stack

11.1 The reference stack — opinionated

11.2 The MCP landscape — production-ready in May 2026

11.3 Vector database selection

11.4 Hosting matrix

12. Cost Economics

12.1 Per-conversation unit economics (May 2026 pricing)

12.2 Annual run-rate — 1,000 conversations / day

12.3 Cost-protection guardrails

13. The 12-Month Build Phasing

13.1 Why this sequencing

13.2 The five sequencing principles

14. Failure Mode Taxonomy

14.1 Failure mode distribution

14.2 QuanData countermeasure register

14.3 The seven mistakes the X-post framework explicitly warns against

15. Appendix A — Decision Matrices

15.1 Pattern selector

15.2 Framework selector

15.3 Model selector

15.4 Vector DB selector

16. Appendix B — Glossary

17. Appendix C — Source Index

17.1 Foundational documents

17.2 Retail deployment evidence

17.3 Integration & infrastructure