All articles
ENQuanData & AI — Office of the Chief Architect

The Blueprint of QuanData & AI's Agentic Orchestration for the Retail Industry

The Blueprint of QuanData & AI's Agentic Orchestration for the Retail Industry

Classification: Internal — Strategic & Technical Document Type: Reference Architecture & Implementation Doctrine Version: 1.0 — May 2026 Owners: QuanData & AI — Office of the Chief Architect Status: Authoritative; supersedes all prior agentic-retail technical notes


Cover Brief

This document defines, with executive clarity and engineering precision, how QuanData & AI will architect, deploy, and operate multi-agent orchestration systems for the retail industry through 2026 and beyond.

It is not a survey. It is a doctrine. Every pattern, every agent, every guardrail, every line of the build phasing has been selected because it is what the production-grade leaders of the agentic era — Anthropic, Walmart, Amazon, Shopify, Sierra, Decagon, Mercado Libre — are demonstrably running in market today.

The thesis is simple and the evidence is overwhelming:

A single agent is a single employee. A coordinated team of specialized agents is a new operational category. The retailers building this category in 2026 are pulling away from the retailers that are not.


Table of Contents

  1. Executive Synthesis
  2. Strategic Context — The Retail Agentic Landscape
  3. Theoretical Foundations — The Three Orchestration Patterns
  4. The QuanData Agent Topology
  5. The Specialist Agent Roster
  6. The Tool Catalog
  7. Reference Orchestration Flows
  8. State, Memory & The Dreaming Layer
  9. Outcomes — The Rubric-Driven Quality Loop
  10. Guardrail Architecture — Defense In Depth
  11. Integration & Infrastructure Stack
  12. Cost Economics
  13. The 12-Month Build Phasing
  14. Failure Mode Taxonomy
  15. Appendix A — Decision Matrices
  16. Appendix B — Glossary
  17. Appendix C — Source Index

1. Executive Synthesis

On May 6, 2026, at the Code with Claude event, Anthropic formalized multi-agent orchestration for Claude Managed Agents — up to 20 specialized agents running in parallel on a single task. This was not a research demo. It was the productization of an architecture Netflix, Harvey, Shopify, and Mercado Libre were already running at scale.

Across the retail industry, the verified outcomes are no longer ambiguous:

Retailer Agent Verified Outcome
Amazon Rufus ~$12B incremental sales (2025); 300M+ users; +60% purchase likelihood
Walmart Sparky +35% AOV for engaged users; ~50% of app users engaged
Klarna OpenAI Assistant 67% automation; $40M profit lift (2024) — with a 2025 partial walk-back to human-hybrid
Mercado Libre Pago Assistant ~90% query containment without human handoff
Lowe's Mylow Companion Deployed to >1,700 stores for associate copilot
WeightWatchers (Sierra) CS Agent 70% containment week-1; >4.5/5 CSAT

QuanData & AI's mandate is to deliver this category of outcome to our retail clients — through an architecture that is:

  • Specialized — narrow agents, not generalist bots
  • Parallel — fan-out where the workload allows
  • Audited — every action traceable to a principal, a tool, and a decision
  • Reversible — every mutation has a documented compensation path
  • Governed — human-in-loop gates on every blast-radius action
  • Self-improving — Dreaming layers that compound institutional knowledge across sessions
  • Outcome-graded — rubric-driven self-evaluation before any output reaches a customer

The blueprint that follows is the canonical implementation of that mandate.


2. Strategic Context — The Retail Agentic Landscape

2.1 The architectural verdict from the market

Two architectural facts now dominate the retail agentic landscape:

  1. Consolidation beats sprawl. Walmart publicly admitted in mid-2025 that fragmented point-bots did not scale and consolidated to four "super agents" (Sparky for customer, Marty for sellers/advertisers/suppliers, an associate agent, and a developer agent) on a unified ML platform (Element) with MCP as the agent-to-agent protocol.

  2. Retailers must own the MCP layer. Shopify exposes three official MCP servers (Storefront, Customer Accounts, Dev), Adobe declared MCP the default agent protocol at Summit 2026, and Walmart's deliberately walled-off Sparky-in-Gemini integration showed why: when Walmart briefly piloted OpenAI's ChatGPT Instant Checkout, it converted 3× worse than click-out to walmart.com.

2.2 The competitive map

quadrantChart
    title Retail Agent Maturity — May 2026
    x-axis "Narrow Use Case" --> "Full Platform Strategy"
    y-axis "Pilot / Internal" --> "Verified Production Outcomes"
    quadrant-1 "Production Platform"
    quadrant-2 "Production Point-Solution"
    quadrant-3 "Pilot Point-Solution"
    quadrant-4 "Pilot Platform"
    Amazon Rufus: [0.85, 0.95]
    Walmart Sparky+Element: [0.92, 0.88]
    Shopify Sidekick/Magic: [0.88, 0.82]
    Mercado Libre Pago: [0.45, 0.90]
    Klarna OpenAI: [0.30, 0.55]
    Lowe's Mylow: [0.40, 0.78]
    Home Depot Magic Apron: [0.35, 0.65]
    Target Store Companion: [0.25, 0.55]
    Sephora Virtual Artist: [0.30, 0.60]
    Best Buy Agentic: [0.25, 0.45]
    Costco ML Forecast: [0.30, 0.85]
    IKEA Billie: [0.20, 0.35]

2.3 The vendor stratification

Tier Vendors Posture
Tier 1 — Platform Owners Anthropic, OpenAI, Google, Microsoft Define the substrate (LLMs, MCP, ACP, UCP)
Tier 1 — Hyperscale Retailers Amazon, Walmart, Shopify, Alibaba Build proprietary super-agents on top of substrate
Tier 2 — Vertical Agent Vendors Sierra, Decagon, Ada, Salesforce Agentforce, Microsoft Copilot for Retail Premium / white-glove agent platforms
Tier 2 — Commerce-Native Vendors Mirakl, Klaviyo, Algolia, Constructor, Lily AI Expose retail primitives as MCP-callable tools
Tier 3 — Adjacent Infrastructure Riskified, Signifyd, Transcend, OneTrust Fraud, privacy, governance — increasingly agent-aware

2.4 The three lessons QuanData & AI has internalized

flowchart LR
    A[Klarna 2025 Walk-Back] --> L1[Lesson 1<br/>Deflection without CSAT is a trap]
    B[Walmart Checkout 3× Drop] --> L2[Lesson 2<br/>Never surrender the funnel to a 3rd-party agent runtime]
    C[Walmart Element Consolidation] --> L3[Lesson 3<br/>Orchestrate from day one — point-bots do not scale]
    L1 --> R[QuanData Doctrine]
    L2 --> R
    L3 --> R
    R -->|enforces| D1[Every metric pairs with CSAT/quality]
    R -->|enforces| D2[QuanData owns the MCP layer]
    R -->|enforces| D3[Orchestrator + Specialist topology]
    style R fill:#0b3d91,stroke:#fff,color:#fff
    style D1 fill:#1f6feb,stroke:#fff,color:#fff
    style D2 fill:#1f6feb,stroke:#fff,color:#fff
    style D3 fill:#1f6feb,stroke:#fff,color:#fff

2.5 Regulatory perimeter

Regulation Surface QuanData Posture
EU AI Act Discovery & personalization = minimal-risk; behavioral manipulation = prohibited Build minimal-risk baseline; require Article-aligned documentation for any inferred-attribute pricing
EU DG COMP Active algorithmic pricing inquiries (July 2025); OECD report October 2025 Surveillance-pricing prohibited by default; only category-level + cohort-level pricing permitted
FTC Surveillance pricing scrutiny; deceptive-practices doctrine Never price-discriminate by inferred income/race/health
GDPR / CCPA Profile-based personalization requires lawful basis Transcend Agentic Assist for DSARs; consent ledger consulted before PSA/MCA invocation
PCI-DSS Card data scope explosion via agent-initiated payment LLMs never see PAN/CVV; Stripe tokenization at the edge; agent only handles pm_xxx tokens
Agent-vs-Agent Fraud Bot-driven returns, promo abuse, reseller arbitrage Riskified + HUMAN-pattern defenses in production by Month 9

3. Theoretical Foundations — The Three Orchestration Patterns

Three orchestration patterns have emerged as the production-validated set across Anthropic's Research system, Walmart's Element, Shopify's agentic platform, Harvey's legal stack, and Sierra's customer constellation. QuanData & AI uses all three, deployed by use-case fit, never as defaults.

3.1 Pattern A — The Pipeline

Agents run sequentially. Each step takes the prior step's structured output as input. Used when later steps strictly depend on earlier ones.

flowchart LR
    U[User / Trigger] --> A1[Agent 1<br/>Research]
    A1 -->|structured output| A2[Agent 2<br/>Analysis]
    A2 -->|structured output| A3[Agent 3<br/>Writing]
    A3 -->|structured output| A4[Agent 4<br/>Review]
    A4 --> O[Final Output]
    style A1 fill:#0b3d91,stroke:#fff,color:#fff
    style A2 fill:#0b3d91,stroke:#fff,color:#fff
    style A3 fill:#0b3d91,stroke:#fff,color:#fff
    style A4 fill:#0b3d91,stroke:#fff,color:#fff

Retail applications: order-status inquiry → policy lookup → reply composition; product-launch brief → copy → compliance → publish.

3.2 Pattern B — The Fan-Out (Commander / Workers)

A commander agent decomposes a task and dispatches subtasks to parallel workers, each in an isolated context window. Results are aggregated by the commander. This is the pattern Netflix uses for parallel build-log analysis and Anthropic uses in its Research feature.

flowchart TB
    C[Commander Agent<br/>decomposes + synthesizes] --> W1[Worker 1<br/>document A]
    C --> W2[Worker 2<br/>document B]
    C --> W3[Worker 3<br/>document C]
    C --> W4[Worker 4<br/>document D]
    C --> W5[Worker 5<br/>document E]
    W1 --> S[Synthesis Layer]
    W2 --> S
    W3 --> S
    W4 --> S
    W5 --> S
    S --> O[Aggregated Output]
    style C fill:#0b3d91,stroke:#fff,color:#fff
    style S fill:#7c3aed,stroke:#fff,color:#fff

Retail applications: parallel competitor-price scraping across N retailers; parallel SKU-level demand forecasting for an entire category; parallel campaign-copy generation across 12 locales.

Critical engineering constraint (Anthropic, Multi-Agent Research System, June 2025): token usage explains ~80% of performance variance. The real value of fan-out is context compression, not just speed. Each worker compresses its findings before returning to the commander.

3.3 Pattern C — The Specialist Team

Multiple agents with distinct specializations collaborate on a single complex deliverable. Each owns a domain. The orchestrator routes, the specialists execute, results are merged. This is the pattern Harvey uses for legal work and Sierra deploys with its "constellation of 15+ models."

flowchart TB
    O[Orchestrator] --> CS[Customer Service Agent]
    O --> PS[Personal Shopper Agent]
    O --> IR[Inventory Agent]
    O --> PP[Pricing Agent]
    O --> MC[Marketing Content Agent]
    O --> SC[Supply Chain Agent]
    O --> AI[Analytics Agent]
    O --> SA[Store Associate Copilot]
    O --> FR[Fraud + Risk Agent]
    CS -.shared state.-> M[(Conversation /<br/>Memory Layer)]
    PS -.shared state.-> M
    IR -.shared state.-> M
    PP -.shared state.-> M
    MC -.shared state.-> M
    SC -.shared state.-> M
    AI -.shared state.-> M
    SA -.shared state.-> M
    FR -.shared state.-> M
    style O fill:#7c3aed,stroke:#fff,color:#fff
    style M fill:#0b3d91,stroke:#fff,color:#fff

Retail applications: the QuanData reference roster is built on this pattern (see §4–§5).

3.4 The pattern-selection decision tree

flowchart TD
    Q1{Does the task have<br/>strict step ordering?}
    Q1 -- Yes --> Q2{Is it a single domain?}
    Q1 -- No --> Q3{Same operation on<br/>many items?}
    Q2 -- Yes --> P_Pipe[Pipeline<br/>e.g. order-status → policy → reply]
    Q2 -- No --> P_Spec1[Specialist Team<br/>w/ ordered handoffs]
    Q3 -- Yes --> P_Fan[Fan-Out<br/>e.g. SKU-level forecasts]
    Q3 -- No --> Q4{Multiple distinct<br/>domains needed?}
    Q4 -- Yes --> P_Spec2[Specialist Team]
    Q4 -- No --> P_Single[Single Agent<br/>or function call]
    style P_Pipe fill:#0b3d91,stroke:#fff,color:#fff
    style P_Fan fill:#7c3aed,stroke:#fff,color:#fff
    style P_Spec1 fill:#16a34a,stroke:#fff,color:#fff
    style P_Spec2 fill:#16a34a,stroke:#fff,color:#fff
    style P_Single fill:#6b7280,stroke:#fff,color:#fff

3.5 The Cognition Theorem — "Multi-agent reads, single-threaded writes"

The most important production lesson of the last twelve months is from Cognition's Don't Build Multi-Agents (June 2025) and the 2026 follow-up Multi-Agents: What's Actually Working:

Parallel multi-agent execution is correct for reads and intelligence work. Single-threaded execution is correct for writes.

QuanData applies this as an inviolable rule: any agent that mutates inventory, payment, fulfillment, or pricing executes in a single-threaded, durable workflow (Temporal / Step Functions). Read paths fan out; write paths serialize.


4. The QuanData Agent Topology

4.1 System-level topology

flowchart TB
    subgraph CH[Channel Layer]
        WEB[Web / Storefront]
        APP[Mobile App]
        POS[POS / In-Store]
        SLK[Slack / Internal]
        EML[Email / SMS]
    end

    subgraph OL[Orchestration Layer]
        GW[API Gateway<br/>auth + rate-limit + redaction]
        OR[Orchestrator Agent<br/>intent · routing · audit · approval]
    end

    subgraph SL[Specialist Layer]
        CSA[Customer Service Agent]
        PSA[Personal Shopper Agent]
        IRA[Inventory + Replenishment Agent]
        PPA[Pricing + Promotions Agent]
        MCA[Marketing Content Agent]
        SCLA[Supply Chain + Logistics Agent]
        AIA[Analytics + Insights Agent]
        SAC[Store Associate Copilot]
        FRA[Fraud + Risk Agent]
    end

    subgraph TL[Tool Layer — MCP]
        T_SHOP[Shopify MCP]
        T_STRIPE[Stripe MCP]
        T_KLAV[Klaviyo MCP]
        T_ALG[Algolia MCP]
        T_INT[Internal MCP<br/>ERP · WMS · CDP · POS]
    end

    subgraph DL[Data Layer]
        OMS[(OMS / Commerce<br/>Shopify · SAP CC)]
        ERP[(ERP / WMS<br/>SAP · NetSuite)]
        CDP[(CDP<br/>RudderStack · Segment)]
        VEC[(Vector Store<br/>pgvector · Turbopuffer)]
        WH[(Warehouse<br/>Snowflake · BigQuery)]
        AUD[(Audit Log<br/>append-only)]
        DRM[(Dreaming Store<br/>distilled memory)]
    end

    CH --> GW
    GW --> OR
    OR --> CSA
    OR --> PSA
    OR --> IRA
    OR --> PPA
    OR --> MCA
    OR --> SCLA
    OR --> AIA
    OR --> SAC
    OR --> FRA

    CSA & PSA & IRA & PPA & MCA & SCLA & AIA & SAC & FRA --> T_SHOP
    CSA & PSA & IRA & PPA & MCA & SCLA & AIA & SAC & FRA --> T_STRIPE
    MCA & PSA --> T_KLAV
    PSA & SAC --> T_ALG
    CSA & IRA & PPA & MCA & SCLA & SAC & FRA --> T_INT

    T_SHOP --> OMS
    T_STRIPE --> OMS
    T_KLAV --> CDP
    T_ALG --> VEC
    T_INT --> ERP
    T_INT --> CDP
    T_INT --> WH

    OR -.audit.-> AUD
    OR -.nightly distill.-> DRM
    DRM -.context inject.-> OR

    style OR fill:#7c3aed,stroke:#fff,color:#fff
    style GW fill:#0b3d91,stroke:#fff,color:#fff
    style DRM fill:#dc2626,stroke:#fff,color:#fff
    style AUD fill:#0b3d91,stroke:#fff,color:#fff

4.2 The control plane vs the data plane

Plane What it carries Where it lives
Control Intent, routing, approval status, audit IDs, agent identity Orchestrator + Redis + Audit log
Data — Read Catalog, inventory, customer profile, order history, embeddings OMS / CDP / Vector store / Warehouse
Data — Write New orders, refunds, returns, POs, price changes OMS / ERP — always via durable workflow
Dreaming Distilled summaries, learned merchant preferences, post-mortem patterns Dreaming Store (Postgres + embeddings)

5. The Specialist Agent Roster

QuanData's reference deployment runs one orchestrator + nine specialists + an optional tenth.

5.1 The roster at a glance

flowchart LR
    subgraph OR_BOX[Top-Level]
        ORCH[Orchestrator / Router]
    end
    subgraph CF[Customer-Facing]
        CSA[Customer Service]
        PSA[Personal Shopper]
        SAC[Store Associate Copilot]
    end
    subgraph MF[Merchant / Operator-Facing]
        IRA[Inventory + Replenishment]
        PPA[Pricing + Promotions]
        MCA[Marketing Content]
        SCLA[Supply Chain + Logistics]
        AIA[Analytics + Insights]
    end
    subgraph GR[Governance]
        FRA[Fraud + Risk]
    end
    ORCH --> CF
    ORCH --> MF
    ORCH --> GR
    style ORCH fill:#7c3aed,stroke:#fff,color:#fff
    style CF fill:#1f6feb,stroke:#fff,color:#fff
    style MF fill:#16a34a,stroke:#fff,color:#fff
    style GR fill:#dc2626,stroke:#fff,color:#fff

5.2 The capability matrix

Agent Primary Surface Read Sources Write Targets Latency SLO p95 Pattern
Orchestrator All All (meta) Audit, Conversation State 200ms (routing only) Specialist Team
Customer Service Web/App/Email/SMS OMS, WMS, Returns, Policy KB Returns, Refunds (gated), Tickets 3s Pipeline + Specialist
Personal Shopper Web/App/Chat Catalog, Embeddings, CDP, Inventory Impressions log 2s Specialist + Fan-Out (multi-locale)
Inventory & Replenishment Slack/Web (merchant) ERP, WMS, Sales Draft POs (gated), Replenishment signals 10s (interactive) / batch Fan-Out (per SKU)
Pricing & Promotions Slack/Web (merchant) Pricing rules, Competitor feed, Elasticity model Price proposals, Promo drafts (gated) 10s / batch Specialist
Marketing Content Slack/Web (merchant) PIM, Brand voice, Past performance Content drafts (gated), Campaign push 30s (drafting) Fan-Out (variants)
Supply Chain & Logistics Slack/Web/App TMS, Carrier APIs, Weather Transfer orders, Claims, ETA updates 5s Pipeline
Analytics & Insights Slack/Web (merchant) Warehouse, Semantic layer Saved queries, Alert rules 5s (query) Specialist
Store Associate Copilot POS/Mobile Store inventory, CDP Reservations, Save-the-sale orders, Notes 2s (hard SLA) Pipeline
Fraud & Risk Internal Order risk feed, Return history Account flags, Verification requests 3s Specialist

5.3 Per-agent specifications (canonical definitions)

5.3.1 Orchestrator / Router

Field Specification
Scope Single entry point. Classifies intent, decomposes multi-step tasks, dispatches to specialists, aggregates results, enforces guardrails.
Inputs Raw user message, channel metadata, authenticated principal, conversation history pointer
Outputs Final response + structured trace (run_id, sub-agent calls, tool calls, confidence, cost)
Tools classify_intent, route_to_agent, request_human_approval, log_audit_event, get_conversation_state, write_conversation_state
Reads Session store, customer profile snippet
Writes Audit log, conversation state
Invoke Always first; re-invoked at every turn; never bypassed
Model Claude Haiku 4.5 (intent) → Sonnet 4.6 (synthesis)

5.3.2 Customer Service Agent (CSA)

Field Specification
Scope Order status, returns/refunds, exchanges, shipping issues, complaints, policy Q&A
Tools get_order, get_shipment_tracking, create_return, issue_refund, create_replacement_order, lookup_policy, escalate_to_human
Guardrails Refund auto-approve ≤ $250 AND LTV decile ≥ 3 AND no abuse flags; otherwise → approval queue
KPIs Deflection rate, first-contact resolution, refund accuracy, CSAT (paired)
Model Sonnet 4.6 (Opus 4.7 for complex disputes)

5.3.3 Personal Shopper Agent (PSA)

Field Specification
Scope Product discovery, outfit/bundle building, gift assistant, post-purchase upsell
Tools search_catalog, vector_search_products, get_customer_preferences, get_purchase_history, check_stock, apply_personalization_model, build_bundle
Guardrails Consent-checked per locale (GDPR/CCPA); never recommends out-of-stock; respects MAP/brand floors
KPIs Recall@10, NDCG@10, attach rate (A/B), novelty, diversity
Model Sonnet 4.6 + internal ranker (SageMaker/Vertex)

5.3.4 Inventory & Replenishment Agent (IRA)

Field Specification
Scope Stock visibility, reorder point math, PO drafting, allocation
Tools check_stock, get_sales_velocity, forecast_demand, compute_reorder_point, draft_purchase_order, submit_po_for_approval, list_open_pos
Guardrails Never writes on-hand counts; PO > $25k auto → human; reliability-scored suppliers only for auto-PO
KPIs Forecast MAPE & WAPE, stockout rate, weeks-of-supply variance, PO acceptance rate
Pattern Fan-Out (per-SKU forecasts in parallel)

5.3.5 Pricing & Promotions Agent (PPA)

Field Specification
Scope Price recommendations, markdown cadence, promo design, competitive repricing
Tools get_price_history, get_competitor_prices, simulate_price_change, propose_promotion, update_price, create_promo_code, schedule_promo
Guardrails Δ > ±10% → human; below MAP → human; margin < 8% → human; promo > 30% → category manager; > 50% → director
Compliance Surveillance-pricing prohibited; only category/cohort segmentation; EU/G7 algorithmic-pricing logging
KPIs Margin lift, sell-through vs target, elasticity-model R², merchant approval rate

5.3.6 Marketing Content Agent (MCA)

Field Specification
Scope Product descriptions, ad copy, email, SMS, push, landing-page hero (multilingual)
Tools get_product_attributes, get_brand_voice, generate_copy, generate_image_prompt, check_compliance, submit_for_review, push_to_klaviyo, push_to_meta_ads, push_to_google_ads
Guardrails Brand-voice score < 0.85 → forced review; compliance auto-checked (CAN-SPAM, GDPR, FTC claims)
KPIs Brand-voice score, compliance pass rate, win-rate vs human in blind test, CTR uplift in live A/B
Pattern Fan-Out (variants × channels × locales)

5.3.7 Supply Chain & Logistics Agent (SCLA)

Field Specification
Scope ETA prediction, carrier selection, exception handling, transfers, last-mile
Tools get_shipment_tracking, predict_eta, select_carrier, create_transfer_order, file_carrier_claim, notify_customer_delay
KPIs ETA MAE, on-time delivery, claim recovery rate

5.3.8 Analytics & Insights Agent (AIA)

Field Specification
Scope Conversational BI for merchants. NL → SQL → chart + narrative
Tools nl_to_sql, run_query, get_anomaly_alerts, compute_attribution, generate_chart, save_dashboard
Guardrails RLS-enforced queries; hallucinated-column rate target = 0; warehouse cost caps
KPIs Execution accuracy, semantic equivalence, hallucinated-column rate

5.3.9 Store Associate Copilot (SAC)

Field Specification
Scope Mobile/POS companion: find-in-store, clienteling, endless aisle, in-store returns, task list
Tools check_stock_nearby, reserve_item, lookup_customer, create_save_the_sale_order, get_clienteling_brief, print_label, get_task_list
Guardrails Hard 2s p95 SLA; bypasses orchestrator for read path, audits post-hoc
KPIs Time-to-answer, save-the-sale conversion, associate CSAT

5.3.10 Fraud & Risk Agent (FRA) — optional 10th

Field Specification
Scope Order risk scoring, return-abuse detection, promo-abuse detection, chargeback assist
Tools score_order_risk, get_return_abuse_history, freeze_account, request_id_verification
KPIs AUC, FPR @ recall = 0.9, chargeback $ saved, agent-vs-agent fraud caught

6. The Tool Catalog

All tools return a structured envelope: {ok, data, error, audit_id}. Mutating tools embed the audit_id in the downstream system of record.

// ===== Orchestrator =====
classify_intent(text: string, context: SessionCtx): {intent: Intent, confidence: number}
route_to_agent(agent: AgentName, payload: object): AgentResult
request_human_approval(action: PendingAction, sla_minutes: number): ApprovalTicket
log_audit_event(event: AuditEvent): void
get_conversation_state(session_id: string): SessionState
write_conversation_state(session_id: string, patch: Partial<SessionState>): void

// ===== Customer Service =====
get_order(order_id: string): Order                                          // Shopify / SAP CC / OMS
get_shipment_tracking(shipment_id: string): TrackingEvents                  // Shippo / EasyPost
create_return(order_id: string, lines: ReturnLine[], reason: string): RMA   // Loop / Narvar / OMS
issue_refund(order_id: string, amount: Money, reason: string): RefundResult // Stripe / Adyen
create_replacement_order(original_order_id: string, lines: Line[]): Order
lookup_policy(topic: string, locale: string): PolicyExcerpt                 // Internal KB (vector)
escalate_to_human(ticket: TicketDraft): TicketId                            // Zendesk / Gorgias

// ===== Personal Shopper =====
search_catalog(query: string, filters: CatalogFilter): SKU[]                // Algolia / Constructor
vector_search_products(embedding: number[] | string, k: number, filters?: CatalogFilter): SKU[]
get_customer_preferences(customer_id: string): Preferences                  // RudderStack / Segment
get_purchase_history(customer_id: string, limit?: number): Order[]
check_stock(sku: string, location_id?: string): StockLevel                  // WMS / ERP
apply_personalization_model(customer_id: string, candidates: string[]): ScoredSKU[]
build_bundle(seed_sku: string, slots: BundleSlot[]): Bundle

// ===== Inventory & Replenishment =====
get_sales_velocity(sku: string, window_days: number, location_id?: string): VelocityStats
forecast_demand(sku: string, horizon_days: number, location_id?: string): Forecast
compute_reorder_point(sku: string, service_level: number): ReorderPoint
draft_purchase_order(supplier_id: string, lines: POLine[]): PODraft         // SAP / NetSuite
submit_po_for_approval(po_draft_id: string): ApprovalTicket
list_open_pos(supplier_id?: string): PO[]

// ===== Pricing & Promotions =====
get_price_history(sku: string, days: number): PricePoint[]
get_competitor_prices(sku: string, competitors?: string[]): CompetitorPrice[]
simulate_price_change(sku: string, new_price: Money): {expected_units, expected_margin, conf}
propose_promotion(scope: PromoScope, mechanic: PromoMechanic): PromoDraft
update_price(sku: string, new_price: Money, valid_from: ISODate, valid_to?: ISODate): void
create_promo_code(definition: PromoDef): PromoCode                          // Shopify / Talon.One
schedule_promo(promo_id: string, start: ISODate, end: ISODate): void

// ===== Marketing Content =====
get_product_attributes(sku: string): PIMRecord                              // Akeneo / Salsify
get_brand_voice(brand_id: string): BrandVoiceProfile
generate_copy(brief: CopyBrief): CopyVariant[]
generate_image_prompt(sku: string, scene: string): ImagePrompt
check_compliance(text: string, locale: string, category: string): ComplianceReport
submit_for_review(asset_id: string, reviewers: string[]): ReviewTicket
push_to_klaviyo(campaign: KlaviyoCampaign): CampaignId                      // Klaviyo MCP
push_to_meta_ads(creative: MetaCreative, adset_id: string): AdId
push_to_google_ads(asset: GoogleAsset, group_id: string): AdId

// ===== Supply Chain & Logistics =====
predict_eta(shipment_id: string): ETA
select_carrier(origin: Address, dest: Address, weight: number, sla: SLA): CarrierQuote[]
create_transfer_order(from_loc: string, to_loc: string, lines: Line[]): TransferOrder
file_carrier_claim(shipment_id: string, reason: string, amount: Money): ClaimId
notify_customer_delay(order_id: string, new_eta: ISODate, reason: string): void

// ===== Analytics & Insights =====
nl_to_sql(question: string, schema_scope: string[]): {sql: string, confidence: number}
run_query(sql: string, role: Role): QueryResult                             // Snowflake / BigQuery (RLS)
get_anomaly_alerts(scope: AnomalyScope): Anomaly[]
compute_attribution(window_days: number, model: AttrModel): AttributionTable
generate_chart(data: QueryResult, hint?: ChartHint): VegaSpec
save_dashboard(spec: DashboardSpec): DashboardId

// ===== Store Associate Copilot =====
check_stock_nearby(sku: string, store_id: string, radius_km: number): NearbyStock[]
reserve_item(sku: string, store_id: string, customer_id: string, hours: number): ReservationId
lookup_customer(query: string): CustomerSummary
create_save_the_sale_order(customer_id: string, lines: Line[], ship_to: Address): Order
get_clienteling_brief(customer_id: string): ClientelingCard
print_label(printer_id: string, payload: LabelPayload): void
get_task_list(associate_id: string, store_id: string): Task[]

// ===== Fraud & Risk =====
score_order_risk(order_id: string): RiskScore                               // Signifyd / Forter / Sift
get_return_abuse_history(customer_id: string): AbuseSignals
freeze_account(customer_id: string, reason: string): void
request_id_verification(customer_id: string): VerificationLink              // Persona / Stripe Identity

Total: 52 tools across 9 specialists + orchestrator.


7. Reference Orchestration Flows

These five flows are canonical. They are the patterns every QuanData retail deployment is benchmarked against.

7.1 Flow A — "Where's my order, and can I return one item?"

sequenceDiagram
    autonumber
    participant U as Customer
    participant O as Orchestrator
    participant CSA as Customer Service Agent
    participant OMS as OMS / Carrier
    participant POL as Policy KB
    participant AUD as Audit Log

    U->>O: "Where's my order, and can I return one item?"
    O->>O: classify_intent → {order_status, return}
    par Status lookup
        O->>CSA: order_status sub-task
        CSA->>OMS: get_order(order_id)
        CSA->>OMS: get_shipment_tracking(shipment_id)
        OMS-->>CSA: "Out for delivery, ETA 4pm"
    and Return setup
        O->>CSA: return sub-task
        CSA->>POL: lookup_policy("return_window")
        CSA->>OMS: create_return(order_id, line_2, "size_too_small")
        OMS-->>CSA: RMA-9981, refund $79
    end
    Note over CSA: Guardrail: $79 < $250 auto-threshold
    CSA-->>O: Combined result
    O->>AUD: log_audit_event(...)
    O-->>U: Tracking + RMA + QR label link

Latency budget: 3s p95. Both CSA sub-calls execute in parallel.

7.2 Flow B — Reorder a fast-moving SKU before stockout

sequenceDiagram
    autonumber
    participant AIA as Analytics Agent
    participant O as Orchestrator
    participant IRA as Inventory Agent
    participant ERP as ERP / WMS
    participant BUY as Buyer (Slack)
    participant SCLA as Supply Chain Agent

    Note over AIA: 06:00 scheduled anomaly scan
    AIA->>AIA: get_anomaly_alerts({type:"stockout_risk"})
    AIA-->>O: SKU-4421, days-of-cover=4, lead=14
    O->>IRA: replenish(SKU-4421)
    par Forecast & reorder math
        IRA->>IRA: forecast_demand(SKU-4421, 30)
        IRA->>IRA: compute_reorder_point(SKU-4421, 0.95)
    end
    IRA->>ERP: draft_purchase_order(SUP-12, [{SKU-4421, qty:1200}])
    Note over IRA: Guardrail: PO $48k > $25k → human
    IRA->>BUY: submit_po_for_approval (Slack card)
    BUY-->>IRA: ✅ Approve
    IRA->>ERP: submit PO
    IRA->>SCLA: register ASN callback for ETA visibility

7.3 Flow C — Flash promo on overstock category

sequenceDiagram
    autonumber
    participant M as Merchant
    participant O as Orchestrator
    participant AIA as Analytics
    participant PPA as Pricing Agent
    participant MCA as Marketing Content
    participant REV as Brand Lead

    M->>O: "We're long on summer outerwear. Move it."
    O->>AIA: identify overstock
    AIA->>AIA: run_query(category='outerwear-summer', WoS>20)
    AIA-->>O: 47 SKUs
    O->>PPA: design_promo(skus, target=60%_in_14d)
    loop top-10 SKUs
        PPA->>PPA: simulate_price_change @ {15%, 25%, 35%}
    end
    PPA->>PPA: propose_promotion → "SUMMER25", 14d
    Note over PPA: Guardrail: 3 SKUs margin<10% — flagged
    PPA-->>M: Proposal (3 flagged)
    M-->>PPA: ✅ Approve with exclusions
    PPA->>PPA: create_promo_code + schedule_promo
    O->>MCA: generate_campaign(promo, [email, meta, web])
    par Variant generation
        MCA->>MCA: get_product_attributes (batch)
        MCA->>MCA: get_brand_voice
        MCA->>MCA: generate_copy (5 variants/channel)
    end
    MCA->>MCA: check_compliance ✓
    MCA->>REV: submit_for_review (4h SLA)
    REV-->>MCA: ✅
    MCA->>MCA: push_to_klaviyo + push_to_meta_ads
    O->>AIA: register sell-through monitor

7.4 Flow D — Next week's lapsed-customer email

sequenceDiagram
    autonumber
    participant CRON as Calendar
    participant O as Orchestrator
    participant MCA as Marketing Content
    participant AIA as Analytics
    participant PSA as Personal Shopper
    participant REV as Marketing Lead
    participant KLAV as Klaviyo

    CRON->>O: weekly_lapsed_campaign
    O->>MCA: kick off
    MCA->>AIA: query_segment(last_purchase 90-180d, ltv≥6)
    AIA-->>MCA: 38,400 customers, by affinity
    par Segment picks (Fan-Out)
        MCA->>PSA: per_segment_picks(seg=A, k=6)
        MCA->>PSA: per_segment_picks(seg=B, k=6)
        MCA->>PSA: per_segment_picks(seg=C, k=6)
    end
    MCA->>MCA: generate_copy (3 subjects × 3 segments)
    MCA->>MCA: check_compliance (CAN-SPAM, GDPR)
    Note over MCA: brand_score=0.91, compliance=pass
    MCA->>REV: auto-approval candidate
    REV-->>MCA: ✅
    MCA->>KLAV: push_to_klaviyo (A/B/C, 10% holdout)
    O->>AIA: schedule 7d post-send eval

7.5 Flow E — Associate finds an item across nearby stores

sequenceDiagram
    autonumber
    participant A as Associate
    participant SAC as Store Associate Copilot
    participant WMS as Inventory
    participant CDP as Customer Profile
    participant OMS as OMS
    participant SCLA as Supply Chain
    participant O as Orchestrator (async audit)

    A->>SAC: scan barcode (sub-2s required)
    SAC->>WMS: check_stock(sku, this_store)
    WMS-->>SAC: 0
    SAC->>WMS: check_stock_nearby(sku, store, 25km)
    WMS-->>SAC: 3 stores (1, 2, 4 units)
    SAC->>CDP: lookup_customer(phone)
    CDP-->>SAC: customer + prefs
    SAC-->>A: options: reserve / SFS / STH
    A->>SAC: ship-from-store
    par
        SAC->>OMS: create_save_the_sale_order
    and
        SAC->>WMS: reserve_item (auto-release 4h)
    end
    SCLA-->>SCLA: async: carrier + label + customer notify
    SAC-->>O: post-hoc audit + clienteling note

8. State, Memory & The Dreaming Layer

8.1 The memory layer cake

flowchart TB
    subgraph EPH[Ephemeral Layer]
        CS[Conversation State<br/>Redis · 24h sliding]
        ST[Short-term Episodic<br/>last 20 turns]
    end
    subgraph WARM[Warm Layer]
        CP[Customer Profile<br/>Postgres + CDP]
        CAT[Catalog Embeddings<br/>pgvector / Turbopuffer]
        POL[Policy / KB Embeddings<br/>pgvector namespaced]
    end
    subgraph COLD[Cold / System-of-Record]
        OMS[(OMS · Orders · Returns)]
        WH[(Warehouse · Snowflake / BigQuery)]
        AUD[(Audit Log · 7yr retention)]
    end
    subgraph DREAM[Dreaming Layer]
        AM[Agent Memory<br/>Postgres + embeddings<br/>summaries · preferences · incidents]
        DSC[Dream Scheduler<br/>nightly background process]
    end

    CS --> ST
    ST --> CP
    CP --> CAT
    CAT --> POL
    POL --> OMS
    OMS --> WH
    WH --> AUD
    AUD -.feed.-> DSC
    DSC -->|distill| AM
    AM -->|inject context| CS

    style DREAM fill:#dc2626,stroke:#fff,color:#fff
    style EPH fill:#0b3d91,stroke:#fff,color:#fff
    style WARM fill:#1f6feb,stroke:#fff,color:#fff
    style COLD fill:#374151,stroke:#fff,color:#fff

8.2 The Dreaming process

Following Anthropic's Managed Agents Dreaming primitive (announced May 6, 2026): a scheduled background process that reviews past sessions, extracts patterns, identifies recurring mistakes, and curates each agent's memory stores.

stateDiagram-v2
    [*] --> Idle
    Idle --> Invoked: user message / trigger
    Invoked --> ToolCalling: classify + plan
    ToolCalling --> Reflecting: tool returns
    Reflecting --> ToolCalling: needs more
    Reflecting --> Outcome_Grading: draft ready
    Outcome_Grading --> ToolCalling: rubric fail
    Outcome_Grading --> Responding: rubric pass
    Responding --> [*]
    [*] --> Dreaming: nightly cron
    Dreaming --> Distilling: read audit + traces
    Distilling --> Updating_Memory: extract summaries / preferences / incidents
    Updating_Memory --> [*]

Harvey reported a ~6× completion-rate lift from enabling Dreaming on their legal agents — without a model change. The lift is purely from agents carrying institutional knowledge across sessions.

8.3 Canonical schemas

session_state {
  session_id PK, principal_id, principal_type,
  current_intent, pending_actions JSONB,
  agent_scratchpad JSONB, last_tool_calls JSONB[],
  updated_at
}

agent_memory {
  id PK, scope_type ENUM('customer','merchant','sku','store'),
  scope_id, kind ENUM('summary','preference','incident','note'),
  content TEXT, embedding VECTOR(1536),
  source_run_id, confidence FLOAT,
  created_at, expires_at NULL, version INT
}

audit_event {
  id PK, run_id, parent_run_id, principal_id,
  agent, tool, input_hash, output_hash,
  approval_id NULL, decision, latency_ms, cost_usd,
  pii_redacted BOOL, created_at
}

approval {
  id PK, action_type, payload JSONB, threshold_breached TEXT,
  requested_by_agent, approver_pool TEXT[],
  status ENUM('pending','approved','rejected','expired'),
  decided_by, decided_at, sla_minutes
}

8.4 Memory engineering principles

  1. Summaries, not transcripts. Long-term memory stores distilled rows, never raw chat history.
  2. PII redacted pre-write. A deterministic redactor runs before any memory row is persisted.
  3. Versioned. Every memory row carries a version so contradictions can be resolved.
  4. Source-traceable. Every memory row links to a source_run_id in the audit log.

9. Outcomes — The Rubric-Driven Quality Loop

Per Anthropic's Outcomes primitive (May 6, 2026): every output is graded against a defined rubric before reaching the customer. QuanData operationalizes this as a per-agent rubric set.

9.1 The grading loop

flowchart LR
    A[Agent generates output] --> B{Rubric grader<br/>self-evaluation}
    B -- pass --> C[Release to customer]
    B -- fail · iter < N --> A
    B -- fail · iter ≥ N --> H[Escalate to human]
    style B fill:#7c3aed,stroke:#fff,color:#fff
    style H fill:#dc2626,stroke:#fff,color:#fff

9.2 Per-agent rubric examples

Agent Rubric Dimensions Auto-Pass Threshold
CSA Empathy score · policy correctness · resolution completeness · CSAT proxy ≥0.85 on all 4
PSA Relevance · constraint adherence (budget, size) · diversity · in-stock filter 100% constraint + ≥0.80 relevance
MCA Brand voice score · compliance pass · variant diversity · CTA presence ≥0.85 brand + 100% compliance
PPA Margin floor respected · MAP respected · elasticity-confidence ≥ 0.7 100% guardrails
AIA Hallucinated-column rate · semantic-equivalence · row-count sanity 0 hallucinated columns

9.3 The dual-track evaluation harness

flowchart TB
    subgraph OFF[Offline Track]
        GD[Golden Datasets<br/>per agent]
        REG[Nightly Regression]
        ALERT[2σ Drop → page]
    end
    subgraph ON[Online Track]
        AB[5% No-Agent Holdout]
        VAR[Variant A/B/C between agent versions]
        JUD[LLM-as-judge + 50/wk human spot-check]
    end
    subgraph RED[Red Team]
        INJ[Prompt-injection probes]
        EXF[PII exfiltration]
        JBR[Jailbreak — refund / price tools]
    end
    GD --> REG --> ALERT
    AB --> VAR --> JUD
    INJ --> ALERT
    EXF --> ALERT
    JBR --> ALERT

10. Guardrail Architecture — Defense In Depth

The most likely failure mode of any agentic retail system is not a model defect. It is a guardrail gap. QuanData architects guardrails as a layer cake — no single layer is sufficient.

10.1 The guardrail layer cake

flowchart TB
    L1[Layer 1 — Input Sanitization<br/>PII tokenization · prompt-injection classifier · channel auth]
    L2[Layer 2 — Intent Classification + Routing<br/>only registered intents reach specialists]
    L3[Layer 3 — Tool Allow-Lists<br/>each agent restricted to its tool set]
    L4[Layer 4 — Tool-Side Guardrails<br/>refund cap · MAP enforce · margin floor · MAP write block]
    L5[Layer 5 — Human-In-Loop Approvals<br/>refunds >$250 · POs >$25k · promos >30% · price Δ ±10%]
    L6[Layer 6 — Outcome Rubric Grader<br/>self-evaluation before release]
    L7[Layer 7 — Audit + Reversibility<br/>every mutation has audit_id + compensation path]
    L8[Layer 8 — Kill Switches<br/>per-agent flag + global pause]

    L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> L8

    style L1 fill:#0b3d91,stroke:#fff,color:#fff
    style L2 fill:#1f6feb,stroke:#fff,color:#fff
    style L3 fill:#1f6feb,stroke:#fff,color:#fff
    style L4 fill:#16a34a,stroke:#fff,color:#fff
    style L5 fill:#16a34a,stroke:#fff,color:#fff
    style L6 fill:#7c3aed,stroke:#fff,color:#fff
    style L7 fill:#7c3aed,stroke:#fff,color:#fff
    style L8 fill:#dc2626,stroke:#fff,color:#fff

10.2 The retail-specific guardrail register

Guardrail Threshold / Rule Mechanism
Refund auto-approve ≤ $250 AND LTV decile ≥ 3 AND no abuse CSA tool wrapper
Price change Δ > ±10% OR < MAP OR margin < 8% → human PPA update_price pre-write
Promo depth > 30% → category mgr; > 50% → director PPA proposal stage
Inventory write Agents cannot mutate on-hand directly; only transfer/PO/reservation Tool layer enforces
Refund total per session Cumulative > $1,000 → block Orchestrator counter
PII handling PAN, CVV, full SSN never reach LLM; tokenize at ingest Pre-LLM redactor
Brand voice Score < 0.85 → forced review MCA check_compliance
Claims compliance No "cures"/"clinically proven" unless category-approved MCA per-category rules
Locale & regulation EU: GDPR consent; CA: CCPA; alcohol/tobacco: age gate Profile flags checked
Audit trail Every tool call logged; mutations require audit_id Audit middleware
Rate limits Per-customer 60 turns/hr; per-merchant 1k tool-calls/min; per-agent circuit breaker @ 5% errors/60s Gateway
Reversibility All mutating actions must have undo_* or compensation Tool registry metadata

10.3 Prompt injection — the OWASP LLM #1

Retail has the highest published prompt-injection vulnerability rate at 40% (2026 data). QuanData's defense is layered:

flowchart LR
    IN[Untrusted Input<br/>customer msg · review · scraped page · return reason] --> PG[PromptGuard 2<br/>~50ms classifier]
    PG -- benign --> LG[LlamaGuard 3 — 8B<br/>hazard classification]
    PG -- suspected --> Q[Quarantine + Manual Review]
    LG -- safe --> NEM[NeMo Guardrails<br/>orchestration]
    LG -- hazard --> Q
    NEM --> A[Trusted Agent Loop]
    style Q fill:#dc2626,stroke:#fff,color:#fff
    style PG fill:#7c3aed,stroke:#fff,color:#fff
    style LG fill:#7c3aed,stroke:#fff,color:#fff

Published benchmarks: layered defense reduces attack success from ~73% to ~9%. Even so, ~35% of adversarial evals still leak — therefore human-in-loop is required on every high-blast-radius action.


11. Integration & Infrastructure Stack

11.1 The reference stack — opinionated

flowchart TB
    subgraph FE[Front End]
        NX[Next.js · Vercel]
        EXP[Mobile · Expo]
    end
    subgraph AR[Agent Runtime]
        MAS[Mastra TS<br/>chat-heavy]
        LG[LangGraph Python<br/>stateful + durable]
        TEMP[Temporal<br/>durable writes]
        SDK[Anthropic Agent SDK<br/>Claude-first builds]
    end
    subgraph MO[Models]
        HK[Claude Haiku 4.5<br/>routing]
        SO[Claude Sonnet 4.6<br/>main loop]
        OP[Claude Opus 4.7<br/>escalation]
        FB[GPT-5.5 / Gemini 2.x<br/>diversity / fallback]
    end
    subgraph TL_MCP[Tool Layer — MCP-Native]
        SH[Shopify MCP — official]
        ST[Stripe MCP — official]
        KL[Klaviyo MCP — official]
        AL[Algolia MCP — official]
        AD[Adobe Commerce MCP — official]
        SF[Salesforce B2C Commerce MCP]
    end
    subgraph TL_WRAP[Tool Layer — Wrapped]
        SAP[SAP S/4HANA via OData wrapper]
        NS[NetSuite via SuiteTalk wrapper]
        POS[Square · Lightspeed via REST wrapper]
    end
    subgraph DATA[Data]
        PG[(Postgres + pgvector)]
        TP[(Turbopuffer<br/>large catalogs)]
        SN[(Snowflake / BigQuery)]
        RU[RudderStack CDP]
        HT[Hightouch Reverse-ETL]
    end
    subgraph OBS[Observability]
        LF[Langfuse self-hosted]
        BT[Braintrust evals]
        PF[Promptfoo CI]
    end
    subgraph SEC[Security]
        PA[PromptArmor + PromptGuard 2]
        TR[Transcend Agentic Assist]
        SI[Signifyd / Riskified]
    end

    FE --> AR
    AR --> MO
    AR --> TL_MCP
    AR --> TL_WRAP
    TL_MCP --> DATA
    TL_WRAP --> DATA
    AR --> OBS
    FE --> SEC
    AR --> SEC

    style AR fill:#7c3aed,stroke:#fff,color:#fff
    style MO fill:#0b3d91,stroke:#fff,color:#fff
    style SEC fill:#dc2626,stroke:#fff,color:#fff

11.2 The MCP landscape — production-ready in May 2026

Server Status Notes
Shopify Dev / Storefront / Admin MCP Production · MIT Live in every Hydrogen 2026.1.4 + Oxygen store by default
Stripe MCP Production Treasury + agentic-commerce tools
Klaviyo MCP Production Co-built with Anthropic
Algolia MCP Production Integrates with Algolia Agent Studio
Adobe Commerce MCP Production Default agent protocol post-Summit 2026
Salesforce B2C Commerce MCP GA / pilot Fully hosted
Transcend Agentic Assist + MCP Production DSAR / privacy
BigCommerce · Commercetools · Medusa · Shopware Community only — IMMATURE Wrap yourself
SAP S/4HANA · NetSuite · Dynamics 365 · Oracle Retail No official — IMMATURE Wrap via iPaaS
POS (Square · Toast · Lightspeed) No official — IMMATURE Wrap REST

11.3 Vector database selection

Scale Recommendation Rationale
≤ 1M SKUs pgvector Sane default, no new infra, ~$0 incremental
1M–10M SKUs Turbopuffer Under $10/mo at this scale; sub-10ms warm; long-tail-friendly
10M+ SKUs · hard SLA Pinecone $99–199/mo at this scale; always-warm <50ms p95

11.4 Hosting matrix

Workload Platform Rationale
Storefront / UI Vercel Best DX, edge-optimized
Chat agent (TS, stateful) Fly.io Persistent VMs, WebSocket-friendly
Heavy agent loops (Python) Modal Unlimited sandbox, GPU, 0–50k concurrent
Durable writes (POs, refunds, fulfillment) Temporal or AWS Step Functions Crash-proof, replayable, exactly-once semantics
Enterprise / regulated AWS Step Functions + Bedrock Compliance posture

12. Cost Economics

12.1 Per-conversation unit economics (May 2026 pricing)

Configuration Input Tokens / Conv Output Tokens / Conv Cost / Conv
No caching · Sonnet 4.6 only ~53.6k ~4.8k ~$0.23
With prompt caching · 90% hit rate ~53.6k (mostly cached) ~4.8k ~$0.07–0.10
Opus 4.7 escalation @ 5% +baseline +baseline +~$0.04
Haiku 4.5 routing only ~5k ~0.2k ~$0.006

12.2 Annual run-rate — 1,000 conversations / day

pie title Annual Cost Allocation — 1,000 conv/day with caching
    "LLM tokens (Sonnet + Haiku + 5% Opus)" : 30000
    "Vector DB (Turbopuffer 1M SKU)" : 1200
    "Observability (Langfuse self-host infra)" : 2400
    "Audit log infra (Postgres + S3)" : 1800
    "Eval & red-team compute" : 3600
    "Embedding refresh (catalog deltas)" : 600

Total realistic all-in: ~$3–5k / month for LLM and ~$1–3k / month supporting infra at the 1k conv/day tier. Caching is the single largest lever — without it, costs ~3× higher.

12.3 Cost-protection guardrails

Cap Threshold
Per-session token cap 200k input / 50k output
Per-session cost cap $1.00 (hard kill)
Opus 4.7 escalation rate < 10% of conversations
Warehouse query cost cap (AIA) $0.50 per query auto, $5 with merchant approval
Subagent fan-out cap 20 parallel workers (matches Anthropic Managed Agents limit)

13. The 12-Month Build Phasing

QuanData & AI builds in three deliberate phases. Sequencing is non-negotiable.

gantt
    title QuanData Retail Orchestration — 12-Month Build
    dateFormat YYYY-MM-DD
    axisFormat %b %y
    section Foundation (Days 0-90)
    Data plumbing audit            :a1, 2026-06-01, 14d
    Orchestrator + audit + approval scaffold :a2, after a1, 21d
    Tool registry + envelope contract :a3, after a1, 21d
    Guardrail middleware           :a4, after a2, 14d
    CSA (returns + order status only) :a5, after a4, 28d
    PSA (chat + on-site reco)      :a6, after a5, 28d
    Eval harness v1                :a7, after a4, 60d
    section Merchant + Internal (Mo. 4-6)
    AIA — conversational BI         :b1, 2026-09-01, 45d
    IRA — recommend-only            :b2, after b1, 30d
    MCA — descriptions + subjects   :b3, 2026-09-15, 60d
    SAC pilot — 5-10 stores         :b4, 2026-10-01, 60d
    Red-team v1 + audit dashboards  :b5, 2026-09-01, 90d
    section Automation + Hard Problems (Mo. 7-12)
    PPA — live markdowns w/ approval :c1, 2026-12-01, 60d
    SCLA — ETA + carrier opt        :c2, 2026-12-01, 60d
    MCA — full campaign mode        :c3, 2027-01-01, 60d
    FRA — fraud + abuse             :c4, 2027-01-15, 75d
    SAC — full rollout              :c5, 2027-02-01, 90d
    IRA — auto-PO under threshold   :c6, 2027-03-01, 60d
    Dreaming layer — production     :c7, 2027-01-01, 90d

13.1 Why this sequencing

flowchart LR
    P1[Phase 1<br/>Foundation] -->|Lowest blast radius<br/>+ Fastest ROI| P2[Phase 2<br/>Merchant + Internal]
    P2 -->|Trust earned<br/>+ Audit muscle built| P3[Phase 3<br/>Automation + Hard]
    P1 -->|CSA: ticket-cost reduction<br/>PSA: proven LLM win| ROI1[Customer-Visible ROI<br/>by Day 90]
    P2 -->|AIA: merchants love it<br/>IRA: recommend-only<br/>MCA: low-stakes content| ROI2[Operator Trust<br/>by Month 6]
    P3 -->|PPA · SCLA · MCA full · FRA<br/>only after audit + eval mature| ROI3[Full Autonomy w/ Gates<br/>by Month 12]
    style P1 fill:#0b3d91,stroke:#fff,color:#fff
    style P2 fill:#1f6feb,stroke:#fff,color:#fff
    style P3 fill:#7c3aed,stroke:#fff,color:#fff

13.2 The five sequencing principles

  1. Read before write. Every agent ships read-only first.
  2. Approval before autonomy. Human-in-loop default; thresholds widen with proven precision.
  3. Internal before external. Merchant-facing failures are cheaper than customer-facing failures.
  4. One source of truth per domain. Agents never mutate OMS / ERP / WMS directly — always via the system's native API with audit_id embedded.
  5. Kill switches always. Per-agent feature flag + global "agent pause" button on every deployment.

14. Failure Mode Taxonomy

Based on a corpus of 591 documented production multi-agent incidents (2023–2026). 40% of multi-agent pilots fail within six months of production. QuanData designs against each.

14.1 Failure mode distribution

pie title Production Multi-Agent Failure Modes
    "Context Blindness (truncated/missing info)" : 31.6
    "Rogue Actions (wrong tool/arg)" : 30.3
    "Silent Degradation (looks right, isn't)" : 24.9
    "Memory Corruption" : 8.1
    "Runaway Execution (loops, cost blowouts)" : 5.1

14.2 QuanData countermeasure register

Mode Share QuanData Countermeasure
Context blindness 31.6% Selective retrieval · summarization at handoff · per-turn context budgets · Anthropic's compression-at-subagent pattern
Rogue actions 30.3% Typed tool schemas · dry-run mode · pre-write guardrails · audit_id requirement
Silent degradation 24.9% Outcome rubric grader · LLM-as-judge + 50/wk human spot-check · 5% no-agent holdout
Memory corruption 8.1% Append-only audit · Dreaming summaries (not transcripts) · separate read/write agents
Runaway execution 5.1% 100-step cap · $1/session cost cap · 60s circuit-breaker on 5% error rate · 20-worker fan-out cap

14.3 The seven mistakes the X-post framework explicitly warns against

  1. Making every agent too general. QuanData rule: every agent has one job; narrow is powerful.
  2. Not standardizing output formats. QuanData rule: the structured {ok, data, error, audit_id} envelope is canonical and enforced.
  3. Running too many agents in parallel too early. QuanData rule: phased rollout — two agents → grade → add.
  4. No error handling between agents. QuanData rule: every tool call returns a fail-safe envelope with documented compensation.
  5. Ignoring token costs. QuanData rule: hard caps per session; prompt caching mandatory.
  6. Letting deflection rates obscure CSAT decay. QuanData rule: every deflection KPI paired with CSAT/quality KPI (Klarna doctrine).
  7. Surrendering the funnel to third-party agent runtimes. QuanData rule: QuanData clients own the MCP layer; external agents transact through it (Walmart doctrine).

15. Appendix A — Decision Matrices

15.1 Pattern selector

flowchart TD
    Start([New task to orchestrate])
    Q1{Mutates inventory /<br/>payment / fulfillment?}
    Q2{Strict step ordering?}
    Q3{Same op on many items?}
    Q4{Multiple distinct domains?}
    Out_Single[Single Threaded<br/>+ Temporal]
    Out_Pipe[Pipeline]
    Out_Fan[Fan-Out]
    Out_Spec[Specialist Team]
    Out_Fn[Function call<br/>— not an agent]

    Start --> Q1
    Q1 -- Yes --> Out_Single
    Q1 -- No --> Q2
    Q2 -- Yes · 1 domain --> Out_Pipe
    Q2 -- Yes · N domains --> Out_Spec
    Q2 -- No --> Q3
    Q3 -- Yes --> Out_Fan
    Q3 -- No --> Q4
    Q4 -- Yes --> Out_Spec
    Q4 -- No --> Out_Fn

    style Out_Single fill:#dc2626,stroke:#fff,color:#fff
    style Out_Pipe fill:#0b3d91,stroke:#fff,color:#fff
    style Out_Fan fill:#7c3aed,stroke:#fff,color:#fff
    style Out_Spec fill:#16a34a,stroke:#fff,color:#fff
    style Out_Fn fill:#6b7280,stroke:#fff,color:#fff

15.2 Framework selector

If… Then
TS shop, chat-heavy UI Mastra (or Vercel AI SDK if pure-chat)
Python shop, complex stateful flows LangGraph + Temporal underneath
All-Claude builds Anthropic Agent SDK directly
Need typed handoffs + voice OpenAI Agents SDK
Money / inventory / fulfillment writes Temporal under any framework

15.3 Model selector

Task Type Model
Intent classification / routing Claude Haiku 4.5
Standard tool-use loop Claude Sonnet 4.6
Complex multi-step reasoning (disputes, fraud, escalation) Claude Opus 4.7
Eval diversity / fallback GPT-5.5 / Gemini 2.x
Image-heavy product workloads Gemini 2.x

15.4 Vector DB selector

Catalog Size Choice
≤ 1M SKUs pgvector
1M–10M SKUs Turbopuffer
10M+ SKUs · hard SLA Pinecone

16. Appendix B — Glossary

Term Definition
Agent An LLM-driven process with a defined role, tools, and outputs
Orchestrator The top-level agent that classifies intent, routes, audits, and aggregates
Specialist A narrow agent owning a single domain (e.g., pricing, returns)
MCP (Model Context Protocol) Anthropic-led open protocol for exposing tools to LLM agents
UCP (Universal Commerce Protocol) Google-led protocol for discovery-through-purchase by agents
ACP (Agentic Commerce Protocol) Stripe/OpenAI protocol for payment leg of agent transactions
Pipeline Sequential agent pattern
Fan-Out Commander + parallel-worker pattern
Specialist Team Multiple specialists collaborating on a single deliverable
Dreaming Scheduled background process that distills past sessions into memory
Outcomes Rubric-based self-evaluation primitive
Audit ID Unique identifier embedded in every mutating tool call for traceability
Compensation path Documented reversal procedure for any mutating action
Dual-track eval Offline regression + online A/B + 5% holdout, run simultaneously
Layer-cake guardrails Eight independent enforcement layers from input sanitization to kill switch

17. Appendix C — Source Index

17.1 Foundational documents

17.2 Retail deployment evidence

17.3 Integration & infrastructure

17.4 Risk, compliance, & security


Closing Note

This blueprint is QuanData & AI's commitment to building the agentic retail systems that the next decade of commerce will run on. It is anchored in what is shipping at Walmart, Amazon, Shopify, Mercado Libre, Sierra, and Decagon today — not what is being demoed at conferences.

Every pattern in this document is buildable as written. Every tool signature maps to a real API. Every guardrail threshold is concrete. Every phase reflects what a mid-size to enterprise retailer can actually staff.

The retailers that build this category in 2026 will pull away from the retailers that do not. QuanData & AI exists to make sure our clients are on the right side of that gap.

— Office of the Chief Architect QuanData & AI — May 2026