The Blueprint of QuanData & AI's Agentic Orchestration for the Retail Industry
The Blueprint of QuanData & AI's Agentic Orchestration for the Retail Industry
Classification: Internal — Strategic & Technical Document Type: Reference Architecture & Implementation Doctrine Version: 1.0 — May 2026 Owners: QuanData & AI — Office of the Chief Architect Status: Authoritative; supersedes all prior agentic-retail technical notes
Cover Brief
This document defines, with executive clarity and engineering precision, how QuanData & AI will architect, deploy, and operate multi-agent orchestration systems for the retail industry through 2026 and beyond.
It is not a survey. It is a doctrine. Every pattern, every agent, every guardrail, every line of the build phasing has been selected because it is what the production-grade leaders of the agentic era — Anthropic, Walmart, Amazon, Shopify, Sierra, Decagon, Mercado Libre — are demonstrably running in market today.
The thesis is simple and the evidence is overwhelming:
A single agent is a single employee. A coordinated team of specialized agents is a new operational category. The retailers building this category in 2026 are pulling away from the retailers that are not.
Table of Contents
- Executive Synthesis
- Strategic Context — The Retail Agentic Landscape
- Theoretical Foundations — The Three Orchestration Patterns
- The QuanData Agent Topology
- The Specialist Agent Roster
- The Tool Catalog
- Reference Orchestration Flows
- State, Memory & The Dreaming Layer
- Outcomes — The Rubric-Driven Quality Loop
- Guardrail Architecture — Defense In Depth
- Integration & Infrastructure Stack
- Cost Economics
- The 12-Month Build Phasing
- Failure Mode Taxonomy
- Appendix A — Decision Matrices
- Appendix B — Glossary
- Appendix C — Source Index
1. Executive Synthesis
On May 6, 2026, at the Code with Claude event, Anthropic formalized multi-agent orchestration for Claude Managed Agents — up to 20 specialized agents running in parallel on a single task. This was not a research demo. It was the productization of an architecture Netflix, Harvey, Shopify, and Mercado Libre were already running at scale.
Across the retail industry, the verified outcomes are no longer ambiguous:
| Retailer | Agent | Verified Outcome |
|---|---|---|
| Amazon | Rufus | ~$12B incremental sales (2025); 300M+ users; +60% purchase likelihood |
| Walmart | Sparky | +35% AOV for engaged users; ~50% of app users engaged |
| Klarna | OpenAI Assistant | 67% automation; $40M profit lift (2024) — with a 2025 partial walk-back to human-hybrid |
| Mercado Libre | Pago Assistant | ~90% query containment without human handoff |
| Lowe's | Mylow Companion | Deployed to >1,700 stores for associate copilot |
| WeightWatchers (Sierra) | CS Agent | 70% containment week-1; >4.5/5 CSAT |
QuanData & AI's mandate is to deliver this category of outcome to our retail clients — through an architecture that is:
- Specialized — narrow agents, not generalist bots
- Parallel — fan-out where the workload allows
- Audited — every action traceable to a principal, a tool, and a decision
- Reversible — every mutation has a documented compensation path
- Governed — human-in-loop gates on every blast-radius action
- Self-improving — Dreaming layers that compound institutional knowledge across sessions
- Outcome-graded — rubric-driven self-evaluation before any output reaches a customer
The blueprint that follows is the canonical implementation of that mandate.
2. Strategic Context — The Retail Agentic Landscape
2.1 The architectural verdict from the market
Two architectural facts now dominate the retail agentic landscape:
-
Consolidation beats sprawl. Walmart publicly admitted in mid-2025 that fragmented point-bots did not scale and consolidated to four "super agents" (Sparky for customer, Marty for sellers/advertisers/suppliers, an associate agent, and a developer agent) on a unified ML platform (Element) with MCP as the agent-to-agent protocol.
-
Retailers must own the MCP layer. Shopify exposes three official MCP servers (Storefront, Customer Accounts, Dev), Adobe declared MCP the default agent protocol at Summit 2026, and Walmart's deliberately walled-off Sparky-in-Gemini integration showed why: when Walmart briefly piloted OpenAI's ChatGPT Instant Checkout, it converted 3× worse than click-out to walmart.com.
2.2 The competitive map
quadrantChart
title Retail Agent Maturity — May 2026
x-axis "Narrow Use Case" --> "Full Platform Strategy"
y-axis "Pilot / Internal" --> "Verified Production Outcomes"
quadrant-1 "Production Platform"
quadrant-2 "Production Point-Solution"
quadrant-3 "Pilot Point-Solution"
quadrant-4 "Pilot Platform"
Amazon Rufus: [0.85, 0.95]
Walmart Sparky+Element: [0.92, 0.88]
Shopify Sidekick/Magic: [0.88, 0.82]
Mercado Libre Pago: [0.45, 0.90]
Klarna OpenAI: [0.30, 0.55]
Lowe's Mylow: [0.40, 0.78]
Home Depot Magic Apron: [0.35, 0.65]
Target Store Companion: [0.25, 0.55]
Sephora Virtual Artist: [0.30, 0.60]
Best Buy Agentic: [0.25, 0.45]
Costco ML Forecast: [0.30, 0.85]
IKEA Billie: [0.20, 0.35]2.3 The vendor stratification
| Tier | Vendors | Posture |
|---|---|---|
| Tier 1 — Platform Owners | Anthropic, OpenAI, Google, Microsoft | Define the substrate (LLMs, MCP, ACP, UCP) |
| Tier 1 — Hyperscale Retailers | Amazon, Walmart, Shopify, Alibaba | Build proprietary super-agents on top of substrate |
| Tier 2 — Vertical Agent Vendors | Sierra, Decagon, Ada, Salesforce Agentforce, Microsoft Copilot for Retail | Premium / white-glove agent platforms |
| Tier 2 — Commerce-Native Vendors | Mirakl, Klaviyo, Algolia, Constructor, Lily AI | Expose retail primitives as MCP-callable tools |
| Tier 3 — Adjacent Infrastructure | Riskified, Signifyd, Transcend, OneTrust | Fraud, privacy, governance — increasingly agent-aware |
2.4 The three lessons QuanData & AI has internalized
flowchart LR
A[Klarna 2025 Walk-Back] --> L1[Lesson 1<br/>Deflection without CSAT is a trap]
B[Walmart Checkout 3× Drop] --> L2[Lesson 2<br/>Never surrender the funnel to a 3rd-party agent runtime]
C[Walmart Element Consolidation] --> L3[Lesson 3<br/>Orchestrate from day one — point-bots do not scale]
L1 --> R[QuanData Doctrine]
L2 --> R
L3 --> R
R -->|enforces| D1[Every metric pairs with CSAT/quality]
R -->|enforces| D2[QuanData owns the MCP layer]
R -->|enforces| D3[Orchestrator + Specialist topology]
style R fill:#0b3d91,stroke:#fff,color:#fff
style D1 fill:#1f6feb,stroke:#fff,color:#fff
style D2 fill:#1f6feb,stroke:#fff,color:#fff
style D3 fill:#1f6feb,stroke:#fff,color:#fff2.5 Regulatory perimeter
| Regulation | Surface | QuanData Posture |
|---|---|---|
| EU AI Act | Discovery & personalization = minimal-risk; behavioral manipulation = prohibited | Build minimal-risk baseline; require Article-aligned documentation for any inferred-attribute pricing |
| EU DG COMP | Active algorithmic pricing inquiries (July 2025); OECD report October 2025 | Surveillance-pricing prohibited by default; only category-level + cohort-level pricing permitted |
| FTC | Surveillance pricing scrutiny; deceptive-practices doctrine | Never price-discriminate by inferred income/race/health |
| GDPR / CCPA | Profile-based personalization requires lawful basis | Transcend Agentic Assist for DSARs; consent ledger consulted before PSA/MCA invocation |
| PCI-DSS | Card data scope explosion via agent-initiated payment | LLMs never see PAN/CVV; Stripe tokenization at the edge; agent only handles pm_xxx tokens |
| Agent-vs-Agent Fraud | Bot-driven returns, promo abuse, reseller arbitrage | Riskified + HUMAN-pattern defenses in production by Month 9 |
3. Theoretical Foundations — The Three Orchestration Patterns
Three orchestration patterns have emerged as the production-validated set across Anthropic's Research system, Walmart's Element, Shopify's agentic platform, Harvey's legal stack, and Sierra's customer constellation. QuanData & AI uses all three, deployed by use-case fit, never as defaults.
3.1 Pattern A — The Pipeline
Agents run sequentially. Each step takes the prior step's structured output as input. Used when later steps strictly depend on earlier ones.
flowchart LR
U[User / Trigger] --> A1[Agent 1<br/>Research]
A1 -->|structured output| A2[Agent 2<br/>Analysis]
A2 -->|structured output| A3[Agent 3<br/>Writing]
A3 -->|structured output| A4[Agent 4<br/>Review]
A4 --> O[Final Output]
style A1 fill:#0b3d91,stroke:#fff,color:#fff
style A2 fill:#0b3d91,stroke:#fff,color:#fff
style A3 fill:#0b3d91,stroke:#fff,color:#fff
style A4 fill:#0b3d91,stroke:#fff,color:#fffRetail applications: order-status inquiry → policy lookup → reply composition; product-launch brief → copy → compliance → publish.
3.2 Pattern B — The Fan-Out (Commander / Workers)
A commander agent decomposes a task and dispatches subtasks to parallel workers, each in an isolated context window. Results are aggregated by the commander. This is the pattern Netflix uses for parallel build-log analysis and Anthropic uses in its Research feature.
flowchart TB
C[Commander Agent<br/>decomposes + synthesizes] --> W1[Worker 1<br/>document A]
C --> W2[Worker 2<br/>document B]
C --> W3[Worker 3<br/>document C]
C --> W4[Worker 4<br/>document D]
C --> W5[Worker 5<br/>document E]
W1 --> S[Synthesis Layer]
W2 --> S
W3 --> S
W4 --> S
W5 --> S
S --> O[Aggregated Output]
style C fill:#0b3d91,stroke:#fff,color:#fff
style S fill:#7c3aed,stroke:#fff,color:#fffRetail applications: parallel competitor-price scraping across N retailers; parallel SKU-level demand forecasting for an entire category; parallel campaign-copy generation across 12 locales.
Critical engineering constraint (Anthropic, Multi-Agent Research System, June 2025): token usage explains ~80% of performance variance. The real value of fan-out is context compression, not just speed. Each worker compresses its findings before returning to the commander.
3.3 Pattern C — The Specialist Team
Multiple agents with distinct specializations collaborate on a single complex deliverable. Each owns a domain. The orchestrator routes, the specialists execute, results are merged. This is the pattern Harvey uses for legal work and Sierra deploys with its "constellation of 15+ models."
flowchart TB
O[Orchestrator] --> CS[Customer Service Agent]
O --> PS[Personal Shopper Agent]
O --> IR[Inventory Agent]
O --> PP[Pricing Agent]
O --> MC[Marketing Content Agent]
O --> SC[Supply Chain Agent]
O --> AI[Analytics Agent]
O --> SA[Store Associate Copilot]
O --> FR[Fraud + Risk Agent]
CS -.shared state.-> M[(Conversation /<br/>Memory Layer)]
PS -.shared state.-> M
IR -.shared state.-> M
PP -.shared state.-> M
MC -.shared state.-> M
SC -.shared state.-> M
AI -.shared state.-> M
SA -.shared state.-> M
FR -.shared state.-> M
style O fill:#7c3aed,stroke:#fff,color:#fff
style M fill:#0b3d91,stroke:#fff,color:#fffRetail applications: the QuanData reference roster is built on this pattern (see §4–§5).
3.4 The pattern-selection decision tree
flowchart TD
Q1{Does the task have<br/>strict step ordering?}
Q1 -- Yes --> Q2{Is it a single domain?}
Q1 -- No --> Q3{Same operation on<br/>many items?}
Q2 -- Yes --> P_Pipe[Pipeline<br/>e.g. order-status → policy → reply]
Q2 -- No --> P_Spec1[Specialist Team<br/>w/ ordered handoffs]
Q3 -- Yes --> P_Fan[Fan-Out<br/>e.g. SKU-level forecasts]
Q3 -- No --> Q4{Multiple distinct<br/>domains needed?}
Q4 -- Yes --> P_Spec2[Specialist Team]
Q4 -- No --> P_Single[Single Agent<br/>or function call]
style P_Pipe fill:#0b3d91,stroke:#fff,color:#fff
style P_Fan fill:#7c3aed,stroke:#fff,color:#fff
style P_Spec1 fill:#16a34a,stroke:#fff,color:#fff
style P_Spec2 fill:#16a34a,stroke:#fff,color:#fff
style P_Single fill:#6b7280,stroke:#fff,color:#fff3.5 The Cognition Theorem — "Multi-agent reads, single-threaded writes"
The most important production lesson of the last twelve months is from Cognition's Don't Build Multi-Agents (June 2025) and the 2026 follow-up Multi-Agents: What's Actually Working:
Parallel multi-agent execution is correct for reads and intelligence work. Single-threaded execution is correct for writes.
QuanData applies this as an inviolable rule: any agent that mutates inventory, payment, fulfillment, or pricing executes in a single-threaded, durable workflow (Temporal / Step Functions). Read paths fan out; write paths serialize.
4. The QuanData Agent Topology
4.1 System-level topology
flowchart TB
subgraph CH[Channel Layer]
WEB[Web / Storefront]
APP[Mobile App]
POS[POS / In-Store]
SLK[Slack / Internal]
EML[Email / SMS]
end
subgraph OL[Orchestration Layer]
GW[API Gateway<br/>auth + rate-limit + redaction]
OR[Orchestrator Agent<br/>intent · routing · audit · approval]
end
subgraph SL[Specialist Layer]
CSA[Customer Service Agent]
PSA[Personal Shopper Agent]
IRA[Inventory + Replenishment Agent]
PPA[Pricing + Promotions Agent]
MCA[Marketing Content Agent]
SCLA[Supply Chain + Logistics Agent]
AIA[Analytics + Insights Agent]
SAC[Store Associate Copilot]
FRA[Fraud + Risk Agent]
end
subgraph TL[Tool Layer — MCP]
T_SHOP[Shopify MCP]
T_STRIPE[Stripe MCP]
T_KLAV[Klaviyo MCP]
T_ALG[Algolia MCP]
T_INT[Internal MCP<br/>ERP · WMS · CDP · POS]
end
subgraph DL[Data Layer]
OMS[(OMS / Commerce<br/>Shopify · SAP CC)]
ERP[(ERP / WMS<br/>SAP · NetSuite)]
CDP[(CDP<br/>RudderStack · Segment)]
VEC[(Vector Store<br/>pgvector · Turbopuffer)]
WH[(Warehouse<br/>Snowflake · BigQuery)]
AUD[(Audit Log<br/>append-only)]
DRM[(Dreaming Store<br/>distilled memory)]
end
CH --> GW
GW --> OR
OR --> CSA
OR --> PSA
OR --> IRA
OR --> PPA
OR --> MCA
OR --> SCLA
OR --> AIA
OR --> SAC
OR --> FRA
CSA & PSA & IRA & PPA & MCA & SCLA & AIA & SAC & FRA --> T_SHOP
CSA & PSA & IRA & PPA & MCA & SCLA & AIA & SAC & FRA --> T_STRIPE
MCA & PSA --> T_KLAV
PSA & SAC --> T_ALG
CSA & IRA & PPA & MCA & SCLA & SAC & FRA --> T_INT
T_SHOP --> OMS
T_STRIPE --> OMS
T_KLAV --> CDP
T_ALG --> VEC
T_INT --> ERP
T_INT --> CDP
T_INT --> WH
OR -.audit.-> AUD
OR -.nightly distill.-> DRM
DRM -.context inject.-> OR
style OR fill:#7c3aed,stroke:#fff,color:#fff
style GW fill:#0b3d91,stroke:#fff,color:#fff
style DRM fill:#dc2626,stroke:#fff,color:#fff
style AUD fill:#0b3d91,stroke:#fff,color:#fff4.2 The control plane vs the data plane
| Plane | What it carries | Where it lives |
|---|---|---|
| Control | Intent, routing, approval status, audit IDs, agent identity | Orchestrator + Redis + Audit log |
| Data — Read | Catalog, inventory, customer profile, order history, embeddings | OMS / CDP / Vector store / Warehouse |
| Data — Write | New orders, refunds, returns, POs, price changes | OMS / ERP — always via durable workflow |
| Dreaming | Distilled summaries, learned merchant preferences, post-mortem patterns | Dreaming Store (Postgres + embeddings) |
5. The Specialist Agent Roster
QuanData's reference deployment runs one orchestrator + nine specialists + an optional tenth.
5.1 The roster at a glance
flowchart LR
subgraph OR_BOX[Top-Level]
ORCH[Orchestrator / Router]
end
subgraph CF[Customer-Facing]
CSA[Customer Service]
PSA[Personal Shopper]
SAC[Store Associate Copilot]
end
subgraph MF[Merchant / Operator-Facing]
IRA[Inventory + Replenishment]
PPA[Pricing + Promotions]
MCA[Marketing Content]
SCLA[Supply Chain + Logistics]
AIA[Analytics + Insights]
end
subgraph GR[Governance]
FRA[Fraud + Risk]
end
ORCH --> CF
ORCH --> MF
ORCH --> GR
style ORCH fill:#7c3aed,stroke:#fff,color:#fff
style CF fill:#1f6feb,stroke:#fff,color:#fff
style MF fill:#16a34a,stroke:#fff,color:#fff
style GR fill:#dc2626,stroke:#fff,color:#fff5.2 The capability matrix
| Agent | Primary Surface | Read Sources | Write Targets | Latency SLO p95 | Pattern |
|---|---|---|---|---|---|
| Orchestrator | All | All (meta) | Audit, Conversation State | 200ms (routing only) | Specialist Team |
| Customer Service | Web/App/Email/SMS | OMS, WMS, Returns, Policy KB | Returns, Refunds (gated), Tickets | 3s | Pipeline + Specialist |
| Personal Shopper | Web/App/Chat | Catalog, Embeddings, CDP, Inventory | Impressions log | 2s | Specialist + Fan-Out (multi-locale) |
| Inventory & Replenishment | Slack/Web (merchant) | ERP, WMS, Sales | Draft POs (gated), Replenishment signals | 10s (interactive) / batch | Fan-Out (per SKU) |
| Pricing & Promotions | Slack/Web (merchant) | Pricing rules, Competitor feed, Elasticity model | Price proposals, Promo drafts (gated) | 10s / batch | Specialist |
| Marketing Content | Slack/Web (merchant) | PIM, Brand voice, Past performance | Content drafts (gated), Campaign push | 30s (drafting) | Fan-Out (variants) |
| Supply Chain & Logistics | Slack/Web/App | TMS, Carrier APIs, Weather | Transfer orders, Claims, ETA updates | 5s | Pipeline |
| Analytics & Insights | Slack/Web (merchant) | Warehouse, Semantic layer | Saved queries, Alert rules | 5s (query) | Specialist |
| Store Associate Copilot | POS/Mobile | Store inventory, CDP | Reservations, Save-the-sale orders, Notes | 2s (hard SLA) | Pipeline |
| Fraud & Risk | Internal | Order risk feed, Return history | Account flags, Verification requests | 3s | Specialist |
5.3 Per-agent specifications (canonical definitions)
5.3.1 Orchestrator / Router
| Field | Specification |
|---|---|
| Scope | Single entry point. Classifies intent, decomposes multi-step tasks, dispatches to specialists, aggregates results, enforces guardrails. |
| Inputs | Raw user message, channel metadata, authenticated principal, conversation history pointer |
| Outputs | Final response + structured trace (run_id, sub-agent calls, tool calls, confidence, cost) |
| Tools | classify_intent, route_to_agent, request_human_approval, log_audit_event, get_conversation_state, write_conversation_state |
| Reads | Session store, customer profile snippet |
| Writes | Audit log, conversation state |
| Invoke | Always first; re-invoked at every turn; never bypassed |
| Model | Claude Haiku 4.5 (intent) → Sonnet 4.6 (synthesis) |
5.3.2 Customer Service Agent (CSA)
| Field | Specification |
|---|---|
| Scope | Order status, returns/refunds, exchanges, shipping issues, complaints, policy Q&A |
| Tools | get_order, get_shipment_tracking, create_return, issue_refund, create_replacement_order, lookup_policy, escalate_to_human |
| Guardrails | Refund auto-approve ≤ $250 AND LTV decile ≥ 3 AND no abuse flags; otherwise → approval queue |
| KPIs | Deflection rate, first-contact resolution, refund accuracy, CSAT (paired) |
| Model | Sonnet 4.6 (Opus 4.7 for complex disputes) |
5.3.3 Personal Shopper Agent (PSA)
| Field | Specification |
|---|---|
| Scope | Product discovery, outfit/bundle building, gift assistant, post-purchase upsell |
| Tools | search_catalog, vector_search_products, get_customer_preferences, get_purchase_history, check_stock, apply_personalization_model, build_bundle |
| Guardrails | Consent-checked per locale (GDPR/CCPA); never recommends out-of-stock; respects MAP/brand floors |
| KPIs | Recall@10, NDCG@10, attach rate (A/B), novelty, diversity |
| Model | Sonnet 4.6 + internal ranker (SageMaker/Vertex) |
5.3.4 Inventory & Replenishment Agent (IRA)
| Field | Specification |
|---|---|
| Scope | Stock visibility, reorder point math, PO drafting, allocation |
| Tools | check_stock, get_sales_velocity, forecast_demand, compute_reorder_point, draft_purchase_order, submit_po_for_approval, list_open_pos |
| Guardrails | Never writes on-hand counts; PO > $25k auto → human; reliability-scored suppliers only for auto-PO |
| KPIs | Forecast MAPE & WAPE, stockout rate, weeks-of-supply variance, PO acceptance rate |
| Pattern | Fan-Out (per-SKU forecasts in parallel) |
5.3.5 Pricing & Promotions Agent (PPA)
| Field | Specification |
|---|---|
| Scope | Price recommendations, markdown cadence, promo design, competitive repricing |
| Tools | get_price_history, get_competitor_prices, simulate_price_change, propose_promotion, update_price, create_promo_code, schedule_promo |
| Guardrails | Δ > ±10% → human; below MAP → human; margin < 8% → human; promo > 30% → category manager; > 50% → director |
| Compliance | Surveillance-pricing prohibited; only category/cohort segmentation; EU/G7 algorithmic-pricing logging |
| KPIs | Margin lift, sell-through vs target, elasticity-model R², merchant approval rate |
5.3.6 Marketing Content Agent (MCA)
| Field | Specification |
|---|---|
| Scope | Product descriptions, ad copy, email, SMS, push, landing-page hero (multilingual) |
| Tools | get_product_attributes, get_brand_voice, generate_copy, generate_image_prompt, check_compliance, submit_for_review, push_to_klaviyo, push_to_meta_ads, push_to_google_ads |
| Guardrails | Brand-voice score < 0.85 → forced review; compliance auto-checked (CAN-SPAM, GDPR, FTC claims) |
| KPIs | Brand-voice score, compliance pass rate, win-rate vs human in blind test, CTR uplift in live A/B |
| Pattern | Fan-Out (variants × channels × locales) |
5.3.7 Supply Chain & Logistics Agent (SCLA)
| Field | Specification |
|---|---|
| Scope | ETA prediction, carrier selection, exception handling, transfers, last-mile |
| Tools | get_shipment_tracking, predict_eta, select_carrier, create_transfer_order, file_carrier_claim, notify_customer_delay |
| KPIs | ETA MAE, on-time delivery, claim recovery rate |
5.3.8 Analytics & Insights Agent (AIA)
| Field | Specification |
|---|---|
| Scope | Conversational BI for merchants. NL → SQL → chart + narrative |
| Tools | nl_to_sql, run_query, get_anomaly_alerts, compute_attribution, generate_chart, save_dashboard |
| Guardrails | RLS-enforced queries; hallucinated-column rate target = 0; warehouse cost caps |
| KPIs | Execution accuracy, semantic equivalence, hallucinated-column rate |
5.3.9 Store Associate Copilot (SAC)
| Field | Specification |
|---|---|
| Scope | Mobile/POS companion: find-in-store, clienteling, endless aisle, in-store returns, task list |
| Tools | check_stock_nearby, reserve_item, lookup_customer, create_save_the_sale_order, get_clienteling_brief, print_label, get_task_list |
| Guardrails | Hard 2s p95 SLA; bypasses orchestrator for read path, audits post-hoc |
| KPIs | Time-to-answer, save-the-sale conversion, associate CSAT |
5.3.10 Fraud & Risk Agent (FRA) — optional 10th
| Field | Specification |
|---|---|
| Scope | Order risk scoring, return-abuse detection, promo-abuse detection, chargeback assist |
| Tools | score_order_risk, get_return_abuse_history, freeze_account, request_id_verification |
| KPIs | AUC, FPR @ recall = 0.9, chargeback $ saved, agent-vs-agent fraud caught |
6. The Tool Catalog
All tools return a structured envelope: {ok, data, error, audit_id}. Mutating tools embed the audit_id in the downstream system of record.
// ===== Orchestrator =====
classify_intent(text: string, context: SessionCtx): {intent: Intent, confidence: number}
route_to_agent(agent: AgentName, payload: object): AgentResult
request_human_approval(action: PendingAction, sla_minutes: number): ApprovalTicket
log_audit_event(event: AuditEvent): void
get_conversation_state(session_id: string): SessionState
write_conversation_state(session_id: string, patch: Partial<SessionState>): void
// ===== Customer Service =====
get_order(order_id: string): Order // Shopify / SAP CC / OMS
get_shipment_tracking(shipment_id: string): TrackingEvents // Shippo / EasyPost
create_return(order_id: string, lines: ReturnLine[], reason: string): RMA // Loop / Narvar / OMS
issue_refund(order_id: string, amount: Money, reason: string): RefundResult // Stripe / Adyen
create_replacement_order(original_order_id: string, lines: Line[]): Order
lookup_policy(topic: string, locale: string): PolicyExcerpt // Internal KB (vector)
escalate_to_human(ticket: TicketDraft): TicketId // Zendesk / Gorgias
// ===== Personal Shopper =====
search_catalog(query: string, filters: CatalogFilter): SKU[] // Algolia / Constructor
vector_search_products(embedding: number[] | string, k: number, filters?: CatalogFilter): SKU[]
get_customer_preferences(customer_id: string): Preferences // RudderStack / Segment
get_purchase_history(customer_id: string, limit?: number): Order[]
check_stock(sku: string, location_id?: string): StockLevel // WMS / ERP
apply_personalization_model(customer_id: string, candidates: string[]): ScoredSKU[]
build_bundle(seed_sku: string, slots: BundleSlot[]): Bundle
// ===== Inventory & Replenishment =====
get_sales_velocity(sku: string, window_days: number, location_id?: string): VelocityStats
forecast_demand(sku: string, horizon_days: number, location_id?: string): Forecast
compute_reorder_point(sku: string, service_level: number): ReorderPoint
draft_purchase_order(supplier_id: string, lines: POLine[]): PODraft // SAP / NetSuite
submit_po_for_approval(po_draft_id: string): ApprovalTicket
list_open_pos(supplier_id?: string): PO[]
// ===== Pricing & Promotions =====
get_price_history(sku: string, days: number): PricePoint[]
get_competitor_prices(sku: string, competitors?: string[]): CompetitorPrice[]
simulate_price_change(sku: string, new_price: Money): {expected_units, expected_margin, conf}
propose_promotion(scope: PromoScope, mechanic: PromoMechanic): PromoDraft
update_price(sku: string, new_price: Money, valid_from: ISODate, valid_to?: ISODate): void
create_promo_code(definition: PromoDef): PromoCode // Shopify / Talon.One
schedule_promo(promo_id: string, start: ISODate, end: ISODate): void
// ===== Marketing Content =====
get_product_attributes(sku: string): PIMRecord // Akeneo / Salsify
get_brand_voice(brand_id: string): BrandVoiceProfile
generate_copy(brief: CopyBrief): CopyVariant[]
generate_image_prompt(sku: string, scene: string): ImagePrompt
check_compliance(text: string, locale: string, category: string): ComplianceReport
submit_for_review(asset_id: string, reviewers: string[]): ReviewTicket
push_to_klaviyo(campaign: KlaviyoCampaign): CampaignId // Klaviyo MCP
push_to_meta_ads(creative: MetaCreative, adset_id: string): AdId
push_to_google_ads(asset: GoogleAsset, group_id: string): AdId
// ===== Supply Chain & Logistics =====
predict_eta(shipment_id: string): ETA
select_carrier(origin: Address, dest: Address, weight: number, sla: SLA): CarrierQuote[]
create_transfer_order(from_loc: string, to_loc: string, lines: Line[]): TransferOrder
file_carrier_claim(shipment_id: string, reason: string, amount: Money): ClaimId
notify_customer_delay(order_id: string, new_eta: ISODate, reason: string): void
// ===== Analytics & Insights =====
nl_to_sql(question: string, schema_scope: string[]): {sql: string, confidence: number}
run_query(sql: string, role: Role): QueryResult // Snowflake / BigQuery (RLS)
get_anomaly_alerts(scope: AnomalyScope): Anomaly[]
compute_attribution(window_days: number, model: AttrModel): AttributionTable
generate_chart(data: QueryResult, hint?: ChartHint): VegaSpec
save_dashboard(spec: DashboardSpec): DashboardId
// ===== Store Associate Copilot =====
check_stock_nearby(sku: string, store_id: string, radius_km: number): NearbyStock[]
reserve_item(sku: string, store_id: string, customer_id: string, hours: number): ReservationId
lookup_customer(query: string): CustomerSummary
create_save_the_sale_order(customer_id: string, lines: Line[], ship_to: Address): Order
get_clienteling_brief(customer_id: string): ClientelingCard
print_label(printer_id: string, payload: LabelPayload): void
get_task_list(associate_id: string, store_id: string): Task[]
// ===== Fraud & Risk =====
score_order_risk(order_id: string): RiskScore // Signifyd / Forter / Sift
get_return_abuse_history(customer_id: string): AbuseSignals
freeze_account(customer_id: string, reason: string): void
request_id_verification(customer_id: string): VerificationLink // Persona / Stripe Identity
Total: 52 tools across 9 specialists + orchestrator.
7. Reference Orchestration Flows
These five flows are canonical. They are the patterns every QuanData retail deployment is benchmarked against.
7.1 Flow A — "Where's my order, and can I return one item?"
sequenceDiagram
autonumber
participant U as Customer
participant O as Orchestrator
participant CSA as Customer Service Agent
participant OMS as OMS / Carrier
participant POL as Policy KB
participant AUD as Audit Log
U->>O: "Where's my order, and can I return one item?"
O->>O: classify_intent → {order_status, return}
par Status lookup
O->>CSA: order_status sub-task
CSA->>OMS: get_order(order_id)
CSA->>OMS: get_shipment_tracking(shipment_id)
OMS-->>CSA: "Out for delivery, ETA 4pm"
and Return setup
O->>CSA: return sub-task
CSA->>POL: lookup_policy("return_window")
CSA->>OMS: create_return(order_id, line_2, "size_too_small")
OMS-->>CSA: RMA-9981, refund $79
end
Note over CSA: Guardrail: $79 < $250 auto-threshold
CSA-->>O: Combined result
O->>AUD: log_audit_event(...)
O-->>U: Tracking + RMA + QR label linkLatency budget: 3s p95. Both CSA sub-calls execute in parallel.
7.2 Flow B — Reorder a fast-moving SKU before stockout
sequenceDiagram
autonumber
participant AIA as Analytics Agent
participant O as Orchestrator
participant IRA as Inventory Agent
participant ERP as ERP / WMS
participant BUY as Buyer (Slack)
participant SCLA as Supply Chain Agent
Note over AIA: 06:00 scheduled anomaly scan
AIA->>AIA: get_anomaly_alerts({type:"stockout_risk"})
AIA-->>O: SKU-4421, days-of-cover=4, lead=14
O->>IRA: replenish(SKU-4421)
par Forecast & reorder math
IRA->>IRA: forecast_demand(SKU-4421, 30)
IRA->>IRA: compute_reorder_point(SKU-4421, 0.95)
end
IRA->>ERP: draft_purchase_order(SUP-12, [{SKU-4421, qty:1200}])
Note over IRA: Guardrail: PO $48k > $25k → human
IRA->>BUY: submit_po_for_approval (Slack card)
BUY-->>IRA: ✅ Approve
IRA->>ERP: submit PO
IRA->>SCLA: register ASN callback for ETA visibility7.3 Flow C — Flash promo on overstock category
sequenceDiagram
autonumber
participant M as Merchant
participant O as Orchestrator
participant AIA as Analytics
participant PPA as Pricing Agent
participant MCA as Marketing Content
participant REV as Brand Lead
M->>O: "We're long on summer outerwear. Move it."
O->>AIA: identify overstock
AIA->>AIA: run_query(category='outerwear-summer', WoS>20)
AIA-->>O: 47 SKUs
O->>PPA: design_promo(skus, target=60%_in_14d)
loop top-10 SKUs
PPA->>PPA: simulate_price_change @ {15%, 25%, 35%}
end
PPA->>PPA: propose_promotion → "SUMMER25", 14d
Note over PPA: Guardrail: 3 SKUs margin<10% — flagged
PPA-->>M: Proposal (3 flagged)
M-->>PPA: ✅ Approve with exclusions
PPA->>PPA: create_promo_code + schedule_promo
O->>MCA: generate_campaign(promo, [email, meta, web])
par Variant generation
MCA->>MCA: get_product_attributes (batch)
MCA->>MCA: get_brand_voice
MCA->>MCA: generate_copy (5 variants/channel)
end
MCA->>MCA: check_compliance ✓
MCA->>REV: submit_for_review (4h SLA)
REV-->>MCA: ✅
MCA->>MCA: push_to_klaviyo + push_to_meta_ads
O->>AIA: register sell-through monitor7.4 Flow D — Next week's lapsed-customer email
sequenceDiagram
autonumber
participant CRON as Calendar
participant O as Orchestrator
participant MCA as Marketing Content
participant AIA as Analytics
participant PSA as Personal Shopper
participant REV as Marketing Lead
participant KLAV as Klaviyo
CRON->>O: weekly_lapsed_campaign
O->>MCA: kick off
MCA->>AIA: query_segment(last_purchase 90-180d, ltv≥6)
AIA-->>MCA: 38,400 customers, by affinity
par Segment picks (Fan-Out)
MCA->>PSA: per_segment_picks(seg=A, k=6)
MCA->>PSA: per_segment_picks(seg=B, k=6)
MCA->>PSA: per_segment_picks(seg=C, k=6)
end
MCA->>MCA: generate_copy (3 subjects × 3 segments)
MCA->>MCA: check_compliance (CAN-SPAM, GDPR)
Note over MCA: brand_score=0.91, compliance=pass
MCA->>REV: auto-approval candidate
REV-->>MCA: ✅
MCA->>KLAV: push_to_klaviyo (A/B/C, 10% holdout)
O->>AIA: schedule 7d post-send eval7.5 Flow E — Associate finds an item across nearby stores
sequenceDiagram
autonumber
participant A as Associate
participant SAC as Store Associate Copilot
participant WMS as Inventory
participant CDP as Customer Profile
participant OMS as OMS
participant SCLA as Supply Chain
participant O as Orchestrator (async audit)
A->>SAC: scan barcode (sub-2s required)
SAC->>WMS: check_stock(sku, this_store)
WMS-->>SAC: 0
SAC->>WMS: check_stock_nearby(sku, store, 25km)
WMS-->>SAC: 3 stores (1, 2, 4 units)
SAC->>CDP: lookup_customer(phone)
CDP-->>SAC: customer + prefs
SAC-->>A: options: reserve / SFS / STH
A->>SAC: ship-from-store
par
SAC->>OMS: create_save_the_sale_order
and
SAC->>WMS: reserve_item (auto-release 4h)
end
SCLA-->>SCLA: async: carrier + label + customer notify
SAC-->>O: post-hoc audit + clienteling note8. State, Memory & The Dreaming Layer
8.1 The memory layer cake
flowchart TB
subgraph EPH[Ephemeral Layer]
CS[Conversation State<br/>Redis · 24h sliding]
ST[Short-term Episodic<br/>last 20 turns]
end
subgraph WARM[Warm Layer]
CP[Customer Profile<br/>Postgres + CDP]
CAT[Catalog Embeddings<br/>pgvector / Turbopuffer]
POL[Policy / KB Embeddings<br/>pgvector namespaced]
end
subgraph COLD[Cold / System-of-Record]
OMS[(OMS · Orders · Returns)]
WH[(Warehouse · Snowflake / BigQuery)]
AUD[(Audit Log · 7yr retention)]
end
subgraph DREAM[Dreaming Layer]
AM[Agent Memory<br/>Postgres + embeddings<br/>summaries · preferences · incidents]
DSC[Dream Scheduler<br/>nightly background process]
end
CS --> ST
ST --> CP
CP --> CAT
CAT --> POL
POL --> OMS
OMS --> WH
WH --> AUD
AUD -.feed.-> DSC
DSC -->|distill| AM
AM -->|inject context| CS
style DREAM fill:#dc2626,stroke:#fff,color:#fff
style EPH fill:#0b3d91,stroke:#fff,color:#fff
style WARM fill:#1f6feb,stroke:#fff,color:#fff
style COLD fill:#374151,stroke:#fff,color:#fff8.2 The Dreaming process
Following Anthropic's Managed Agents Dreaming primitive (announced May 6, 2026): a scheduled background process that reviews past sessions, extracts patterns, identifies recurring mistakes, and curates each agent's memory stores.
stateDiagram-v2
[*] --> Idle
Idle --> Invoked: user message / trigger
Invoked --> ToolCalling: classify + plan
ToolCalling --> Reflecting: tool returns
Reflecting --> ToolCalling: needs more
Reflecting --> Outcome_Grading: draft ready
Outcome_Grading --> ToolCalling: rubric fail
Outcome_Grading --> Responding: rubric pass
Responding --> [*]
[*] --> Dreaming: nightly cron
Dreaming --> Distilling: read audit + traces
Distilling --> Updating_Memory: extract summaries / preferences / incidents
Updating_Memory --> [*]Harvey reported a ~6× completion-rate lift from enabling Dreaming on their legal agents — without a model change. The lift is purely from agents carrying institutional knowledge across sessions.
8.3 Canonical schemas
session_state {
session_id PK, principal_id, principal_type,
current_intent, pending_actions JSONB,
agent_scratchpad JSONB, last_tool_calls JSONB[],
updated_at
}
agent_memory {
id PK, scope_type ENUM('customer','merchant','sku','store'),
scope_id, kind ENUM('summary','preference','incident','note'),
content TEXT, embedding VECTOR(1536),
source_run_id, confidence FLOAT,
created_at, expires_at NULL, version INT
}
audit_event {
id PK, run_id, parent_run_id, principal_id,
agent, tool, input_hash, output_hash,
approval_id NULL, decision, latency_ms, cost_usd,
pii_redacted BOOL, created_at
}
approval {
id PK, action_type, payload JSONB, threshold_breached TEXT,
requested_by_agent, approver_pool TEXT[],
status ENUM('pending','approved','rejected','expired'),
decided_by, decided_at, sla_minutes
}
8.4 Memory engineering principles
- Summaries, not transcripts. Long-term memory stores distilled rows, never raw chat history.
- PII redacted pre-write. A deterministic redactor runs before any memory row is persisted.
- Versioned. Every memory row carries a
versionso contradictions can be resolved. - Source-traceable. Every memory row links to a
source_run_idin the audit log.
9. Outcomes — The Rubric-Driven Quality Loop
Per Anthropic's Outcomes primitive (May 6, 2026): every output is graded against a defined rubric before reaching the customer. QuanData operationalizes this as a per-agent rubric set.
9.1 The grading loop
flowchart LR
A[Agent generates output] --> B{Rubric grader<br/>self-evaluation}
B -- pass --> C[Release to customer]
B -- fail · iter < N --> A
B -- fail · iter ≥ N --> H[Escalate to human]
style B fill:#7c3aed,stroke:#fff,color:#fff
style H fill:#dc2626,stroke:#fff,color:#fff9.2 Per-agent rubric examples
| Agent | Rubric Dimensions | Auto-Pass Threshold |
|---|---|---|
| CSA | Empathy score · policy correctness · resolution completeness · CSAT proxy | ≥0.85 on all 4 |
| PSA | Relevance · constraint adherence (budget, size) · diversity · in-stock filter | 100% constraint + ≥0.80 relevance |
| MCA | Brand voice score · compliance pass · variant diversity · CTA presence | ≥0.85 brand + 100% compliance |
| PPA | Margin floor respected · MAP respected · elasticity-confidence ≥ 0.7 | 100% guardrails |
| AIA | Hallucinated-column rate · semantic-equivalence · row-count sanity | 0 hallucinated columns |
9.3 The dual-track evaluation harness
flowchart TB
subgraph OFF[Offline Track]
GD[Golden Datasets<br/>per agent]
REG[Nightly Regression]
ALERT[2σ Drop → page]
end
subgraph ON[Online Track]
AB[5% No-Agent Holdout]
VAR[Variant A/B/C between agent versions]
JUD[LLM-as-judge + 50/wk human spot-check]
end
subgraph RED[Red Team]
INJ[Prompt-injection probes]
EXF[PII exfiltration]
JBR[Jailbreak — refund / price tools]
end
GD --> REG --> ALERT
AB --> VAR --> JUD
INJ --> ALERT
EXF --> ALERT
JBR --> ALERT10. Guardrail Architecture — Defense In Depth
The most likely failure mode of any agentic retail system is not a model defect. It is a guardrail gap. QuanData architects guardrails as a layer cake — no single layer is sufficient.
10.1 The guardrail layer cake
flowchart TB
L1[Layer 1 — Input Sanitization<br/>PII tokenization · prompt-injection classifier · channel auth]
L2[Layer 2 — Intent Classification + Routing<br/>only registered intents reach specialists]
L3[Layer 3 — Tool Allow-Lists<br/>each agent restricted to its tool set]
L4[Layer 4 — Tool-Side Guardrails<br/>refund cap · MAP enforce · margin floor · MAP write block]
L5[Layer 5 — Human-In-Loop Approvals<br/>refunds >$250 · POs >$25k · promos >30% · price Δ ±10%]
L6[Layer 6 — Outcome Rubric Grader<br/>self-evaluation before release]
L7[Layer 7 — Audit + Reversibility<br/>every mutation has audit_id + compensation path]
L8[Layer 8 — Kill Switches<br/>per-agent flag + global pause]
L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> L8
style L1 fill:#0b3d91,stroke:#fff,color:#fff
style L2 fill:#1f6feb,stroke:#fff,color:#fff
style L3 fill:#1f6feb,stroke:#fff,color:#fff
style L4 fill:#16a34a,stroke:#fff,color:#fff
style L5 fill:#16a34a,stroke:#fff,color:#fff
style L6 fill:#7c3aed,stroke:#fff,color:#fff
style L7 fill:#7c3aed,stroke:#fff,color:#fff
style L8 fill:#dc2626,stroke:#fff,color:#fff10.2 The retail-specific guardrail register
| Guardrail | Threshold / Rule | Mechanism |
|---|---|---|
| Refund auto-approve | ≤ $250 AND LTV decile ≥ 3 AND no abuse | CSA tool wrapper |
| Price change | Δ > ±10% OR < MAP OR margin < 8% → human | PPA update_price pre-write |
| Promo depth | > 30% → category mgr; > 50% → director | PPA proposal stage |
| Inventory write | Agents cannot mutate on-hand directly; only transfer/PO/reservation | Tool layer enforces |
| Refund total per session | Cumulative > $1,000 → block | Orchestrator counter |
| PII handling | PAN, CVV, full SSN never reach LLM; tokenize at ingest | Pre-LLM redactor |
| Brand voice | Score < 0.85 → forced review | MCA check_compliance |
| Claims compliance | No "cures"/"clinically proven" unless category-approved | MCA per-category rules |
| Locale & regulation | EU: GDPR consent; CA: CCPA; alcohol/tobacco: age gate | Profile flags checked |
| Audit trail | Every tool call logged; mutations require audit_id |
Audit middleware |
| Rate limits | Per-customer 60 turns/hr; per-merchant 1k tool-calls/min; per-agent circuit breaker @ 5% errors/60s | Gateway |
| Reversibility | All mutating actions must have undo_* or compensation |
Tool registry metadata |
10.3 Prompt injection — the OWASP LLM #1
Retail has the highest published prompt-injection vulnerability rate at 40% (2026 data). QuanData's defense is layered:
flowchart LR
IN[Untrusted Input<br/>customer msg · review · scraped page · return reason] --> PG[PromptGuard 2<br/>~50ms classifier]
PG -- benign --> LG[LlamaGuard 3 — 8B<br/>hazard classification]
PG -- suspected --> Q[Quarantine + Manual Review]
LG -- safe --> NEM[NeMo Guardrails<br/>orchestration]
LG -- hazard --> Q
NEM --> A[Trusted Agent Loop]
style Q fill:#dc2626,stroke:#fff,color:#fff
style PG fill:#7c3aed,stroke:#fff,color:#fff
style LG fill:#7c3aed,stroke:#fff,color:#fffPublished benchmarks: layered defense reduces attack success from ~73% to ~9%. Even so, ~35% of adversarial evals still leak — therefore human-in-loop is required on every high-blast-radius action.
11. Integration & Infrastructure Stack
11.1 The reference stack — opinionated
flowchart TB
subgraph FE[Front End]
NX[Next.js · Vercel]
EXP[Mobile · Expo]
end
subgraph AR[Agent Runtime]
MAS[Mastra TS<br/>chat-heavy]
LG[LangGraph Python<br/>stateful + durable]
TEMP[Temporal<br/>durable writes]
SDK[Anthropic Agent SDK<br/>Claude-first builds]
end
subgraph MO[Models]
HK[Claude Haiku 4.5<br/>routing]
SO[Claude Sonnet 4.6<br/>main loop]
OP[Claude Opus 4.7<br/>escalation]
FB[GPT-5.5 / Gemini 2.x<br/>diversity / fallback]
end
subgraph TL_MCP[Tool Layer — MCP-Native]
SH[Shopify MCP — official]
ST[Stripe MCP — official]
KL[Klaviyo MCP — official]
AL[Algolia MCP — official]
AD[Adobe Commerce MCP — official]
SF[Salesforce B2C Commerce MCP]
end
subgraph TL_WRAP[Tool Layer — Wrapped]
SAP[SAP S/4HANA via OData wrapper]
NS[NetSuite via SuiteTalk wrapper]
POS[Square · Lightspeed via REST wrapper]
end
subgraph DATA[Data]
PG[(Postgres + pgvector)]
TP[(Turbopuffer<br/>large catalogs)]
SN[(Snowflake / BigQuery)]
RU[RudderStack CDP]
HT[Hightouch Reverse-ETL]
end
subgraph OBS[Observability]
LF[Langfuse self-hosted]
BT[Braintrust evals]
PF[Promptfoo CI]
end
subgraph SEC[Security]
PA[PromptArmor + PromptGuard 2]
TR[Transcend Agentic Assist]
SI[Signifyd / Riskified]
end
FE --> AR
AR --> MO
AR --> TL_MCP
AR --> TL_WRAP
TL_MCP --> DATA
TL_WRAP --> DATA
AR --> OBS
FE --> SEC
AR --> SEC
style AR fill:#7c3aed,stroke:#fff,color:#fff
style MO fill:#0b3d91,stroke:#fff,color:#fff
style SEC fill:#dc2626,stroke:#fff,color:#fff11.2 The MCP landscape — production-ready in May 2026
| Server | Status | Notes |
|---|---|---|
| Shopify Dev / Storefront / Admin MCP | Production · MIT | Live in every Hydrogen 2026.1.4 + Oxygen store by default |
| Stripe MCP | Production | Treasury + agentic-commerce tools |
| Klaviyo MCP | Production | Co-built with Anthropic |
| Algolia MCP | Production | Integrates with Algolia Agent Studio |
| Adobe Commerce MCP | Production | Default agent protocol post-Summit 2026 |
| Salesforce B2C Commerce MCP | GA / pilot | Fully hosted |
| Transcend Agentic Assist + MCP | Production | DSAR / privacy |
| BigCommerce · Commercetools · Medusa · Shopware | Community only — IMMATURE | Wrap yourself |
| SAP S/4HANA · NetSuite · Dynamics 365 · Oracle Retail | No official — IMMATURE | Wrap via iPaaS |
| POS (Square · Toast · Lightspeed) | No official — IMMATURE | Wrap REST |
11.3 Vector database selection
| Scale | Recommendation | Rationale |
|---|---|---|
| ≤ 1M SKUs | pgvector | Sane default, no new infra, ~$0 incremental |
| 1M–10M SKUs | Turbopuffer | Under $10/mo at this scale; sub-10ms warm; long-tail-friendly |
| 10M+ SKUs · hard SLA | Pinecone | $99–199/mo at this scale; always-warm <50ms p95 |
11.4 Hosting matrix
| Workload | Platform | Rationale |
|---|---|---|
| Storefront / UI | Vercel | Best DX, edge-optimized |
| Chat agent (TS, stateful) | Fly.io | Persistent VMs, WebSocket-friendly |
| Heavy agent loops (Python) | Modal | Unlimited sandbox, GPU, 0–50k concurrent |
| Durable writes (POs, refunds, fulfillment) | Temporal or AWS Step Functions | Crash-proof, replayable, exactly-once semantics |
| Enterprise / regulated | AWS Step Functions + Bedrock | Compliance posture |
12. Cost Economics
12.1 Per-conversation unit economics (May 2026 pricing)
| Configuration | Input Tokens / Conv | Output Tokens / Conv | Cost / Conv |
|---|---|---|---|
| No caching · Sonnet 4.6 only | ~53.6k | ~4.8k | ~$0.23 |
| With prompt caching · 90% hit rate | ~53.6k (mostly cached) | ~4.8k | ~$0.07–0.10 |
| Opus 4.7 escalation @ 5% | +baseline | +baseline | +~$0.04 |
| Haiku 4.5 routing only | ~5k | ~0.2k | ~$0.006 |
12.2 Annual run-rate — 1,000 conversations / day
pie title Annual Cost Allocation — 1,000 conv/day with caching
"LLM tokens (Sonnet + Haiku + 5% Opus)" : 30000
"Vector DB (Turbopuffer 1M SKU)" : 1200
"Observability (Langfuse self-host infra)" : 2400
"Audit log infra (Postgres + S3)" : 1800
"Eval & red-team compute" : 3600
"Embedding refresh (catalog deltas)" : 600Total realistic all-in: ~$3–5k / month for LLM and ~$1–3k / month supporting infra at the 1k conv/day tier. Caching is the single largest lever — without it, costs ~3× higher.
12.3 Cost-protection guardrails
| Cap | Threshold |
|---|---|
| Per-session token cap | 200k input / 50k output |
| Per-session cost cap | $1.00 (hard kill) |
| Opus 4.7 escalation rate | < 10% of conversations |
| Warehouse query cost cap (AIA) | $0.50 per query auto, $5 with merchant approval |
| Subagent fan-out cap | 20 parallel workers (matches Anthropic Managed Agents limit) |
13. The 12-Month Build Phasing
QuanData & AI builds in three deliberate phases. Sequencing is non-negotiable.
gantt
title QuanData Retail Orchestration — 12-Month Build
dateFormat YYYY-MM-DD
axisFormat %b %y
section Foundation (Days 0-90)
Data plumbing audit :a1, 2026-06-01, 14d
Orchestrator + audit + approval scaffold :a2, after a1, 21d
Tool registry + envelope contract :a3, after a1, 21d
Guardrail middleware :a4, after a2, 14d
CSA (returns + order status only) :a5, after a4, 28d
PSA (chat + on-site reco) :a6, after a5, 28d
Eval harness v1 :a7, after a4, 60d
section Merchant + Internal (Mo. 4-6)
AIA — conversational BI :b1, 2026-09-01, 45d
IRA — recommend-only :b2, after b1, 30d
MCA — descriptions + subjects :b3, 2026-09-15, 60d
SAC pilot — 5-10 stores :b4, 2026-10-01, 60d
Red-team v1 + audit dashboards :b5, 2026-09-01, 90d
section Automation + Hard Problems (Mo. 7-12)
PPA — live markdowns w/ approval :c1, 2026-12-01, 60d
SCLA — ETA + carrier opt :c2, 2026-12-01, 60d
MCA — full campaign mode :c3, 2027-01-01, 60d
FRA — fraud + abuse :c4, 2027-01-15, 75d
SAC — full rollout :c5, 2027-02-01, 90d
IRA — auto-PO under threshold :c6, 2027-03-01, 60d
Dreaming layer — production :c7, 2027-01-01, 90d13.1 Why this sequencing
flowchart LR
P1[Phase 1<br/>Foundation] -->|Lowest blast radius<br/>+ Fastest ROI| P2[Phase 2<br/>Merchant + Internal]
P2 -->|Trust earned<br/>+ Audit muscle built| P3[Phase 3<br/>Automation + Hard]
P1 -->|CSA: ticket-cost reduction<br/>PSA: proven LLM win| ROI1[Customer-Visible ROI<br/>by Day 90]
P2 -->|AIA: merchants love it<br/>IRA: recommend-only<br/>MCA: low-stakes content| ROI2[Operator Trust<br/>by Month 6]
P3 -->|PPA · SCLA · MCA full · FRA<br/>only after audit + eval mature| ROI3[Full Autonomy w/ Gates<br/>by Month 12]
style P1 fill:#0b3d91,stroke:#fff,color:#fff
style P2 fill:#1f6feb,stroke:#fff,color:#fff
style P3 fill:#7c3aed,stroke:#fff,color:#fff13.2 The five sequencing principles
- Read before write. Every agent ships read-only first.
- Approval before autonomy. Human-in-loop default; thresholds widen with proven precision.
- Internal before external. Merchant-facing failures are cheaper than customer-facing failures.
- One source of truth per domain. Agents never mutate OMS / ERP / WMS directly — always via the system's native API with
audit_idembedded. - Kill switches always. Per-agent feature flag + global "agent pause" button on every deployment.
14. Failure Mode Taxonomy
Based on a corpus of 591 documented production multi-agent incidents (2023–2026). 40% of multi-agent pilots fail within six months of production. QuanData designs against each.
14.1 Failure mode distribution
pie title Production Multi-Agent Failure Modes
"Context Blindness (truncated/missing info)" : 31.6
"Rogue Actions (wrong tool/arg)" : 30.3
"Silent Degradation (looks right, isn't)" : 24.9
"Memory Corruption" : 8.1
"Runaway Execution (loops, cost blowouts)" : 5.114.2 QuanData countermeasure register
| Mode | Share | QuanData Countermeasure |
|---|---|---|
| Context blindness | 31.6% | Selective retrieval · summarization at handoff · per-turn context budgets · Anthropic's compression-at-subagent pattern |
| Rogue actions | 30.3% | Typed tool schemas · dry-run mode · pre-write guardrails · audit_id requirement |
| Silent degradation | 24.9% | Outcome rubric grader · LLM-as-judge + 50/wk human spot-check · 5% no-agent holdout |
| Memory corruption | 8.1% | Append-only audit · Dreaming summaries (not transcripts) · separate read/write agents |
| Runaway execution | 5.1% | 100-step cap · $1/session cost cap · 60s circuit-breaker on 5% error rate · 20-worker fan-out cap |
14.3 The seven mistakes the X-post framework explicitly warns against
- Making every agent too general. QuanData rule: every agent has one job; narrow is powerful.
- Not standardizing output formats. QuanData rule: the structured
{ok, data, error, audit_id}envelope is canonical and enforced. - Running too many agents in parallel too early. QuanData rule: phased rollout — two agents → grade → add.
- No error handling between agents. QuanData rule: every tool call returns a fail-safe envelope with documented compensation.
- Ignoring token costs. QuanData rule: hard caps per session; prompt caching mandatory.
- Letting deflection rates obscure CSAT decay. QuanData rule: every deflection KPI paired with CSAT/quality KPI (Klarna doctrine).
- Surrendering the funnel to third-party agent runtimes. QuanData rule: QuanData clients own the MCP layer; external agents transact through it (Walmart doctrine).
15. Appendix A — Decision Matrices
15.1 Pattern selector
flowchart TD
Start([New task to orchestrate])
Q1{Mutates inventory /<br/>payment / fulfillment?}
Q2{Strict step ordering?}
Q3{Same op on many items?}
Q4{Multiple distinct domains?}
Out_Single[Single Threaded<br/>+ Temporal]
Out_Pipe[Pipeline]
Out_Fan[Fan-Out]
Out_Spec[Specialist Team]
Out_Fn[Function call<br/>— not an agent]
Start --> Q1
Q1 -- Yes --> Out_Single
Q1 -- No --> Q2
Q2 -- Yes · 1 domain --> Out_Pipe
Q2 -- Yes · N domains --> Out_Spec
Q2 -- No --> Q3
Q3 -- Yes --> Out_Fan
Q3 -- No --> Q4
Q4 -- Yes --> Out_Spec
Q4 -- No --> Out_Fn
style Out_Single fill:#dc2626,stroke:#fff,color:#fff
style Out_Pipe fill:#0b3d91,stroke:#fff,color:#fff
style Out_Fan fill:#7c3aed,stroke:#fff,color:#fff
style Out_Spec fill:#16a34a,stroke:#fff,color:#fff
style Out_Fn fill:#6b7280,stroke:#fff,color:#fff15.2 Framework selector
| If… | Then |
|---|---|
| TS shop, chat-heavy UI | Mastra (or Vercel AI SDK if pure-chat) |
| Python shop, complex stateful flows | LangGraph + Temporal underneath |
| All-Claude builds | Anthropic Agent SDK directly |
| Need typed handoffs + voice | OpenAI Agents SDK |
| Money / inventory / fulfillment writes | Temporal under any framework |
15.3 Model selector
| Task Type | Model |
|---|---|
| Intent classification / routing | Claude Haiku 4.5 |
| Standard tool-use loop | Claude Sonnet 4.6 |
| Complex multi-step reasoning (disputes, fraud, escalation) | Claude Opus 4.7 |
| Eval diversity / fallback | GPT-5.5 / Gemini 2.x |
| Image-heavy product workloads | Gemini 2.x |
15.4 Vector DB selector
| Catalog Size | Choice |
|---|---|
| ≤ 1M SKUs | pgvector |
| 1M–10M SKUs | Turbopuffer |
| 10M+ SKUs · hard SLA | Pinecone |
16. Appendix B — Glossary
| Term | Definition |
|---|---|
| Agent | An LLM-driven process with a defined role, tools, and outputs |
| Orchestrator | The top-level agent that classifies intent, routes, audits, and aggregates |
| Specialist | A narrow agent owning a single domain (e.g., pricing, returns) |
| MCP (Model Context Protocol) | Anthropic-led open protocol for exposing tools to LLM agents |
| UCP (Universal Commerce Protocol) | Google-led protocol for discovery-through-purchase by agents |
| ACP (Agentic Commerce Protocol) | Stripe/OpenAI protocol for payment leg of agent transactions |
| Pipeline | Sequential agent pattern |
| Fan-Out | Commander + parallel-worker pattern |
| Specialist Team | Multiple specialists collaborating on a single deliverable |
| Dreaming | Scheduled background process that distills past sessions into memory |
| Outcomes | Rubric-based self-evaluation primitive |
| Audit ID | Unique identifier embedded in every mutating tool call for traceability |
| Compensation path | Documented reversal procedure for any mutating action |
| Dual-track eval | Offline regression + online A/B + 5% holdout, run simultaneously |
| Layer-cake guardrails | Eight independent enforcement layers from input sanitization to kill switch |
17. Appendix C — Source Index
17.1 Foundational documents
- Anthropic — Multi-Agent Research System (Jun 2025): https://www.anthropic.com/engineering/multi-agent-research-system
- Anthropic — Code with Claude, Managed Agents announcement (May 6, 2026)
- Cognition — Don't Build Multi-Agents (Jun 2025): https://cognition.ai/blog/dont-build-multi-agents
- Cognition — Multi-Agents: What's Actually Working (2026): https://cognition.ai/blog/multi-agents-working
- Shopify Engineering — Building Production-Ready Agentic Systems: https://shopify.engineering/building-production-ready-agentic-systems
- Sierra — Constellation architecture: https://sierra.ai/about
17.2 Retail deployment evidence
- Walmart — Meet Sparky: https://corporate.walmart.com/news/2025/06/06/walmart-the-future-of-shopping-is-agentic-meet-sparky
- Walmart Tech — Element + Wallaby: https://tech.walmart.com/content/walmart-global-tech/en_us/blog/post/wibey-announcement.html
- Constellation Research — Sparky +35% AOV: https://www.constellationr.com/insights/news/walmarts-sparky-ai-agent-increases-order-value
- Major Matters — Walmart Instant Checkout 3× flop: https://www.majormatters.co/p/walmart-instant-checkout-agentic-commerce-flop
- Fortune — Amazon Rufus $10B: https://fortune.com/2025/11/02/amazon-rufus-ai-shopping-assistant-chatbot-10-billion-sales-monetization/
- PPC.land — Rufus $12B in 2025: https://ppc.land/amazons-ai-shopping-assistant-drove-12-billion-in-sales-for-2025/
- Klarna — AI assistant first-month: https://www.klarna.com/international/press/klarna-ai-assistant-handles-two-thirds-of-customer-service-chats-in-its-first-month/
- OpenAI — Klarna case study: https://openai.com/index/klarna/
- PromptLayer — Klarna AI-first to human-hybrid: https://blog.promptlayer.com/klarna-customer-service-from-ai-first-to-human-hybrid-balance/
- Lowe's — Mylow Companion at scale: https://corporate.lowes.com/newsroom/press-releases/lowes-deploys-first-scale-ai-assistant-retail-associates-05-05-25
- Sierra — WeightWatchers case study: https://sierra.ai/customers/weightwatchers
- Decagon — Bilt case study: https://decagon.ai/case-studies/bilt
- Mercado Libre AI strategy — Motley Fool: https://www.fool.com/investing/2026/04/10/mercadolibre-is-investing-heavily-in-ai-will-this/
17.3 Integration & infrastructure
- Shopify Storefront MCP: https://shopify.dev/docs/apps/build/storefront-mcp/servers/storefront
- Shopify Admin MCP / AI Toolkit (April 2026, MIT): https://revize.app/blog/shopify-ai-toolkit-guide
- Klaviyo MCP: https://developers.klaviyo.com/en/docs/klaviyo_mcp_server
- Algolia MCP: https://www.algolia.com/developers/lp-mcp
- Adobe MCP as default commerce agent protocol: https://www.paz.ai/blog/adobe-mcp-commerce-default-agent-protocol
- Stripe UCP / ACP: https://docs.stripe.com/agentic-commerce/protocol
- Anthropic prompt caching: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
- Anthropic pricing 2026: https://www.finout.io/blog/anthropic-api-pricing
17.4 Risk, compliance, & security
- Scandiweb — EU AI Act for e-commerce: https://scandiweb.com/blog/eu-ai-act-for-ecommerce-frequently-asked
- OECD — Algorithmic Pricing G7 (Oct 2025): https://www.oecd.org/content/dam/oecd/en/publications/reports/2025/10/algorithmic-pricing-and-competition-in-g7-jurisdictions_f936689b/f36dacf8-en.pdf
- Riskified — Fraud detection for AI shopping agents: https://www.riskified.com/blog/fraud-detection-ai-agent/
- TokenMix — Prompt Injection Defense 2026: https://tokenmix.ai/blog/prompt-injection-defense-techniques-2026
- Latitude — AI Agent Failure Detection: https://latitude.so/blog/ai-agent-failure-detection-guide
- Augment Code — Why Multi-Agent LLM Systems Fail: https://www.augmentcode.com/guides/why-multi-agent-llm-systems-fail-and-how-to-fix-them
- Clyro — The 5 AI Agent Failure Modes: https://clyro.dev/blog/the-5-ai-agent-failure-modes-why-they-fail-in-production/
Closing Note
This blueprint is QuanData & AI's commitment to building the agentic retail systems that the next decade of commerce will run on. It is anchored in what is shipping at Walmart, Amazon, Shopify, Mercado Libre, Sierra, and Decagon today — not what is being demoed at conferences.
Every pattern in this document is buildable as written. Every tool signature maps to a real API. Every guardrail threshold is concrete. Every phase reflects what a mid-size to enterprise retailer can actually staff.
The retailers that build this category in 2026 will pull away from the retailers that do not. QuanData & AI exists to make sure our clients are on the right side of that gap.
— Office of the Chief Architect QuanData & AI — May 2026