Multiple Agents should only be introduced when a single Agent genuinely cannot handle the task. But "Multi-Agent" is not a magic switch — it multiplies capability while equally multiplying complexity, cost, and debugging difficulty. This article covers four main collaboration topologies (single / orchestrator-worker / peer network / pipeline), task decomposition logic, arbitration mechanism design, cost control strategies, and real-world Multi-Agent implementation in mobility scenarios.
Part II · Agent 搭建第 02-04 篇 · 共 12 篇系统拓扑 / System Topology约 5,800 字
A team enthusiastically built a "five-Agent collaboration system": an intent-understanding Agent, a route-planning Agent, an order-processing Agent, a customer service Agent, and an audit Agent. After launch, they discovered: 80% of user requests only required intent understanding + order processing — two steps. But every request had to traverse the full five-Agent chain, tripling latency and quadrupling cost, with no clear ownership when things broke. They ultimately refactored to: a single primary Agent handling most requests, with the route-planning Agent only invoked on-demand for complex itinerary tasks. That was the right design.
Multi-Agent is not a "more advanced architecture" — it's a tool for solving specific problems. The only legitimate reasons to introduce multiple Agents are:
引入多 Agent 的合理理由
① 任务天然并行:两个子任务相互独立,可以同时进行,并行完成比串行快很多(如同时查火车和酒店);
② 上下文窗口不够:单次任务的信息量超过单个模型的上下文限制,需要拆分后分别处理;
③ 专业化明确提升质量:某一类子任务用专门训练/提示的模型处理,质量明显好于通用 Agent;
④ 风险隔离:高风险操作(支付、数据写入)隔离在独立 Agent 中,出了问题不影响整体流程;
⑤ 任务量太大、需要规模扩展:并发任务数量超出单 Agent 处理能力,需要横向扩展。
Five legitimate reasons to introduce Multi-Agent: ① Naturally parallel tasks — two subtasks are independent and can run simultaneously, making parallel significantly faster than serial (e.g., simultaneously querying trains and hotels). ② Context window exhaustion — a task's information volume exceeds a single model's context limit, requiring decomposition. ③ Specialization clearly improves quality — a specifically tuned/prompted model handles a subtask category measurably better than a general Agent. ④ Risk isolation — high-risk operations (payment, data writes) are quarantined in a dedicated Agent so failures don't cascade. ⑤ Scale — concurrent task volume exceeds a single Agent's throughput, requiring horizontal scaling.
Multi-Agent debugging complexity scales as the square of the number of Agents. Before reaching for Multi-Agent, always ask: can this task be solved with one well-optimized Agent + tool calls? Most of the time, the answer is yes.
Just as companies have flat, hierarchical, and matrix org structures, Agent systems have different collaboration topologies. Choosing the wrong topology causes problems in unexpected places — coordination overhead too high, unclear ownership when failures occur, or prohibitive scaling costs.
Multi-Agent 四种拓扑示意图 · 从简到复杂,引入门槛依次升高 · Four topologies from simple to complex
Topology selection is determined by three axes: task coupling (how interdependent are subtasks?), quality requirements (does each stage need independent validation?), and team capability (can you debug distributed Agent interactions?). Start simple — most products begin with Topology 1 and graduate to Topology 2 only when a concrete bottleneck appears.
拓扑 Topology
协调复杂度 Coordination
调试难度 Debugging
并行能力 Parallelism
推荐使用阶段 When to Use
单 Agent + 工具
最低
最低
低
MVP 及大多数生产场景
主从结构
中
中
高
任务可拆分、有并发需求
对等网络
高
最高
最高
复杂生产系统、模块边界清晰
流水线
低
低
低
需要按序多级质检的场景
Section 03
任务拆分:怎么才算拆得对
Task Decomposition: What Good Splitting Looks Like
"Help me plan a 3-day Beijing departure trip and book the train and hotel." Bad decomposition: split "plan itinerary," "book train," and "book hotel" into three independent Agents — but the itinerary must be decided before knowing which direction to book; the train must be booked before knowing the departure time for the hotel. Strong dependencies between tasks make parallel execution chaotic. Good decomposition: plan first (single-Agent decision), then parallel-execute train booking and hotel booking (two Workers running simultaneously) — exploiting parallelism while respecting dependency order.
好的任务拆分,需要满足三个条件:
Good task decomposition requires three conditions:
好的任务拆分标准
① 子任务有明确的输入/输出边界:A 任务的输出是 B 任务的输入,两者之间的接口必须预先定义清楚,不能靠"Agent 自己理解"来传递信息;
② 子任务内聚、任务间松耦合:每个子任务内部逻辑完整,不依赖其他子任务的中间状态;需要信息共享时,通过明确的接口传递,而不是直接读写共享状态;
③ 拆分后可以带来实际收益:要么并行提速(两件事可以同时做),要么专业化提质(特定模型处理特定子任务明显更好),否则拆分只增加复杂度。
Three conditions for good decomposition: ① Clear input/output boundaries per subtask — the interface between subtask A's output and subtask B's input must be pre-defined, not left to "the Agent will figure it out." ② High cohesion within subtasks, loose coupling between them — shared information passes through explicit interfaces, not through shared mutable state. ③ Decomposition delivers actual benefit — either parallel speedup (two things done simultaneously) or specialization quality improvement (a specific model handles a specific subtask measurably better). Otherwise, decomposition only adds complexity.
依赖图:拆分前必须画的一张图
Dependency Graph: The Diagram You Must Draw Before Splitting
拆分任务之前,先画依赖图:找出哪些子任务可以并行(没有依赖关系),哪些必须串行(B 依赖 A 的输出)。只有真正独立的子任务才值得并行执行。强行让有依赖的任务并行,只会引入同步问题和竞态条件。
Before splitting tasks, draw the dependency graph: identify which subtasks can run in parallel (no dependencies between them) and which must run in series (B depends on A's output). Only genuinely independent subtasks are worth parallelizing. Forcing dependent tasks to run in parallel only introduces synchronization problems and race conditions.
依赖图示例:先串行规划,再并行执行,最后汇总 · Serial planning → parallel execution → aggregation
Section 04
仲裁机制:意见不一致时怎么办
Arbitration: Resolving Agent Disagreements
一个多 Agent 路线规划系统:路线 Agent A 推荐高铁(快但贵),路线 Agent B 推荐大巴(慢但省钱),两个 Agent 的结论互相矛盾,主 Agent 没有仲裁逻辑——于是它把两个答案都返回给了用户,并说"您可以自己选择"。用户愤怒地投诉:"我要你们帮我决定,不是让我自己想。"
没有仲裁机制的多 Agent 系统,遇到分歧就会"甩锅"给用户。
A multi-Agent route planning system: Route Agent A recommends high-speed rail (fast but expensive); Route Agent B recommends coach (slow but cheap). The two Agents' conclusions contradict each other, and the main Agent has no arbitration logic — so it returned both answers to the user with "you can choose for yourself." The user complained furiously: "I asked you to decide for me, not make me think." A multi-Agent system without arbitration logic offloads its conflicts onto the user.
When multiple Agents produce conflicting conclusions, the system must have a defined arbitration mechanism — not expose the conflict to the user. Three common arbitration approaches, each suited to different contexts:
仲裁方式 Arbitration Method
适用场景 When to Use
优点 Pros
风险 Risks
指挥官最终决策 Orchestrator Decides
有主 Agent,各 Worker 意见仅供参考
决策链短、速度快、责任清晰
指挥官若判断失误,无法被纠正
置信度加权投票 Confidence-Weighted Vote
多个同级 Agent,各自有置信度评分
综合多方信息,减少单点偏差
置信度本身不可靠,可能误导投票
人工介入 Human Escalation
高风险操作、置信度普遍低、分歧超过阈值
最安全,用户保有控制权
打断自动化流程,增加等待时间
Design your arbitration logic before deployment — not as a patch when conflicts appear in production. The key question: when two Agents disagree, who (or what) has the final word? The answer must be deterministic, not left to runtime improvisation. For high-stakes decisions (payment amounts, irreversible actions), always default to human escalation when confidence is below a defined threshold.
📋 产品负责人决策清单 · PM Decision Checklist
是否定义了"Agent 分歧阈值"?超过多大的分歧才触发仲裁,还是只要分歧就仲裁?
人工介入的等待时间是否可接受?对于时间敏感场景(如即将发车),人工仲裁可能来不及。
指挥官 Agent 失败时,系统有什么降级方案?指挥官是单点故障,必须有 fallback。
是否记录了每次仲裁的过程和结果?仲裁日志是日后改进仲裁策略的唯一依据。
用户是否需要知道背后有多个 Agent 在协作?通常不需要,但出错时如何解释需要设计。
Is an "Agent disagreement threshold" defined? How large must the divergence be before triggering arbitration? Is the human escalation wait time acceptable in time-sensitive scenarios (e.g., a train departing soon)? Does the Orchestrator Agent have a failure fallback? It's a single point of failure. Is every arbitration decision logged? Arbitration logs are the only basis for improving arbitration strategy over time. Do users need to know multiple Agents are collaborating? Usually no — but how to explain failures requires deliberate design.
A team launched an orchestrator-worker Agent system. The first month's bill: LLM costs were 7× budget. They assumed it was usage growth — but on close inspection, found a design problem: the Orchestrator was passing the full conversation context to every Worker on each task assignment, and every Worker was returning the full context back. Each task cycle tripled the context length. With 8 Workers, costs were effectively 24×. Context design is the most commonly overlooked cost killer in Multi-Agent systems.
多 Agent 系统的成本主要来自四个来源,需要分别应对:
Multi-Agent system costs come from four sources, each requiring a different mitigation strategy:
成本来源 Cost Source
典型表现 How It Appears
控制策略 Mitigation Strategy
上下文膨胀 Context Bloat
Agent 间传递完整对话历史,token 数量随轮次线性增长
只传必要的结构化摘要,而非完整上下文;Worker 只接收任务相关信息
不必要的大模型调用 Unnecessary Large Model Calls
简单的格式校验、路由判断也调用 GPT-4 级别的模型
路由、格式处理用小模型(如 GPT-3.5、Haiku);只有复杂推理才用大模型
结果缓存缺失 Missing Result Caching
相同的查询重复调用工具/模型,没有缓存层
查询类结果加缓存(余票、路况可缓存 30-60 秒);相同 prompt 的模型结果可短暂缓存
非关键路径串行化 Non-Critical Serial Execution
可以异步处理的任务(如发通知、写日志)阻塞了主流程
非关键任务异步化,不阻塞主 Agent 的响应路径
The single most impactful cost-control intervention is context management: never pass the full conversation history to Worker Agents. Pass only the structured task spec. Workers are specialists — they don't need the full context of every prior turn to do their job. This one change typically reduces token consumption by 50-80% in orchestrator-worker architectures.
模型分层使用策略
Tiered Model Usage Strategy
不是所有 Agent 都需要用同一个模型,也不是用最贵的模型系统就最好。合理的分层策略是:
Not every Agent needs the same model, and using the most expensive model everywhere doesn't produce the best system. A sensible tiered approach:
模型分层使用参考
🔴 大模型(GPT-4o / Claude Sonnet):用于核心推理、复杂规划、需要高准确率的意图理解
Tiered model usage reference: ① Large models (GPT-4o / Claude Sonnet) — core reasoning, complex planning, high-accuracy intent understanding. ② Mid-size models (Claude Haiku / GPT-3.5) — task routing, format conversion, simple extraction, Worker standardized operations. ③ Rules/scripts — format validation, data cleaning, deterministic operations — never invoke a model where deterministic code suffices. Core principle: use the smallest sufficient model for each task; complexity determines model tier, not "use the biggest model just to be safe."
A mobility platform's Agent system handles complex multi-modal travel requests: cross-city journeys involving trains, rides, and hotels require querying multiple information sources simultaneously and then integrated planning. With serial single-Agent processing, user wait times exceeded 15 seconds, generating high negative review rates. After introducing parallel architecture, the three information types were queried simultaneously, bringing total response time below 4 seconds with significant satisfaction improvement. Yet for simple "book me a ride" requests, the same platform still uses a single Agent — no reason to introduce complex architecture for a simple task.
Multi-Agent design in mobility scenarios follows one core principle: select the architecture based on task complexity, not a one-size-fits-most-complex approach. Here is a reference layered architecture:
出行场景 Multi-Agent 分层架构:路由 → 按复杂度分流 → 单 Agent 或并行 Worker · Layered architecture: route by complexity, then single or parallel
Agent 角色 Agent Role
职责 Responsibility
推荐模型 Recommended Model
关键设计要点 Design Notes
意图路由 Agent Intent Router
分析请求复杂度,决定走单 Agent 还是多 Agent 路径,提取关键参数
中等模型(Haiku)
路由决策必须快速(<500ms);路由错误成本高,需要测试集覆盖
Orchestrator Agent Orchestrator
接收复杂任务,拆解为子任务,分发给 Worker,收集结果后综合规划
大模型(Sonnet / GPT-4o)
只传结构化任务参数给 Worker,不传完整上下文;设置任务超时
查询类 Worker Query Workers
并行查询特定信息源(火车/酒店/打车),返回结构化结果
中等模型 / 直接 API
查询结果可缓存 30-60 秒;失败要有默认返回值,不能让 Orchestrator 空等
异常处理 Agent Exception Handler
处理退款申诉、行程异常、超出规则的边缘情况
大模型 + 人工 escalation
异常 Agent 独立部署,出问题不影响主流程;需要完整的案例日志
Key insight from real deployments: the Intent Router is the most critical Agent in this architecture. Its job — deciding whether a request needs single or multi-Agent processing — determines the cost and latency of every user interaction. Invest heavily in testing and validating the router. A misrouted simple request through the full multi-Agent pipeline is pure waste; a complex request misrouted to single-Agent processing results in degraded output quality.
The greatest value of Multi-Agent in mobility scenarios is not "making AI smarter" — it's "maintaining quality at high concurrency." Running multiple queries in parallel is 3-5× faster than serial, which is a real and tangible experience gap in time-sensitive travel scenarios. But the price is real: every Agent's output needs validation, inter-Agent communication formats need standardization, and observability requirements multiply. Recommended evolution path: first make the single-Agent system stable and high-quality. When you identify a specific parallel bottleneck (e.g., slow multi-source queries), introduce Multi-Agent only for that specific bottleneck. Do not design a full multi-Agent system from day one.
Glossary
中英术语对照表
Bilingual Terminology Glossary
本篇涉及的核心概念,中英对照及简明释义。
多 Agent 系统
Multi-Agent System (MAS)
多个 AI Agent 协作完成任务的系统架构,各 Agent 有独立的职责和推理能力。
A system architecture in which multiple AI Agents collaborate to complete tasks, each with independent responsibilities and reasoning capability.
主从结构
Orchestrator-Worker Pattern
一个"指挥官 Agent"分解任务并分配给多个"执行者 Agent"的层级协作结构。
A hierarchical collaboration pattern where an Orchestrator Agent decomposes tasks and delegates to Worker Agents.
对等网络
Peer-to-Peer Agent Network
多个平级 Agent 相互协作、无单一指挥中心的网络拓扑,适合模块边界清晰的复杂系统。
A network topology where Agents collaborate as peers without a single coordination center, suited for complex systems with clear module boundaries.
流水线拓扑
Pipeline Topology
任务按固定顺序在多个 Agent 间传递,每个 Agent 处理一个环节,适合多阶段质检场景。
A topology where tasks are passed sequentially through multiple Agents, each handling one processing stage; suited for multi-stage quality validation.
任务拆分
Task Decomposition
将复杂任务分解为多个子任务的过程,好的拆分使子任务独立、可并行且有明确的输入输出边界。
The process of breaking a complex task into subtasks; good decomposition produces independent, parallelizable subtasks with clear input/output boundaries.
依赖图
Dependency Graph
描述任务间依赖关系的有向图,用于确定哪些子任务可以并行,哪些必须串行。
A directed graph depicting dependencies between tasks, used to determine which subtasks can run in parallel and which must run serially.
仲裁机制
Arbitration Mechanism
多 Agent 产生冲突结论时的决策规则,如指挥官决策、置信度投票或人工介入。
The decision rules applied when multiple Agents produce conflicting conclusions, e.g., Orchestrator decision, confidence-weighted voting, or human escalation.
意图路由
Intent Routing
根据用户请求的复杂度和类型,决定由哪个 Agent 或路径来处理的调度机制。
A dispatching mechanism that determines which Agent or processing path handles a request based on its complexity and type.
上下文膨胀
Context Bloat
多 Agent 间传递过多冗余上下文导致 token 消耗急剧增加的常见成本陷阱。
A common cost trap where Agents pass excessive redundant context to each other, causing token consumption to grow rapidly.
模型分层
Tiered Model Strategy
按任务复杂度选用不同能力/成本的模型,用最小够用的模型完成每类任务。
Using models of different capability/cost tiers based on task complexity — always using the smallest sufficient model for each task type.
并行执行
Parallel Execution
多个独立子任务同时运行,相比串行可显著缩短总响应时间,是多 Agent 最主要的价值来源。
Running multiple independent subtasks simultaneously; significantly reduces total response time compared to serial execution — the primary value driver of Multi-Agent architecture.
单点故障
Single Point of Failure (SPOF)
系统中一旦失败就会导致整体不可用的关键节点,主从架构中 Orchestrator 是典型的单点。
A critical node whose failure causes entire system unavailability; the Orchestrator in an orchestrator-worker architecture is a classic SPOF.
结果缓存
Result Caching
将工具调用或模型调用的结果临时存储,避免短时间内重复执行相同查询,降低成本和延迟。
Temporarily storing tool call or model call results to avoid re-executing identical queries within a short window, reducing cost and latency.
人工升级
Human Escalation
当 Agent 无法处理或置信度低时,将任务转交给人工处理的机制。
The mechanism of transferring a task to human handling when an Agent cannot resolve it or confidence is below threshold.
松耦合
Loose Coupling
Agent 或模块间通过明确接口交互,不直接依赖彼此的内部实现,便于独立修改和替换。
Agents or modules interacting through explicit interfaces without directly depending on each other's internals, enabling independent modification and replacement.
竞态条件
Race Condition
多个 Agent 并行操作共享资源时,因执行顺序不确定而导致结果异常的并发问题。
A concurrency issue where multiple Agents concurrently accessing shared resources produce inconsistent results due to non-deterministic execution order.
可观测性
Observability
通过日志、追踪、指标了解系统内部状态的能力,多 Agent 系统中尤为重要。
The ability to understand a system's internal state through logs, traces, and metrics; especially critical in Multi-Agent systems where failures are hard to reproduce.
任务超时
Task Timeout
设定子任务的最长执行时间,超时后触发 fallback,防止单个 Worker 阻塞整体流程。
A maximum execution time set for a subtask; triggers fallback upon expiry, preventing a single slow Worker from blocking the overall process.