An Agent that can only "talk" is useless. Its value comes from "doing" — querying orders, sending notifications, calling map APIs, writing to databases. The mechanism that enables this is called Tool Calling. And as the number of tools grows and different models and platforms need to interoperate, Anthropic's Model Context Protocol (MCP) is emerging as the industry's unified integration standard. This article covers how tool calling works, permission design, error handling, and MCP's core concepts and strategic significance at the product level.
Part II · Agent 搭建第 02-03 篇 · 共 12 篇能力扩展 / Capability Extension约 5,500 字
这里有一个常见的误解:很多人以为 Agent 是"直接调用了 API"。实际上,调用 API 的是你的服务器,Agent 做的是:告诉你它想调用哪个工具、传入什么参数,然后等你把结果告诉它,再用这个结果继续推理和回答。
A user asks the customer service Agent: "Can you check how many high-speed rail tickets are left for Beijing to Shanghai tomorrow?" The Agent cannot reason its way to a ticket count — it has to query a real system. But how? A common misconception: many people think the Agent "directly calls the API." In reality, your server calls the API. What the Agent does is: declare which tool it wants to call and with what parameters, wait for you to return the result, then use that result to continue reasoning and formulate the answer.
Tool calling involves five steps. The entire process is essentially a "structured delegation protocol" — the model tells the external system what it wants to do, the external system does it, and returns the result to the model.
工具调用五步流程 · 模型只声明意图,服务器负责执行 · The five-step tool calling flow — the model declares; the server executes
This design has an important implication: the security of tool calling is determined by your server, not by the model. The model can "request" any tool it believes is needed, but your server can reject, rate-limit, audit, or insert human confirmation steps before any operation. Control always stays with you.
The model decides which tool to call by analyzing the user request against the tool descriptions. The result (JSON returned by the tool) is added to the conversation context, and the model continues generating based on this data. A complete task may involve multiple tool calls, forming a "reason → call → reason → call" loop.
Section 02
工具描述:被严重低估的工程细节
Tool Descriptions: The Underestimated Engineering Detail
一个团队上线了一个"查询用户行程"的工具,工具描述写的是:"get_trip_info",没有任何说明。Agent 在测试时经常调错工具,或者参数传错格式。排查了两天,最后发现:只要把工具描述改成"根据用户 ID 和日期范围查询历史行程记录,返回出发地、目的地、时间和状态",调用准确率从 60% 提升到了 95%。
工具描述不是给程序员看的注释,它是给模型看的"使用说明书"。
A team deployed a "query user itinerary" tool with the description: "get_trip_info" — no explanation whatsoever. During testing, the Agent frequently called the wrong tool or passed parameters in the wrong format. After two days of debugging, they discovered: simply rewriting the description to "Query historical trip records by user ID and date range, returning origin, destination, time, and status" raised call accuracy from 60% to 95%. Tool descriptions are not comments for developers — they are the instruction manual for the model.
The model selects tools based entirely on their description text. The clearer the description, the more accurate the model's judgment. A good tool description answers three questions: what the tool can do, when it should be used, and what the input/output format looks like.
Bad tool descriptions produce bad tool selection. Good descriptions include: what data the tool returns, when to use it (and when NOT to), any constraints or side effects, and parameter format examples. Treat tool description writing as a first-class engineering task, not an afterthought.
工具描述的四个黄金法则
Four Golden Rules for Tool Descriptions
法则 Rule
含义 What It Means
反例 Counter-Example
明确边界 Explicit Scope
说清楚能做什么、不能做什么,避免模型误调
"查询用户信息" → 到底查什么信息?行程?账户?偏好?
说明副作用 State Side Effects
写操作必须明确标注"会修改数据",读操作标注"只读不写"
不说明是否写入,模型可能在不该的时候调用写工具
给参数示例 Include Examples
在参数描述里写明格式和合法取值范围,如 "YYYY-MM-DD"
只写"date: string",模型可能传入"明天"或"2024年5月"
消歧义 Disambiguate
有多个相似工具时,写清楚每个工具的适用场景差异
get_trip 和 get_order 两个工具描述相似,模型经常选错
These four rules apply universally across any tool set. They are especially critical when the Agent has access to many tools — the more tools available, the more likely the model is to make the wrong selection without clear, differentiated descriptions.
Section 03
权限三级:哪些工具能自主调用
Three-Level Permissions: What Can Run Autonomously
A smart assistant was tasked with "automatically handling itinerary changes." The Agent determined the user's train was delayed, so it automatically refunded the ticket and booked a new one — but the new train wasn't what the user wanted at all, and the cancellation fee had already been deducted. The user complained. This wasn't a model reasoning failure — it was a permission boundary design failure. Auto-refunding and auto-rebooking should never be allowed to execute autonomously without explicit confirmation.
The core principle of tool permission design: reversibility determines the boundary of autonomous execution. The harder an action is to undo, the more it requires human confirmation. This is the most important line in product safety design.
🟢 可自主执行 · Auto-OK
查询余票余座
查询路况实时信息
查询用户订单列表
查询退款资格(不退款)
天气 / 地图信息查询
查看用户偏好记录
🟡 需用户确认 · Confirm First
下单 / 预订座位
发起支付
发送通知短信 / 推送
修改个人信息
申请退款(金额 <500元)
加入等候列表
🔴 必须明确授权 · Explicit Auth
退款(金额 ≥500元)
删除订单 / 账户数据
批量操作多个订单
更改支付方式
申请开具发票
代理人操作(代他人购票)
Three-tier permission model: ① Query operations (read-only, no side effects) can run autonomously. ② Write operations with reversible consequences require user confirmation before execution. ③ Irreversible, high-value, or high-risk operations require explicit authorization — the user must actively confirm with full understanding of the consequences. This model should be hardcoded into your tool infrastructure, not left to the model's judgment.
Permission boundaries must not rely on the model's "own judgment." Models can be overconfident and execute when they shouldn't. The correct design: hardcode permission logic in the tool execution layer (server side) — certain tools, when called, are automatically intercepted by the server and return a "requires user confirmation" signal, regardless of whether the model thinks it should proceed. This is called "permissions enforced at the tool layer, not the prompt layer" — prompts can be circumvented; tool-layer enforcement cannot.
At 3 AM during peak booking traffic, the ticketing system API timed out. The Agent called the available-seats query tool, waited 10 seconds, got nothing back. The Agent didn't know what to do — so it entered a retry loop, retrying every 5 seconds, 20 times in total, eventually crashing the API endpoint. Error handling isn't "how to tell the user something went wrong." It's "how the entire system behaves when something goes wrong."
工具调用的错误,分三类,每类的处理策略不同:
Tool call failures fall into three categories, each requiring a different handling strategy:
错误类型 Error Type
典型表现 Examples
推荐处理策略 Recommended Strategy
连接/超时错误 Network / Timeout
接口超时、网络断连、服务不可达
带退避的重试(指数退避,最多 3 次);重试失败后降级到备选工具或告知用户稍后重试
格式/参数错误 Format / Param Error
模型传入格式不对、必填参数缺失、类型不匹配
不重试,将错误信息返回给模型,让模型自我纠正参数后重新调用;最多允许 2 次自我纠错
业务逻辑错误 Business Logic Error
订单不存在、余票为零、用户无权限退款
不重试,将结构化的业务错误信息返回给模型,由模型生成对用户友好的解释和备选方案
Key principle: not all errors warrant retrying. Retrying a business logic error (e.g., "order not found") wastes tokens and time — the answer won't change. Only network/timeout errors are worth retrying, and always with exponential backoff and a maximum retry count. Never let the Agent retry indefinitely.
For write operations (payments, orders, messages), retries carry a hidden risk: if the first call actually succeeded but returned a timeout, and the Agent retries, the operation executes twice — double payment, duplicate message. The design pattern that prevents this is idempotency: generate a unique idempotency key for each write operation; if the server receives the same key again, it executes only once and returns the cached result of the first call for all subsequent duplicates.
Has each tool type been given a defined "failure fallback behavior"? Agents must not silently stall on tool failure. Is there a hard cap on retries? Unbounded retries are one of the most common causes of Agent system failures. Are write-operation tools idempotent? Payment, ordering, and notification tools are the top priority. Are error messages model-friendly? Errors must include enough context for the model to determine its next action — returning only "500 Internal Error" is insufficient. Is there a maximum tool-call-count guardrail to prevent runaway loops that exhaust budgets?
Imagine: your company has 30 internal tools. You integrate Claude and write 30 tool adapters for it. Next month you integrate GPT-4o — another 30 adapters. The month after, Gemini — 30 more. Meanwhile, the tool team updates 10 of the tool APIs, and you have to synchronize the changes across three adapter sets. This is exactly the problem MCP (Model Context Protocol) is designed to solve: one tool, usable by any MCP-compatible model or platform — write once, run everywhere.
MCP(Model Context Protocol,模型上下文协议)是 Anthropic 于 2024 年底提出并开源的一套标准协议,目标是定义 AI 模型与外部工具/资源之间的统一通信接口。用一句话解释:MCP 之于 AI 工具,就像 USB 之于硬件设备——统一接口,消除适配成本。
MCP (Model Context Protocol) is an open standard proposed and open-sourced by Anthropic in late 2024. Its goal is to define a unified communication interface between AI models and external tools/resources. In one sentence: MCP is to AI tools what USB is to hardware devices — a unified connector that eliminates adaptation cost.
MCP 的核心价值:任意模型 × 任意工具,通过统一协议互通 · MCP decouples models from tools via a universal protocol layer
MCP Server: The provider of a tool or resource, implementing the MCP server interface. This could be a database query tool, a file read/write service, or an integration layer for a third-party SaaS. Once MCP-compliant, the tool can be called by any MCP client.
MCP Client: The caller of tools, typically the Agent runtime or model integration layer. Claude, MCP-compatible LangGraph setups, and similar systems act as MCP clients, discovering and calling any MCP Server's tools through the unified protocol.
Resources & Prompts: MCP supports not only tool calling but also active resource reading (files, databases, code repositories) and predefined prompt templates. This enables Agents to access virtually any type of external information source in a standardized way.
MCP's ecosystem is growing rapidly — Anthropic, Cursor, Zed, GitHub Copilot, Cloudflare, and other leading companies have announced MCP support. For product owners, this means:
MCP 的产品层含义
① 工具资产可复用:你一次把内部工具 MCP 化,未来接入任何支持 MCP 的模型或平台,不需要重新适配;
② 降低切换成本:今天用 Claude,明天想换 GPT-4o,工具层不需要动,只换模型接入层;
③ 接入外部生态:未来会有越来越多的第三方工具以 MCP Server 的形式开放,你可以直接接入,不需要自己开发所有能力;
Four product-level implications of MCP: ① Tool asset reusability — MCP-ify your tools once, then integrate with any MCP-compatible model or platform without re-adaptation. ② Lower switching cost — swap the model layer without touching the tool layer. ③ Access to the external ecosystem — a growing library of third-party MCP Servers means you don't have to build every capability in-house. ④ Understanding now = early positioning — MCP is evolving rapidly; investing in understanding its design principles now establishes a first-mover advantage before the ecosystem matures.
一个最小 MCP Server 示例
A Minimal MCP Server Example
MCP Server 并不复杂。以 Python 为例,一个能被任意 MCP 客户端调用的工具服务器,核心代码大概是这个样子(实际项目需要参考官方 SDK):
An MCP Server is not complex. In Python, a tool server callable by any MCP client looks roughly like this (refer to the official SDK for production use):
The key point: the docstring is what the model reads to understand the tool — it IS the tool description. Write it with the same care you'd give a user-facing product specification.
A mobility Agent launched with 18 tools, but the model kept confusing "query order," "query itinerary," and "query user info." The descriptions were too similar and some tools had overlapping functionality. Eventually the 18 had to be consolidated into 9, each with a more singular responsibility and differentiated description, before call accuracy stabilized. More tools is not better — clear boundaries matter more than quantity.
出行场景的工具集,可以按"行为意图"分成四个域,每个域内的工具保持独立、无重叠:
A mobility scenario tool set can be organized into four domains by "action intent," with each domain's tools kept independent and non-overlapping:
工具域 Domain
典型工具 Typical Tools
权限级别 Permission Level
关键设计要点 Design Notes
信息查询 Information
余票查询、路况查询、天气查询、运价查询
🟢 自主执行
全部只读;高频调用,注意缓存策略减少 API 成本
订单管理 Order Mgmt
下单、查订单、改签、取消预订
🟡 需确认
下单/改签写操作必须经过用户确认;查询可自主;写操作必须幂等
支付财务 Payment
发起支付、申请退款、查支付记录、开发票
🔴 明确授权
所有写操作必须有幂等键;退款金额阈值触发人工复核;日志完整留存
通知沟通 Notifications
发短信通知、推送行程提醒、发票推送
🟡 需确认
防止重复发送(幂等键);用户授权范围内的通知才能发;频率限制
Domain-based tool organization serves two purposes: it helps the model make correct tool selection decisions (tools within a domain are clearly differentiated), and it enables consistent permission enforcement (all tools in the Payment domain require explicit authorization). Design the tool set architecture before writing individual tool implementations.
Tool description quality determines whether the Agent chooses the right "player." Permission boundary precision determines how much damage occurs when something goes wrong.
🏗 架构师视角 · Architect's Perspective
MCP 正在成为行业标准,这一点现在已经比较确定。对出行类 Agent 产品来说,值得提前投入的方向是:把核心工具(余票查询、订单管理、路况 API)封装成 MCP Server,而不是只做针对某个特定模型的私有适配。这样,未来切换模型或者接入新的 AI 平台时,工具层的成本接近于零。
另一个角度:随着越来越多的第三方平台(地图服务、航旅服务、酒店平台)以 MCP Server 形式开放能力,出行 Agent 可以通过接入这些公共 MCP Server 快速扩展能力范围,而不需要为每一个平台写单独的集成代码。这是 MCP 生态成熟后最直接的商业价值。
MCP is becoming the industry standard — this is now fairly certain. For mobility Agent products, the strategic investment worth making now: wrap your core tools (ticket availability, order management, traffic API) as MCP Servers rather than building private adapters for specific models only. When you switch models or integrate new AI platforms in the future, the tool layer cost approaches zero. A second angle: as more third-party platforms (maps, aviation, hotels) open their capabilities as MCP Servers, mobility Agents can rapidly expand their capability surface by plugging into public MCP Servers — without writing custom integration code for each platform. This is the most direct commercial value of a mature MCP ecosystem.
Glossary
中英术语对照表
Bilingual Terminology Glossary
本篇涉及的核心概念,中英对照及简明释义。
工具调用
Tool Calling / Function Calling
Agent 声明调用外部工具的机制,模型只声明意图,服务器负责实际执行。
The mechanism by which an Agent declares its intent to call an external tool; the model declares, the server executes.
MCP 协议
Model Context Protocol (MCP)
Anthropic 提出的开源标准,定义 AI 模型与外部工具/资源间的统一通信接口。
An open standard by Anthropic defining a unified interface between AI models and external tools/resources — the "USB for AI tools."
MCP Server
MCP Server
实现了 MCP 协议服务端接口的工具/资源提供方,可被任意 MCP 客户端调用。
A tool or resource provider that implements the MCP server interface, callable by any MCP-compatible client.
MCP Client
MCP Client
调用 MCP Server 工具的一方,通常是 Agent 运行时或模型接入层。
The caller of MCP Server tools, typically the Agent runtime or model integration layer.
幂等性
Idempotency
同一操作执行多次与执行一次结果相同,是写操作安全重试的核心保障。
The property that executing the same operation multiple times produces the same result as once; critical for safe retries on write operations.
幂等键
Idempotency Key
每次写操作生成的唯一标识符,服务端用于去重,防止重复执行。
A unique identifier generated per write operation; used by the server to deduplicate and prevent double execution.
指数退避
Exponential Backoff
重试时按指数增长的间隔等待(如 1s、2s、4s),避免持续冲击故障服务。
A retry strategy where wait intervals grow exponentially (e.g., 1s, 2s, 4s) to avoid continuously hammering a failing service.
工具描述
Tool Description
给模型看的工具说明文本,决定模型如何选择和调用工具,是 Agent 准确率的关键。
The natural language description of a tool presented to the model; determines how the model selects and calls the tool — a key driver of Agent accuracy.
权限最小化
Principle of Least Privilege
Agent 只拥有完成当前任务所需的最小权限,限制潜在损害范围。
The Agent is granted only the minimum permissions needed for the current task, limiting the blast radius of any failure or attack.
降级策略
Fallback / Graceful Degradation
工具调用失败时自动切换到备选方案,保障系统基本可用。
Automatically switching to an alternative approach when a tool call fails, maintaining baseline system availability.
结构化输出
Structured Output
模型以规定格式(如 JSON)返回工具调用参数,便于服务器解析和执行。
Model responses in a prescribed format (e.g., JSON) for tool call parameters, enabling server-side parsing and execution.
人在回路
Human-in-the-Loop (HITL)
在 Agent 执行关键操作前插入人工确认步骤,保障高风险操作的安全性。
Inserting a human confirmation step before the Agent executes critical operations, ensuring safety for high-risk actions.
写入操作
Write Operation
会修改系统状态的操作(下单、支付、删除),需要比读操作更严格的权限控制。
Operations that modify system state (ordering, payment, deletion), requiring stricter permission controls than read operations.
工具域
Tool Domain
按业务意图分组的工具集合,同域内工具边界清晰,降低模型选错工具的概率。
A grouping of tools by business intent, with clear intra-domain boundaries to reduce the probability of the model selecting the wrong tool.
副作用
Side Effect
工具调用对系统状态产生的持久改变(如扣款、发消息),需要在工具描述中明确标注。
Persistent changes to system state caused by a tool call (e.g., deducting payment, sending a message); must be explicitly noted in the tool description.
提示词注入
Prompt Injection
攻击者通过精心构造的输入,操控 Agent 执行非预期行为的安全威胁。
A security threat where attackers craft malicious inputs to manipulate the Agent into executing unintended behaviors.
余票查询
Seat / Ticket Availability Query
实时查询交通工具剩余座位或票量的工具调用,出行 Agent 的高频只读操作。
A real-time query for remaining seats or ticket inventory; a high-frequency read-only operation in mobility Agents.
工具合并
Tool Consolidation
将职责重叠的工具合并为边界更清晰的单一工具,降低模型选错的概率。
Merging tools with overlapping responsibilities into a single, more clearly bounded tool to reduce model selection errors.