Agent 和普通 AI 对话
有什么本质区别?
What Is the Fundamental Difference Between an Agent and a Chatbot?
搞清楚这个问题,是理解后面所有内容的前提。一句话说明白:普通 AI 是"无状态的问答机",Agent 是"有目标的自主执行者"。
用户说:"帮我安排下周去上海出差,周一出发,周三回来,要住靠近客户公司的酒店,预算 500 以内。"
User says: "Help me arrange a business trip to Shanghai next week — leave Monday, return Wednesday, hotel near the client's office, budget under ¥500."
问一句,答一句
- 回复一段建议文字,告诉用户"可以查一下某某平台" / Gives a paragraph of advice, tells user to "check some platform"
- 不会主动去查余票、余房 / Won't proactively check availability
- 不记得这次对话里说过的偏好 / Doesn't remember preferences stated in this conversation
- 下一次对话从零开始 / Next conversation starts from scratch
- 用户要做所有实际的操作 / User must do all actual actions themselves
拿到目标,自主执行
- 调用日历工具,确认下周一三的日期 / Calls calendar tool to confirm Monday-Wednesday dates
- 调用地图工具,查询客户公司位置 / Calls map tool to find client office location
- 调用酒店查询工具,筛选 500 以内、附近的房间 / Searches hotels under ¥500 near that location
- 调用票务工具,查询高铁余票 / Checks train ticket availability
- 整合结果,给用户一个可以直接确认的方案 / Presents a ready-to-confirm itinerary
这个对比说明了 Agent 的两个关键特征:第一,它能连续调用多个工具完成复合任务;第二,它在执行过程中保持状态——记住自己做了什么、下一步要做什么。这两点,普通对话 AI 都做不到。
This comparison illustrates two key characteristics of an Agent: first, it can invoke multiple tools in sequence to complete a compound task; second, it maintains state during execution — remembering what it has done and what comes next. Ordinary conversational AI can do neither.
感知层:Agent
怎么理解外部信息?
The Perception Layer: How Does an Agent Make Sense of the Outside World?
感知是 Agent 循环的入口。信息进来的质量,决定了后面推理和行动的质量上限——垃圾进,垃圾出。
某出行 Agent 收到用户的语音消息:"帮我订明天从首都机场飞上海的机票。"语音转文字之后变成了"帮我订明天从首度机场飞上海的机票"——"首都"变成了"首度"。Agent 把这个词传给机场查询工具,工具返回空结果。Agent 以为没有航班,告诉用户"未找到可用航班"。
A travel Agent received a voice message: "Book me a flight from Capital Airport to Shanghai tomorrow." After voice-to-text, it became a misspelling — "Capital" turned into "Captial." The Agent passed this to the airport lookup tool, which returned no results. The Agent concluded there were no flights and told the user: "No available flights found."
问题不在推理,不在工具,只在感知的第一步——输入数据没有经过清洗和校验。
The problem wasn't reasoning. It wasn't the tool. It was the very first step — the input data was never cleaned or validated.
Agent 能感知哪些类型的输入?
What Types of Input Can an Agent Perceive?
文本
Text
用户直接输入的消息、系统传来的日志、API 返回的 JSON、文档内容。最常见,但需要做格式规范化。
User messages, system logs, API JSON responses, document content. Most common type — requires format normalization.
语音
Voice / Audio
经过 ASR(语音识别)转成文字后处理。转写质量直接影响理解准确率——口音、噪声、专有名词都是常见的出错场景。
Processed after ASR conversion to text. Transcription quality directly affects understanding accuracy — accents, noise, and proper nouns are common failure points.
图片 / 文件
Image / File
行程单截图、身份证照片、PDF 合同——多模态模型可以直接理解图片,或先提取文字再处理。
Itinerary screenshots, ID photos, PDF contracts — multimodal models can understand images directly, or extract text first for processing.
结构化数据
Structured Data
数据库查询结果、工具返回的 JSON、用户历史订单——这类数据需要转成模型能理解的格式,不能直接"倒进去"。
Database query results, tool JSON responses, user order history — must be converted into a format the model can understand, not just "dumped in" raw.
感知层最常见的坑
The Most Common Perception Layer Pitfalls
感知层的问题往往最难被察觉,因为系统不会报错——它只是悄悄地拿着错误的输入往下走,最后输出一个莫名其妙的结果,让你以为是推理出了问题。
Perception layer problems are often the hardest to detect, because the system doesn't throw an error — it silently proceeds with incorrect input, eventually producing a baffling output that makes you think the reasoning is broken.
| 常见问题 / Problem | 表现 / Symptom | 应对 / Fix |
|---|---|---|
| 输入格式不规范 Malformed Input |
用户输入了特殊字符、表情符号、混合语言,导致解析失败 | 在输入预处理阶段做格式清洗和规范化 |
| 语音识别错误 ASR Errors |
专有名词、地名、人名被识别错误,整个意图理解跑偏 | 添加行业专属词库,对高频错误词做后处理纠正 |
| 上下文丢失 Context Loss |
多轮对话中,Agent 忘了上一轮说过的内容,用户重复了信息还是没被理解 | 设计合理的上下文管理机制,关键信息要持久化 |
| 工具返回数据未处理 Raw Tool Output |
工具返回了几千字的原始 JSON,直接全部塞给模型,导致关键信息被淹没 | 工具返回结果先做摘要提取,只把关键字段传给模型 |
感知层设计的三个必问问题
- 我们的 Agent 会接收哪些类型的输入?每种输入有没有对应的预处理和校验机制? What types of input will our Agent receive? Does each type have a corresponding preprocessing and validation mechanism?
- 工具调用返回的数据,会直接传给模型吗?还是先经过摘要提取和格式化? Are tool call results passed to the model directly, or do they go through summarization and formatting first?
- 当感知层的输入质量不达标时(比如 ASR 置信度太低),系统有没有降级或要求用户澄清的机制? When perception layer input quality is insufficient (e.g., low ASR confidence), does the system have a fallback or user clarification mechanism?
推理层:Agent
怎么"想清楚"再行动?
The Reasoning Layer: How Does an Agent "Think It Through" Before Acting?
推理是 Agent 的大脑——它决定了 Agent 在拿到信息之后,下一步做什么。这也是 Agent 和普通对话 AI 差异最大的地方。
用户说:"帮我查一下从北京到成都有没有直飞,如果没有,给我推荐一个最短时间的中转方案。"这个任务里藏着一个条件分支——先查,再根据结果决定下一步。不同推理模式处理这个任务的方式,差别很大。
User says: "Check if there's a direct flight from Beijing to Chengdu, and if not, recommend the shortest connection." There's a conditional branch hidden in this task — check first, then decide the next step based on results. Different reasoning modes handle this very differently.
三种主流推理模式
Three Main Reasoning Modes
思维链
Chain of Thought · 逐步推导
让模型把每一步的思考过程写出来,再得出结论。适合逻辑分析、数学推理、规则判断类任务。好处是推理过程可见、可检查;代价是多消耗 token,速度相对慢。
边想边做
Reason + Act · 交替思考与行动
Agent 先想一步,行动一步,拿到结果后再想,再行动——循环推进。适合需要实时信息的任务(查票、查路况),可以根据工具返回的真实数据动态调整计划。这是目前最主流的 Agent 推理模式。
自我复盘
Reflection · 执行后自我评估
Agent 执行完一个任务后,回头审视自己的输出——"我做的对吗?有没有遗漏?"——然后决定要不要修正。适合对质量要求高、可以接受多消耗时间的场景,比如生成重要文件、做复杂决策。
推理层最容易出的问题
Common Reasoning Layer Failures
推理层的问题很有迷惑性——它不像系统报错那样一目了然,而是以"错误但听起来合理的结论"的形式出现。
Reasoning layer failures are deceptive — they don't look like system errors. They appear as "wrong-but-sounds-reasonable conclusions."
推理链断掉
Broken Reasoning Chain
Agent 在推理过程中跳过了某个关键步骤,直接跳到结论。结论可能看起来合理,但推理路径是有洞的。
The Agent skips a critical reasoning step and jumps to a conclusion. The conclusion may look reasonable, but the reasoning path has holes.
过早下结论
Premature Conclusion
没有查询实时数据就直接给出答案。比如没有调用票务工具就说"有余票"——这本质上是幻觉,但披着推理的外衣。
Answers without querying real-time data. Saying "tickets available" without calling the ticketing tool — hallucination dressed up as reasoning.
无限循环
Infinite Loop
Agent 反复尝试同一个工具、得到同样的失败结果、再重试……没有设置退出条件,就会一直循环下去。
The Agent retries the same tool, gets the same failure, retries again... without an exit condition, it loops indefinitely.
目标漂移
Goal Drift
在多步任务中,Agent 在执行过程中"忘了"原来的目标,转而解决了一个相关但不同的问题。
In multi-step tasks, the Agent "forgets" the original goal and ends up solving a related but different problem.
用对了场景,简单的 CoT 比复杂的多 Agent 推理更可靠。
In the right scenario, simple CoT outperforms elaborate multi-agent reasoning in reliability.
推理层设计的三个关键问题
- 你的 Agent 主要处理什么类型的任务——逻辑推导型?实时信息型?质量优先型?选对推理模式比调 Prompt 更重要 What type of tasks does your Agent mainly handle — logic-heavy? real-time info? quality-critical? Choosing the right reasoning mode matters more than prompt tuning.
- 有没有针对"推理过程"做监控?能不能看到 Agent 每步推理的中间结果?(看不见就没法调试) Do you monitor the reasoning process? Can you inspect intermediate results at each reasoning step? (You can't debug what you can't see.)
- 当任务无法完成时,Agent 有没有明确的退出路径,而不是无限重试? When a task can't be completed, does the Agent have a clear exit path rather than infinite retrying?
行动层:Agent 能做什么,
边界在哪里?
The Action Layer: What Can an Agent Do, and Where Are Its Limits?
行动层是 Agent 产生实际影响的地方——它不只是"说话",它会真实地改变外部世界的状态。这也是设计 Agent 时最需要谨慎的地方。
某出行 App 的 Agent 被授权帮用户自动续购会员。用户当月账户余额不足,Agent 判断"用户需要续费",于是自动触发了绑定银行卡的扣款——用户根本不知道发生了什么,收到银行扣款短信才察觉。
A travel app's Agent was authorized to auto-renew memberships. The user had insufficient balance that month. The Agent determined "user needs renewal" and automatically triggered a bank card charge — the user had no idea this happened until they received the bank deduction SMS.
这个案例的问题不是 Agent"做错了"——逻辑上它是在执行授权范围内的动作。问题是:没有在高风险操作前设置人工确认节点。
The problem wasn't that the Agent "did wrong" — logically it was acting within its authorized scope. The problem: there was no human confirmation gate before a high-risk action.
Agent 的行动分三类
Three Categories of Agent Actions
Query
查询 / 读取 · 可自主执行
查票价、查余票、查路况、查天气、查用户偏好——这类操作只读不写,失败了也不会改变世界的状态,可以让 Agent 自主调用,不需要用户确认。
Write
写入 / 修改 · 需要确认
下订单、修改行程、更新用户信息——这类操作会改变数据库状态,执行前需要向用户确认,让用户知道 Agent 要做什么。
Irreversible
不可逆操作 · 必须明确授权
发起支付、退款、删除数据、向外部发消息——这类操作执行后很难甚至无法撤回,必须有明确的用户授权,且要清楚告知后果。
行动权限边界:一张清单
Action Permission Boundaries: A Reference Checklist
| 操作类型 / Action Type | 示例 / Example | 权限要求 / Permission Required | 失败后果 / Failure Impact |
|---|---|---|---|
| 只读查询 Read-Only |
查余票、查路况、查天气 | 可自主执行 | 无副作用,可重试 |
| 状态修改 State Mutation |
修改偏好设置、更新行程备注 | 隐式授权(用户已操作过) | 可覆盖,影响较小 |
| 业务写入 Business Write |
下订单、预订酒店、购票 | 显式确认(用户看到明细后点击确认) | 可申请取消,有退款流程 |
| 资金操作 Financial Action |
支付、退款、充值 | 强制确认 + 金额明示 | 涉及资金,难以逆转 |
| 不可逆操作 Irreversible |
删除账号、清空历史数据 | 二次确认 + 等待期 | 无法恢复,影响严重 |
记忆层:Agent
怎么记住事情?
The Memory Layer: How Does an Agent Remember Things?
记忆是让 Agent "感觉像一个人"的关键——它决定了 Agent 是每次见面都像陌生人,还是真正了解你。
某用户在出行 App 的 Agent 里设置过:"我不喜欢靠近马桶的座位,买票优先选靠窗的。"下次用这个 App 订票,Agent 完全不记得了——用户又得重新说一遍。第三次发生同样的事,用户删了这个 App。
A user told the travel app's Agent: "I don't like seats near the toilet — window seats are my priority." The next time they booked, the Agent had no memory of this — the user had to say it again. When it happened a third time, they deleted the app.
这不是功能问题,是记忆架构问题——用户偏好这类信息,需要存在长期记忆里,而不是只活在当前对话的上下文里。
This isn't a feature problem — it's a memory architecture problem. User preferences need to live in long-term memory, not just the current conversation context.
四种记忆类型
Four Types of Memory
工作记忆(短期)· Working Memory
当前对话的上下文——用户说了什么、Agent 做了什么、工具返回了什么。随着对话结束而消失。类比人类:你现在大脑里正在处理的内容。
情景记忆(历史)· Episodic Memory
过去对话的摘要和关键事件——"用户上周订了去上海的票,投诉过一次座位分配问题"。存在数据库里,需要显式读取。类比人类:你记得上周发生了什么事。
语义记忆(知识库)· Semantic Memory
结构化的知识——产品规则、票务政策、路线数据库、用户偏好设置。这是 Agent 的"百科全书",通常通过 RAG(检索增强生成)调用。
程序记忆(技能)· Procedural Memory
Agent 会怎么用工具、会走哪些流程——这层记忆编码在系统提示(System Prompt)和工具定义里,是 Agent 最稳定的"肌肉记忆"。
终止机制:Agent
什么时候该停下来?
Termination: When Should an Agent Stop?
没有"停止"的 Agent,是危险的 Agent。终止机制听起来简单,但设计不当会导致无限循环、资源耗尽、甚至执行了本不该执行的操作。
某团队的 Agent 负责监控库存并自动补货。某天库存数据接口出了问题,一直返回"查询失败"。Agent 没有设置最大重试次数,于是从下午 2 点开始,一直重试到凌晨 3 点,产生了数万次无效调用,API 账单多了好几千元。
A team's Agent monitored inventory and auto-reordered. One day, the inventory API started returning failures. Without a max retry limit, the Agent started retrying at 2 PM and didn't stop until 3 AM — generating tens of thousands of failed calls and thousands of yuan in unexpected API costs.
终止机制不是可选的,它是 Agent 系统的安全阀。
Termination logic isn't optional — it's the safety valve of any Agent system.
三种必须设计的终止条件
Three Termination Conditions You Must Design
成功完成
Success Condition
Agent 判断任务已经完成,输出最终结果并停止。关键:完成标准要明确定义——"用户满意"不是判断标准,"订单已创建且用户收到确认"才是。
Agent determines task is complete, outputs final result, and stops. Key: completion criteria must be explicitly defined — "user is satisfied" isn't measurable; "order created and confirmation received" is.
异常退出
Error Exit
工具调用连续失败超过 N 次、总执行时间超过阈值、花费的 token 超过预算——触发任何一个,Agent 应该停下来,报告错误,而不是继续重试。
Tool failures exceed N retries, total execution time exceeds threshold, or token budget is exceeded — any one of these should trigger a stop and error report, not continued retrying.
人工介入
Human-in-the-Loop
Agent 判断自己的置信度不够高、遇到了没有预案的边界情况、或者操作风险超过了自主执行的范围——此时应该暂停并请求用户介入,而不是强行执行。
Agent judges its confidence is insufficient, encounters an unhandled edge case, or action risk exceeds autonomous execution scope — should pause and request user intervention, not force execution.
完整循环:
每一环都是潜在故障点
The Full Loop: Every Layer Is a Potential Failure Point
把感知、推理、行动、记忆、终止这五层放在一起看,你会发现 Agent 系统的复杂性:每一层都可能出问题,而且层与层之间的错误会级联放大。
这个循环图有一个重要的隐含信息:任何一层的错误,都不会自动被下一层捕获。感知层拿到了错的数据,推理层会基于错的数据推理,行动层会执行一个建立在错误推理上的操作——错误级联放大。
This loop diagram contains an important implication: an error at any layer is not automatically caught by the next layer. If perception receives bad data, reasoning reasons on bad data, and action executes on bad reasoning — errors cascade and amplify.
这就是为什么 Agent 系统需要在每一层都设计防护措施——而不是只在最后的输出端做一次兜底校验。好的防护架构是:在错误发生的那一层就发现并处理它,不让它传到下一层。
This is why Agent systems need defensive measures at every layer — not just a single final output check. Good defensive architecture: catch and handle errors at the layer where they occur, don't let them propagate downstream.
Agent 不是"更聪明的聊天机器人"——它是一个有感知、有推理、有行动、有记忆、有终止的完整自主系统。理解这五层,是设计可靠 Agent 产品的基础。
An Agent isn't a "smarter chatbot" — it's a complete autonomous system with perception, reasoning, action, memory, and termination. Understanding these five layers is the foundation for designing reliable Agent products.
每一层都有自己的设计要点和失败模式。作为产品负责人,你不需要能写代码,但你需要能在 review 设计方案时,逐层问出正确的问题——感知层的输入有没有校验?推理层有没有退出条件?行动层的权限边界划了吗?记忆里存了什么、什么时候取?
Each layer has its own design requirements and failure modes. As a product owner, you don't need to write code — but you do need to ask the right questions during design review: Is perception input validated? Does reasoning have exit conditions? Are action permission boundaries defined? What's stored in memory and when is it retrieved?
能问出这些问题,你就已经站在了大多数产品负责人前面。
Being able to ask these questions already puts you ahead of most product managers in this space.
📖 中英词汇对照表
Glossary of Key Terms · Agent Architecture / AI / Travel Industry