Agent 本质理解 · 感知·推理·行动·记忆

00 · 核心差异 / The Core Difference

Agent 和普通 AI 对话
有什么本质区别？

What Is the Fundamental Difference Between an Agent and a Chatbot?

搞清楚这个问题，是理解后面所有内容的前提。一句话说明白：普通 AI 是"无状态的问答机"，Agent 是"有目标的自主执行者"。

Understanding this is the prerequisite for everything that follows. In one sentence: ordinary AI is a "stateless question-answering machine"; an Agent is an "autonomous executor with a goal."

🧳 同一个用户请求，两种完全不同的处理方式 / The Same Request, Two Completely Different Approaches

用户说："帮我安排下周去上海出差，周一出发，周三回来，要住靠近客户公司的酒店，预算 500 以内。"

User says: "Help me arrange a business trip to Shanghai next week — leave Monday, return Wednesday, hotel near the client's office, budget under ¥500."

× 普通 AI 对话 / Ordinary AI Chatbot

问一句，答一句

回复一段建议文字，告诉用户"可以查一下某某平台" / Gives a paragraph of advice, tells user to "check some platform"
不会主动去查余票、余房 / Won't proactively check availability
不记得这次对话里说过的偏好 / Doesn't remember preferences stated in this conversation
下一次对话从零开始 / Next conversation starts from scratch
用户要做所有实际的操作 / User must do all actual actions themselves

✓ Agent / Autonomous Agent

拿到目标，自主执行

调用日历工具，确认下周一三的日期 / Calls calendar tool to confirm Monday-Wednesday dates
调用地图工具，查询客户公司位置 / Calls map tool to find client office location
调用酒店查询工具，筛选 500 以内、附近的房间 / Searches hotels under ¥500 near that location
调用票务工具，查询高铁余票 / Checks train ticket availability
整合结果，给用户一个可以直接确认的方案 / Presents a ready-to-confirm itinerary

这个对比说明了 Agent 的两个关键特征：第一，它能连续调用多个工具完成复合任务；第二，它在执行过程中保持状态——记住自己做了什么、下一步要做什么。这两点，普通对话 AI 都做不到。

This comparison illustrates two key characteristics of an Agent: first, it can invoke multiple tools in sequence to complete a compound task; second, it maintains state during execution — remembering what it has done and what comes next. Ordinary conversational AI can do neither.

Agent 核心循环 vs 普通 AI 问答 / Agent Core Loop vs. Simple Q&A

01 · 感知层 / Perception Layer

感知层：Agent
怎么理解外部信息？

The Perception Layer: How Does an Agent Make Sense of the Outside World?

感知是 Agent 循环的入口。信息进来的质量，决定了后面推理和行动的质量上限——垃圾进，垃圾出。

Perception is the entry point of the Agent loop. The quality of information coming in sets the ceiling for reasoning and action quality downstream — garbage in, garbage out.

🎯 感知出了问题，后面全崩 / When Perception Fails, Everything Downstream Fails

某出行 Agent 收到用户的语音消息："帮我订明天从首都机场飞上海的机票。"语音转文字之后变成了"帮我订明天从首度机场飞上海的机票"——"首都"变成了"首度"。Agent 把这个词传给机场查询工具，工具返回空结果。Agent 以为没有航班，告诉用户"未找到可用航班"。

A travel Agent received a voice message: "Book me a flight from Capital Airport to Shanghai tomorrow." After voice-to-text, it became a misspelling — "Capital" turned into "Captial." The Agent passed this to the airport lookup tool, which returned no results. The Agent concluded there were no flights and told the user: "No available flights found."

问题不在推理，不在工具，只在感知的第一步——输入数据没有经过清洗和校验。

The problem wasn't reasoning. It wasn't the tool. It was the very first step — the input data was never cleaned or validated.

Agent 能感知哪些类型的输入？

What Types of Input Can an Agent Perceive?

输入类型 · 01

文本
Text

用户直接输入的消息、系统传来的日志、API 返回的 JSON、文档内容。最常见，但需要做格式规范化。

User messages, system logs, API JSON responses, document content. Most common type — requires format normalization.

输入类型 · 02

语音
Voice / Audio

经过 ASR（语音识别）转成文字后处理。转写质量直接影响理解准确率——口音、噪声、专有名词都是常见的出错场景。

Processed after ASR conversion to text. Transcription quality directly affects understanding accuracy — accents, noise, and proper nouns are common failure points.

输入类型 · 03

图片 / 文件
Image / File

行程单截图、身份证照片、PDF 合同——多模态模型可以直接理解图片，或先提取文字再处理。

Itinerary screenshots, ID photos, PDF contracts — multimodal models can understand images directly, or extract text first for processing.

输入类型 · 04

结构化数据
Structured Data

数据库查询结果、工具返回的 JSON、用户历史订单——这类数据需要转成模型能理解的格式，不能直接"倒进去"。

Database query results, tool JSON responses, user order history — must be converted into a format the model can understand, not just "dumped in" raw.

感知层最常见的坑

The Most Common Perception Layer Pitfalls

感知层的问题往往最难被察觉，因为系统不会报错——它只是悄悄地拿着错误的输入往下走，最后输出一个莫名其妙的结果，让你以为是推理出了问题。

Perception layer problems are often the hardest to detect, because the system doesn't throw an error — it silently proceeds with incorrect input, eventually producing a baffling output that makes you think the reasoning is broken.

常见问题 / Problem	表现 / Symptom	应对 / Fix
输入格式不规范 Malformed Input	用户输入了特殊字符、表情符号、混合语言，导致解析失败	在输入预处理阶段做格式清洗和规范化
语音识别错误 ASR Errors	专有名词、地名、人名被识别错误，整个意图理解跑偏	添加行业专属词库，对高频错误词做后处理纠正
上下文丢失 Context Loss	多轮对话中，Agent 忘了上一轮说过的内容，用户重复了信息还是没被理解	设计合理的上下文管理机制，关键信息要持久化
工具返回数据未处理 Raw Tool Output	工具返回了几千字的原始 JSON，直接全部塞给模型，导致关键信息被淹没	工具返回结果先做摘要提取，只把关键字段传给模型

产品负责人的决策清单 / Product Owner Checklist

感知层设计的三个必问问题

我们的 Agent 会接收哪些类型的输入？每种输入有没有对应的预处理和校验机制？ What types of input will our Agent receive? Does each type have a corresponding preprocessing and validation mechanism?
工具调用返回的数据，会直接传给模型吗？还是先经过摘要提取和格式化？ Are tool call results passed to the model directly, or do they go through summarization and formatting first?
当感知层的输入质量不达标时（比如 ASR 置信度太低），系统有没有降级或要求用户澄清的机制？ When perception layer input quality is insufficient (e.g., low ASR confidence), does the system have a fallback or user clarification mechanism?

02 · 推理层 / Reasoning Layer

推理层：Agent
怎么"想清楚"再行动？

The Reasoning Layer: How Does an Agent "Think It Through" Before Acting?

推理是 Agent 的大脑——它决定了 Agent 在拿到信息之后，下一步做什么。这也是 Agent 和普通对话 AI 差异最大的地方。

Reasoning is the Agent's brain — it determines what the Agent does next after receiving information. This is also where Agents differ most dramatically from ordinary conversational AI.

🧠 同一个任务，三种不同的"想法" / The Same Task, Three Different Thinking Approaches

用户说："帮我查一下从北京到成都有没有直飞，如果没有，给我推荐一个最短时间的中转方案。"这个任务里藏着一个条件分支——先查，再根据结果决定下一步。不同推理模式处理这个任务的方式，差别很大。

User says: "Check if there's a direct flight from Beijing to Chengdu, and if not, recommend the shortest connection." There's a conditional branch hidden in this task — check first, then decide the next step based on results. Different reasoning modes handle this very differently.

三种主流推理模式

Three Main Reasoning Modes

CoT
思维链

Chain of Thought · 逐步推导

让模型把每一步的思考过程写出来，再得出结论。适合逻辑分析、数学推理、规则判断类任务。好处是推理过程可见、可检查；代价是多消耗 token，速度相对慢。

Forces the model to write out each reasoning step before reaching a conclusion. Best for logic analysis, math reasoning, rule-based judgment. Advantage: reasoning is visible and checkable. Cost: more tokens consumed, slower.

ReAct
边想边做

Reason + Act · 交替思考与行动

Agent 先想一步，行动一步，拿到结果后再想，再行动——循环推进。适合需要实时信息的任务（查票、查路况），可以根据工具返回的真实数据动态调整计划。这是目前最主流的 Agent 推理模式。

Agent thinks one step, acts one step, gets a result, thinks again, acts again — progressively advancing. Best for tasks needing real-time information (ticket lookup, traffic). Can dynamically adjust plans based on actual tool results. Currently the most mainstream Agent reasoning mode.

Reflection
自我复盘

Reflection · 执行后自我评估

Agent 执行完一个任务后，回头审视自己的输出——"我做的对吗？有没有遗漏？"——然后决定要不要修正。适合对质量要求高、可以接受多消耗时间的场景，比如生成重要文件、做复杂决策。

After executing a task, the Agent reviews its own output — "Did I get this right? Did I miss anything?" — and decides whether to revise. Best for quality-critical, latency-tolerant scenarios like generating important documents or complex decisions.

推理层最容易出的问题

Common Reasoning Layer Failures

推理层的问题很有迷惑性——它不像系统报错那样一目了然，而是以"错误但听起来合理的结论"的形式出现。

Reasoning layer failures are deceptive — they don't look like system errors. They appear as "wrong-but-sounds-reasonable conclusions."

推理陷阱 · 01

推理链断掉
Broken Reasoning Chain

Agent 在推理过程中跳过了某个关键步骤，直接跳到结论。结论可能看起来合理，但推理路径是有洞的。

The Agent skips a critical reasoning step and jumps to a conclusion. The conclusion may look reasonable, but the reasoning path has holes.

推理陷阱 · 02

过早下结论
Premature Conclusion

没有查询实时数据就直接给出答案。比如没有调用票务工具就说"有余票"——这本质上是幻觉，但披着推理的外衣。

Answers without querying real-time data. Saying "tickets available" without calling the ticketing tool — hallucination dressed up as reasoning.

推理陷阱 · 03

无限循环
Infinite Loop

Agent 反复尝试同一个工具、得到同样的失败结果、再重试……没有设置退出条件，就会一直循环下去。

The Agent retries the same tool, gets the same failure, retries again... without an exit condition, it loops indefinitely.

推理陷阱 · 04

目标漂移
Goal Drift

在多步任务中，Agent 在执行过程中"忘了"原来的目标，转而解决了一个相关但不同的问题。

In multi-step tasks, the Agent "forgets" the original goal and ends up solving a related but different problem.

推理模式不是越复杂越好。
用对了场景，简单的 CoT 比复杂的多 Agent 推理更可靠。

More complex reasoning modes aren't always better.
In the right scenario, simple CoT outperforms elaborate multi-agent reasoning in reliability.

产品负责人的决策清单 / Product Owner Checklist

推理层设计的三个关键问题

你的 Agent 主要处理什么类型的任务——逻辑推导型？实时信息型？质量优先型？选对推理模式比调 Prompt 更重要 What type of tasks does your Agent mainly handle — logic-heavy? real-time info? quality-critical? Choosing the right reasoning mode matters more than prompt tuning.
有没有针对"推理过程"做监控？能不能看到 Agent 每步推理的中间结果？（看不见就没法调试） Do you monitor the reasoning process? Can you inspect intermediate results at each reasoning step? (You can't debug what you can't see.)
当任务无法完成时，Agent 有没有明确的退出路径，而不是无限重试？ When a task can't be completed, does the Agent have a clear exit path rather than infinite retrying?

03 · 行动层 / Action Layer

行动层：Agent 能做什么，
边界在哪里？

The Action Layer: What Can an Agent Do, and Where Are Its Limits?

行动层是 Agent 产生实际影响的地方——它不只是"说话"，它会真实地改变外部世界的状态。这也是设计 Agent 时最需要谨慎的地方。

The action layer is where an Agent produces real impact — it doesn't just "talk," it actually changes the state of the external world. This is also the area requiring the most careful design.

💳 一次"聪明"的自动操作，差点出大事 / A "Smart" Automated Action That Almost Caused a Disaster

某出行 App 的 Agent 被授权帮用户自动续购会员。用户当月账户余额不足，Agent 判断"用户需要续费"，于是自动触发了绑定银行卡的扣款——用户根本不知道发生了什么，收到银行扣款短信才察觉。

A travel app's Agent was authorized to auto-renew memberships. The user had insufficient balance that month. The Agent determined "user needs renewal" and automatically triggered a bank card charge — the user had no idea this happened until they received the bank deduction SMS.

这个案例的问题不是 Agent"做错了"——逻辑上它是在执行授权范围内的动作。问题是：没有在高风险操作前设置人工确认节点。

The problem wasn't that the Agent "did wrong" — logically it was acting within its authorized scope. The problem: there was no human confirmation gate before a high-risk action.

Agent 的行动分三类

Three Categories of Agent Actions

查询类
Query

查询 / 读取 · 可自主执行

查票价、查余票、查路况、查天气、查用户偏好——这类操作只读不写，失败了也不会改变世界的状态，可以让 Agent 自主调用，不需要用户确认。

Checking prices, availability, traffic, weather, user preferences — read-only operations. Failure doesn't change the state of the world. Can be called autonomously without user confirmation.

写入类
Write

写入 / 修改 · 需要确认

下订单、修改行程、更新用户信息——这类操作会改变数据库状态，执行前需要向用户确认，让用户知道 Agent 要做什么。

Placing orders, modifying itineraries, updating user profiles — these change database state. Require user confirmation before execution — the user must know what the Agent is about to do.

不可逆
Irreversible

不可逆操作 · 必须明确授权

发起支付、退款、删除数据、向外部发消息——这类操作执行后很难甚至无法撤回，必须有明确的用户授权，且要清楚告知后果。

Triggering payment, issuing refunds, deleting data, sending messages externally — difficult or impossible to reverse. Must have explicit user authorization, with clear disclosure of consequences.

行动权限边界：一张清单

Action Permission Boundaries: A Reference Checklist

操作类型 / Action Type	示例 / Example	权限要求 / Permission Required	失败后果 / Failure Impact
只读查询 Read-Only	查余票、查路况、查天气	可自主执行	无副作用，可重试
状态修改 State Mutation	修改偏好设置、更新行程备注	隐式授权（用户已操作过）	可覆盖，影响较小
业务写入 Business Write	下订单、预订酒店、购票	显式确认（用户看到明细后点击确认）	可申请取消，有退款流程
资金操作 Financial Action	支付、退款、充值	强制确认 + 金额明示	涉及资金，难以逆转
不可逆操作 Irreversible	删除账号、清空历史数据	二次确认 + 等待期	无法恢复，影响严重

    一条必须牢记的设计原则：Agent 的自主程度，和操作的可逆性成反比。操作越不可逆，需要的人工确认越多；操作越可逆，Agent 可以越自主。
  

    A design principle to memorize: The more autonomous the Agent, the more reversible the actions must be. The less reversible an action, the more human confirmation gates are required.
  

04 · 记忆层 / Memory Layer

记忆层：Agent
怎么记住事情？

The Memory Layer: How Does an Agent Remember Things?

记忆是让 Agent "感觉像一个人"的关键——它决定了 Agent 是每次见面都像陌生人，还是真正了解你。

Memory is what makes an Agent feel like a real person — it determines whether the Agent treats you like a stranger every time, or actually knows you.

🧳 用户讨厌的事：每次都要重新介绍自己 / What Users Hate: Re-Introducing Themselves Every Time

某用户在出行 App 的 Agent 里设置过："我不喜欢靠近马桶的座位，买票优先选靠窗的。"下次用这个 App 订票，Agent 完全不记得了——用户又得重新说一遍。第三次发生同样的事，用户删了这个 App。

A user told the travel app's Agent: "I don't like seats near the toilet — window seats are my priority." The next time they booked, the Agent had no memory of this — the user had to say it again. When it happened a third time, they deleted the app.

这不是功能问题，是记忆架构问题——用户偏好这类信息，需要存在长期记忆里，而不是只活在当前对话的上下文里。

This isn't a feature problem — it's a memory architecture problem. User preferences need to live in long-term memory, not just the current conversation context.

四种记忆类型

Four Types of Memory

①

工作记忆（短期）· Working Memory

当前对话的上下文——用户说了什么、Agent 做了什么、工具返回了什么。随着对话结束而消失。类比人类：你现在大脑里正在处理的内容。

Current conversation context — what the user said, what the Agent did, what tools returned. Disappears when the conversation ends. Human analogy: what's actively in your working brain right now.

②

情景记忆（历史）· Episodic Memory

过去对话的摘要和关键事件——"用户上周订了去上海的票，投诉过一次座位分配问题"。存在数据库里，需要显式读取。类比人类：你记得上周发生了什么事。

Summaries and key events from past conversations — "User booked a Shanghai ticket last week, filed a complaint about seat assignment." Stored in a database, requires explicit retrieval. Human analogy: remembering what happened last week.

③

语义记忆（知识库）· Semantic Memory

结构化的知识——产品规则、票务政策、路线数据库、用户偏好设置。这是 Agent 的"百科全书"，通常通过 RAG（检索增强生成）调用。

Structured knowledge — product rules, ticketing policies, route databases, user preference settings. The Agent's "encyclopedia," typically accessed via RAG (Retrieval-Augmented Generation).

④

程序记忆（技能）· Procedural Memory

Agent 会怎么用工具、会走哪些流程——这层记忆编码在系统提示（System Prompt）和工具定义里，是 Agent 最稳定的"肌肉记忆"。

How the Agent uses tools and navigates workflows — encoded in the System Prompt and tool definitions. The Agent's most stable "muscle memory."

四种记忆类型的存储位置与读取方式 / Storage Location and Retrieval Method for Each Memory Type

    产品设计的核心问题：用户说的哪些话值得被"记住"？记在哪里？什么时候触发召回？这三个问题决定了 Agent 是否具备真正的"个性化"能力。
  

    The core product design question: Which of the user's statements are worth "remembering"? Where should they be stored? When should they be retrieved? These three questions determine whether your Agent is capable of true personalization.
  

05 · 终止机制 / Termination

终止机制：Agent
什么时候该停下来？

Termination: When Should an Agent Stop?

没有"停止"的 Agent，是危险的 Agent。终止机制听起来简单，但设计不当会导致无限循环、资源耗尽、甚至执行了本不该执行的操作。

An Agent without a "stop" mechanism is a dangerous Agent. Termination logic sounds simple, but poor design leads to infinite loops, resource exhaustion, and execution of unintended actions.

🔁 一个在凌晨 3 点还在转的 Agent / An Agent Still Spinning at 3 AM

某团队的 Agent 负责监控库存并自动补货。某天库存数据接口出了问题，一直返回"查询失败"。Agent 没有设置最大重试次数，于是从下午 2 点开始，一直重试到凌晨 3 点，产生了数万次无效调用，API 账单多了好几千元。

A team's Agent monitored inventory and auto-reordered. One day, the inventory API started returning failures. Without a max retry limit, the Agent started retrying at 2 PM and didn't stop until 3 AM — generating tens of thousands of failed calls and thousands of yuan in unexpected API costs.

终止机制不是可选的，它是 Agent 系统的安全阀。

Termination logic isn't optional — it's the safety valve of any Agent system.

三种必须设计的终止条件

Three Termination Conditions You Must Design

终止类型 · 01

成功完成
Success Condition

Agent 判断任务已经完成，输出最终结果并停止。关键：完成标准要明确定义——"用户满意"不是判断标准，"订单已创建且用户收到确认"才是。

Agent determines task is complete, outputs final result, and stops. Key: completion criteria must be explicitly defined — "user is satisfied" isn't measurable; "order created and confirmation received" is.

终止类型 · 02

异常退出
Error Exit

工具调用连续失败超过 N 次、总执行时间超过阈值、花费的 token 超过预算——触发任何一个，Agent 应该停下来，报告错误，而不是继续重试。

Tool failures exceed N retries, total execution time exceeds threshold, or token budget is exceeded — any one of these should trigger a stop and error report, not continued retrying.

终止类型 · 03

人工介入
Human-in-the-Loop

Agent 判断自己的置信度不够高、遇到了没有预案的边界情况、或者操作风险超过了自主执行的范围——此时应该暂停并请求用户介入，而不是强行执行。

Agent judges its confidence is insufficient, encounters an unhandled edge case, or action risk exceeds autonomous execution scope — should pause and request user intervention, not force execution.

    设计原则：失败要快，反馈要清。Agent 在任何一个终止条件触发时，都要给用户一个清晰的状态说明——不是"出错了"，而是"任务在第几步遇到了什么问题，你现在可以做什么"。
  

    Design principle: Fail fast, communicate clearly. When any termination condition triggers, give the user a clear status update — not "an error occurred," but "the task failed at step N because of X, and here's what you can do now."
  

06 · 全局视角 / The Complete Loop

完整循环：
每一环都是潜在故障点

The Full Loop: Every Layer Is a Potential Failure Point

把感知、推理、行动、记忆、终止这五层放在一起看，你会发现 Agent 系统的复杂性：每一层都可能出问题，而且层与层之间的错误会级联放大。

Viewing perception, reasoning, action, memory, and termination together reveals the true complexity of Agent systems: every layer can fail, and failures cascade and amplify across layers.

Agent 完整循环与各层故障点 / Agent Full Loop with Failure Points at Each Layer

这个循环图有一个重要的隐含信息：任何一层的错误，都不会自动被下一层捕获。感知层拿到了错的数据，推理层会基于错的数据推理，行动层会执行一个建立在错误推理上的操作——错误级联放大。

This loop diagram contains an important implication: an error at any layer is not automatically caught by the next layer. If perception receives bad data, reasoning reasons on bad data, and action executes on bad reasoning — errors cascade and amplify.

这就是为什么 Agent 系统需要在每一层都设计防护措施——而不是只在最后的输出端做一次兜底校验。好的防护架构是：在错误发生的那一层就发现并处理它，不让它传到下一层。

This is why Agent systems need defensive measures at every layer — not just a single final output check. Good defensive architecture: catch and handle errors at the layer where they occur, don't let them propagate downstream.

ARCHITECT NOTE · 本文核心结论 / Chapter Takeaways

Agent 不是"更聪明的聊天机器人"——它是一个有感知、有推理、有行动、有记忆、有终止的完整自主系统。理解这五层，是设计可靠 Agent 产品的基础。

An Agent isn't a "smarter chatbot" — it's a complete autonomous system with perception, reasoning, action, memory, and termination. Understanding these five layers is the foundation for designing reliable Agent products.

每一层都有自己的设计要点和失败模式。作为产品负责人，你不需要能写代码，但你需要能在 review 设计方案时，逐层问出正确的问题——感知层的输入有没有校验？推理层有没有退出条件？行动层的权限边界划了吗？记忆里存了什么、什么时候取？

Each layer has its own design requirements and failure modes. As a product owner, you don't need to write code — but you do need to ask the right questions during design review: Is perception input validated? Does reasoning have exit conditions? Are action permission boundaries defined? What's stored in memory and when is it retrieved?

能问出这些问题，你就已经站在了大多数产品负责人前面。

Being able to ask these questions already puts you ahead of most product managers in this space.

📖 中英词汇对照表

Glossary of Key Terms · Agent Architecture / AI / Travel Industry

智能体

Agent

能感知环境、自主决策并执行行动来完成目标任务的 AI 系统，区别于只做单轮问答的对话 AI。

An AI system that perceives its environment, makes autonomous decisions, and executes actions to complete goal-directed tasks — distinct from single-turn conversational AI.

感知层

Perception Layer

Agent 接收并处理外部信息的环节，包括文本、语音、图片、结构化数据等各类输入的预处理与校验。

The layer where an Agent receives and processes external information, including preprocessing and validation of text, voice, images, and structured data.

推理层

Reasoning Layer

Agent 基于感知到的信息，决定下一步行动的"大脑"——包括 CoT、ReAct、Reflection 等主流推理模式。

The "brain" where the Agent decides what to do next based on perceived information — including CoT, ReAct, Reflection, and other reasoning modes.

思维链

Chain of Thought (CoT)

让模型逐步写出推理过程再得出结论的技术，可提高逻辑推理任务的准确率，但会消耗更多 token。

A technique prompting the model to write out its reasoning step-by-step before concluding. Improves accuracy on logical reasoning tasks at the cost of more tokens.

ReAct 推理模式

ReAct (Reason + Act)

交替进行推理和行动的模式：想一步、做一步、看结果、再想下一步。目前 Agent 系统最主流的推理范式。

Alternating reasoning and action: think one step, act, observe result, think again. Currently the most mainstream reasoning paradigm for Agent systems.

工具调用

Function Calling / Tool Use

模型识别到需要外部能力时，生成结构化指令调用外部工具（如 API、数据库）并将结果返回给模型继续处理。

When the model detects a need for external capability, it generates structured call instructions; the system executes the tool and returns results for the model to continue.

工作记忆

Working Memory

Agent 当前对话中的上下文信息，存储在上下文窗口内，对话结束后消失。相当于人类的"短期工作记忆"。

The context of the Agent's current conversation, stored in the context window, lost when the conversation ends. Analogous to human short-term working memory.

情景记忆

Episodic Memory

存储在数据库中的历史对话摘要和关键事件，需要显式召回。让 Agent 能"记住"跨对话的用户行为。

Summaries and key events from past conversations, stored in a database and retrieved explicitly. Enables the Agent to "remember" user behavior across sessions.

语义记忆

Semantic Memory

Agent 的"知识库"——产品规则、票务政策、路线数据等结构化知识，通常通过 RAG 检索后注入上下文。

The Agent's "knowledge base" — structured knowledge like product rules, ticketing policies, and route data, typically injected via RAG retrieval.

检索增强生成

Retrieval-Augmented Generation (RAG)

在模型回答前先从知识库检索相关内容，将其作为上下文传入，使模型能基于真实资料作答，减少幻觉。

Before answering, retrieves relevant content from a knowledge base and injects it as context — enabling the model to answer from real data and reducing hallucination.

幂等性

Idempotency

同一操作执行多次，结果与执行一次相同。Agent 重试机制的基础设计要求，防止支付等操作被重复执行。

Executing the same operation multiple times produces the same result as executing it once. A fundamental design requirement for Agent retry logic — prevents duplicate payments and similar issues.

人在回路

Human-in-the-Loop (HITL)

在 Agent 自动执行流程的关键节点插入人工确认步骤，用于高风险操作或置信度不足的场景。

Inserting human confirmation steps at critical points in an Agent's automated workflow — used for high-risk actions or when the Agent's confidence is insufficient.

系统提示

System Prompt

在对话开始前注入的指令，设定 Agent 的角色、行为规则、工具使用方式，是 Agent "程序记忆"的核心载体。

Instructions injected before the conversation begins, setting the Agent's role, behavior rules, and tool usage — the core carrier of the Agent's "procedural memory."

上下文窗口

Context Window

模型一次能处理的最大文本量（以 token 计），超出则需裁剪或用外部记忆补充。

The maximum amount of text a model can process at once (measured in tokens). Excess must be truncated or supplemented with external memory.

行程规划

Itinerary Planning

根据用户出发地、目的地、时间、预算等约束，综合安排交通、住宿、活动的完整出行方案。

Comprehensive arrangement of transportation, accommodation, and activities based on user constraints: origin, destination, timing, and budget.

退改签

Ticket Change / Cancellation / Refund

对已购车票或机票进行日期变更、取消或申请退款的操作，属于 Agent 行动层的高风险写入操作类别。

Date change, cancellation, or refund application for purchased tickets — classified as high-risk write operations in the Agent action layer.

语音识别

Automatic Speech Recognition (ASR)

将音频信号转换为文字的技术，是 Agent 语音输入感知的第一步，识别错误会影响后续全链路质量。

Technology converting audio signals to text — the first step in Agent voice input perception. Recognition errors affect quality across the entire downstream pipeline.

目标漂移

Goal Drift

Agent 在执行多步任务过程中偏离原始目标，转而完成相关但不同问题的推理失败模式。

A reasoning failure mode where the Agent deviates from the original goal during multi-step execution and ends up solving a related but different problem.

Agent 和普通 AI 对话有什么本质区别？

问一句，答一句

拿到目标，自主执行

感知层：Agent怎么理解外部信息？

文本Text

语音Voice / Audio

图片 / 文件Image / File

结构化数据Structured Data