AI 为什么会
"一本正经地说错话"?
Why Does AI State Incorrect Information with Absolute Confidence?
幻觉(Hallucination)是 AI 领域里听起来玄乎、但其实原理很好理解的概念。搞清楚它怎么产生的,是设计 Agent 系统的第一步。
有人问 ChatGPT:"北京到上海的高铁 G1 次,2024年6月1日还有没有二等座?" AI 给出了非常具体的回答:有余票,价格 553 元,16:28 发车,20:56 到达。
Someone asked ChatGPT: "Are there second-class seats available on the G1 high-speed train from Beijing to Shanghai on June 1, 2024?" The AI gave a very specific answer: tickets available, price ¥553, departure at 16:28, arrival at 20:56.
听起来非常权威。但这些信息是假的——AI 根本无法查询实时余票,它只是生成了一段"看起来像是真实车次信息"的内容。
But all of this was fabricated. The AI cannot check real-time ticket availability — it simply generated content that looked like real train schedule information.
模型的本质是什么?
What Is a Language Model, Fundamentally?
理解幻觉,要先理解模型在做什么。大模型的工作原理,用一句话解释就是:根据前面所有的内容,预测"接下来最可能出现的词"是什么。它不是在查资料,不是在做逻辑推导,也不是在"思考"——它是在做一个极其复杂的概率计算。
To understand hallucination, you first need to understand what the model is doing. In one sentence: a large language model predicts "the most likely next word" based on everything that came before it. It is not looking up information, not doing logical reasoning, and not "thinking" — it is performing an extraordinarily complex probability calculation.
给定"北京到上海高铁G1次,2024年"这串词,接下来最可能跟的是什么?模型学过大量车票信息相关的文本,所以它知道车次信息应该包含价格、发车时间、到达时间……于是它生成了这些——哪怕它并不知道 2024 年 6 月 1 日那天的真实情况。
Given the phrase "G1 high-speed train from Beijing to Shanghai, 2024," what is most likely to follow? The model has been trained on large amounts of ticket-related text, so it knows that train information should include price, departure time, arrival time... and so it generates exactly that — even though it has no idea what the actual situation was on June 1, 2024.
幻觉为什么这么难发现?
Why Is Hallucination So Hard to Detect?
普通的错误,错得很明显——比如把"北京"打成"北蟹",一眼就看出来了。但幻觉不一样:模型生成的错误内容,往往格式正确、语言流畅、语气笃定,和正确答案放在一起根本看不出区别。
Ordinary errors are obvious — like typing "Bejing" instead of "Beijing," you notice it immediately. But hallucination is different: the incorrect content generated by the model is typically well-formatted, fluently written, and stated with complete confidence. Placed next to a correct answer, it looks indistinguishable.
更危险的是:模型没有"我不确定"的内置自觉。它不会在错误答案前加上"我不太确定,但是……"——它会用和说出正确答案时完全一样的语气,给出错误的信息。
What makes this even more dangerous: the model has no built-in sense of uncertainty. It won't preface a wrong answer with "I'm not sure, but..." — it delivers incorrect information in exactly the same confident tone it uses for correct information.
这两件事看起来像,但本质上完全不同。
These may look the same from the outside, but they are fundamentally different operations.
幻觉有哪几种?
Types of Hallucination
不是所有"说错了"都是同一种问题,区分清楚才知道怎么应对:
Not all "wrong answers" are the same type of problem. Identifying the category helps you choose the right countermeasure:
| 幻觉类型 / Type | 表现 / What It Looks Like | 例子 / Example | 风险 / Risk |
|---|---|---|---|
| 事实性幻觉 Factual Hallucination |
编造了不存在的事实 | 说某篇论文存在,但实际上根本没有这篇论文 | 高 |
| 时效性错误 Temporal Error |
给出的信息是过时的 | 票价、法规条文、人事变动——训练截止后的内容不知道 | 高 |
| 精度幻觉 Precision Hallucination |
数字算错,或给出假精确数字 | 说"这条路线全程 312.7 公里",但这个数字是捏造的 | 高 |
| 逻辑跳跃 Reasoning Gap |
推理过程有漏洞,但结论听起来合理 | 推荐路线时跳过了某个重要的换乘节点 | 中 |
| 过度自信 Overconfidence |
对不确定的内容给出了确定的答案 | "这个限号规定每天都执行"——但实际有例外情况 | 中 |
| 格式幻觉 Format Hallucination |
内容对,但格式是捏造的 | 要求输出 JSON,但某个字段名和约定的不一致 | 低 |
在开始设计 Agent 之前,先问自己这几个问题
- 我的 Agent 需要回答有确定正确答案的问题吗?(比如票价、法规、库存数量)——如果是,必须接入实时数据源,不能依赖模型"背诵" Does my Agent need to answer questions with definitive correct answers (e.g., fares, regulations, inventory)? If so, it must connect to real-time data sources — not rely on the model's "memory."
- 用户会把 Agent 的回答当作可以直接行动的依据吗?——如果是,需要在高风险输出前加人工确认或数据验证 Will users treat the Agent's answers as direct basis for action? If so, add human confirmation or data validation before high-risk outputs.
- 如果 Agent 说错了,后果有多严重?——后果越严重,需要的护栏越多 If the Agent is wrong, how severe are the consequences? The more severe, the more guardrails you need.
模型在哪些地方
最容易出错?
Where Are Models Most Likely to Fail?
每个模型都有自己不擅长的事。不了解这些边界,你就会在错误的地方信任它,然后在出问题时感到莫名其妙。
问 AI:"如果我 2024年3月15日购买了一张6个月有效期的火车票,有效期截止到哪一天?"大多数模型会说:2024年9月15日。听起来对,对吧?但等等——这是需要数日期的计算,包含月份的天数差异,还要考虑"6个月后"是否刚好落在那一天……
Ask an AI: "If I bought a train ticket with a 6-month validity period on March 15, 2024, what is the expiration date?" Most models will say: September 15, 2024. Sounds right, doesn't it? But wait — this requires counting actual calendar days, accounting for month lengths, and determining whether "6 months later" lands precisely on that date...
这种"看起来简单但其实需要精确推算"的问题,是模型最常翻车的场景之一。更危险的是——它算完了还是很自信,根本不会告诉你"我可能算错了"。
This type of "looks simple but requires precise calculation" problem is one of the most common failure scenarios for models. What makes it more dangerous: the model remains completely confident after calculating — it will never tell you "I might have gotten this wrong."
五个最容易翻车的场景
Five High-Failure Scenarios
精确数学计算
Precise Math
加减乘除、日期计算、价格汇总。模型在做这些的时候,是"生成答案"而非"真的算"——复杂计算错误率出乎意料地高。
Basic arithmetic, date math, price totals. The model is "generating an answer," not actually computing — error rates on complex calculations are surprisingly high.
→ 解法:让模型调用计算器工具 / Use a calculator tool
实时信息
Real-Time Information
今天的天气、实时余票、最新政策、当前油价。模型的知识截止到训练日期,之后发生的事它一无所知,但它可能不会承认这一点。
Today's weather, live ticket availability, latest regulations, current prices. The model's knowledge has a training cutoff — it knows nothing about events after that date, yet may not admit it.
→ 解法:接入实时数据源 / Connect to live data sources
超长文本中段
Lost in the Middle
就算模型支持 10 万字上下文,放在文本中间位置的信息常常会被"遗漏"。研究显示这叫"迷失在中间"(Lost in the Middle)效应。
Even with 100K-token context windows, information placed in the middle of a long document is frequently overlooked. Research calls this the "Lost in the Middle" effect.
→ 解法:关键信息放开头/结尾;使用 RAG / Place key info at start or end; use RAG
多步逻辑推理
Multi-Step Reasoning
需要连续推理 5 步以上的问题,模型容易在中途走偏,最终结论看起来合理但前提已经错了。
Problems requiring 5+ consecutive reasoning steps often cause the model to drift mid-process — the final conclusion sounds reasonable, but the premises are already wrong.
→ 解法:让模型显式写出每步推理 / Force explicit step-by-step reasoning
小众垂直领域
Niche Domain Knowledge
模型学习的数据以大众内容为主,小众领域(比如特定城市的地方政策、特定行业的专业规范)的知识往往不全甚至有误。
Training data skews toward mainstream content. Niche domain knowledge — local regulations, industry-specific standards — is often incomplete or incorrect.
→ 解法:接入垂直知识库(RAG) / Inject domain knowledge via RAG
"迷失在中间"效应
The "Lost in the Middle" Effect
斯坦福大学的研究发现,当你给模型一个很长的文档时,它对文档开头和结尾的内容记得最清楚,中间部分的信息则明显容易被忽略。
Stanford research found that when given a long document, models remember the content at the beginning and end most clearly, while information in the middle is significantly more likely to be missed.
对于 Agent 产品来说,这意味着:如果你把重要的系统指令或关键约束条件塞在一个超长的 Prompt 中间,模型很可能"忘了"它。重要的约束要放在 Prompt 的开头,或者用结构清晰的格式强调。
For Agent products, this means: if you bury critical system instructions or key constraints in the middle of a long prompt, the model will likely "forget" them. Place important constraints at the beginning of the prompt, or use clearly structured formatting to emphasize them.
不同模型表现一样吗?
Agent 场景怎么选?
Do All Models Perform the Same? How to Choose for Agent Use Cases?
"哪个模型最聪明"是个伪问题。对 Agent 产品来说,真正要问的是:"这个模型在我的具体场景下,稳不稳定?"
某团队在内部测试时发现,GPT-4 在回答复杂问题时表现出色,但在一个需要频繁调用工具、严格按照 JSON 格式输出的 Agent 流程里,它有时候会在 JSON 里多塞一段解释文字,导致下游解析失败。
One team found in internal testing that GPT-4 excelled at complex questions, but in an Agent workflow requiring frequent tool calls and strict JSON output, it would sometimes insert explanatory text inside the JSON — causing downstream parsing failures.
而另一个在"聪明程度"测试上分数略低的模型,在工具调用的格式遵从性上却非常稳定——几乎每次都能给出规范的结构化输出。最后他们换了模型。对 Agent 系统来说,稳定比"聪明"更重要。
A slightly lower-scoring model on intelligence benchmarks was rock-solid on format adherence — producing well-structured output almost every time. They switched models. For Agent systems, consistency beats cleverness.
Agent 场景里真正重要的几个维度
What Actually Matters in Agent Scenarios
| 评估维度 / Dimension | 重要性 / Importance | 具体指的是什么 / What It Means |
|---|---|---|
| 指令遵从能力 Instruction Following |
⭐⭐⭐⭐⭐ 最重要 | 你让它"只输出 JSON,不要解释",它能不能做到?Agent 流程的每一步都依赖这一点 |
| 工具调用准确率 Function Calling Accuracy |
⭐⭐⭐⭐⭐ 最重要 | 调用工具时,传入的参数格式对不对、字段有没有缺失、调用的工具选对了吗 |
| 长上下文稳定性 Long-Context Consistency |
⭐⭐⭐⭐ | 对话轮次多了之后,有没有"忘记"前面说过的事,或者前后矛盾 |
| 拒绝幻觉能力 Hallucination Refusal |
⭐⭐⭐⭐ | 当它不确定时,能不能说"我不知道",而不是编造一个答案 |
| 语言生成质量 Generation Quality |
⭐⭐⭐ | 回复是否流畅自然——对最终用户体验有影响,但不是最核心的 |
| 推理能力 Reasoning Capability |
⭐⭐⭐ | 能不能处理复杂的多步骤任务——对于 Agent 复杂任务规划很重要 |
| 响应速度 Latency |
⭐⭐ | 对时间敏感的场景影响体验;但可以用流式输出(Streaming)来弥补 |
主流模型在 Agent 场景下的特点
Major Models for Agent Use Cases
OpenAI
工具调用成熟度高,生态最完善,文档和社区资源最丰富。是目前 Agent 产品的主流选择,但价格相对较高,速度有时不稳定。
Most mature function-calling ecosystem, richest community resources. The mainstream choice for Agent products — but relatively expensive, and latency can be inconsistent.
适合:核心决策链路 / Best for: core decision pipeline
Anthropic
指令遵从能力强,在需要"严格按格式输出"的场景下表现出色。长上下文处理能力好,幻觉率相对较低。
Strong instruction-following, excellent at strict format compliance. Good long-context handling and relatively lower hallucination rates.
适合:格式遵从性要求高的 Agent / Best for: format-strict Agents
超长上下文(100万 token)是核心优势,处理超大文档不需要切分。多模态能力强,可以处理图片、视频等。
1M-token context window is the flagship advantage — process massive documents without chunking. Strong multimodal capabilities for image and video inputs.
适合:大文档/多模态场景 / Best for: large docs, multimodal
国产模型 / Chinese Models
中文理解和生成能力强,对国内场景的知识覆盖更好,有些模型针对工具调用做了专项优化。价格通常也更有竞争力。
Superior Chinese language understanding, better coverage of China-specific knowledge. Some are optimized specifically for function calling. Generally more cost-competitive.
适合:中文产品/成本敏感场景 / Best for: Chinese-language, cost-sensitive
不要只用一个模型——"路由策略"是高级玩法
Don't Use Just One Model — "Routing Strategy" Is the Advanced Play
成熟的 Agent 产品不会只用一个模型。更聪明的做法是:根据任务类型,用不同的模型处理不同的子任务。
Mature Agent products don't use just one model. The smarter approach: route different subtasks to different models based on task type.
简单意图识别 → 用小模型
用户说"我要查票"还是"我要退票"——这种简单分类,不需要最贵的模型,用轻量级的模型处理,速度快、成本低。
复杂任务规划 → 用大模型
帮用户规划多城市行程、处理复杂的退改签策略——这时候用旗舰模型,确保推理质量。
重复性格式化任务 → 用专项优化模型
把工具调用的结果格式化输出给用户——这种高度结构化的任务,可以用针对指令遵从优化的小模型,稳定且便宜。
产品层面
怎么规避幻觉风险?
How to Mitigate Hallucination Risk at the Product Layer?
很多团队想到"降低幻觉",第一反应是"优化 Prompt"。这是必要的,但远远不够。更可靠的方式是在系统架构层面设置护栏——不要假设模型会永远输出正确的内容,而是假设它随时可能出错,然后提前设计好应对方案。
银行 ATM 机会核对你的密码,不是因为不信任你,而是因为密码错误是可以发生的,所以需要验证机制。给 Agent 设置护栏,逻辑完全一样:不是因为不信任模型,而是因为幻觉是可以发生的。
A bank ATM verifies your PIN not because it doesn't trust you, but because PIN errors can happen, so a verification mechanism is needed. Adding guardrails to an Agent follows the same logic: not distrust in the model, but recognition that hallucinations can and do happen.
五层护栏,从轻到重
Five Layers of Guardrails, from Light to Heavy
输入层:控制好"喂给模型的东西"
在把用户请求发给模型之前,先做预处理:过滤掉明显的干扰信息、规范化格式、确保系统提示(System Prompt)里的关键约束在最显眼的位置。垃圾进,垃圾出——输入质量决定输出质量的上限。
Prompt 层:给模型明确的约束和"逃生路线"
除了告诉模型"你要做什么",还要明确告诉它"当你不确定时,应该怎么处理"。比如:"如果你不确定票价是否准确,请说'我需要查询实时数据,请稍候',而不是给出一个可能不准确的数字。"
工具层:让模型"查"而不是"记"
所有需要精确性的信息——票价、余票、路况、政策——都通过工具调用(Function Calling)从数据源实时获取,不依赖模型的"知识"。工具就是模型的"外挂手册",让它知道去哪儿查,而不是凭记忆猜。
输出层:格式校验 + 范围校验
模型的输出在到达下一步之前,先过一道验证:格式校验(JSON 结构是否完整?必填字段有没有?)和范围校验(价格是不是一个合理的数字?日期是不是在有效范围内?)。不合格的输出触发重试,而不是直接传下去。
决策层:高风险操作必须人工确认
支付、退票、删除数据——这些不可逆操作,不管模型多么"确定",都必须先给用户看清楚要做什么,用户确认后才执行。这是最后一道护栏,也是最重要的一道。
用户问票价 → 模型直接回答
- 可能返回过时的训练数据里的价格 / May return stale training-data prices
- 可能随机生成一个"看起来合理"的数字 / May generate a plausible-sounding number
- 用户信以为真,到了才发现价格不对 / User trusts it, arrives and finds it's wrong
- 投诉、退款、信任损失 / Complaints, refunds, trust loss
用户问票价 → 触发工具调用
- 模型识别需要实时数据,调用票务查询工具 / Model triggers live ticket lookup tool
- 工具返回真实价格,模型基于真实数据回答 / Tool returns real price, model answers from real data
- 输出通过范围校验(价格是否合理?) / Output passes range validation
- 给用户展示的是真实且验证过的价格 / User sees real, validated price
有些团队的做法是:让模型输出内容之后,再让同一个模型检查自己的输出对不对。这是行不通的——模型在检查自己的输出时,会倾向于觉得它是对的。就像让一个作者校对自己的文章,他往往看不到自己的错误。
Some teams have the model check its own output for correctness after generating it. This doesn't work — when checking its own output, the model tends to agree with itself. It's like asking an author to proofread their own writing — they rarely catch their own mistakes.
护栏设计的四个必问问题
- 我们系统里哪些信息是必须实时准确的(票价/余票/状态)?这些必须通过工具调用获取,不能依赖模型知识 Which information in our system must be real-time accurate (fares/availability/status)? These must be retrieved via tool calls — not from model knowledge.
- 模型的输出在到达用户之前,有没有独立的格式和范围校验?(不能由模型自己验证自己) Is there an independent format and range validation step before model output reaches the user? (Cannot be self-validated by the model.)
- 当模型给出一个我们没有预期的输出时,系统会怎么处理?(是报错、降级、还是直接传给用户) When the model produces unexpected output, what does the system do? (Error out, degrade gracefully, or pass it to the user?)
- 有没有定期抽查模型输出,看有没有出现幻觉的情况?(不能假设上线后就没问题了) Do we regularly sample-check model outputs for hallucination? (Cannot assume everything is fine after launch.)
工具调用有多可靠?
边界在哪里?
How Reliable Is Function Calling? Where Are Its Boundaries?
工具调用(Function Calling)是 Agent 系统的核心能力——正是因为模型能调用工具,Agent 才能真正"做事"而不只是"说话"。但很多团队高估了工具调用的可靠性,在边界情况下被坑。
给模型定义了一个查询订单的工具,要求传入 order_id(字符串)和 user_id(整数)。99% 的时候,模型调用这个工具完全没问题。
A tool is defined for querying orders, requiring order_id (string) and user_id (integer). 99% of the time, the model calls it perfectly.
但有时候,用户的 user_id 包含了字母(比如"USR_001"这种格式),模型有时会自作主张地把它转成数字,或者把两个字段搞混……下游系统收到了错误的参数,查询失败,但错误日志只显示"接口调用异常",根本找不到原因。
But sometimes, when user_id contains letters (like "USR_001"), the model may autonomously convert it to a number, or swap two fields... The downstream system receives wrong parameters, the query fails, and the error log just says "API call exception" — impossible to trace.
工具调用可能出现哪些问题?
What Can Go Wrong with Function Calling?
参数类型错误
Type Mismatch
要求整数,模型传了字符串;要求数组,模型传了单个值。在边界情况下出现率不低。
Expected integer, got string; expected array, got single value. Occurs frequently in edge cases.
工具选择错误
Wrong Tool Selected
当你提供了多个功能相似的工具时,模型可能选错了一个。比如"取消预订"和"退款"是两个工具,模型有时会搞混。
With multiple similar tools, the model may pick the wrong one. "Cancel booking" vs "Refund" are often confused.
多余文字输出
Extraneous Text Output
要求"只返回工具调用,不要额外解释",但模型在调用工具前后加了一段文字,导致解析失败。
Instructed to return only a function call, but the model adds explanatory text — causing downstream parsing to fail.
参数值捏造
Parameter Fabrication
当模型不确定某个参数的值时,它有时会"猜"一个听起来合理的值,而不是说"我没有这个信息"。
When unsure of a parameter value, the model sometimes guesses a plausible-sounding one instead of admitting "I don't have this information."
让工具调用更可靠的五个实践
Five Practices for More Reliable Function Calling
| 实践 / Practice | 做什么 / What to Do | 为什么有效 / Why It Works |
|---|---|---|
| 工具描述写清楚 Clear Tool Descriptions |
每个工具的 description 字段要精确说明用途、参数含义、边界情况 | 模型主要靠描述来判断调用哪个工具——描述越清楚,选对的概率越高 |
| 减少相似工具数量 Minimize Similar Tools |
功能相近的工具合并,或者用清晰的命名区分 | 选择越多越容易混淆,保持工具集精简,每个工具职责单一 |
| 参数做强校验 Strict Parameter Validation |
在工具接收到参数后,先做类型和范围校验,不合规的参数直接报错返回 | 让模型知道"这次调用参数不对",它会重新尝试;而不是用错误参数执行 |
| 必填参数不要有默认值 No Defaults for Required Params |
如果参数是必须的,不要给它设默认值 | 有默认值时,模型可能"懒得传";强制必填,逼模型给出真实的参数 |
| 记录完整调用日志 Full Call Logging |
每次工具调用的参数和返回值都要完整记录 | 出了问题时,可以精确复现"模型传了什么参数",不用靠猜 |
工具描述写得好不好,差距有多大?
How Much Does Tool Description Quality Matter?
这是实际工程里最常被低估的细节之一。来看一个具体对比:
This is one of the most underestimated details in real-world engineering. Here's a concrete comparison:
查询列车 / query_train
description: "查询列车信息"
params: from, to, date
模型不清楚 from/to 是城市名还是车站代码,date 是什么格式,也不知道什么时候该用这个工具。
Model doesn't know if from/to are city names or station codes, what date format to use, or when to call this vs. other tools.
search_trains
description: "查询两城市间高铁/动车班次。当用户询问余票、时刻表或想要订票时使用。不适用于已购票查询(用 get_order)"
明确了:① 什么时候用 ② 和其他工具的区别 ③ 参数格式要求
Explicitly defines: ① when to use this tool ② how it differs from other tools ③ required parameter formats
了解模型的能力边界,不是为了对 AI 失望,而是为了把它用在对的地方、用正确的方式用。
Understanding model capability boundaries isn't about being disappointed in AI — it's about using it in the right places, in the right ways.
模型不擅长的事(精确计算、实时信息、超长文本中段内容)——给它工具,让它查,而不是让它猜。模型容易出错的地方——在架构层面设护栏,不要假设它永远正确。
What models can't do well (precise math, real-time info, middle-of-document content) — give them tools to look it up, not guess. Where models are error-prone — set guardrails at the architecture level, don't assume it's always right.
工具描述写得清楚、参数做强校验、完整记录日志——这三件事做好了,能解决 80% 的工具调用问题。
Clear tool descriptions, strict parameter validation, complete call logging — getting these three things right solves 80% of Function Calling problems.
最终的原则是:对模型的能力持客观态度,既不神话它,也不妖魔化它。知道它能做什么、不能做什么,然后做好系统设计的托底工作。
The ultimate principle: approach model capabilities objectively — neither mythologize nor demonize. Know what it can and cannot do, then design solid system-level safeguards accordingly.
📖 中英词汇对照表
Glossary of Key Terms · AI / Agent / Travel Industry