用户怎么判断
一个 Agent"够不够聪明"?
How Do Users Actually Evaluate an Agent's Intelligence?
用户不看代码,不看准确率报告——他们靠直觉打分。而直觉的来源,往往是几个你想象不到的细节。
用户问:"帮我查一下从北京到成都的高铁,明天下午出发。"
User asks: "Find me a high-speed train from Beijing to Chengdu, leaving tomorrow afternoon."
Agent A 的回复:"好的,我查到了明天下午有 3 趟高铁,分别是 14:05 的 G89(历时 8 小时 12 分)、15:30 的 G311(历时 7 小时 56 分)和 16:48 的 G7(历时 8 小时 30 分)。G311 最快,目前二等座还有余票,您要预订哪一趟?"
Agent A replies: "I found 3 high-speed trains tomorrow afternoon: G89 at 14:05 (8h 12m), G311 at 15:30 (7h 56m), and G7 at 16:48 (8h 30m). G311 is fastest — second class still has seats. Which one would you like to book?"
Agent B 的回复:"北京到成都有多趟高铁可供选择,建议您根据出发时间和价格进行比较后选择适合您的车次。"
Agent B replies: "There are multiple high-speed trains from Beijing to Chengdu. I recommend comparing departure times and prices to find the one that suits you best."
Agent B 没有说错任何一句话——但它什么也没做。用户体验到的是:这个 Agent 没用。而 Agent A 展示了查询、比较、给出建议的完整过程,用户感受到的是:它真的在帮我。
Agent B said nothing wrong — but it did nothing. What the user experienced: this Agent is useless. Agent A showed the full process of searching, comparing, and recommending. What the user felt: it's actually helping me.
用户判断"智能程度"的五个真实维度
The Five Real Dimensions Users Use to Judge Intelligence
研究用户行为会发现,他们评价 Agent 智能程度时,用的不是技术指标——而是五个感知维度,每一个都和你的技术选型关系不大,却和产品设计关系极大。
Research into user behavior reveals that when users evaluate Agent intelligence, they're not using technical metrics — they're using five perceptual dimensions, each of which has little to do with your tech stack and everything to do with product design.
| 感知维度 / Dimension | 用户的判断方式 / How Users Judge | 设计关键点 / Design Implication |
|---|---|---|
| 响应速度 Response Speed |
超过 3 秒没有任何反馈,用户开始怀疑"它是不是卡了" | 超过 2 秒的操作,必须有进度反馈("正在查询...") |
| 语言流畅度 Language Fluency |
机械重复、生硬套话会让用户立刻感觉"这是个机器" | 语气要自然,根据场景调整正式/轻松程度 |
| 理解准确度 Intent Accuracy |
用户说了一件事,Agent 理解成了另一件事——这是最伤信任的 | 意图不明时主动澄清,而不是猜测后执行 |
| 异常处理 Error Handling |
出了问题怎么说、说什么——直接决定用户是否继续使用 | 诚实说明出了什么问题,给出可操作的下一步 |
| 记忆连贯性 Memory Continuity |
"上次我告诉过它的,它还记得吗?"是一个高频期待 | 核心偏好要持久化,多轮对话要保持上下文一致 |
展示思考过程,
还是直接给结果?
Show the Thinking Process, or Just Deliver the Result?
这不是一个非此即彼的问题——答案取决于任务的复杂程度和用户当前的心理状态。
用户让 Agent 规划一次从上海出发、途经杭州、最终到达厦门的三天商务出行,要求每段都要最快,预算控制在 3000 元以内,且两段的交通方式不同。这个任务没有唯一答案——Agent 展示的推理过程,比它最终给出的结论更重要。用户看到"它先查了三条路线,比较了时间和价格,发现上海到杭州坐高铁比飞机快两小时……",会立刻产生一种感受:这个 Agent 真的在认真想。
A user asks the Agent to plan a three-day business trip starting in Shanghai, stopping in Hangzhou, and ending in Xiamen — fastest route for each leg, budget under ¥3,000, with different transport modes for the two legs. There's no single right answer. The reasoning the Agent shows matters more than the final itinerary. When users see "it checked three routes, compared time and cost, found that Shanghai-Hangzhou by HSR is two hours faster than flying…" they immediately feel: this Agent is genuinely thinking.
什么时候展示过程,什么时候直接给结果?
When to Show Process vs. When to Just Deliver
让用户等待,降低效率
- 用户问"今天天气怎么样",Agent 展示了查询步骤 — 完全没必要
- 推理步骤太多,用户找不到最终结论在哪里
- 技术语言出现在过程里,让普通用户困惑
- 每次都有过程,用户感到厌烦
建立理解感,提升信任
- 复杂多步任务:让用户感到 Agent 在认真处理
- 结果有争议时:让用户看到依据,更容易接受
- 推理有不确定性:显示"我查了 X 和 Y,选了 X 因为…" 更有说服力
- 用户主动要求解释时:永远值得展示
一个实用的设计原则:默认给结果,复杂任务给进度,用户问"为什么"时给过程。这三种模式覆盖了绝大多数场景,既不让用户等待,也不让信任白白流失。
A practical design principle: default to results; show progress for complex tasks; show process when users ask "why." These three modes cover the vast majority of scenarios — minimizing waiting while not wasting trust opportunities.
Agent 出错了,
用户体验怎么设计?
When the Agent Fails: How to Design the User Experience
出错不可怕——让用户感到失控才可怕。好的错误体验设计,能把"摔跤的瞬间"变成"信任加分的机会"。
用户在高峰期用 Agent 买票,因为支付接口超时,订单没有成功创建。
During peak period, a user tries to buy a ticket through the Agent — the payment gateway times out and the order fails to create.
差的设计:"抱歉,出现了未知错误,请重试。"——用户不知道发生了什么,不知道钱有没有扣,不知道下一步应该怎么做,不知道要不要再试一次。
Bad design: "Sorry, an unknown error occurred. Please try again." — The user doesn't know what happened, whether money was deducted, what to do next, or whether it's safe to retry.
好的设计:"支付超时,订单未创建,您的账户没有扣款。当前可能是高峰期,您可以:① 30 秒后重试(票还在保留中);② 切换到支付宝支付;③ 我帮您发送一个预订链接到手机,稍后手动完成。"——清楚说明了问题、安抚了核心顾虑(钱没扣)、给了三条可操作的路径。
Good design: "Payment timed out — order not created, no charge was made to your account. This may be due to peak traffic. You can: ① retry in 30 seconds (your ticket hold is still active); ② switch to Alipay; ③ let me send a booking link to your phone to complete manually." — Clear explanation, addresses the core worry (no charge), three actionable paths.
错误体验的四个设计原则
Four Principles for Designing Error Experiences
先安抚核心顾虑,再解释原因
支付失败时,用户最担心的是"钱扣了没有",而不是"为什么超时"。先说"没有扣款",再解释原因——顺序很重要。
给出具体的、用户可以执行的下一步
"请稍后重试"是最无用的错误提示——用户不知道等多久、重试哪里、重试有没有风险。具体的下一步应该是:可以点击的操作、明确的时间、明确的预期结果。
不要让用户重做已完成的步骤
支付失败后,用户不应该重新输入出发地、目的地、日期——这些信息已经收集过了,应该帮用户保留。只有失败的那一步需要重做。
保持语气一致,不要突然"机械化"
平时对话很流畅的 Agent,出错时突然变成"ERROR_CODE: PAYMENT_TIMEOUT_001"风格——这种语气断层会让用户感到这个 AI 根本不懂人。错误提示也要符合产品的整体语气。
机会是:让用户看到这个产品是被认真做的。
an opportunity to show users that this product was built with genuine care.
错误体验设计的自查清单
- 产品里每一类错误,有没有专门设计的提示语?还是都用的同一条"出错了,请重试"? Does every error type in your product have purpose-designed copy, or is everything using the same generic "an error occurred, please try again"?
- 支付/金额相关的错误,是否第一句话就明确了"有没有扣款"? For payment/financial errors, does the very first sentence clarify "was the charge made or not"?
- 错误发生后,用户已填写/选择的信息是否自动保留,不需要重新输入? After an error, does the product automatically preserve what the user has already entered, eliminating the need to re-enter?
- 错误提示里,有没有至少一个用户可以立即执行的操作(按钮或链接)? Does every error message include at least one immediately actionable option — a button or link the user can tap right now?
出行场景有什么
特殊性?
What Makes the Travel Scenario Uniquely Demanding?
打车、买票、规划行程——这些场景有一个共同特征:时间敏感、压力高、用户容错空间极小。
用户用 Agent 查询某个餐厅是否营业,Agent 查错了,告诉用户"营业",用户去了发现关门——用户会有点烦,但不会造成严重损失,也许还会有些好笑。
A user asks an Agent whether a restaurant is open. The Agent gets it wrong and says "yes, it's open." The user goes and finds it closed — mildly annoying, not catastrophic. Maybe even a little funny.
同样的错误出现在出行场景:用户要赶飞机,Agent 告诉他"还有充足时间",用户悠闲出门——结果赶不上了。这次的代价是:误机费、改签费、可能错过的重要会议,以及对这个 App 永远的不信任。
The same error in a travel context: a user needs to catch a flight. The Agent says "you have plenty of time." The user leaves casually — and misses the flight. The cost: rebooking fees, a missed meeting, and permanent distrust of this app.
出行场景的特殊性就在于此——信息的准确性要求极高,因为错误的信息会直接转化为真实的损失和强烈的负面情绪。
This is what makes travel unique — the requirement for accuracy is extreme, because wrong information directly translates into real losses and intense negative emotions.
出行场景对 Agent 的四个特殊要求
Four Special Requirements Travel Places on Agents
实时性 > 流畅度
Accuracy over Fluency
用户宁愿接受一个语气生硬但信息准确的 Agent,也不愿接受一个语气流畅但给出过时信息的 Agent。实时票况、路况、价格——必须调用真实接口,不能用"估计"替代。
Users prefer a blunt but accurate Agent over a fluent one with stale information. Real-time availability, traffic, pricing — must call live APIs, never substitute estimates.
不确定性要明示
Surface Uncertainty Explicitly
"预计到达时间约 35 分钟,但当前路况变化较大,建议提前出发"——比"到达时间 35 分钟"更诚实,也更有用。出行场景的用户懂得不确定性,隐藏它反而会失去信任。
"ETA approx. 35 minutes, but current traffic is volatile — recommend leaving early" is more honest and more useful than "35 minute ETA." Travel users understand uncertainty; hiding it destroys trust.
关键操作需要双重确认
Double-Confirm Key Actions
订票、退票、改签——任何涉及钱的操作,都需要在执行前清楚展示给用户确认:具体金额、具体车次、具体时间。在出行场景,"Agent 帮我自动决定了"不是好事,是用户噩梦。
Booking, cancellation, rebooking — any financial action must be displayed for explicit user confirmation first: exact amount, exact train/flight, exact time. "The Agent decided automatically" is not a feature — it's a nightmare.
高峰期降级方案
Peak-Period Fallback
节假日高峰期,服务响应可能变慢甚至中断。Agent 要提前设计好降级方案——哪些功能在高峰期关闭、哪些用缓存数据替代、哪些要主动告诉用户"当前可能延迟"。
During peak holidays, response times slow or services go down. Design fallback plans proactively — which features degrade, which use cached data, which proactively warn users of expected delays.
让用户逐步接受
Agent 的自主能力
Building User Acceptance of Agent Autonomy, Step by Step
信任是一步一步建立的,不能跳步——这是 Agent 产品设计里最容易被忽视、也最容易酿成大失败的原则。
某出行 App 发布了"全自动出行助手"——用户授权后,App 会自动帮用户订最优方案,无需确认,直接下单。这个功能背后的逻辑是好的:省去了用户多次确认的麻烦。但上线第一天,就收到了大量投诉:
A travel app launched a "fully automated assistant." After authorization, the app would auto-book the optimal option — no confirmation, orders placed directly. The logic was sound: eliminate the friction of multiple confirmation steps. But on day one, complaints poured in:
"它帮我买了商务座,我根本没有预算买商务座。"
"它帮我订了早上 6 点的航班,我告诉过它我不接受 6 点前出发的。"
"它帮我取消了一张票,说是重复订单,但那是我故意买的两张不同票。"
"It booked me into business class — I can't afford business class." / "It booked a 6am flight. I told it I don't take flights before 6." / "It canceled a ticket it called a duplicate — but those were two intentionally different bookings."
问题不在于 Agent 做错了——从技术角度看,它都在执行最优解。问题是:用户没有建立对它的信任,就被迫接受了它的自主决策。
The problem wasn't that the Agent did wrong — technically, it was executing optimal solutions. The problem: users were forced to accept autonomous decisions before they had built any trust in the Agent.
三阶段信任阶梯模型
The Three-Stage Trust Ladder
建议者
Agent 做"建议者"(Advisor)
Agent 给推荐,用户自己决定。"我找到了 3 个方案,推荐方案二,因为……您选哪个?"——用户始终掌握主动权,Agent 提供信息价值。这是建立信任的起点。
执行者
Agent 做"执行者"(Executor)
用户确认后,Agent 帮用户完成操作。"我已经为您找好了方案,您确认后我来帮您完成预订"——用户仍然做决策,Agent 承担执行负担。这是信任积累到一定程度后的自然进化。
代理人
Agent 做"代理人"(Autonomous Agent)
用户授权 Agent 在特定范围内自主执行,无需每次确认。"以后商务出行都帮我用最快方案,预算 2000 以内,不用每次问我"——这是信任的最高形态,只有经过前两阶段积累,才能健康达到。
这三个阶段不是固定的产品版本,而是每个用户需要走过的心理历程。同一个产品里,不同用户可能处于不同阶段——有人还在"建议者"阶段,有人已经愿意让 Agent 全权代理。产品设计要能同时支持这三种状态,并提供清晰的升级路径。
These three stages aren't fixed product versions — they're a psychological journey each user needs to take. In the same product, different users may be at different stages. Some users are still in the "advisor" stage; others are ready for full autonomy. Product design must support all three states simultaneously and offer a clear upgrade path.
首次使用:
管理用户预期
First-Time Use: Managing User Expectations
用户对 AI 的期待往往要么过高,要么过低——两种都会导致失望。好的首次体验,就是把这两种偏差同时校准。
心理学研究显示,用户在首次使用 AI 产品时,同时带着两个截然相反的期待:一部分人期待"它什么都能做"(被科幻电影塑造的形象),另一部分人认为"AI 肯定还是很笨"(被以前那些糟糕的 AI 客服伤害过)。这两种期待都会在产品的前几次交互中被现实击碎——差别只是击碎的方式是失望还是惊喜。
Research shows users open an AI product for the first time with two contradictory expectations simultaneously: some expect "it can do everything" (shaped by science fiction), others assume "AI is still pretty dumb" (burned by bad AI customer service in the past). Both expectations will be shattered in the first few interactions — the only question is whether the shattering is disappointment or pleasant surprise.
Onboarding 设计的五个核心要素
Five Core Elements of Effective Agent Onboarding
一句话说清"能做什么"
不要用 PR 语言("您的智能出行助手"),要用具体能力描述("帮您查票、订票、规划行程——用说话的方式,不用在 App 里找来找去")。
第一次交互就给"哇"时刻
用户的第一条消息,Agent 要用真实能力"惊艳"他——比直接给结果更快、更准、更体贴。第一次成功,是后续信任积累的基础。
主动说清楚"不能做什么"
不要等用户撞到边界才发现限制。在 onboarding 里提前告诉用户:它目前不能做境外出行规划、不能帮你退已出发的票、不能代替你完成身份验证——这不是示弱,是诚实。
解释什么时候需要用户介入
"涉及支付的操作,我会在下单前让您确认金额"——提前告知,用户才不会在收到确认弹窗时感到困惑或被打扰。
给用户示范正确的"使用姿势"
很多用户不知道怎么和 Agent 说话——是说"查票"还是"帮我找明天从北京到上海的高铁,我想下午出发"?好的 onboarding 会用几个具体的示范对话,帮用户建立"对话感"。
用户对 Agent 的信任,不是一个技术问题,而是一个设计问题。你的模型可能已经足够好,但如果产品设计没有帮用户建立正确的心智模型——什么时候它会出错、出错了会怎么处理、什么决策它会自己做——用户就无法建立信任。
User trust in an Agent is a design problem, not a technical one. Your model may already be good enough — but if the product design hasn't helped users build the right mental model — when it will fail, how failures are handled, which decisions it makes autonomously — users cannot build trust.
信任的建立是单向的、不可逆的:建立慢,失去快。在出行这类高压场景里,一次关键时刻的失误,抵得过前面二十次的良好体验。产品设计要始终问:我们在每一个关键时刻,有没有做好最坏情况的准备?
Trust builds slowly and breaks fast. In high-pressure scenarios like travel, one critical failure can undo twenty successful experiences. Product design must always ask: at every critical moment, have we prepared for the worst case?
📖 中英词汇对照表
Glossary of Key Terms · User Experience / Trust Design / Agent UX