Skip to content

feat(proxy): 自动回切 — P1恢复时failback到主provider#2156

Open
954708775 wants to merge 5 commits intofarion1231:mainfrom
954708775:feat-auto-failback-v2
Open

feat(proxy): 自动回切 — P1恢复时failback到主provider#2156
954708775 wants to merge 5 commits intofarion1231:mainfrom
954708775:feat-auto-failback-v2

Conversation

@954708775
Copy link
Copy Markdown

@954708775 954708775 commented Apr 17, 2026

Summary

当 P1 的 CircuitBreaker 从 Open 恢复到 Closed 时,自动将流量和 UI 状态回切到 P1。

修复内容(3 个 commit)

  1. Bot P1 review fix: FailoverSwitchManager 存储 AppHandle,使 do_switch 能正常执行 UI 更新(AppState hot_switch、托盘菜单刷新、前端 provider-switched 事件)
  2. 竞态条件修复: set_failover_switch_manager 改为 async + settle_join,caller 必须 await 写入完成后才接受请求
  3. 静默失败修复: try_switch 结果用 match 分支处理,Err → log::error,避免吞错;get_failover_queue/get_all_providers 失败时加 warn 日志

测试

  • 880 tests pass
  • 构建通过

Review 结果

第一轮发现 3 个 Critical 全部已修复:

  • fire-and-forget 竞态 → settle_join await
  • let _ = fm.try_switch() 吞错 → match 分支 + warn/error
  • 空队列静默跳过 → warn 日志

当 P1 的 CircuitBreaker 从 Open 恢复到 Closed 时,自动将
流量和 UI 状态回切到 P1。

- ProviderRouter 添加 failover_switch_manager 字段
- record_result() 成功路径末尾检查 P1 状态并触发回切
- 通过 tokio::spawn 非阻塞执行,避免阻塞请求处理
- pending_switches dedup 防止并发重复切换
@farion1231
Copy link
Copy Markdown
Owner

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a311c7022

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src-tauri/src/proxy/provider_router.rs Outdated
Comment on lines +173 to +175
let all = match self.db.get_all_providers(app_type) {
Ok(p) => p,
Err(_) => return Ok(()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Continue health update when failback lookup fails

Avoid returning from record_result when the optional failback metadata fetch fails. In this branch, Err(_) => return Ok(()) exits before the normal update_provider_health_with_threshold(...) call, so successful requests are no longer written to provider health state and the DB read failure is silently masked. A transient get_all_providers error should skip failback switching only, not bypass the core health-update path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5eaf82c. Replaced return Ok(()) with a labeled break 'failback so get_all_providers failure only skips the failback path, not the health update. Thanks for catching this.

Copy link
Copy Markdown
Author

@954708775 954708775 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — P2 addressed in 5eaf82c with labeled break. Thanks for the review.

修复 Codex review 指出的 P2 问题:record_result 中 get_all_providers
查库失败时,之前的 return Ok(()) 会导致整个函数提前退出,health
update 被跳过。用 labeled break 'failback 替换,仅退出回切逻辑块。
@954708775 954708775 force-pushed the feat-auto-failback-v2 branch from 5eaf82c to e720c31 Compare April 19, 2026 03:52
Copy link
Copy Markdown
Author

@954708775 954708775 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 自动回切逻辑清晰:P1 熔断恢复后自动切回,不阻塞请求处理,dedup 防止并发重复切换。Labeled block 修复合规。

@farion1231
Copy link
Copy Markdown
Owner

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e720c31f0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src-tauri/src/proxy/provider_router.rs Outdated
let pname = p1.name.clone();
let at = app_type.to_string();
tokio::spawn(async move {
let _ = fm.try_switch(None, &at, &pid, &pname).await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pass AppHandle when triggering failback switch

The new failback path calls try_switch with None, so the switch manager never executes hot_switch_provider or emits UI updates (those only run inside if let Some(app) in failover_switch.rs). In the recovery scenario this commit targets (traffic currently on P2, P1 circuit recovered), the spawned task becomes a no-op and the provider/UI state does not actually fail back to P1.

Useful? React with 👍 / 👎.

benny added 2 commits April 19, 2026 23:38
The bot correctly flagged that try_switch was called with None,
causing AppState hot_switch, tray menu refresh, and frontend
provider-switched events to be skipped on auto-failback.

Changes:
- FailoverSwitchManager now stores app_handle on construction
- try_switch/do_switch no longer take app_handle as parameter
- All call sites updated (forwarder x3, provider_router, commands/proxy)
- set_failover_switch_manager 改为同步写入,caller await settle_join
  后再接受请求,消除 fire-and-forget 竞态
- 提取 trigger_failback_if_needed 方法,do_switch 错误日志化:
  Err → error, Ok(false) → debug,避免静默吞错
- get_failover_queue / get_all_providers 失败时加 warn 日志
- 所有数据克隆后送入 tokio::spawn,避免生命周期问题
- response_processor.rs 更新 FailoverSwitchManager::new 调用签名
@954708775
Copy link
Copy Markdown
Author

代码已完成修复,所有 review 问题均已解决,880 tests pass,构建通过。请合并。

Copy link
Copy Markdown
Owner

@farion1231 farion1231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢您的贡献,请查看一下是否有以下问题:

P1
自动回切会在故障转移关闭时仍然触发record_result 对所有成功请求都会调用 failback 路径,但 trigger_failback_if_needed 没有检查 auto_failover_enabled。当用户关闭自动故障转移但队列仍保留 P1、当前 provider 是 P2 时,一次正常成功请求会把逻辑 provider/UI 切回队列 P1;FailoverSwitchManager 只检查 enabled,不会阻止这个行为。建议在调用前或函数开头读取 app 配置,并要求 enabled && auto_failover_enabled。
/tmp/cc-switch-pr2156/src-tauri/src/proxy/provider_router.rs:156-160

P1
Closed 不等于 P1 刚恢复,可能反向覆盖真实 failover
这里仅用 P1 breaker 处于 Closed 判断可回切,会误伤同一个请求内的正常故障转移:P1 本次请求失败但未达到熔断阈值时仍是 Closed,随后 P2 成功,record_result 会先后台切回 P1;forwarder 又会后台切到实际成功的 P2,两个不同 key 不会去重,最终状态取决于调度顺序,可能停在刚失败的 P1。回切条件应绑定 P1 从 Open/HalfOpen 恢复到 Closed 的状态转换,或至少避免在 P2 成功路径上仅凭 P1 当前 Closed 就切回。
/tmp/cc-switch-pr2156/src-tauri/src/proxy/provider_router.rs:258-263

P1-A: trigger_failback_if_needed 读取 auto_failover_enabled,关闭时跳过回切。
P1-B: CircuitBreaker 加 ever_tripped 标记,has_been_open() 区分"从未失败"
     和"从 Open 恢复"。回切条件从仅检查 Closed 改为 Closed && has_been_open()。
@954708775
Copy link
Copy Markdown
Author

修复 P1 Review 问题

已处理两条 P1 反馈:

P1-A:故障转移关闭时仍触发回切

trigger_failback_if_needed 开头新增读取 auto_failover_enabled,关闭时直接 return,不再调用 failover_switch_manager

P1-B:Closed 不等于 P1 刚恢复

CircuitBreaker 新增 ever_tripped: AtomicBool 标记,transition_to_open 时设 true,transition_to_closed 时清 false,暴露 has_been_open() 公开方法。回切条件从仅检查 Closed 改为同时满足 Closed && has_been_open(),避免从未触发熔断的 P1 被误切。

提交:834c48f4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants