fix: pop_record() preserves tool_calls/tool pairing to prevent 400 error on context overflow#7232
Open
CompilError-bts wants to merge 9 commits intoAstrBotDevs:masterfrom
Open
fix: pop_record() preserves tool_calls/tool pairing to prevent 400 error on context overflow#7232CompilError-bts wants to merge 9 commits intoAstrBotDevs:masterfrom
CompilError-bts wants to merge 9 commits intoAstrBotDevs:masterfrom
Conversation
…ssue-7225 fix: keep tool call message pairs when trimming context
Contributor
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The
_pop_earliest_unitand_peek_earliest_unit_counthelpers duplicate almost the same logic for determining the unit span; consider extracting a shared internal helper that computes the[start_idx, end_idx]range to reduce repetition and keep the two in sync. - The magic numbers
2and3in the truncation loop (the "pop around 2 records" behavior) are non-obvious; defining them as named constants and expanding the docstring to describe this policy would make the intent clearer and future changes less error-prone.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_pop_earliest_unit` and `_peek_earliest_unit_count` helpers duplicate almost the same logic for determining the unit span; consider extracting a shared internal helper that computes the `[start_idx, end_idx]` range to reduce repetition and keep the two in sync.
- The magic numbers `2` and `3` in the truncation loop (the "pop around 2 records" behavior) are non-obvious; defining them as named constants and expanding the docstring to describe this policy would make the intent clearer and future changes less error-prone.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/provider.py" line_range="176-185" />
<code_context>
+ def _pop_earliest_unit() -> int:
</code_context>
<issue_to_address>
**suggestion:** Reduce duplication between `_pop_earliest_unit` and `_peek_earliest_unit_count` to keep the unit-selection logic in one place.
Both functions reimplement the same logic for computing the `[start_idx, end_idx]` window (role checks, tool-call grouping, consecutive `tool` messages). This duplication risks divergence if only one is updated. Consider extracting a helper (e.g. `_earliest_unit_span() -> tuple[int | None, int | None]` or a function parameterized by `dry_run: bool`) so the “atomic unit” rules live in one place and are easier to maintain.
Suggested implementation:
```python
def _first_non_system_index() -> int | None:
for idx, record in enumerate(context):
if record.get("role") != "system":
return idx
return None
def _earliest_unit_span() -> tuple[int | None, int | None]:
"""
Compute the [start_idx, end_idx] of the earliest atomic unit in `context`.
Rules:
- Skip leading `system` messages.
- If the first non-system message is an `assistant` with tool_calls, group it
together with all immediately following `tool` messages.
- If the first non-system message is a `tool`, group it together with all
immediately following `tool` messages.
- Otherwise the unit is just that single message.
"""
start_idx = _first_non_system_index()
if start_idx is None:
return None, None
record = context[start_idx]
role = record.get("role")
end_idx = start_idx
# Keep assistant(tool_calls) + following tool messages atomic
if role == "assistant" and _has_tool_calls(record):
i = start_idx + 1
while i < len(context) and context[i].get("role") == "tool":
end_idx = i
i += 1
# Keep consecutive tool messages atomic
elif role == "tool":
i = start_idx + 1
while i < len(context) and context[i].get("role") == "tool":
end_idx = i
i += 1
return start_idx, end_idx
def _pop_earliest_unit() -> int:
start_idx, end_idx = _earliest_unit_span()
if start_idx is None or end_idx is None:
return 0
# Pop from the end of the span backwards to preserve indices
count = end_idx - start_idx + 1
for _ in range(count):
context.pop(start_idx)
return count
```
```python
def _peek_earliest_unit_count() -> int:
"""
Return the size (number of messages) of the earliest atomic unit
without mutating `context`.
"""
start_idx, end_idx = _earliest_unit_span()
if start_idx is None or end_idx is None:
return 0
return end_idx - start_idx + 1
```
</issue_to_address>
### Comment 2
<location path="astrbot/core/provider/provider.py" line_range="222-229" />
<code_context>
+ end_idx += 1
+ return end_idx - start_idx + 1
+
+ removed = 0
+ while removed < 2:
+ next_unit_count = _peek_earliest_unit_count()
+ if next_unit_count == 0:
break
-
- for idx in reversed(indexs_to_pop):
- context.pop(idx)
+ # Keep behavior close to the old "pop around 2 records" strategy,
+ # while still preserving tool-call atomicity.
+ if removed > 0 and removed + next_unit_count > 3:
+ break
+ removed_now = _pop_earliest_unit()
</code_context>
<issue_to_address>
**suggestion:** The removal loop’s stopping conditions are subtle; consider making the policy more explicit or parameterized.
The combined conditions `while removed < 2` and `if removed > 0 and removed + next_unit_count > 3: break` make the effective policy ("usually 2, sometimes 3 records depending on unit boundaries") hard to see. Consider expressing this via named constants (e.g., `TARGET_RECORDS = 2`, `MAX_RECORDS = 3`) and rewriting the loop in terms of those, so the “remove up to MAX_RECORDS while preserving atomic units” intent is immediately clear.
Suggested implementation:
```python
return end_idx - start_idx + 1
# Removal policy: try to remove around TARGET_RECORDS messages,
# but allow up to MAX_RECORDS to keep tool-call/message units atomic.
TARGET_RECORDS = 2
MAX_RECORDS = 3
removed = 0
while removed < TARGET_RECORDS:
next_unit_count = _peek_earliest_unit_count()
if next_unit_count == 0:
break
# Keep behavior close to the old "pop around 2 records" strategy,
# while still preserving tool-call atomicity.
if removed > 0 and removed + next_unit_count > MAX_RECORDS:
break
```
```python
removed_now = _pop_earliest_unit()
if removed_now == 0:
break
removed += removed_now
```
</issue_to_address>
### Comment 3
<location path="astrbot/core/provider/provider.py" line_range="165" />
<code_context>
- indexs_to_pop.append(idx)
- poped += 1
- if poped == 2:
+ """弹出最早的非 system 记录,同时保持 tool_calls 与 tool 配对完整。"""
+
+ def _has_tool_calls(message: dict) -> bool:
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring `pop_record` to use a single helper for computing the earliest message unit span and a simpler capped-removal loop to preserve existing behavior more clearly.
You can simplify this without changing behavior by (1) unifying the “unit span” logic and (2) simplifying the removal loop.
### 1. Extract shared “unit span” logic
Both `_pop_earliest_unit` and `_peek_earliest_unit_count` compute the same span. You can replace them with a single helper that returns the `(start_idx, end_idx)` for the earliest unit:
```python
async def pop_record(self, context: list) -> None:
"""弹出最早的非 system 记录,同时保持 tool_calls 与 tool 配对完整。"""
def _has_tool_calls(message: dict) -> bool:
return bool(message.get("tool_calls"))
def _get_earliest_unit_span() -> tuple[int, int] | None:
# find first non-system
start_idx = None
for idx, record in enumerate(context):
if record.get("role") != "system":
start_idx = idx
break
if start_idx is None:
return None
record = context[start_idx]
role = record.get("role")
end_idx = start_idx
# assistant with tool_calls, or contiguous tools
if role == "assistant" and _has_tool_calls(record):
while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
end_idx += 1
elif role == "tool":
while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
end_idx += 1
return start_idx, end_idx
```
This removes the duplicated logic and guarantees a single definition of what a “unit” is.
If you prefer to avoid nested helpers, this can also be a private method on the class:
```python
def _get_earliest_unit_span(self, context: list) -> tuple[int, int] | None:
...
```
and then call it from `pop_record`.
### 2. Simplify the removal loop
You can keep the “~2 records with slack up to 3” behavior using one consistent cap and the shared span helper, instead of a separate peek/pop pair:
```python
async def pop_record(self, context: list) -> None:
"""弹出最早的非 system 记录,同时保持 tool_calls 与 tool 配对完整。"""
def _has_tool_calls(message: dict) -> bool:
return bool(message.get("tool_calls"))
def _get_earliest_unit_span() -> tuple[int, int] | None:
start_idx = None
for idx, record in enumerate(context):
if record.get("role") != "system":
start_idx = idx
break
if start_idx is None:
return None
record = context[start_idx]
role = record.get("role")
end_idx = start_idx
if role == "assistant" and _has_tool_calls(record):
while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
end_idx += 1
elif role == "tool":
while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
end_idx += 1
return start_idx, end_idx
removed = 0
min_removed = 2
max_removed = 3 # keep old "~2 records" behavior with slack
while True:
span = _get_earliest_unit_span()
if span is None:
break
start_idx, end_idx = span
unit_len = end_idx - start_idx + 1
# don't exceed the cap once we've already removed something
if removed and removed + unit_len > max_removed:
break
del context[start_idx : end_idx + 1]
removed += unit_len
if removed >= min_removed:
break
```
This keeps all current behavior (atomic tool-call units, up to ~2–3 records removed) but:
- Removes duplicated span logic.
- Eliminates the need for `_peek_earliest_unit_count` and `_pop_earliest_unit`.
- Uses a single clear “cap” (`max_removed`) instead of interacting conditions and a “magic 3” inside a loop condition.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the pop_record method to ensure that assistant tool calls and their associated tool responses are removed atomically, maintaining the integrity of the conversation context. It also includes comprehensive new test cases and updates existing file URI tests for better cross-platform compatibility. Feedback was provided regarding code duplication in the pop_record implementation; refactoring the internal helper functions into a single range-finding utility would improve maintainability and efficiency.
…ssue-7225-fwjo6t Preserve tool-call atomicity when popping context records and add tests
Author
|
已修改合并 |
…p-for-clarity-t0rtc8 refactor: clarify pop_record removal policy constants
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation / 动机
当对话上下文超过模型 token 上限时,
Provider.pop_record()会触发应急截断(#7225)。原实现固定从头部盲删 2 条非 system 消息,不检查assistant(tool_calls)与tool回复的配对关系,可能导致孤立tool消息残留,使 OpenAI 兼容 API 返回 400 错误(Messages with role 'tool' must be a response to a preceding message with 'tool_calls')。此前
ContextTruncator.fix_messages()已在常规截断路径(#5416 / #5417)中解决了同类问题,但pop_record()走的是另一条独立路径,未经过该修复。此外,
test_file_uri_to_path系列测试在 Windows 上构造 localhost file URI 的方式存在缺陷:通过字符串拼接file://localhost{as_posix()}生成file://localhostC:/...,缺少 localhost 后的/,导致 URI 无效。本 PR 一并修复。Modifications / 改动点
astrbot/core/provider/provider.py:重写Provider.pop_record()_pop_earliest_unit():将assistant(tool_calls)及其后续连续tool消息作为原子单位整体删除_peek_earliest_unit_count():预判下一个待删单元大小,当累计删除数即将超过 3 时提前停止,避免过度截断tool消息:将连续的孤立tool消息作为一个单位整体清除tests/test_openai_source.py:新增 5 个单元测试覆盖pop_record()的核心场景test_pop_record_removes_assistant_tool_calls_with_following_tools_atomically:基本 tool 链原子删除test_pop_record_removes_leading_orphan_tool_messages:孤立 tool 消息清理test_pop_record_normal_messages_no_regression:普通对话无回归test_pop_record_assistant_with_multiple_tool_calls:多 tool_calls 配对test_pop_record_only_system_messages:仅含 system 消息的边界情况tests/test_openai_source.py:修复 Windows localhost file URI 测试构造urlparse/urlunparse基于Path.as_uri()规范构造 URI,替代字符串拼接,确保生成file://localhost/C:/...格式This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
新增测试覆盖的场景(本地验证):
system → assistant(tc) → tool → usersystem → usersystem → tool(orphan) → user → assistant → usersystem → assistant → usersystem → u1 → a1 → u2 → a2system → u2 → a2system → assistant(tc1,tc2) → tool1 → tool2 → usersystem → usersystemsystem(不变)Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Adjust context truncation logic to preserve assistant tool_call and tool message pairing while keeping truncation bounded, and update tests including Windows file URI handling.
Bug Fixes:
Tests: