Skip to content

fix: pop_record() preserves tool_calls/tool pairing to prevent 400 error on context overflow#7232

Open
CompilError-bts wants to merge 9 commits intoAstrBotDevs:masterfrom
CompilError-bts:master
Open

fix: pop_record() preserves tool_calls/tool pairing to prevent 400 error on context overflow#7232
CompilError-bts wants to merge 9 commits intoAstrBotDevs:masterfrom
CompilError-bts:master

Conversation

@CompilError-bts
Copy link
Copy Markdown

@CompilError-bts CompilError-bts commented Mar 31, 2026

Motivation / 动机

当对话上下文超过模型 token 上限时,Provider.pop_record() 会触发应急截断(#7225)。原实现固定从头部盲删 2 条非 system 消息,不检查 assistant(tool_calls)tool 回复的配对关系,可能导致孤立 tool 消息残留,使 OpenAI 兼容 API 返回 400 错误(Messages with role 'tool' must be a response to a preceding message with 'tool_calls')。

此前 ContextTruncator.fix_messages() 已在常规截断路径(#5416 / #5417)中解决了同类问题,但 pop_record() 走的是另一条独立路径,未经过该修复。

此外,test_file_uri_to_path 系列测试在 Windows 上构造 localhost file URI 的方式存在缺陷:通过字符串拼接 file://localhost{as_posix()} 生成 file://localhostC:/...,缺少 localhost 后的 /,导致 URI 无效。本 PR 一并修复。

Modifications / 改动点

  • astrbot/core/provider/provider.py:重写 Provider.pop_record()

    • 引入 _pop_earliest_unit():将 assistant(tool_calls) 及其后续连续 tool 消息作为原子单位整体删除
    • 引入 _peek_earliest_unit_count():预判下一个待删单元大小,当累计删除数即将超过 3 时提前停止,避免过度截断
    • 处理头部孤立 tool 消息:将连续的孤立 tool 消息作为一个单位整体清除
  • tests/test_openai_source.py:新增 5 个单元测试覆盖 pop_record() 的核心场景

    • test_pop_record_removes_assistant_tool_calls_with_following_tools_atomically:基本 tool 链原子删除
    • test_pop_record_removes_leading_orphan_tool_messages:孤立 tool 消息清理
    • test_pop_record_normal_messages_no_regression:普通对话无回归
    • test_pop_record_assistant_with_multiple_tool_calls:多 tool_calls 配对
    • test_pop_record_only_system_messages:仅含 system 消息的边界情况
  • tests/test_openai_source.py:修复 Windows localhost file URI 测试构造

    • 使用 urlparse/urlunparse 基于 Path.as_uri() 规范构造 URI,替代字符串拼接,确保生成 file://localhost/C:/... 格式
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

$ ruff format .
All checks passed!

$ ruff check .
All checks passed!

新增测试覆盖的场景(本地验证):

测试 输入 预期输出 结果
tool 链原子删除 system → assistant(tc) → tool → user system → user ✅ Pass
孤立 tool 清理 system → tool(orphan) → user → assistant → user system → assistant → user ✅ Pass
普通对话回归 system → u1 → a1 → u2 → a2 system → u2 → a2 ✅ Pass
多 tool_calls system → assistant(tc1,tc2) → tool1 → tool2 → user system → user ✅ Pass
仅 system system system(不变) ✅ Pass

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Adjust context truncation logic to preserve assistant tool_call and tool message pairing while keeping truncation bounded, and update tests including Windows file URI handling.

Bug Fixes:

  • Ensure emergency context truncation via pop_record removes assistant tool_calls and their corresponding tool messages atomically to prevent invalid tool-only histories and API 400 errors.
  • Fix Windows localhost file URI construction and path resolution tests to validate using normalized Path comparisons instead of string literals.

Tests:

  • Add unit tests covering pop_record behavior with tool_call chains, orphan tool messages, normal conversations, multiple tool_calls, and system-only contexts.
  • Adjust file URI to path tests on Windows to assert normalized Path equality for local and UNC-style file URIs.

@auto-assign auto-assign bot requested review from Fridemn and Soulter March 31, 2026 12:34
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Mar 31, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The _pop_earliest_unit and _peek_earliest_unit_count helpers duplicate almost the same logic for determining the unit span; consider extracting a shared internal helper that computes the [start_idx, end_idx] range to reduce repetition and keep the two in sync.
  • The magic numbers 2 and 3 in the truncation loop (the "pop around 2 records" behavior) are non-obvious; defining them as named constants and expanding the docstring to describe this policy would make the intent clearer and future changes less error-prone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_pop_earliest_unit` and `_peek_earliest_unit_count` helpers duplicate almost the same logic for determining the unit span; consider extracting a shared internal helper that computes the `[start_idx, end_idx]` range to reduce repetition and keep the two in sync.
- The magic numbers `2` and `3` in the truncation loop (the "pop around 2 records" behavior) are non-obvious; defining them as named constants and expanding the docstring to describe this policy would make the intent clearer and future changes less error-prone.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/provider.py" line_range="176-185" />
<code_context>
+        def _pop_earliest_unit() -> int:
</code_context>
<issue_to_address>
**suggestion:** Reduce duplication between `_pop_earliest_unit` and `_peek_earliest_unit_count` to keep the unit-selection logic in one place.

Both functions reimplement the same logic for computing the `[start_idx, end_idx]` window (role checks, tool-call grouping, consecutive `tool` messages). This duplication risks divergence if only one is updated. Consider extracting a helper (e.g. `_earliest_unit_span() -> tuple[int | None, int | None]` or a function parameterized by `dry_run: bool`) so the “atomic unit” rules live in one place and are easier to maintain.

Suggested implementation:

```python
        def _first_non_system_index() -> int | None:
            for idx, record in enumerate(context):
                if record.get("role") != "system":
                    return idx
            return None

        def _earliest_unit_span() -> tuple[int | None, int | None]:
            """
            Compute the [start_idx, end_idx] of the earliest atomic unit in `context`.

            Rules:
            - Skip leading `system` messages.
            - If the first non-system message is an `assistant` with tool_calls, group it
              together with all immediately following `tool` messages.
            - If the first non-system message is a `tool`, group it together with all
              immediately following `tool` messages.
            - Otherwise the unit is just that single message.
            """
            start_idx = _first_non_system_index()
            if start_idx is None:
                return None, None

            record = context[start_idx]
            role = record.get("role")
            end_idx = start_idx

            # Keep assistant(tool_calls) + following tool messages atomic
            if role == "assistant" and _has_tool_calls(record):
                i = start_idx + 1
                while i < len(context) and context[i].get("role") == "tool":
                    end_idx = i
                    i += 1
            # Keep consecutive tool messages atomic
            elif role == "tool":
                i = start_idx + 1
                while i < len(context) and context[i].get("role") == "tool":
                    end_idx = i
                    i += 1

            return start_idx, end_idx

        def _pop_earliest_unit() -> int:
            start_idx, end_idx = _earliest_unit_span()
            if start_idx is None or end_idx is None:
                return 0

            # Pop from the end of the span backwards to preserve indices
            count = end_idx - start_idx + 1
            for _ in range(count):
                context.pop(start_idx)
            return count

```

```python
        def _peek_earliest_unit_count() -> int:
            """
            Return the size (number of messages) of the earliest atomic unit
            without mutating `context`.
            """
            start_idx, end_idx = _earliest_unit_span()
            if start_idx is None or end_idx is None:
                return 0
            return end_idx - start_idx + 1

```
</issue_to_address>

### Comment 2
<location path="astrbot/core/provider/provider.py" line_range="222-229" />
<code_context>
+                    end_idx += 1
+            return end_idx - start_idx + 1
+
+        removed = 0
+        while removed < 2:
+            next_unit_count = _peek_earliest_unit_count()
+            if next_unit_count == 0:
                 break
-
-        for idx in reversed(indexs_to_pop):
-            context.pop(idx)
+            # Keep behavior close to the old "pop around 2 records" strategy,
+            # while still preserving tool-call atomicity.
+            if removed > 0 and removed + next_unit_count > 3:
+                break
+            removed_now = _pop_earliest_unit()
</code_context>
<issue_to_address>
**suggestion:** The removal loop’s stopping conditions are subtle; consider making the policy more explicit or parameterized.

The combined conditions `while removed < 2` and `if removed > 0 and removed + next_unit_count > 3: break` make the effective policy ("usually 2, sometimes 3 records depending on unit boundaries") hard to see. Consider expressing this via named constants (e.g., `TARGET_RECORDS = 2`, `MAX_RECORDS = 3`) and rewriting the loop in terms of those, so the “remove up to MAX_RECORDS while preserving atomic units” intent is immediately clear.

Suggested implementation:

```python
            return end_idx - start_idx + 1

        # Removal policy: try to remove around TARGET_RECORDS messages,
        # but allow up to MAX_RECORDS to keep tool-call/message units atomic.
        TARGET_RECORDS = 2
        MAX_RECORDS = 3

        removed = 0
        while removed < TARGET_RECORDS:
            next_unit_count = _peek_earliest_unit_count()
            if next_unit_count == 0:
                break
            # Keep behavior close to the old "pop around 2 records" strategy,
            # while still preserving tool-call atomicity.
            if removed > 0 and removed + next_unit_count > MAX_RECORDS:
                break

```

```python
            removed_now = _pop_earliest_unit()
            if removed_now == 0:
                break
            removed += removed_now

```
</issue_to_address>

### Comment 3
<location path="astrbot/core/provider/provider.py" line_range="165" />
<code_context>
-            indexs_to_pop.append(idx)
-            poped += 1
-            if poped == 2:
+        """弹出最早的非 system 记录,同时保持 tool_calls 与 tool 配对完整。"""
+
+        def _has_tool_calls(message: dict) -> bool:
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring `pop_record` to use a single helper for computing the earliest message unit span and a simpler capped-removal loop to preserve existing behavior more clearly.

You can simplify this without changing behavior by (1) unifying the “unit span” logic and (2) simplifying the removal loop.

### 1. Extract shared “unit span” logic

Both `_pop_earliest_unit` and `_peek_earliest_unit_count` compute the same span. You can replace them with a single helper that returns the `(start_idx, end_idx)` for the earliest unit:

```python
async def pop_record(self, context: list) -> None:
    """弹出最早的非 system 记录,同时保持 tool_calls 与 tool 配对完整。"""

    def _has_tool_calls(message: dict) -> bool:
        return bool(message.get("tool_calls"))

    def _get_earliest_unit_span() -> tuple[int, int] | None:
        # find first non-system
        start_idx = None
        for idx, record in enumerate(context):
            if record.get("role") != "system":
                start_idx = idx
                break
        if start_idx is None:
            return None

        record = context[start_idx]
        role = record.get("role")
        end_idx = start_idx

        # assistant with tool_calls, or contiguous tools
        if role == "assistant" and _has_tool_calls(record):
            while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
                end_idx += 1
        elif role == "tool":
            while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
                end_idx += 1

        return start_idx, end_idx
```

This removes the duplicated logic and guarantees a single definition of what a “unit” is.

If you prefer to avoid nested helpers, this can also be a private method on the class:

```python
def _get_earliest_unit_span(self, context: list) -> tuple[int, int] | None:
    ...
```

and then call it from `pop_record`.

### 2. Simplify the removal loop

You can keep the “~2 records with slack up to 3” behavior using one consistent cap and the shared span helper, instead of a separate peek/pop pair:

```python
async def pop_record(self, context: list) -> None:
    """弹出最早的非 system 记录,同时保持 tool_calls 与 tool 配对完整。"""

    def _has_tool_calls(message: dict) -> bool:
        return bool(message.get("tool_calls"))

    def _get_earliest_unit_span() -> tuple[int, int] | None:
        start_idx = None
        for idx, record in enumerate(context):
            if record.get("role") != "system":
                start_idx = idx
                break
        if start_idx is None:
            return None

        record = context[start_idx]
        role = record.get("role")
        end_idx = start_idx

        if role == "assistant" and _has_tool_calls(record):
            while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
                end_idx += 1
        elif role == "tool":
            while end_idx + 1 < len(context) and context[end_idx + 1].get("role") == "tool":
                end_idx += 1

        return start_idx, end_idx

    removed = 0
    min_removed = 2
    max_removed = 3  # keep old "~2 records" behavior with slack

    while True:
        span = _get_earliest_unit_span()
        if span is None:
            break

        start_idx, end_idx = span
        unit_len = end_idx - start_idx + 1

        # don't exceed the cap once we've already removed something
        if removed and removed + unit_len > max_removed:
            break

        del context[start_idx : end_idx + 1]
        removed += unit_len

        if removed >= min_removed:
            break
```

This keeps all current behavior (atomic tool-call units, up to ~2–3 records removed) but:

- Removes duplicated span logic.
- Eliminates the need for `_peek_earliest_unit_count` and `_pop_earliest_unit`.
- Uses a single clear “cap” (`max_removed`) instead of interacting conditions and a “magic 3” inside a loop condition.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the pop_record method to ensure that assistant tool calls and their associated tool responses are removed atomically, maintaining the integrity of the conversation context. It also includes comprehensive new test cases and updates existing file URI tests for better cross-platform compatibility. Feedback was provided regarding code duplication in the pop_record implementation; refactoring the internal helper functions into a single range-finding utility would improve maintainability and efficiency.

@CompilError-bts
Copy link
Copy Markdown
Author

已修改合并

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant