Skip to content

fix(provider): preserve tool-call pairing after context truncation (#7225)#7237

Open
Yaohua-Leo wants to merge 3 commits intoAstrBotDevs:masterfrom
Yaohua-Leo:fix/7225-tool-call-truncation
Open

fix(provider): preserve tool-call pairing after context truncation (#7225)#7237
Yaohua-Leo wants to merge 3 commits intoAstrBotDevs:masterfrom
Yaohua-Leo:fix/7225-tool-call-truncation

Conversation

@Yaohua-Leo
Copy link
Copy Markdown
Contributor

@Yaohua-Leo Yaohua-Leo commented Mar 31, 2026

Summary

Review summary for issue #7225

Branch

fix/7225-tool-call-truncation

Changed files

  • astrbot/core/provider/provider.py
  • tests/test_openai_source.py

Commit

Subject: fix(provider): clean orphaned tool messages after truncation
Author: LehaoLin <linlehao@cuhk.edu.cn>
Date: Tue Mar 31 22:16:35 2026 +0800

Issue

Executed

  • Command: python -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Purpose: Try the smallest repository-native OpenAI provider test slice from the existing global Python environment.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Purpose: Re-run the same repository-native slice in a repo-local Python 3.12 virtualenv after installing requirements.txt, pytest, and pytest-asyncio.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_removes_orphaned_tool_messages
    Purpose: Validate the issue-specific regression where emergency truncation used to leave an orphaned tool message.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_preserves_remaining_valid_messages
    Purpose: Check the nearby boundary case that valid non-tool messages still survive emergency truncation.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py
    Purpose: Run the full OpenAI provider test module to catch adjacent regressions in error-handling branches.
  • Command: .venv\Scripts\python.exe -m ruff check astrbot/core/provider/provider.py tests/test_openai_source.py
    Purpose: Run static checks on the changed implementation and regression tests.

Results

  • Command: python -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Status: failed then fixed
    Summary: The first attempt failed before test collection because the global Python environment was missing sqlalchemy. I created a repo-local Python 3.12 .venv, installed the runtime requirements plus pytest tooling, and reran the equivalent repository-native check successfully.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Status: passed
    Summary: Three selected tests passed, covering the new maximum-context regression cases plus one pre-existing content-moderation check.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_removes_orphaned_tool_messages
    Status: passed
    Summary: Confirmed that after pop_record() removes the earliest user/assistant(tool_calls) pair, orphaned tool messages are now removed from the dict-based context before retry.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_preserves_remaining_valid_messages
    Status: passed
    Summary: Confirmed that emergency truncation still preserves unrelated valid messages after the oldest conversation turn is removed.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py
    Status: passed
    Summary: All 16 tests in tests/test_openai_source.py passed.
  • Command: .venv\Scripts\python.exe -m ruff check astrbot/core/provider/provider.py tests/test_openai_source.py
    Status: passed
    Summary: Ruff reported no lint errors in the changed files.

Not run

  • Command or area: Full repository test suite
    Reason: The fix is isolated to the OpenAI provider emergency truncation path, so I kept validation focused on the smallest relevant built-in module instead of running the entire suite.
  • Command or area: End-to-end chat flow against a real OpenAI-compatible provider
    Reason: This workspace does not have a test-specific remote provider session configured for safe reproducible E2E verification.

Residual risk

  • Risk: The regression coverage is strong for dict-based maximum context length retries in ProviderOpenAIOfficial, but it does not exercise a live multi-round agent conversation end to end.
  • Follow-up: If maintainers want extra confidence, run one manual agent session that forces context trimming and confirms the retry path no longer sends a lone tool message.

Changes

  • astrbot/core/provider/provider.py
  • tests/test_openai_source.py

Testing

Executed

  • Command: python -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Purpose: Try the smallest repository-native OpenAI provider test slice from the existing global Python environment.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Purpose: Re-run the same repository-native slice in a repo-local Python 3.12 virtualenv after installing requirements.txt, pytest, and pytest-asyncio.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_removes_orphaned_tool_messages
    Purpose: Validate the issue-specific regression where emergency truncation used to leave an orphaned tool message.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_preserves_remaining_valid_messages
    Purpose: Check the nearby boundary case that valid non-tool messages still survive emergency truncation.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py
    Purpose: Run the full OpenAI provider test module to catch adjacent regressions in error-handling branches.
  • Command: .venv\Scripts\python.exe -m ruff check astrbot/core/provider/provider.py tests/test_openai_source.py
    Purpose: Run static checks on the changed implementation and regression tests.

Results

  • Command: python -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Status: failed then fixed
    Summary: The first attempt failed before test collection because the global Python environment was missing sqlalchemy. I created a repo-local Python 3.12 .venv, installed the runtime requirements plus pytest tooling, and reran the equivalent repository-native check successfully.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py -k "context_length or content_moderated_removes_images"
    Status: passed
    Summary: Three selected tests passed, covering the new maximum-context regression cases plus one pre-existing content-moderation check.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_removes_orphaned_tool_messages
    Status: passed
    Summary: Confirmed that after pop_record() removes the earliest user/assistant(tool_calls) pair, orphaned tool messages are now removed from the dict-based context before retry.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py::test_handle_api_error_context_length_preserves_remaining_valid_messages
    Status: passed
    Summary: Confirmed that emergency truncation still preserves unrelated valid messages after the oldest conversation turn is removed.
  • Command: .venv\Scripts\python.exe -m pytest tests/test_openai_source.py
    Status: passed
    Summary: All 16 tests in tests/test_openai_source.py passed.
  • Command: .venv\Scripts\python.exe -m ruff check astrbot/core/provider/provider.py tests/test_openai_source.py
    Status: passed
    Summary: Ruff reported no lint errors in the changed files.

Not run

  • Command or area: Full repository test suite
    Reason: The fix is isolated to the OpenAI provider emergency truncation path, so I kept validation focused on the smallest relevant built-in module instead of running the entire suite.
  • Command or area: End-to-end chat flow against a real OpenAI-compatible provider
    Reason: This workspace does not have a test-specific remote provider session configured for safe reproducible E2E verification.

Residual risk

  • Risk: The regression coverage is strong for dict-based maximum context length retries in ProviderOpenAIOfficial, but it does not exercise a live multi-round agent conversation end to end.
  • Follow-up: If maintainers want extra confidence, run one manual agent session that forces context trimming and confirms the retry path no longer sends a lone tool message.

Summary by Sourcery

Ensure OpenAI provider context truncation maintains valid conversation structure when handling maximum context length errors.

Bug Fixes:

  • Remove orphaned tool messages left behind after truncating the earliest user/assistant(tool_calls) pair in dict-based message histories.
  • Preserve remaining non-tool messages when emergency context trimming occurs on maximum context length errors.

Tests:

  • Add regression tests covering removal of orphaned tool messages and preservation of valid messages after context-length-based truncation in the OpenAI provider.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to clean up message history by removing orphaned tool call chains—specifically assistant messages with tool calls that lack corresponding tool responses—during context management. It also adds unit tests to ensure that context length errors are handled correctly by preserving valid message pairs while pruning orphaned ones. A review comment suggests simplifying the conditional logic used to identify assistant messages with tool calls by leveraging Python's idiomatic truthiness for list checks.

Comment on lines +202 to +206
if (
role == "assistant"
and message.get("tool_calls") is not None
and len(message.get("tool_calls")) > 0
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition to identify an assistant message with tool calls can be simplified. In Python, an empty list is falsy, so message.get("tool_calls") is sufficient to check for both the existence of the key and that the list is not empty. This also avoids redundant calls to .get() and len().

            if role == "assistant" and message.get("tool_calls"):

@Yaohua-Leo Yaohua-Leo marked this pull request as ready for review March 31, 2026 14:35
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Mar 31, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="tests/test_openai_source.py" line_range="168-169" />
<code_context>
         await provider.terminate()


+@pytest.mark.asyncio
+async def test_handle_api_error_context_length_removes_orphaned_tool_messages():
+    provider = _make_provider()
+    try:
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for multiple tool calls and their tool responses to ensure whole tool-call chains are cleaned up correctly after truncation.

The current test covers a single assistant `tool_calls` → tool message pair. To better exercise `_fix_tool_call_pairs_in_dict_context`, please add a variant where one assistant message has multiple `tool_calls` with multiple corresponding `tool` messages (e.g., `call_1`, `call_2`). Then simulate truncation that drops part of a chain (some tools, or the assistant but not all tools) and assert that `payloads['messages']` never contains partial or orphaned tool chains—only fully intact chains or none at all.

```suggestion
@pytest.mark.asyncio
async def test_handle_api_error_context_length_removes_orphaned_multi_tool_chains():
    provider = _make_provider()
    try:
        payloads = {
            "messages": [
                {"role": "system", "content": "system"},
                {"role": "user", "content": "Run multiple tools"},
                {
                    "role": "assistant",
                    "content": "",
                    "tool_calls": [
                        {
                            "id": "call_1",
                            "type": "function",
                            "function": {"name": "tool_a", "arguments": "{}"},
                        },
                        {
                            "id": "call_2",
                            "type": "function",
                            "function": {"name": "tool_b", "arguments": "{}"},
                        },
                    ],
                },
                {
                    "role": "tool",
                    "tool_call_id": "call_1",
                    "name": "tool_a",
                    "content": "result a",
                },
                {
                    "role": "tool",
                    "tool_call_id": "call_2",
                    "name": "tool_b",
                    "content": "result b",
                },
            ]
        }

        # Case 1: truncate away the assistant message but leave tool messages
        truncated = {
            "messages": payloads["messages"][:2] + payloads["messages"][3:]
        }
        provider._fix_tool_call_pairs_in_dict_context(truncated)

        # No orphan tool messages should remain
        assert all(m["role"] != "tool" for m in truncated["messages"])

        # Case 2: truncate away one tool message from a multi-tool chain
        truncated = {
            "messages": payloads["messages"][:-1]
        }
        provider._fix_tool_call_pairs_in_dict_context(truncated)

        # The remaining context must not contain partial tool-call chains:
        # every tool_call id present on assistant messages must be present
        # on tool messages, and vice versa.
        tool_call_ids = {
            tc["id"]
            for m in truncated["messages"]
            if m["role"] == "assistant"
            for tc in m.get("tool_calls", [])
        }
        tool_msg_ids = {
            m["tool_call_id"]
            for m in truncated["messages"]
            if m["role"] == "tool"
        }
        assert tool_call_ids == tool_msg_ids
    finally:
        await provider.terminate()


@pytest.mark.asyncio
async def test_handle_api_error_context_length_removes_orphaned_tool_messages():
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +168 to +169
@pytest.mark.asyncio
async def test_handle_api_error_context_length_removes_orphaned_tool_messages():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test for multiple tool calls and their tool responses to ensure whole tool-call chains are cleaned up correctly after truncation.

The current test covers a single assistant tool_calls → tool message pair. To better exercise _fix_tool_call_pairs_in_dict_context, please add a variant where one assistant message has multiple tool_calls with multiple corresponding tool messages (e.g., call_1, call_2). Then simulate truncation that drops part of a chain (some tools, or the assistant but not all tools) and assert that payloads['messages'] never contains partial or orphaned tool chains—only fully intact chains or none at all.

Suggested change
@pytest.mark.asyncio
async def test_handle_api_error_context_length_removes_orphaned_tool_messages():
@pytest.mark.asyncio
async def test_handle_api_error_context_length_removes_orphaned_multi_tool_chains():
provider = _make_provider()
try:
payloads = {
"messages": [
{"role": "system", "content": "system"},
{"role": "user", "content": "Run multiple tools"},
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_1",
"type": "function",
"function": {"name": "tool_a", "arguments": "{}"},
},
{
"id": "call_2",
"type": "function",
"function": {"name": "tool_b", "arguments": "{}"},
},
],
},
{
"role": "tool",
"tool_call_id": "call_1",
"name": "tool_a",
"content": "result a",
},
{
"role": "tool",
"tool_call_id": "call_2",
"name": "tool_b",
"content": "result b",
},
]
}
# Case 1: truncate away the assistant message but leave tool messages
truncated = {
"messages": payloads["messages"][:2] + payloads["messages"][3:]
}
provider._fix_tool_call_pairs_in_dict_context(truncated)
# No orphan tool messages should remain
assert all(m["role"] != "tool" for m in truncated["messages"])
# Case 2: truncate away one tool message from a multi-tool chain
truncated = {
"messages": payloads["messages"][:-1]
}
provider._fix_tool_call_pairs_in_dict_context(truncated)
# The remaining context must not contain partial tool-call chains:
# every tool_call id present on assistant messages must be present
# on tool messages, and vice versa.
tool_call_ids = {
tc["id"]
for m in truncated["messages"]
if m["role"] == "assistant"
for tc in m.get("tool_calls", [])
}
tool_msg_ids = {
m["tool_call_id"]
for m in truncated["messages"]
if m["role"] == "tool"
}
assert tool_call_ids == tool_msg_ids
finally:
await provider.terminate()
@pytest.mark.asyncio
async def test_handle_api_error_context_length_removes_orphaned_tool_messages():

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant