Skip to content

fix(prompt): reserve system budget in message_fit_in#14164

Open
hyl64 wants to merge 3 commits intoinfiniflow:mainfrom
hyl64:fix/issue-13607-message-fit-in
Open

fix(prompt): reserve system budget in message_fit_in#14164
hyl64 wants to merge 3 commits intoinfiniflow:mainfrom
hyl64:fix/issue-13607-message-fit-in

Conversation

@hyl64
Copy link
Copy Markdown
Contributor

@hyl64 hyl64 commented Apr 16, 2026

Summary

This PR fixes the message_fit_in() truncation bug reported in #13607.

Changes:

  • fix the user-message truncation branch to reserve room for the system prompt token budget
  • guard the zero-token edge case to avoid dividing by zero in the truncation ratio check
  • add focused regression tests covering both the user-dominant truncation path and the zero-token boundary case

Validation

pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py

Result: 2 passed

Closes #13607

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c968059-14c0-486c-86b9-d647ecdc6c49

📥 Commits

Reviewing files that changed from the base of the PR and between 40494c4 and 73d562d.

📒 Files selected for processing (2)
  • rag/prompts/generator.py
  • test/unit_test/rag/prompts/test_generator_message_fit_in.py
✅ Files skipped from review due to trivial changes (1)
  • test/unit_test/rag/prompts/test_generator_message_fit_in.py

📝 Walkthrough

Walkthrough

Adjusted token-truncation logic in message_fit_in: compute total = ll + ll2, guard early-return when total <= 0, use ll / total for role selection, and change preserved-slice calculations for system and last-message truncation. Added unit tests exercising these cases.

Changes

Cohort / File(s) Summary
Core logic
rag/prompts/generator.py
Compute total = ll + ll2 and early-return if total <= 0; use ll / total for role selection; add trim_content helper; adjust truncation branches to allocate tokens between system and last message using max(0, max_length - ...) logic.
Tests
test/unit_test/rag/prompts/test_generator_message_fit_in.py
New pytest module that stubs encoders and token counters for deterministic tests covering normal truncation, zero-token edge case, single-message handling, and scenarios where system or user messages dominate the budget.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I count the hops of tokens in line,
ll and ll2 now sit in a line.
Totals checked, slices snug and neat,
Messages fit — a cozy treat. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing a bug in message_fit_in to reserve the system message budget when truncating.
Description check ✅ Passed The description covers the problem, changes made, and validation steps but is missing explicit checkbox selections from the template's 'Type of change' section.
Linked Issues check ✅ Passed The PR addresses all key objectives from #13607: fixing truncation to reserve system budget, guarding division-by-zero, preventing negative slicing, and adding regression tests.
Out of Scope Changes check ✅ Passed All changes are within scope: the core fix in message_fit_in and corresponding regression tests directly address the reported bug in #13607.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added 🐞 bug Something isn't working, pull request that fix bug. 🧪 test Pull requests that update test cases. labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rag/prompts/generator.py`:
- Around line 92-94: Add a debug log just before the early-return branch that
checks total <= 0 in the function where total = ll + ll2; log the values of ll,
ll2, total and the msg to record when this degenerate zero-token path is taken
(use the module/project logger, e.g. logger.debug or
logging.getLogger(__name__).debug) and then return as currently implemented.
Ensure the log call is placed immediately before the return 0, msg and follows
the project's logging conventions.
- Around line 96-103: The new branch handling total <= 0 must emit a log entry
(use the existing logging import) with context (e.g., values of total,
max_length, ll, ll2 and which message was preserved) so new flows are
observable; also prevent negative-slice overflow by clamping slice lengths
before slicing: compute preserve_len = max(0, max_length - ll2) (for the
system/message at msg_[0]) and preserve_len_last = max(0, max_length - ll) (for
msg_[-1]) and use those non-negative lengths when doing
encoder.encode(m)[:preserve_len] (or set the content to an empty string when
preserve_len == 0) so Python never interprets a negative slice as "drop from the
end"; update the code paths around msg_[0]/msg_[-1] manipulation (variables
msg_, msg, ll, ll2, encoder, max_length) accordingly and ensure the returned
used-token count still reflects the true max_length or the actual encoded
length.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5363c31a-4bba-47fb-809b-648da81a4e86

📥 Commits

Reviewing files that changed from the base of the PR and between 8a874c7 and ecc526b.

📒 Files selected for processing (2)
  • rag/prompts/generator.py
  • test/unit_test/rag/prompts/test_generator_message_fit_in.py

Comment thread rag/prompts/generator.py
Comment thread rag/prompts/generator.py Outdated
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 16, 2026
@hyl64
Copy link
Copy Markdown
Contributor Author

hyl64 commented Apr 16, 2026

Addressed the CodeRabbit suggestions in a narrow follow-up:

  • added a debug log for the degenerate total <= 0 path in message_fit_in
  • clamped preserve lengths to avoid Python negative-slice behavior when max_length is smaller than the preserved counterpart
  • added a regression test for the negative-slice case

Validation:

  • pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py
  • 3 passed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
rag/prompts/generator.py (1)

104-115: ⚠️ Potential issue | 🔴 Critical

Clamp the over-budget side too, not just the slice length.

At Line 106 and Line 112, max(0, ...) only avoids Python’s negative-slice behavior. It still returns an oversized prompt when the preserved side already exceeds max_length: if ll >= max_length, the fallback branch empties the last message but keeps the full system prompt; if ll2 >= max_length, the ll / total > 0.8 branch can do the symmetric thing. In both cases used_tokens is reported as max_length even though the actual prompt is still over budget, so downstream callers can still hit context-length errors.

🛠️ Proposed fix
     if ll / total > 0.8:
-        m = msg_[0]["content"]
-        preserve_len = max(0, max_length - ll2)
-        m = encoder.decode(encoder.encode(m)[:preserve_len])
-        msg[0]["content"] = m
+        if ll2 >= max_length:
+            msg[0]["content"] = ""
+            msg[-1]["content"] = encoder.decode(encoder.encode(msg_[-1]["content"])[:max_length])
+            return max_length, msg
+        preserve_len = max(0, max_length - ll2)
+        msg[0]["content"] = encoder.decode(encoder.encode(msg_[0]["content"])[:preserve_len])
         return max_length, msg
 
-    m = msg_[-1]["content"]
-    preserve_len_last = max(0, max_length - ll)
-    m = encoder.decode(encoder.encode(m)[:preserve_len_last])
-    msg[-1]["content"] = m
+    if ll >= max_length:
+        msg[0]["content"] = encoder.decode(encoder.encode(msg_[0]["content"])[:max_length])
+        msg[-1]["content"] = ""
+        return max_length, msg
+    preserve_len_last = max(0, max_length - ll)
+    msg[-1]["content"] = encoder.decode(encoder.encode(msg_[-1]["content"])[:preserve_len_last])
     return max_length, msg
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rag/prompts/generator.py` around lines 104 - 115, The code currently only
clamps the preserved slice length (using max(0, ...)) which can still leave the
other side of the prompt over budget; update both branches in the trimming logic
in generator.py so that when you compute preserve_len (from ll2) or
preserve_len_last (from ll) you also enforce that the other part is trimmed so
the combined token count <= max_length (e.g., compute preserve = min(preserve,
max_length) and then trim the opposite message/content to fit the remaining
tokens), and return the actual used token count (recompute from ll/ll2 or
encoder.encode lengths) instead of always returning max_length; make these
changes where ll / total > 0.8, msg_[0]/msg[0], and the fallback using
msg_[-1]/msg[-1] so both the system and last message are clamped to guarantee
the final prompt is never over budget.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@rag/prompts/generator.py`:
- Around line 104-115: The code currently only clamps the preserved slice length
(using max(0, ...)) which can still leave the other side of the prompt over
budget; update both branches in the trimming logic in generator.py so that when
you compute preserve_len (from ll2) or preserve_len_last (from ll) you also
enforce that the other part is trimmed so the combined token count <= max_length
(e.g., compute preserve = min(preserve, max_length) and then trim the opposite
message/content to fit the remaining tokens), and return the actual used token
count (recompute from ll/ll2 or encoder.encode lengths) instead of always
returning max_length; make these changes where ll / total > 0.8, msg_[0]/msg[0],
and the fallback using msg_[-1]/msg[-1] so both the system and last message are
clamped to guarantee the final prompt is never over budget.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 420049a7-d51a-4e0c-8cce-d178510ceca8

📥 Commits

Reviewing files that changed from the base of the PR and between ecc526b and 40494c4.

📒 Files selected for processing (2)
  • rag/prompts/generator.py
  • test/unit_test/rag/prompts/test_generator_message_fit_in.py

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Apr 16, 2026
@hyl64
Copy link
Copy Markdown
Contributor Author

hyl64 commented Apr 16, 2026

Follow-up for the remaining over-budget edge case:

  • capped the preserved side itself when either the system message or the trailing message alone already exceeds max_length
  • changed message_fit_in to return the actual used token count after trimming instead of always returning max_length
  • added regression coverage for the dominant-last-message case

Validation:

  • pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py
  • 4 passed

@hyl64
Copy link
Copy Markdown
Contributor Author

hyl64 commented Apr 16, 2026

@KevinHuSh @JinHai-CN @TeslaZY @Lynn-Inf could you help take a look at this bugfix PR when convenient?

This one fixes #13607 in rag/prompts/generator.py and I've already addressed the follow-up CodeRabbit points around over-budget trimming and actual used-token counting.

Local validation:

  • pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py
  • 4 passed

@hyl64
Copy link
Copy Markdown
Contributor Author

hyl64 commented Apr 17, 2026

@KevinHuSh @JinHai-CN @TeslaZY quick nudge on this bugfix PR when convenient.

This is the fix for #13607. I addressed the follow-up CodeRabbit concerns and revalidated the focused regression tests locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. size:M This PR changes 30-99 lines, ignoring generated files. 🧪 test Pull requests that update test cases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: message_fit_in truncates user message with wrong variable, causing context length overflow

1 participant