fix(prompt): reserve system budget in message_fit_in by hyl64 · Pull Request #14164 · infiniflow/ragflow

hyl64 · 2026-04-16T12:10:39Z

Summary

This PR fixes the message_fit_in() truncation bug reported in #13607.

Changes:

fix the user-message truncation branch to reserve room for the system prompt token budget
guard the zero-token edge case to avoid dividing by zero in the truncation ratio check
add focused regression tests covering both the user-dominant truncation path and the zero-token boundary case

Validation

pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py

Result: 2 passed

Closes #13607

coderabbitai · 2026-04-16T12:10:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c968059-14c0-486c-86b9-d647ecdc6c49

📥 Commits

Reviewing files that changed from the base of the PR and between 40494c4 and 73d562d.

📒 Files selected for processing (2)

rag/prompts/generator.py
test/unit_test/rag/prompts/test_generator_message_fit_in.py

✅ Files skipped from review due to trivial changes (1)

test/unit_test/rag/prompts/test_generator_message_fit_in.py

📝 Walkthrough

Walkthrough

Adjusted token-truncation logic in message_fit_in: compute total = ll + ll2, guard early-return when total <= 0, use ll / total for role selection, and change preserved-slice calculations for system and last-message truncation. Added unit tests exercising these cases.

Changes

Cohort / File(s)	Summary
Core logic `rag/prompts/generator.py`	Compute `total = ll + ll2` and early-return if `total <= 0`; use `ll / total` for role selection; add `trim_content` helper; adjust truncation branches to allocate tokens between system and last message using `max(0, max_length - ...)` logic.
Tests `test/unit_test/rag/prompts/test_generator_message_fit_in.py`	New pytest module that stubs encoders and token counters for deterministic tests covering normal truncation, zero-token edge case, single-message handling, and scenarios where system or user messages dominate the budget.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I count the hops of tokens in line,
ll and ll2 now sit in a line.
Totals checked, slices snug and neat,
Messages fit — a cozy treat. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: fixing a bug in message_fit_in to reserve the system message budget when truncating.
Description check	✅ Passed	The description covers the problem, changes made, and validation steps but is missing explicit checkbox selections from the template's 'Type of change' section.
Linked Issues check	✅ Passed	The PR addresses all key objectives from `#13607`: fixing truncation to reserve system budget, guarding division-by-zero, preventing negative slicing, and adding regression tests.
Out of Scope Changes check	✅ Passed	All changes are within scope: the core fix in message_fit_in and corresponding regression tests directly address the reported bug in `#13607`.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rag/prompts/generator.py`:
- Around line 92-94: Add a debug log just before the early-return branch that
checks total <= 0 in the function where total = ll + ll2; log the values of ll,
ll2, total and the msg to record when this degenerate zero-token path is taken
(use the module/project logger, e.g. logger.debug or
logging.getLogger(__name__).debug) and then return as currently implemented.
Ensure the log call is placed immediately before the return 0, msg and follows
the project's logging conventions.
- Around line 96-103: The new branch handling total <= 0 must emit a log entry
(use the existing logging import) with context (e.g., values of total,
max_length, ll, ll2 and which message was preserved) so new flows are
observable; also prevent negative-slice overflow by clamping slice lengths
before slicing: compute preserve_len = max(0, max_length - ll2) (for the
system/message at msg_[0]) and preserve_len_last = max(0, max_length - ll) (for
msg_[-1]) and use those non-negative lengths when doing
encoder.encode(m)[:preserve_len] (or set the content to an empty string when
preserve_len == 0) so Python never interprets a negative slice as "drop from the
end"; update the code paths around msg_[0]/msg_[-1] manipulation (variables
msg_, msg, ll, ll2, encoder, max_length) accordingly and ensure the returned
used-token count still reflects the true max_length or the actual encoded
length.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5363c31a-4bba-47fb-809b-648da81a4e86

📥 Commits

Reviewing files that changed from the base of the PR and between 8a874c7 and ecc526b.

📒 Files selected for processing (2)

rag/prompts/generator.py
test/unit_test/rag/prompts/test_generator_message_fit_in.py

hyl64 · 2026-04-16T12:53:33Z

Addressed the CodeRabbit suggestions in a narrow follow-up:

added a debug log for the degenerate total <= 0 path in message_fit_in
clamped preserve lengths to avoid Python negative-slice behavior when max_length is smaller than the preserved counterpart
added a regression test for the negative-slice case

Validation:

pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py
3 passed

coderabbitai

♻️ Duplicate comments (1)

rag/prompts/generator.py (1)

104-115: ⚠️ Potential issue | 🔴 Critical

Clamp the over-budget side too, not just the slice length.

At Line 106 and Line 112, max(0, ...) only avoids Python’s negative-slice behavior. It still returns an oversized prompt when the preserved side already exceeds max_length: if ll >= max_length, the fallback branch empties the last message but keeps the full system prompt; if ll2 >= max_length, the ll / total > 0.8 branch can do the symmetric thing. In both cases used_tokens is reported as max_length even though the actual prompt is still over budget, so downstream callers can still hit context-length errors.

🛠️ Proposed fix

     if ll / total > 0.8:
-        m = msg_[0]["content"]
-        preserve_len = max(0, max_length - ll2)
-        m = encoder.decode(encoder.encode(m)[:preserve_len])
-        msg[0]["content"] = m
+        if ll2 >= max_length:
+            msg[0]["content"] = ""
+            msg[-1]["content"] = encoder.decode(encoder.encode(msg_[-1]["content"])[:max_length])
+            return max_length, msg
+        preserve_len = max(0, max_length - ll2)
+        msg[0]["content"] = encoder.decode(encoder.encode(msg_[0]["content"])[:preserve_len])
         return max_length, msg
 
-    m = msg_[-1]["content"]
-    preserve_len_last = max(0, max_length - ll)
-    m = encoder.decode(encoder.encode(m)[:preserve_len_last])
-    msg[-1]["content"] = m
+    if ll >= max_length:
+        msg[0]["content"] = encoder.decode(encoder.encode(msg_[0]["content"])[:max_length])
+        msg[-1]["content"] = ""
+        return max_length, msg
+    preserve_len_last = max(0, max_length - ll)
+    msg[-1]["content"] = encoder.decode(encoder.encode(msg_[-1]["content"])[:preserve_len_last])
     return max_length, msg

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@rag/prompts/generator.py` around lines 104 - 115, The code currently only
clamps the preserved slice length (using max(0, ...)) which can still leave the
other side of the prompt over budget; update both branches in the trimming logic
in generator.py so that when you compute preserve_len (from ll2) or
preserve_len_last (from ll) you also enforce that the other part is trimmed so
the combined token count <= max_length (e.g., compute preserve = min(preserve,
max_length) and then trim the opposite message/content to fit the remaining
tokens), and return the actual used token count (recompute from ll/ll2 or
encoder.encode lengths) instead of always returning max_length; make these
changes where ll / total > 0.8, msg_[0]/msg[0], and the fallback using
msg_[-1]/msg[-1] so both the system and last message are clamped to guarantee
the final prompt is never over budget.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@rag/prompts/generator.py`:
- Around line 104-115: The code currently only clamps the preserved slice length
(using max(0, ...)) which can still leave the other side of the prompt over
budget; update both branches in the trimming logic in generator.py so that when
you compute preserve_len (from ll2) or preserve_len_last (from ll) you also
enforce that the other part is trimmed so the combined token count <= max_length
(e.g., compute preserve = min(preserve, max_length) and then trim the opposite
message/content to fit the remaining tokens), and return the actual used token
count (recompute from ll/ll2 or encoder.encode lengths) instead of always
returning max_length; make these changes where ll / total > 0.8, msg_[0]/msg[0],
and the fallback using msg_[-1]/msg[-1] so both the system and last message are
clamped to guarantee the final prompt is never over budget.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 420049a7-d51a-4e0c-8cce-d178510ceca8

📥 Commits

Reviewing files that changed from the base of the PR and between ecc526b and 40494c4.

📒 Files selected for processing (2)

rag/prompts/generator.py
test/unit_test/rag/prompts/test_generator_message_fit_in.py

hyl64 · 2026-04-16T14:10:02Z

Follow-up for the remaining over-budget edge case:

capped the preserved side itself when either the system message or the trailing message alone already exceeds max_length
changed message_fit_in to return the actual used token count after trimming instead of always returning max_length
added regression coverage for the dominant-last-message case

Validation:

pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py
4 passed

hyl64 · 2026-04-16T14:47:53Z

@KevinHuSh @JinHai-CN @TeslaZY @Lynn-Inf could you help take a look at this bugfix PR when convenient?

This one fixes #13607 in rag/prompts/generator.py and I've already addressed the follow-up CodeRabbit points around over-budget trimming and actual used-token counting.

Local validation:

pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py
4 passed

hyl64 · 2026-04-17T07:25:50Z

@KevinHuSh @JinHai-CN @TeslaZY quick nudge on this bugfix PR when convenient.

This is the fix for #13607. I addressed the follow-up CodeRabbit concerns and revalidated the focused regression tests locally.

fix(prompt): reserve system budget in message_fit_in

ecc526b

dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Apr 16, 2026

hyl64 mentioned this pull request Apr 16, 2026

[Bug]: message_fit_in truncates user message with wrong variable, causing context length overflow #13607

Open

4 tasks

dosubot Bot added 🐞 bug Something isn't working, pull request that fix bug. 🧪 test Pull requests that update test cases. labels Apr 16, 2026

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread rag/prompts/generator.py

Comment thread rag/prompts/generator.py Outdated

fix(prompt): clamp message_fit_in slice lengths

40494c4

dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 16, 2026

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

fix(prompt): cap message_fit_in preserved side

73d562d

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prompt): reserve system budget in message_fit_in#14164

fix(prompt): reserve system budget in message_fit_in#14164
hyl64 wants to merge 3 commits intoinfiniflow:mainfrom
hyl64:fix/issue-13607-message-fit-in

hyl64 commented Apr 16, 2026

Uh oh!

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

hyl64 commented Apr 16, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

hyl64 commented Apr 16, 2026

Uh oh!

hyl64 commented Apr 16, 2026

Uh oh!

hyl64 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hyl64 commented Apr 16, 2026

Summary

Validation

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hyl64 commented Apr 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hyl64 commented Apr 16, 2026

Uh oh!

hyl64 commented Apr 16, 2026

Uh oh!

hyl64 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading