fix(prompt): reserve system budget in message_fit_in#14164
fix(prompt): reserve system budget in message_fit_in#14164hyl64 wants to merge 3 commits intoinfiniflow:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdjusted token-truncation logic in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rag/prompts/generator.py`:
- Around line 92-94: Add a debug log just before the early-return branch that
checks total <= 0 in the function where total = ll + ll2; log the values of ll,
ll2, total and the msg to record when this degenerate zero-token path is taken
(use the module/project logger, e.g. logger.debug or
logging.getLogger(__name__).debug) and then return as currently implemented.
Ensure the log call is placed immediately before the return 0, msg and follows
the project's logging conventions.
- Around line 96-103: The new branch handling total <= 0 must emit a log entry
(use the existing logging import) with context (e.g., values of total,
max_length, ll, ll2 and which message was preserved) so new flows are
observable; also prevent negative-slice overflow by clamping slice lengths
before slicing: compute preserve_len = max(0, max_length - ll2) (for the
system/message at msg_[0]) and preserve_len_last = max(0, max_length - ll) (for
msg_[-1]) and use those non-negative lengths when doing
encoder.encode(m)[:preserve_len] (or set the content to an empty string when
preserve_len == 0) so Python never interprets a negative slice as "drop from the
end"; update the code paths around msg_[0]/msg_[-1] manipulation (variables
msg_, msg, ll, ll2, encoder, max_length) accordingly and ensure the returned
used-token count still reflects the true max_length or the actual encoded
length.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5363c31a-4bba-47fb-809b-648da81a4e86
📒 Files selected for processing (2)
rag/prompts/generator.pytest/unit_test/rag/prompts/test_generator_message_fit_in.py
|
Addressed the CodeRabbit suggestions in a narrow follow-up:
Validation:
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
rag/prompts/generator.py (1)
104-115:⚠️ Potential issue | 🔴 CriticalClamp the over-budget side too, not just the slice length.
At Line 106 and Line 112,
max(0, ...)only avoids Python’s negative-slice behavior. It still returns an oversized prompt when the preserved side already exceedsmax_length: ifll >= max_length, the fallback branch empties the last message but keeps the full system prompt; ifll2 >= max_length, thell / total > 0.8branch can do the symmetric thing. In both casesused_tokensis reported asmax_lengtheven though the actual prompt is still over budget, so downstream callers can still hit context-length errors.🛠️ Proposed fix
if ll / total > 0.8: - m = msg_[0]["content"] - preserve_len = max(0, max_length - ll2) - m = encoder.decode(encoder.encode(m)[:preserve_len]) - msg[0]["content"] = m + if ll2 >= max_length: + msg[0]["content"] = "" + msg[-1]["content"] = encoder.decode(encoder.encode(msg_[-1]["content"])[:max_length]) + return max_length, msg + preserve_len = max(0, max_length - ll2) + msg[0]["content"] = encoder.decode(encoder.encode(msg_[0]["content"])[:preserve_len]) return max_length, msg - m = msg_[-1]["content"] - preserve_len_last = max(0, max_length - ll) - m = encoder.decode(encoder.encode(m)[:preserve_len_last]) - msg[-1]["content"] = m + if ll >= max_length: + msg[0]["content"] = encoder.decode(encoder.encode(msg_[0]["content"])[:max_length]) + msg[-1]["content"] = "" + return max_length, msg + preserve_len_last = max(0, max_length - ll) + msg[-1]["content"] = encoder.decode(encoder.encode(msg_[-1]["content"])[:preserve_len_last]) return max_length, msg🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rag/prompts/generator.py` around lines 104 - 115, The code currently only clamps the preserved slice length (using max(0, ...)) which can still leave the other side of the prompt over budget; update both branches in the trimming logic in generator.py so that when you compute preserve_len (from ll2) or preserve_len_last (from ll) you also enforce that the other part is trimmed so the combined token count <= max_length (e.g., compute preserve = min(preserve, max_length) and then trim the opposite message/content to fit the remaining tokens), and return the actual used token count (recompute from ll/ll2 or encoder.encode lengths) instead of always returning max_length; make these changes where ll / total > 0.8, msg_[0]/msg[0], and the fallback using msg_[-1]/msg[-1] so both the system and last message are clamped to guarantee the final prompt is never over budget.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@rag/prompts/generator.py`:
- Around line 104-115: The code currently only clamps the preserved slice length
(using max(0, ...)) which can still leave the other side of the prompt over
budget; update both branches in the trimming logic in generator.py so that when
you compute preserve_len (from ll2) or preserve_len_last (from ll) you also
enforce that the other part is trimmed so the combined token count <= max_length
(e.g., compute preserve = min(preserve, max_length) and then trim the opposite
message/content to fit the remaining tokens), and return the actual used token
count (recompute from ll/ll2 or encoder.encode lengths) instead of always
returning max_length; make these changes where ll / total > 0.8, msg_[0]/msg[0],
and the fallback using msg_[-1]/msg[-1] so both the system and last message are
clamped to guarantee the final prompt is never over budget.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 420049a7-d51a-4e0c-8cce-d178510ceca8
📒 Files selected for processing (2)
rag/prompts/generator.pytest/unit_test/rag/prompts/test_generator_message_fit_in.py
|
Follow-up for the remaining over-budget edge case:
Validation:
|
|
@KevinHuSh @JinHai-CN @TeslaZY @Lynn-Inf could you help take a look at this bugfix PR when convenient? This one fixes Local validation:
|
|
@KevinHuSh @JinHai-CN @TeslaZY quick nudge on this bugfix PR when convenient. This is the fix for |
Summary
This PR fixes the
message_fit_in()truncation bug reported in #13607.Changes:
Validation
Result:
2 passedCloses #13607