Self Checks
RAGFlow workspace code commit ID
Latest main branch
RAGFlow image version
nightly
Other environment information
N/A - code-level bug found by reading the source
Actual behavior
In rag/prompts/generator.py, the message_fit_in() function has a bug on line 98 in its else branch (when the user message is the dominant portion and needs truncation).
The code currently reads:
ll = num_tokens_from_string(msg_[0]["content"]) # system prompt tokens
ll2 = num_tokens_from_string(msg_[-1]["content"]) # user message tokens
if ll / (ll + ll2) > 0.8:
# system prompt is large: truncate system, leaving room for user
m = encoder.decode(encoder.encode(m)[: max_length - ll2]) # correct
...
# else: user message is large
m = msg_[-1]["content"]
m = encoder.decode(encoder.encode(m)[: max_length - ll2]) # BUG: should be max_length - ll
The else branch truncates the user message to max_length - ll2 tokens, but ll2 is the token count of the user message itself. It should subtract ll (the system prompt token count) to leave room for the system prompt.
Consequences:
- If
ll2 > max_length, the slice index becomes negative, causing the encoder to take the last N tokens instead of the first N (Python negative slice behavior)
- The system prompt's token count is never accounted for, so the total can still exceed
max_length, leading to "maximum context length" errors from the LLM API
Additionally, if both messages are empty (ll + ll2 == 0), line 91 causes a ZeroDivisionError.
Expected behavior
The user message should be truncated to max_length - ll tokens (leaving room for the system prompt), symmetric with how the if-branch truncates the system prompt to max_length - ll2 (leaving room for the user message).
Steps to reproduce
1. Set up a chat assistant with a model that has a 16K context window
2. Send a query that retrieves many chunks, producing a large user message
3. The message_fit_in function is called but incorrectly truncates
4. The LLM API returns "maximum context length exceeded" error
Additional information
This bug is likely a root cause (or contributing factor) for issues #13549 and #13082 where users report context length overflow despite having small chunk sizes and low topN settings.
Self Checks
RAGFlow workspace code commit ID
Latest main branch
RAGFlow image version
nightly
Other environment information
Actual behavior
In
rag/prompts/generator.py, themessage_fit_in()function has a bug on line 98 in its else branch (when the user message is the dominant portion and needs truncation).The code currently reads:
The else branch truncates the user message to
max_length - ll2tokens, butll2is the token count of the user message itself. It should subtractll(the system prompt token count) to leave room for the system prompt.Consequences:
ll2 > max_length, the slice index becomes negative, causing the encoder to take the last N tokens instead of the first N (Python negative slice behavior)max_length, leading to "maximum context length" errors from the LLM APIAdditionally, if both messages are empty (
ll + ll2 == 0), line 91 causes aZeroDivisionError.Expected behavior
The user message should be truncated to
max_length - lltokens (leaving room for the system prompt), symmetric with how the if-branch truncates the system prompt tomax_length - ll2(leaving room for the user message).Steps to reproduce
Additional information
This bug is likely a root cause (or contributing factor) for issues #13549 and #13082 where users report context length overflow despite having small chunk sizes and low topN settings.