Skip to content

[Bug]: message_fit_in truncates user message with wrong variable, causing context length overflow #13607

@mango766

Description

@mango766

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

Latest main branch

RAGFlow image version

nightly

Other environment information

N/A - code-level bug found by reading the source

Actual behavior

In rag/prompts/generator.py, the message_fit_in() function has a bug on line 98 in its else branch (when the user message is the dominant portion and needs truncation).

The code currently reads:

ll = num_tokens_from_string(msg_[0]["content"])   # system prompt tokens
ll2 = num_tokens_from_string(msg_[-1]["content"])  # user message tokens
if ll / (ll + ll2) > 0.8:
    # system prompt is large: truncate system, leaving room for user
    m = encoder.decode(encoder.encode(m)[: max_length - ll2])  # correct
    ...
# else: user message is large
m = msg_[-1]["content"]
m = encoder.decode(encoder.encode(m)[: max_length - ll2])  # BUG: should be max_length - ll

The else branch truncates the user message to max_length - ll2 tokens, but ll2 is the token count of the user message itself. It should subtract ll (the system prompt token count) to leave room for the system prompt.

Consequences:

  • If ll2 > max_length, the slice index becomes negative, causing the encoder to take the last N tokens instead of the first N (Python negative slice behavior)
  • The system prompt's token count is never accounted for, so the total can still exceed max_length, leading to "maximum context length" errors from the LLM API

Additionally, if both messages are empty (ll + ll2 == 0), line 91 causes a ZeroDivisionError.

Expected behavior

The user message should be truncated to max_length - ll tokens (leaving room for the system prompt), symmetric with how the if-branch truncates the system prompt to max_length - ll2 (leaving room for the user message).

Steps to reproduce

1. Set up a chat assistant with a model that has a 16K context window
2. Send a query that retrieves many chunks, producing a large user message
3. The message_fit_in function is called but incorrectly truncates
4. The LLM API returns "maximum context length exceeded" error

Additional information

This bug is likely a root cause (or contributing factor) for issues #13549 and #13082 where users report context length overflow despite having small chunk sizes and low topN settings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working, pull request that fix bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions