[Bug]: message_fit_in truncates user message with wrong variable, causing context length overflow

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### RAGFlow workspace code commit ID

Latest main branch

### RAGFlow image version

nightly

### Other environment information

```
N/A - code-level bug found by reading the source
```

### Actual behavior

In `rag/prompts/generator.py`, the `message_fit_in()` function has a bug on line 98 in its else branch (when the user message is the dominant portion and needs truncation).

The code currently reads:

```python
ll = num_tokens_from_string(msg_[0]["content"])   # system prompt tokens
ll2 = num_tokens_from_string(msg_[-1]["content"])  # user message tokens
if ll / (ll + ll2) > 0.8:
    # system prompt is large: truncate system, leaving room for user
    m = encoder.decode(encoder.encode(m)[: max_length - ll2])  # correct
    ...
# else: user message is large
m = msg_[-1]["content"]
m = encoder.decode(encoder.encode(m)[: max_length - ll2])  # BUG: should be max_length - ll
```

The else branch truncates the user message to `max_length - ll2` tokens, but `ll2` is the token count of the user message itself. It should subtract `ll` (the system prompt token count) to leave room for the system prompt.

**Consequences:**
- If `ll2 > max_length`, the slice index becomes negative, causing the encoder to take the *last* N tokens instead of the first N (Python negative slice behavior)
- The system prompt's token count is never accounted for, so the total can still exceed `max_length`, leading to "maximum context length" errors from the LLM API

Additionally, if both messages are empty (`ll + ll2 == 0`), line 91 causes a `ZeroDivisionError`.

### Expected behavior

The user message should be truncated to `max_length - ll` tokens (leaving room for the system prompt), symmetric with how the if-branch truncates the system prompt to `max_length - ll2` (leaving room for the user message).

### Steps to reproduce

```
1. Set up a chat assistant with a model that has a 16K context window
2. Send a query that retrieves many chunks, producing a large user message
3. The message_fit_in function is called but incorrectly truncates
4. The LLM API returns "maximum context length exceeded" error
```

### Additional information

This bug is likely a root cause (or contributing factor) for issues #13549 and #13082 where users report context length overflow despite having small chunk sizes and low topN settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: message_fit_in truncates user message with wrong variable, causing context length overflow #13607

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: message_fit_in truncates user message with wrong variable, causing context length overflow #13607

Description

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions