Skip to content

Fix: private dataset authorization bypass in shared dataset access checks#14645

Merged
KevinHuSh merged 37 commits into
infiniflow:mainfrom
jony376:fix/private-datasets-remain-accessible-to-tenant-members
May 9, 2026
Merged

Fix: private dataset authorization bypass in shared dataset access checks#14645
KevinHuSh merged 37 commits into
infiniflow:mainfrom
jony376:fix/private-datasets-remain-accessible-to-tenant-members

Conversation

@jony376
Copy link
Copy Markdown
Contributor

@jony376 jony376 commented May 7, 2026

Related issues

Closes #14644

What problem does this PR solve?

This PR fixes an authorization bug where datasets marked with permission = me could still be accessed by other members of the same tenant through APIs that relied on KnowledgebaseService.accessible() or DocumentService.accessible().

Before this change, those shared access helpers only checked tenant membership and did not enforce the dataset's permission mode. As a result, a non-owner who knew a private dataset_id could still reach downstream document and chunk operations even though the dataset was intended to be owner-only.

This change updates the central access checks so that:

  • dataset owners always retain access
  • joined tenant members only get access when the dataset permission is TEAM
  • private datasets with permission = me remain inaccessible to non-owners
  • document-level access follows the same dataset permission rules

The PR also adds regression coverage for private-vs-team dataset access behavior.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

Testing

  • Added test/unit_test/api/db/services/test_dataset_access_permissions.py
  • Attempted to run: python -m pytest test\\unit_test\\api\\db\\services\\test_dataset_access_permissions.py -q
  • Local execution in this workspace is currently blocked during test collection because the environment is missing the strenum dependency

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. 🧪 test Pull requests that update test cases. labels May 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Document authorization now delegates to knowledgebase-level checks: knowledgebase existence, VALID status, owner equality, permission flag (TEAM vs ME), and tenant membership via TenantService. DocumentService.accessible loads the document and defers to KnowledgebaseService.accessible, returning False if the document is missing.

Changes

Dataset Access Permission Enforcement

Layer / File(s) Summary
Knowledgebase Permission & Status Validation
api/db/services/knowledgebase_service.py
KnowledgebaseService.accessible now loads the KB, requires StatusEnum.VALID, grants access when kb.tenant_id == user_id, denies when kb.permission is not TenantPermission.TEAM.value, and otherwise checks tenant membership via TenantService.get_joined_tenants_by_user_id.
Knowledge Base Query Methods
api/db/services/knowledgebase_service.py
get_kb_by_id and get_kb_by_name apply the accessible check and return [] on denial or [kb.to_dict()] for the first authorized KB; previous UserTenant join queries removed.
Document Access Delegation
api/db/services/document_service.py
DocumentService.accessible loads the document and delegates authorization to KnowledgebaseService.accessible(doc.kb_id, user_id), returning False if the document cannot be loaded.
Access Control Test Coverage
test/unit_test/api/db/services/test_dataset_access_permissions.py
New tests and helpers: verify private datasets are inaccessible to non-owner tenant members, team datasets allow joined-tenant members, and document access follows KB accessibility. Tests use monkeypatch to stub get_by_id and TenantService responses.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant DocumentService
  participant KnowledgebaseService
  participant TenantService
  participant DB
  Client->>DocumentService: request with doc_id, user_id
  DocumentService->>DB: get document by doc_id
  DB-->>DocumentService: doc (with kb_id) or None
  alt doc exists
    DocumentService->>KnowledgebaseService: accessible(kb_id, user_id)
    KnowledgebaseService->>DB: get_by_id(kb_id)
    DB-->>KnowledgebaseService: kb or None
    alt kb exists and VALID
      KnowledgebaseService->>KnowledgebaseService: if tenant_id == user_id -> True
      alt not owner and permission == TEAM
        KnowledgebaseService->>TenantService: get_joined_tenants_by_user_id(user_id)
        TenantService-->>KnowledgebaseService: joined_tenants
        KnowledgebaseService-->>DocumentService: True/False
      else not TEAM
        KnowledgebaseService-->>DocumentService: False
      end
    else
      KnowledgebaseService-->>DocumentService: False
    end
    DocumentService-->>Client: True/False
  else
    DocumentService-->>Client: False
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

  • #14644 — Centralizes KB-level checks to prevent tenant-member bypass for private (me) datasets; this PR's changes address that authorization bypass.

Possibly related PRs

Suggested reviewers

  • wangq8
  • yuzhichang
  • JinHai-CN

Poem

🐰 I nibble through the access snare,
Owner's key held safe with care,
Team doors open where they're due,
Private holes are closed and true.
Hop on—permissions now renew.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
api/db/services/knowledgebase_service.py (3)

513-516: ⚡ Quick win

get_kb_by_id fetches the knowledgebase twice.

Line 513 calls cls.get_by_id(kb_id), and Line 514 calls cls.accessible(kb_id, user_id) which unconditionally calls cls.get_by_id(kb_id) again inside accessible. Every get_kb_by_id invocation therefore pays two DB round-trips for the same row. The already-fetched kb object is available and can be used to short-circuit the second fetch.

♻️ Avoid the second fetch
 `@classmethod`
 `@DB.connection_context`()
 def get_kb_by_id(cls, kb_id, user_id):
-    e, kb = cls.get_by_id(kb_id)
-    if not e or not cls.accessible(kb_id, user_id):
-        return []
-    return [kb.to_dict()]
+    if not cls.accessible(kb_id, user_id):
+        return []
+    e, kb = cls.get_by_id(kb_id)
+    if not e:
+        return []
+    return [kb.to_dict()]

Alternatively, expose a private _accessible(cls, kb, user_id) overload that accepts an already-loaded model instance and avoid the redundant lookup altogether.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/db/services/knowledgebase_service.py` around lines 513 - 516,
get_kb_by_id currently calls cls.get_by_id(kb_id) then calls
cls.accessible(kb_id, user_id) which calls cls.get_by_id again; change
get_kb_by_id to reuse the already-fetched kb by checking accessibility against
the loaded instance: either add a private overload _accessible(cls, kb, user_id)
that accepts the model instance and use that from get_kb_by_id, or modify
accessible to accept an optional kb parameter and pass the kb returned from
cls.get_by_id(kb_id) to avoid the second DB fetch; ensure you still return []
when not found or not accessible and return [kb.to_dict()] when allowed.

481-502: 💤 Low value

Add logging for the new accessible flow.

The new access-control path (denied for private dataset, denied for non-member, granted for team member) has no log output, making it difficult to diagnose authorization failures in production. As per coding guidelines: "Add logging for new flows".

📋 Suggested logging additions
 `@classmethod`
 `@DB.connection_context`()
 def accessible(cls, kb_id, user_id):
+    import logging
     e, kb = cls.get_by_id(kb_id)
     if not e:
+        logging.debug("accessible: kb_id=%s not found", kb_id)
         return False

     if kb.status != StatusEnum.VALID.value:
+        logging.debug("accessible: kb_id=%s status=%s invalid", kb_id, kb.status)
         return False

     if kb.tenant_id == user_id:
         return True

     if kb.permission != TenantPermission.TEAM.value:
+        logging.debug("accessible: kb_id=%s is private, denying user_id=%s", kb_id, user_id)
         return False

     joined_tenants = TenantService.get_joined_tenants_by_user_id(user_id)
-    return any(tenant["tenant_id"] == kb.tenant_id for tenant in joined_tenants)
+    result = any(tenant["tenant_id"] == kb.tenant_id for tenant in joined_tenants)
+    if not result:
+        logging.debug("accessible: kb_id=%s tenant_id=%s not in joined tenants for user_id=%s", kb_id, kb.tenant_id, user_id)
+    return result
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/db/services/knowledgebase_service.py` around lines 481 - 502, The
accessible(cls, kb_id, user_id) method lacks logs for its access-control
decisions; add structured debug/info logs at each decision point (e.g., when
get_by_id returns no kb, when kb.status != StatusEnum.VALID, when tenant_id
matches user_id and access is granted, when permission != TenantPermission.TEAM
and access denied, and when checking TenantService.get_joined_tenants_by_user_id
results) including kb_id, user_id, kb.tenant_id and the reason for grant/deny;
use the module logger (the logger used elsewhere in this file) and keep messages
concise and consistent with existing logging style.

527-531: 💤 Low value

get_kb_by_name re-fetches each already-loaded kb inside accessible.

For every knowledgebase returned by cls.query(...) at Line 527, cls.accessible(kb.id, user_id) at Line 529 issues another get_by_id DB call for the same row. While in practice the loop usually terminates after the first match, this is the same redundant-fetch pattern as in get_kb_by_id.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/db/services/knowledgebase_service.py` around lines 527 - 531,
get_kb_by_name is re-fetching each knowledgebase because it calls
cls.accessible(kb.id, user_id) which does a DB get_by_id; change the check to
avoid a redundant fetch by either (A) updating accessible to accept a KB object
(e.g., accessible(kb_or_id, user_id)) and branch on type, or (B) add a new
helper like accessible_obj(kb, user_id) that contains the same permission logic
but operates on the already-loaded kb; then update get_kb_by_name to call the
object-aware check (replace cls.accessible(kb.id, user_id) with the new/updated
call) so you use the in-memory kb and return kb.to_dict() without an extra DB
round-trip.
test/unit_test/api/db/services/test_dataset_access_permissions.py (1)

32-72: ⚡ Quick win

Missing test for the dataset owner always-accessible path.

KnowledgebaseService.accessible grants access when kb.tenant_id == user_id (the owner path, Line 495-496 in knowledgebase_service.py). There is no test covering this branch. If that condition is ever accidentally removed or inverted, the regression would go undetected.

✅ Suggested additional test
def test_owner_can_always_access_private_dataset(monkeypatch):
    kb = SimpleNamespace(
        id="kb-private",
        tenant_id="owner-1",
        permission=TenantPermission.ME.value,
        status=StatusEnum.VALID.value,
    )

    monkeypatch.setattr(KnowledgebaseService, "get_by_id", classmethod(lambda cls, kb_id: (True, kb)))

    # Owner should always have access regardless of permission mode
    assert _unwrapped_kb_accessible()(KnowledgebaseService, "kb-private", "owner-1") is True
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/unit_test/api/db/services/test_dataset_access_permissions.py` around
lines 32 - 72, Add a unit test that covers the owner-access branch: create a
SimpleNamespace kb with id "kb-private", tenant_id "owner-1", permission
TenantPermission.ME.value and status StatusEnum.VALID.value, monkeypatch
KnowledgebaseService.get_by_id to return (True, kb), then assert that
_unwrapped_kb_accessible()(KnowledgebaseService, "kb-private", "owner-1") is
True (name the test e.g. test_owner_can_always_access_private_dataset) to ensure
KnowledgebaseService.accessible treats the tenant owner as always allowed.
api/db/services/document_service.py (1)

680-684: 💤 Low value

Add logging for the new DocumentService.accessible flow.

The updated delegation path (document not found, or KB access denied) produces no log output. As per coding guidelines: "Add logging for new flows".

📋 Suggested logging additions
 `@classmethod`
 `@DB.connection_context`()
 def accessible(cls, doc_id, user_id):
     e, doc = cls.get_by_id(doc_id)
     if not e:
+        logging.debug("DocumentService.accessible: doc_id=%s not found", doc_id)
         return False
     return KnowledgebaseService.accessible(doc.kb_id, user_id)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/db/services/document_service.py` around lines 680 - 684, Add logging to
DocumentService.accessible: after calling cls.get_by_id(doc_id) log an
info/debug message when the document is not found (include doc_id and user_id)
and return False; if the document is found, log that access is being delegated
to KnowledgebaseService.accessible (include doc_id, doc.kb_id and user_id) and
optionally log the result returned by KnowledgebaseService.accessible before
returning it. Use the module's logger (or processLogger) consistent with other
services.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@api/db/services/document_service.py`:
- Around line 680-684: Add logging to DocumentService.accessible: after calling
cls.get_by_id(doc_id) log an info/debug message when the document is not found
(include doc_id and user_id) and return False; if the document is found, log
that access is being delegated to KnowledgebaseService.accessible (include
doc_id, doc.kb_id and user_id) and optionally log the result returned by
KnowledgebaseService.accessible before returning it. Use the module's logger (or
processLogger) consistent with other services.

In `@api/db/services/knowledgebase_service.py`:
- Around line 513-516: get_kb_by_id currently calls cls.get_by_id(kb_id) then
calls cls.accessible(kb_id, user_id) which calls cls.get_by_id again; change
get_kb_by_id to reuse the already-fetched kb by checking accessibility against
the loaded instance: either add a private overload _accessible(cls, kb, user_id)
that accepts the model instance and use that from get_kb_by_id, or modify
accessible to accept an optional kb parameter and pass the kb returned from
cls.get_by_id(kb_id) to avoid the second DB fetch; ensure you still return []
when not found or not accessible and return [kb.to_dict()] when allowed.
- Around line 481-502: The accessible(cls, kb_id, user_id) method lacks logs for
its access-control decisions; add structured debug/info logs at each decision
point (e.g., when get_by_id returns no kb, when kb.status != StatusEnum.VALID,
when tenant_id matches user_id and access is granted, when permission !=
TenantPermission.TEAM and access denied, and when checking
TenantService.get_joined_tenants_by_user_id results) including kb_id, user_id,
kb.tenant_id and the reason for grant/deny; use the module logger (the logger
used elsewhere in this file) and keep messages concise and consistent with
existing logging style.
- Around line 527-531: get_kb_by_name is re-fetching each knowledgebase because
it calls cls.accessible(kb.id, user_id) which does a DB get_by_id; change the
check to avoid a redundant fetch by either (A) updating accessible to accept a
KB object (e.g., accessible(kb_or_id, user_id)) and branch on type, or (B) add a
new helper like accessible_obj(kb, user_id) that contains the same permission
logic but operates on the already-loaded kb; then update get_kb_by_name to call
the object-aware check (replace cls.accessible(kb.id, user_id) with the
new/updated call) so you use the in-memory kb and return kb.to_dict() without an
extra DB round-trip.

In `@test/unit_test/api/db/services/test_dataset_access_permissions.py`:
- Around line 32-72: Add a unit test that covers the owner-access branch: create
a SimpleNamespace kb with id "kb-private", tenant_id "owner-1", permission
TenantPermission.ME.value and status StatusEnum.VALID.value, monkeypatch
KnowledgebaseService.get_by_id to return (True, kb), then assert that
_unwrapped_kb_accessible()(KnowledgebaseService, "kb-private", "owner-1") is
True (name the test e.g. test_owner_can_always_access_private_dataset) to ensure
KnowledgebaseService.accessible treats the tenant owner as always allowed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4a6eadbb-50cc-4b7e-91e4-097193bd7958

📥 Commits

Reviewing files that changed from the base of the PR and between 057806d and 8f05050.

📒 Files selected for processing (3)
  • api/db/services/document_service.py
  • api/db/services/knowledgebase_service.py
  • test/unit_test/api/db/services/test_dataset_access_permissions.py

@wangq8 wangq8 added the ci Continue Integration label May 8, 2026
@wangq8 wangq8 marked this pull request as draft May 8, 2026 05:32
@wangq8 wangq8 marked this pull request as ready for review May 8, 2026 05:32
@wangq8
Copy link
Copy Markdown
Collaborator

wangq8 commented May 8, 2026

LGTM

@wangq8
Copy link
Copy Markdown
Collaborator

wangq8 commented May 8, 2026

Hi @jony376 please help to fix the unit test failrue.

voidborne-d and others added 17 commits May 8, 2026 04:45
…flow#14185)

## Summary

Fixes infiniflow#14159 — files from different datasets can overwrite each other in
Azure Blob storage.

## Problem

Both `azure_spn_conn.py` and `azure_sas_conn.py` ignore the `bucket`
parameter in all storage operations (`put`, `get`, `rm`, `obj_exist`,
`get_presigned_url`). Files are stored flat using only the filename, so
two datasets containing a file with the same name will overwrite each
other.

The MinIO and S3 implementations correctly use the bucket (typically the
knowledge base ID) as a path prefix to create logical folder isolation:
- MinIO: uses `use_prefix_path` decorator → `{orig_bucket}/{fnm}`
- S3: uses `use_prefix_path` decorator → `{prefix_path}/{bucket}/{fnm}`

## Fix

Prepend `{bucket}/` to the file path in all 5 operations across both
Azure connector files:

| File | Methods fixed |
|------|---------------|
| `azure_spn_conn.py` | `put`, `get`, `rm`, `obj_exist`,
`get_presigned_url` |
| `azure_sas_conn.py` | `put`, `get`, `rm`, `obj_exist`,
`get_presigned_url` |

This matches the existing convention where `bucket` is the knowledge
base ID used as a directory prefix.

## ⚠️ Migration Note

Existing Azure SPN/SAS deployments have files stored without the bucket
prefix. After this fix, new files will be stored under
`{bucket}/{filename}` while existing files remain at `{filename}`. A
one-time migration script or manual file move may be needed for existing
deployments. New deployments are unaffected.

## Testing

- Verified the fix is consistent across all 5 methods in both files
- The `health()` method is intentionally left unchanged as it uses a
hardcoded test filename without bucket semantics

Co-authored-by: Jin Hai <[email protected]>
…tdated models (infiniflow#14637)

### What problem does this PR solve?

Feat: support local provider for code exec component & remove some
outdated models

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?

Fix: Change route name

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?

Fixes infiniflow#14412.

`common.metadata_utils.meta_filter` evaluates user-defined metadata
conditions in Python after `DocMetadataService.get_flatted_meta_by_kbs`
loads the entire `meta_fields` table into memory. Past a few thousand
documents per knowledge base this becomes a memory bottleneck and a
wasted ES round-trip — every filter request currently fetches up to
10000 metadata rows even when the resulting `doc_ids` list is tiny.

This PR adds an ES push-down path that translates the same filter
language into a `bool` query and returns just the matching document IDs.

**Changes**

- `common/metadata_es_filter.py` *(new)*: pure-Python translator from
the RAGflow filter list to ES DSL. Covers every operator the in-memory
path supports (`=`, `≠`, `>`, `<`, `≥`, `≤`, `in`, `not in`, `contains`,
`not contains`, `start with`, `end with`, `empty`, `not empty`) with
`case_insensitive: true` on `prefix` and `wildcard` for parity with the
existing lower-cased Python comparisons. User wildcard metacharacters
are escaped before being injected into `wildcard` patterns. Negative
operators (`≠`, `not in`, `not contains`, ranges) are wrapped with an
`exists` guard so they do not accidentally match documents missing the
key, matching the legacy `if k not in metas` behaviour.
- `api/db/services/doc_metadata_service.py`: new
`DocMetadataService.filter_doc_ids_by_meta_pushdown(kb_ids, filters,
logic)` that returns the doc IDs ES matched, or `None` to signal the
caller should fall back to the in-memory path. Returns `None` when the
active doc store is Infinity (`meta_fields` is a JSON column, not a
dotted-object mapping), when any filter cannot be expressed in DSL
(`UnsupportedMetaFilter`), or when the ES request or metadata index
lookup errors.
- `common/metadata_utils.py`: `apply_meta_data_filter` accepts an
optional `kb_ids` argument. When supplied, conditions go through
push-down first via a new `_try_meta_pushdown` helper; on `None` the
function falls back to the original `meta_filter` call. Default
behaviour is unchanged for callers that don't pass `kb_ids`.
- Updated all four callers (`agent/tools/retrieval.py`,
`api/db/services/dialog_service.py` ×2,
`api/apps/services/dataset_api_service.py`, `api/apps/sdk/session.py`)
to forward `kb_ids` so the push-down path is exercised in production.
- `test/unit_test/common/test_metadata_es_filter.py` *(new)*: 35 unit
tests covering every operator's DSL shape, value coercion
(`ast.literal_eval`, lowercasing, ISO-date pass-through), wildcard
escaping, OR-logic wrapping that protects negative clauses, and the
doc-ID extractor.

**Behaviour preserved**

- The in-memory `meta_filter` is untouched and still services every
fallback case (Infinity backend, unknown operators, ES outages).
- The eligibility / credibility / issue-multiplier semantics described
in the LLM-driven `auto` and `semi_auto` modes still hand the LLM the
full in-memory `metas` dict to choose conditions from. Only the
*evaluation* of those generated conditions is pushed down.
- Existing tests in
`test/unit_test/common/test_metadata_filter_operators.py` continue to
pass (14/14).

**Test plan**

- `pytest test/unit_test/common/test_metadata_es_filter.py` — 35 passed.
- `pytest test/unit_test/common/test_metadata_filter_operators.py` — 14
passed.
- `ruff check` clean on every modified file.
- Reviewer please validate the ES query shapes against a live cluster —
particularly `case_insensitive` on `wildcard` and `prefix` (requires ES
7.10+) and the `exists` + `must_not` pairing for `≠`.

**Notes**

- The first cut caps each push-down request at 10000 results, matching
the existing `get_flatted_meta_by_kbs` limit, and logs a warning when
the cap is hit. A `search_after` follow-up would let us drop the cap
entirely once the push-down path is validated.
- Operator parity with the in-memory path is exact for the canonical
unicode operators (`≥`, `≤`, `≠`) used internally; the ASCII aliases
(`>=`, `<=`, `!=`) are normalised by `convert_conditions` before they
reach the translator.

### Type of change

- [x] Performance Improvement

---------

Co-authored-by: sxxtony <[email protected]>
## Summary
Update the Turkish locale file to match the latest English locale keys.

## Changes
- Add missing Turkish translations for the new Skills and Skill Search
sections
- Add newly introduced common, header, dataset, settings, and agent
workflow strings
- Align renamed flow keys such as file format options and list
operations with the English source
- Add empty-state strings for skill spaces

## Validation
- Compared web/src/locales/en.ts and web/src/locales/tr.ts: 0 missing
keys, 0 extra keys
- Checked jsonjoy-builder locale: Turkish is already complete
- Checked translated README variants: no new Turkish-specific
documentation gap found
- VS Code diagnostics: no errors in web/src/locales/tr.ts

Co-authored-by: bakiburakogun <[email protected]>
… vllm (infiniflow#14614)

### What problem does this PR solve?

Three Go drivers had `CheckConnection` returning a hardcoded `no such
method` error, even though each one already has a working `ListModels`
that hits the configured base URL with the configured API key. So the
"Check connection" button in the model provider UI always failed for
these three providers, even when the underlying setup was fine.

Affected drivers:

- `internal/entity/models/ollama.go`
- `internal/entity/models/lmstudio.go`
- `internal/entity/models/vllm.go`

This is a real user-facing gap because Ollama and LM Studio are two of
the most popular local LLM runners, and vLLM is widely used for
self-hosted deployments.

### What this PR includes

For each of the three drivers, replace the stub with a small
implementation that calls `ListModels` and returns its error:

```go
func (o *OllamaModel) CheckConnection(apiConfig *APIConfig) error {
    _, err := o.ListModels(apiConfig)
    return err
}
```

This is the exact pattern that xai, moonshot, deepseek, aliyun, and
gitee already use for the same method.

No JSON change. No factory change. No interface change.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- `go build ./internal/entity/models/...` in a clean go 1.25 image (the
go.mod minimum) returns exit 0.
- The full ModelDriver interface still resolves on each driver
(NewInstance, Name, ChatWithMessages, ChatStreamlyWithSender, Encode,
Rerank, ListModels, Balance, CheckConnection).
- Pattern parity with the existing xai, moonshot, deepseek, aliyun, and
gitee CheckConnection methods.

Closes infiniflow#14609
### What problem does this PR solve?

The SiliconFlow Go driver shipped with a stub \`Balance\` method that
returned \`no such method\`, even though SiliconFlow exposes a public
\`GET /v1/user/info\` endpoint that returns the account balance per
currency.

So the "Balance" panel in the model provider UI always shows an error
for SiliconFlow tenants, while it already works for
Moonshot and Gitee. This PR fills the gap.

### What this PR includes

- \`conf/models/siliconflow.json\`: add \`\"balance\": \"user/info\"\`
under \`url_suffix\` so the driver builds the URL from config.
- \`internal/entity/models/siliconflow.go\`: replace the \`Balance\`
stub with a real implementation. Adds a small local response type that
matches the upstream shape.

No factory change. No interface change.

### How the driver works

- Validate \`apiConfig\` and the API key, resolve the region with a
default fallback, and build the URL from \`BaseURL[region] +
URLSuffix.Balance\`.
- GET the URL with \`Authorization: Bearer <api_key>\`.
- Parse the upstream response. SiliconFlow returns balance fields as
strings, so the driver parses them with \`strconv.ParseFloat\`. It
prefers \`totalBalance\` over \`balance\` when both are present.
- Return \`{\"balance\": <float>, \"currency\": \"CNY\"}\`, the same
shape the Moonshot driver returns. The UI can render it
  with no provider-specific code.

### Edge cases

- Missing or empty API key returns a clear local error before any HTTP
call.
- An unknown region falls back to the default base URL.
- Empty \`balance\` and \`totalBalance\` returns a clear "no balance
info in response" error rather than a zero-value silent success.
- Non-numeric balance string returns a clear parse error.
- Non-200 responses propagate the upstream status line and body.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
returns exit 0.
- The full method set on \`SiliconflowModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing Moonshot and Gitee Balance
implementations.

Closes infiniflow#14642
### What problem does this PR solve?

1. **Implement `OpenRouter` Provider:** Fully support OpenRouter AI
models (e.g., `gemma`, `minimax`). Includes robust handling of
Server-Sent Events (SSE) streams, error event interception, and proper
parsing of both `reasoning_content` and standard `content`.
2. **Fix BaseURL Resolution Bug:** Fixed a critical edge case in region
configuration parsing. Added a strict empty string check
(`*apiConfig.Region != ""`) alongside the `nil` check. This ensures that
if the UI passes an empty string, the system correctly falls back to the
`"default"` region, preventing `unsupported protocol scheme ""` errors
during HTTP requests.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
Closes infiniflow#14631 

### What problem does this PR solve?

The DeepSeek Go driver shipped with a stub \`Balance\` method that
returned \`no such method\`, even though DeepSeek exposes a public \`GET
/user/balance\` endpoint that works with the same Bearer token used for
chat.

So the "Balance" panel in the model provider UI always shows an error
for DeepSeek tenants, while it already works for Moonshot and Gitee.
This PR fills the gap.

### What this PR includes

- \`conf/models/deepseek.json\`: add \`\"balance\": \"user/balance\"\`
under \`url_suffix\` so the driver can build the URL from config the
same way the other endpoints do.
- \`internal/entity/models/deepseek.go\`: replace the \`Balance\` stub
with a real implementation. Adds a small local response type
\`deepseekBalanceResponse\` that matches the upstream shape.

No factory change. No interface change.

### How the driver works

- Validate \`apiConfig\` and the API key, resolve the region (with a
\`default\` fallback), and build the URL from \`BaseURL[region] +
URLSuffix.Balance\`.
- GET the URL with \`Authorization: Bearer <api_key>\`.
- Parse the upstream response:

  \`\`\`json
  {
    \"is_available\": true,
    \"balance_infos\": [
      {\"currency\": \"USD\", \"total_balance\": \"10.00\", ...},
      {\"currency\": \"CNY\", \"total_balance\": \"70.00\", ...}
    ]
  }
  \`\`\`

\`total_balance\` is a string in the upstream API, so the driver parses
it with \`strconv.ParseFloat\`.
- Return the first balance entry as \`{\"balance\": <float>,
\"currency\": <string>}\`, the same shape the Moonshot driver returns.
The UI can render it with no provider-specific code.

### Edge cases

- Missing or empty API key returns a clear local error before any HTTP
call.
- Empty \`balance_infos\` returns a clear \"no balance info in
response\" error rather than a zero-value silent success.
- Non-numeric \`total_balance\` returns a clear parse error.
- Non-200 responses propagate the upstream status line and body so the
user can see why the call failed.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
(the go.mod minimum) returns exit 0.
- The full method set on \`DeepSeekModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing Moonshot and Gitee Balance
implementations.
…nfiniflow#14636)

### What problem does this PR solve?

The NVIDIA Go driver added in infiniflow#14623 has a real chat path, but
\`ListModels\` and \`CheckConnection\` are stubs that always return \`no
such method\`. So:

- The model picker cannot auto-populate available NVIDIA NIM model ids.
Users have to type the full id by hand (e.g.
  \`abacusai/dracarys-llama-3.1-70b-instruct\`).
- The "Check connection" button always fails for NVIDIA, even when the
base URL is reachable and the API key is accepted.

NVIDIA NIM is OpenAI-compatible. \`/v1/models\` works with the same
Bearer token used for chat. The
\`conf/models/nvidia.json\` file already wires the \`models\`
url_suffix, so no config change is needed.

### What this PR includes

- \`internal/entity/models/nvidia.go\`:
  - \`ListModels\` now calls
    \`GET ${BaseURL}/${URLSuffix.Models}\`, parses
    \`response.data[*].id\`, and returns the list. Same shape
    as the moonshot, xai, and openai drivers.
  - \`CheckConnection\` now calls \`ListModels\` and returns its
    error. Same pattern xai, moonshot, deepseek, aliyun, and
    gitee already use.

\`Balance\`, \`Encode\`, and \`Rerank\` are still stubs in this PR and
can be added in follow-ups.

No JSON change. No factory change. No interface change.

### How the implementation works

- Region resolution falls back to \`default\` when the supplied region
is unknown, so a stray region value does not break a valid request.
- The Authorization header is only set when \`apiConfig\` and \`ApiKey\`
are non-nil and non-empty. This avoids a nil-pointer dereference and
lets self-hosted NIM deployments without a key still work.
- Non-200 responses propagate the upstream status line and body so the
user sees a real error message.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### How was this tested?

- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
(the go.mod minimum) returns exit 0.
- The full method set on \`NvidiaModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing xai, moonshot, deepseek, aliyun,
gitee, and openai drivers.

Closes infiniflow#14635
…nfiniflow#14347)

## Summary

Fixes file collision between different datasets when using Azure Blob
storage (SPN or SAS authentication).

## Bug

azure_spn_conn.py and �zure_sas_conn.py ignored the �ucket parameter
entirely, storing all files flat with just the filename. This caused
files with the same name from different datasets (knowledge bases) to
overwrite each other.

## Fix

Prepend bucket/ as a path prefix in all methods (put, 
m, get, obj_exist, get_presigned_url, health) to match the behavior of
MinIO and S3 implementations.

## Changes

- **rag/utils/azure_spn_conn.py**: Added {bucket}/ prefix to file paths
in all operations
- **rag/utils/azure_sas_conn.py**: Same fix applied for consistency
(also noted in the original issue)

## Testing

Manual verification: files from different datasets now stored under
distinct bucket/ prefixes, preventing collisions.

Fixes infiniflow#14159

Co-authored-by: Hunter <[email protected]>
Co-authored-by: Jin Hai <[email protected]>
### What problem does this PR solve?

The Gitee AI Go driver shipped with a stub \`Rerank\` method that
returned \`Rerank not implemented\`, even though
\`conf/models/gitee.json\` already wires the rerank URL suffix at
\`\"rerank\": \"rerank\"\`. The same config did not list any
rerank model, so the picker had nothing to select.

So a Gitee tenant could not use BAAI/bge-reranker-v2-m3 as a reranker
through the Go layer today, even though the
infrastructure was one config entry and one method body away.

### What this PR includes

- \`conf/models/gitee.json\`: add \`BAAI/bge-reranker-v2-m3\` to the
\`models\` array.
- \`internal/entity/models/gitee.go\`: replace the \`Rerank\` stub with
a real implementation. Adds two small local types
that match the OpenAI-compatible \`/rerank\` shape already used by the
SiliconFlow and ZhipuAI drivers.

No factory change. No interface change.

### How the driver works

- Validate \`apiConfig\` and the API key, validate the model name,
resolve the region with a default fallback, build the
  URL from \`BaseURL[region] + URLSuffix.Rerank\`.
- Use a per-call \`context.WithTimeout(30s)\` and
\`http.NewRequestWithContext\`, matching the pattern the
  recently merged Aliyun Encode and the OpenAI driver already use.
- Send \`{model, query, documents, top_n, return_documents:false}\` in
the body.
- Parse \`results[*].relevance_score\` and copy each score into the
output slice indexed by \`results[*].index\`, so the
output order matches the input order even if the API returns items in a
different order.
- Empty input returns \`[]float64{}\` with no HTTP call.
- An out-of-range result index returns a clear error rather than
silently skipping the entry.
- Non-200 responses propagate the upstream status line and body.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
returns exit 0.
- The full method set on \`GiteeModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing SiliconFlow Rerank and the recently
merged ZhipuAI Rerank (infiniflow#14608).

Closes infiniflow#14655
### What problem does this PR solve?

Use right key in error text.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
)

### What problem does this PR solve?

migrate document preview/download to RESTful API

### Type of change
- [x] Refactoring
…iflow#14649)

### Related issues
Closes infiniflow#14648

### What problem does this PR solve?

This PR fixes an authorization flaw in `POST /files/link-to-datasets`.

Before this change, the endpoint only checked whether the supplied
`file_ids` and `kb_ids` existed. It did not verify whether the
authenticated user was actually allowed to access those files or target
datasets. As a result, an authenticated user who knew valid IDs could
relink another user's files to arbitrary datasets.

This was especially risky because the relinking flow is state-changing:
the background worker removes existing file-document mappings and then
recreates documents under the attacker-supplied dataset IDs.

This change makes the route enforce the same permission model already
used by nearby file and document operations:

- each resolved file must pass `check_file_team_permission(...)`
- each target dataset must pass `check_kb_team_permission(...)`
- authorization is enforced before scheduling background relinking work

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Testing

- Added regression coverage in
`test/testcases/test_web_api/test_file_app/test_file2document_routes_unit.py`
- Covered:
  - unauthorized file access is rejected
  - unauthorized dataset access is rejected
- existing success path still returns immediately after scheduling
background work
- Attempted to run:
- `python -m pytest
test\\testcases\\test_web_api\\test_file_app\\test_file2document_routes_unit.py
-q`
- Local execution in this workspace is currently blocked by missing test
dependencies during bootstrap, including `ragflow_sdk`

---------

Co-authored-by: jony376 <[email protected]>
### What problem does this PR solve?

Since API is updated, CLI login failed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <[email protected]>
### What problem does this PR solve?

1. implement `rerank`, `embedding`, `balance`, `checkConnet` method for
`OpenRouter`
2. delete `chat` method in `internal/entity/models/volcengine.go`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
pandadev66 and others added 15 commits May 8, 2026 04:45
### What problem does this PR solve?

The Aliyun Go driver shipped with a stub \`Encode\` method that returned
\`no such method\`, even though \`conf/models/aliyun.json\` already
wires the OpenAI-compatible embeddings URL suffix at
\`compatible-mode/v1/embeddings\`. The same config also did not list any
embedding models, so the picker had nothing to select.

So an Aliyun tenant who wanted to use Tongyi text-embedding-v3 or v4 in
the Go layer could not, even though the upstream endpoint is public and
uses the standard \`POST /v1/embeddings\` shape that the SiliconFlow and
ZhipuAI
drivers already support.

This PR fills the gap.

### What this PR includes

- \`conf/models/aliyun.json\`: add \`text-embedding-v4\` and
\`text-embedding-v3\` to the \`models\` array.
- \`internal/entity/models/aliyun.go\`: replace the \`Encode\` stub with
a real implementation. Adds a small local response type that matches the
OpenAI-compatible shape.

No factory change. No interface change.

### How the driver works

- Validate \`apiConfig\` and the API key, validate the model name,
resolve the region with a default fallback, build the
  URL from \`BaseURL[region] + URLSuffix.Embedding\`.
- Send all input texts in one request as the \`input\` array, the same
OpenAI-compatible shape the SiliconFlow \`Encode\`
  uses.
- Parse \`data[*].embedding\` and copy each slice into a \`[][]float64\`
indexed by \`data[*].index\` so the output order matches the input order
even if the API returns items in a different order.
- Handle both \`float64\` and \`float32\` element types.
- Empty input returns \`[][]float64{}\` with no HTTP call.
- Non-200 responses propagate the upstream status line and body.
- A final pass checks every input slot got a vector and returns a clear
error if any slot is still nil.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
returns exit 0.
- The full method set on \`AliyunModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing SiliconFlow Encode implementation.

Closes infiniflow#14646

---------

Co-authored-by: Jin Hai <[email protected]>
…niflow#14618) (infiniflow#14625)

### What problem does this PR solve?

Closes infiniflow#14618.

The `GET /v1/document/get/<doc_id>` endpoint in
`api/apps/document_app.py` was protected only by `@login_required` and
called `DocumentService.get_by_id(doc_id)` without verifying that the
document's knowledge base belonged to the requesting user's tenant. Any
authenticated user who knew (or guessed) a document ID could download
files belonging to any other tenant — a cross-tenant IDOR.

This PR adds a `DocumentService.accessible(doc_id, current_user.id)`
check before serving the file. The helper already exists and joins
`Document` → `Knowledgebase` → `UserTenant` to verify the requesting
user belongs to the tenant that owns the document's KB. The same pattern
is already used by `api/apps/restful_apis/document_api.py` and mirrors
the tenant scoping in the SDK route at `api/apps/sdk/doc.py`.

The check returns the existing `"Document not found!"` error for both
non-existent and inaccessible documents, so attackers cannot use the
response to enumerate valid doc IDs across tenants.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Other (please describe): Security fix (cross-tenant IDOR /
authorization bypass)
…l tag (infiniflow#14613)

## Summary

- **Collapsible thinking**: Replace `<section>` with `<details>` for
`<think>` content, so model thinking output is collapsed by default
(click to expand). Works for all models that output `<think>` tags
(Qwen3, DeepSeek, Gemini, Claude, etc.).
- **Fix double thinking tags**: When reasoning/deep research mode is
enabled in knowledge base chat, both the retrieval progress and model
thinking were wrapped in `<think>` tags, producing two "Thinking..."
blocks. Now retrieval progress uses a dedicated `<retrieving>` tag
rendered as a separate "Retrieving..." collapsible with a distinct green
accent.

### Before
- Thinking content displayed as flat gray-bordered `<section>`,
occupying significant screen space
- Deep research + model thinking both use `<think>` → two identical
"Thinking..." blocks

### After
- Thinking content collapsed by default in a `<details>` element, click
"Thinking..." to expand
- Deep research shows "Retrieving..." (green border), model thinking
shows "Thinking..." (gray border)

## Changes

**Backend (`api/db/services/dialog_service.py`)**
- Deep research callback: replace `start_to_think`/`end_to_think` marker
flags with direct `<retrieving>`/`</retrieving>` answer text

**Frontend**
- `web/src/utils/chat.ts`: `replaceThinkToSection()` now uses
`<details>` instead of `<section>`; add new
`replaceRetrievingToSection()`
- 4 tsx files: import and pipe `replaceRetrievingToSection`, whitelist
`details`, `summary`, `retrieving` in DOMPurify `ADD_TAGS`
- 4 less files: `section.think` → `details.think` with `<summary>`
styles; add `details.retrieving` with green accent; dark mode and RTL
variants

## Test plan
- [ ] Open a chat WITHOUT knowledge base, ask a question to a model with
thinking (e.g. Qwen3) → thinking content should be collapsed by default,
click "Thinking..." to expand
- [ ] Open a chat WITH knowledge base and reasoning enabled, ask a
question → "Retrieving..." (green) shows retrieval progress,
"Thinking..." (gray) shows model thinking, each independently
collapsible
- [ ] Verify dark mode renders correctly for both collapsible blocks
- [ ] Verify RTL layout renders correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: wanghualoong <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
…low#14663)

### What problem does this PR solve?

add compatibility route for document download under /v1

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
…nfiniflow#14600)

## Summary

- When an agent workflow has multiple `UserFillUp` pause points,
`canvas.run()` calls `reset(True)` on **all** components at the start of
each run. This clears outputs from components that completed in prior
runs, so downstream references like `{Agent:XXX@content}` resolve to
`None`.
- This fix only resets components on the **current execution path**
(`self.path`), preserving outputs from previously completed components.

## Problem

In a multi-step agent (e.g. draft email → user confirms → send email):

1. First `run()`: Agent drafts content, UserFillUp pauses for user input
→ Agent output is saved
2. Second `run()`: User submits input, but `reset(True)` clears **all**
components including the Agent that already completed
3. Email component references `{Agent:XXX@content}` → gets `None`
instead of the draft

This affects **all** agents that reference upstream component outputs
after a UserFillUp pause point.

## Fix

```python
# Before: reset ALL components
for k, cpn in self.components.items():
    self.components[k]["obj"].reset(True)

# After: only reset components on current execution path
path_set = set(self.path)
for k, cpn in self.components.items():
    if k in path_set:
        self.components[k]["obj"].reset(True)
```

`self.path` already tracks the current execution path. For agents
without UserFillUp (single run), `path` contains all components, so
behavior is unchanged.

## Test plan

- [x] Agent with single UserFillUp: outputs from prior components are
preserved after resume
- [x] Agent with multiple UserFillUp: each resume preserves all
previously completed outputs
- [x] Agent without UserFillUp: behavior unchanged (all components in
path, all reset)
- [x] Webhook-triggered agents: unaffected (path includes all components
on first run)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: wanghualoong <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>
### What problem does this PR solve?

Update the type of tenant_rerank_id in validation.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?

1. Update API URL
2. Add password encryption

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <[email protected]>
…ck (infiniflow#14669)

### What problem does this PR solve?

1. **Fix Global State Pollution in Local Providers (Critical Bug):** -
Resolved a severe concurrency and architecture issue in
`model_service.go`. Previously, `ListSupportedModels` would permanently
overwrite the global provider singleton with a localized URL instance
(`driver.NewInstance`). This caused cross-request contamination in
multi-tenant environments.
- Fixed `CheckProviderConnection` for local models (LM Studio, vLLM,
Ollama). It now properly creates a localized driver copy and injects the
`base_url` before testing the connection, entirely eliminating the
false-positive `missing base URL` error without polluting the global
state.
2. **Implement `VolcEngine` Embeddings:** - Fully implemented the
`Encode` method for the `volcengine` provider, enabling text embedding
capabilities for VolcEngine models.
3. **Enhance Region Validation in `SiliconFlow`:** - Added a strict
empty string check (`*apiConfig.Region != ""`) alongside the existing
`nil` check when parsing regions. This ensures that if an empty string
is passed, the system safely falls back to the `"default"` region,
preventing malformed URL requests and `unsupported protocol scheme`
errors.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?

```
RAGFlow(user)> logout;
SUCCESS
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <[email protected]>
…rash (infiniflow#14651) (infiniflow#14666)

### What problem does this PR solve?

Fixes infiniflow#14651.

`kb_prompt()` in `rag/prompts/generator.py` crashes with
`AttributeError: 'NoneType' object has no attribute 'items'` during
agent citation generation when a retrieved chunk carries
`document_metadata: null`.

**Root cause.** The crash happens at `rag/prompts/generator.py:132-133`:

```python
meta = ck.get("document_metadata", {})
for k, v in meta.items():
```

`dict.get(key, default)` only returns the default when the key is
*missing*. When the key is present with an explicit `None` value,
`.get()` returns `None`, and `.items()` crashes.

**How the chunk gets `None`.** It's a round-trip inside RAGFlow itself,
not bad input from retrieval:

1. The agent stores retrieved chunks via `agent/canvas.py:814`, which
routes them through `chunks_format()`.
2. `rag/prompts/generator.py:61` canonicalizes the field with
`chunk.get("document_metadata")` (no default), so chunks without
metadata become `{"document_metadata": None, ...}`.
3. `agent/component/agent_with_tools.py:314` feeds those canonicalized
chunks back into `kb_prompt()` for citation generation, and
`.get("document_metadata", {})` no longer protects us.

**Fix.** One-line change at `rag/prompts/generator.py:132`: use
`ck.get("document_metadata") or {}` so an explicit `None` is also
coerced to `{}`.

The line-61 `None` is intentionally part of the API/UI contract — the
frontend handles it via optional chaining
(`web/src/components/markdown-content/index.tsx:184`,
`web/src/pages/next-search/search-view.tsx:217`) — so the fix belongs at
the consumer, not the producer.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
…nfiniflow#14684)

### What problem does this PR solve?

Do not bypass threshold for rerank when metadata filter is enabled

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?

Restrict file move operations: prevent moving a folder to itself or to
one of its own subfolders.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
…nfiniflow#14592)

Closes infiniflow#14590

## Self Checks

- [x] I have searched for existing issues [search for existing
issues](https://github.com/infiniflow/ragflow/issues), including closed
ones.
- [x] I confirm that I am using English to submit this report ([Language
Policy](infiniflow#5910)).
- [x] Non-english title submitions will be closed directly (
非英文标题的提交将会被直接关闭 ) ([Language
Policy](infiniflow#5910)).
- [x] Please do not modify this template :) and fill in all the required
fields.

## RAGFlow workspace code commit ID

`a1b2c3d4e5f67890123456789abcdef12345678`

## RAGFlow image version

`0.13.1`

## Other environment information

- Hardware parameters: N/A
- OS type: Linux 6.17.0-22-generic
- Others: API key authentication via `Authorization: Bearer <token>`

## Actual behavior

The chatbot API endpoints:

- `POST /chatbots/<dialog_id>/completions`
- `GET /chatbots/<dialog_id>/info`

validate only that the bearer token exists in `APIToken`, but do not
verify that `dialog_id` belongs to the same tenant as that token.

Current flow (simplified):

1. Route extracts bearer token and checks `APIToken.query(beta=token)`.
2. If token exists, request is accepted.
3. Downstream service resolves dialog globally by ID
(`DialogService.get_by_id(dialog_id)` in `conversation_service.py`).
4. No tenant ownership check is enforced for `dialog_id`.

Impact: Any user with a valid API key can attempt arbitrary `dialog_id`
values and access/invoke chatbots outside their own tenant boundary if
IDs are known/guessed/leaked.

Security classification:

- Vulnerability class: Broken Access Control (IDOR, OWASP Top 10 A01)
- Severity recommendation: Critical
- Exploit prerequisite: any valid API key + discoverable target
`dialog_id`

## Expected behavior

Requests to `/chatbots/<dialog_id>/completions` and
`/chatbots/<dialog_id>/info` must be authorized only when:

1. bearer token is valid, and
2. `dialog_id` belongs to the same `tenant_id` as the token.

Otherwise, reject with authorization failure (e.g., 403 or
404-equivalent policy).

## Steps to reproduce

1. Prepare two tenants:
   - Tenant A with API key `TOKEN_A`
   - Tenant B with chatbot `dialog_id = DIALOG_B`
2. Send request from Tenant A to Tenant B chatbot completion endpoint:

```bash
curl -X POST "https://<host>/chatbots/DIALOG_B/completions" \
  -H "Authorization: Bearer TOKEN_A" \
  -H "Content-Type: application/json" \
  -d '{"question":"hello","stream":false}'
```

3. Observe request is processed (or reaches dialog resolution) without
tenant ownership rejection.
4. Repeat against info endpoint:

```bash
curl -X GET "https://<host>/chatbots/DIALOG_B/info" \
  -H "Authorization: Bearer TOKEN_A"
```

5. Observe the same missing ownership enforcement.

## Additional information

Affected code paths:

- `api/apps/sdk/session.py`
  - `chatbot_completions(dialog_id)`
  - `chatbots_inputs(dialog_id)`
- `api/db/services/conversation_service.py`
  - `async_iframe_completion(...)` uses global dialog lookup

Suggested fix:

1. In both chatbot endpoints:
   - Resolve `tenant_id = objs[0].tenant_id` from validated token.
- Fetch dialog with tenant-scoped query
(`DialogService.query(id=dialog_id, tenant_id=tenant_id)`).
   - Reject if dialog is not found/owned by tenant.
2. Defense in depth:
- Require and enforce `tenant_id` in service-layer dialog resolution for
external flows.
- Avoid global `get_by_id(dialog_id)` where user-controlled dialog IDs
are reachable.
3. Add regression tests:
   - Positive: same-tenant token + dialog succeeds.
   - Negative: cross-tenant token + dialog fails for both endpoints.
### What problem does this PR solve?

Refactor : Allow search multiple datasets
1. support /datasets/search
2. get rid of /graph/search, use /graph

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
internal/entity/models/siliconflow.go (1)

475-497: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate apiConfig before using it in ListModels.

This method still dereferences apiConfig.ApiKey unconditionally later, so ListModels(nil) or a missing key crashes instead of returning a regular error. The new region guard only fixes part of that path.

Proposed fix
 func (z *SiliconflowModel) ListModels(apiConfig *APIConfig) ([]string, error) {
+	if apiConfig == nil || apiConfig.ApiKey == nil || *apiConfig.ApiKey == "" {
+		return nil, fmt.Errorf("api key is required")
+	}
+
 	var region = "default"
-	if apiConfig.Region != nil && *apiConfig.Region != "" {
+	if apiConfig.Region != nil && *apiConfig.Region != "" {
 		region = *apiConfig.Region
 	}
@@
 	req.Header.Set("Content-Type", "application/json")
 	req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", *apiConfig.ApiKey))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/siliconflow.go` around lines 475 - 497, The ListModels
method currently dereferences apiConfig and apiConfig.ApiKey without validation;
update ListModels to first check that apiConfig is not nil and that
apiConfig.ApiKey is non-nil and non-empty, returning a clear error if the config
or key is missing, adjust the region selection to handle a nil apiConfig (use
default region when apiConfig==nil), and only set the Authorization header when
a valid ApiKey exists; reference the ListModels function, APIConfig type and
apiConfig.ApiKey/Region, and URL building using z.BaseURL and z.URLSuffix.Models
to locate the changes.
api/utils/validation_utils.py (1)

855-878: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the type mismatch in SearchDatasetsReq.tenant_rerank_id — it should be int | None, not str | None.

SearchDatasetReq.tenant_rerank_id is correctly typed as int | None (line 857), but SearchDatasetsReq.tenant_rerank_id is incorrectly str | None (line 877). Both models route through the same downstream service layer, which calls get_model_config_by_id(req["tenant_rerank_id"]) — a function that expects an int parameter (see api/db/joint_services/tenant_model_service.py:24). The database also stores this as IntegerField.

With the current str | None typing on SearchDatasetsReq, requests with numeric string IDs sent to POST /datasets/search will fail at service invocation, while the same payload to POST /datasets/<dataset_id>/search (which has the correct int | None type) will succeed.

🛠️ Fix
    class SearchDatasetsReq(BaseModel):
         model_config = ConfigDict(extra="ignore")
 
         dataset_ids: Annotated[list[str], Field(..., min_length=1)]
         question: Annotated[str, StringConstraints(strip_whitespace=True, min_length=1), Field(...)]
         doc_ids: Annotated[list[str], Field(default=[])]
         page: Annotated[int, Field(default=1, ge=1)]
         size: Annotated[int, Field(default=30, ge=1)]
         top_k: Annotated[int, Field(default=1024, ge=1)]
         similarity_threshold: Annotated[float, Field(default=0.0, ge=0.0, le=1.0)]
         vector_similarity_weight: Annotated[float, Field(default=0.3, ge=0.0, le=1.0)]
         use_kg: Annotated[bool, Field(default=False)]
         cross_languages: Annotated[list[str], Field(default=[])]
         keyword: Annotated[bool, Field(default=False)]
         search_id: Annotated[str | None, Field(default=None)]
         rerank_id: Annotated[str | None, Field(default=None)]
-        tenant_rerank_id: Annotated[str | None, Field(default=None)]
+        tenant_rerank_id: Annotated[int | None, Field(default=None)]
         meta_data_filter: Annotated[dict | None, Field(default=None)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/utils/validation_utils.py` around lines 855 - 878,
SearchDatasetsReq.tenant_rerank_id is incorrectly typed as str | None; change it
to int | None to match SearchDatasetReq and downstream expectations (the
model/service function get_model_config_by_id expects an int and DB stores
IntegerField). Locate the SearchDatasetsReq class and replace the annotation for
tenant_rerank_id with Annotated[int | None, Field(default=None)] so requests to
POST /datasets/search pass an integer (or None) through to
get_model_config_by_id.
api/apps/restful_apis/file2document_api.py (1)

119-147: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Folder file_ids bypass check_file_team_permission — only their expanded inner files are checked.

In the folder expansion branch (lines 119–125, pre-existing), file_id is never appended to all_file_ids — only its inner files are. Because the new auth loop at lines 128–147 only iterates all_file_ids, a caller can submit any reachable folder ID and the folder itself is never passed to check_file_team_permission. Only the inner files are verified. If check_file_team_permission relies on tenant membership (similar to how the old accessible() worked before this PR), a tenant member can bypass folder-level ownership the same way private datasets were previously bypassed.

The fix is to auth-check every original file_id (including folders) before expansion:

🛡️ Proposed fix
+        user_id = current_user.id
+        for file_id in file_ids:
+            file = files_set[file_id]
+            if not check_file_team_permission(file, user_id):
+                logger.warning(
+                    "user_id=%s resource_type=file resource_id=%s action=authorize_file result=denied file_ids=%s kb_ids=%s",
+                    user_id,
+                    file_id,
+                    file_ids,
+                    kb_ids,
+                )
+                return get_data_error_result(message="No authorization.")
+
         # Expand folders to their innermost file IDs
         all_file_ids = []
         for file_id in file_ids:
             file = files_set[file_id]
             if file.type == FileType.FOLDER.value:
                 all_file_ids.extend(FileService.get_all_innermost_file_ids(file_id, []))
             else:
                 all_file_ids.append(file_id)

-        user_id = current_user.id
         for file_id in all_file_ids:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/apps/restful_apis/file2document_api.py` around lines 119 - 147, The
folder IDs from file_ids are never checked because only expanded inner IDs are
appended to all_file_ids; update the logic so every original file_id (including
folders) is authorization-checked before or during expansion: either append the
folder id to all_file_ids before calling
FileService.get_all_innermost_file_ids(file_id, []) or run
check_file_team_permission(file, user_id) against the folder entry immediately
after retrieving it (the code that calls FileService.get_all_innermost_file_ids,
iterates file_ids, and the authorization loop over all_file_ids should ensure
both folder and inner-file IDs are validated), referencing all_file_ids,
file_ids, FileService.get_all_innermost_file_ids, and
check_file_team_permission.
internal/cli/filesystem/dataset.go (1)

323-328: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale comments contradict the actual URL built by BuildURL — remove or update them.

Both lines 326 and 610 contain:

// Call retrieval API (useAPIBase=false because the route is /v1/chunk/retrieval_test, not /api/v1/...)

These comments document the old intent: use NonAPIBase() to produce {host}/v1/chunk/retrieval_test. After this change, BuildURL("/chunk/retrieval_test") produces {host}/api/v1/chunk/retrieval_test. The comments are actively misleading and are the primary evidence for the routing regression flagged at http_client.go. Once the backend path question is resolved, update these comments to reflect the final routing intent.

🗑️ Remove stale comments at both locations
-	// Call retrieval API (useAPIBase=false because the route is /v1/chunk/retrieval_test, not /api/v1/...)
 	resp, err := p.httpClient.Request("POST", "/chunk/retrieval_test", "auto", nil, payload)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/filesystem/dataset.go` around lines 323 - 328, Update/remove the
stale inline comment that claims the request uses NonAPIBase/`/v1/...` in the
block calling p.httpClient.Request("POST", "/chunk/retrieval_test", ...); locate
the call sites (e.g., the Request invocation in dataset.go and the similar
comment at the other location around line ~610) and either delete the misleading
comment or replace it with a succinct note that reflects the current URL
behavior (e.g., that BuildURL now produces {host}/api/v1/..., or indicate the
chosen routing intent once the backend path is finalized). Ensure references to
BuildURL, NonAPIBase, and httpClient.Request are updated so the comment matches
the actual URL construction.
internal/cli/user_parser.go (1)

112-169: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

LIST USERS and LIST SERVICES are now silently broken; dead handlers remain in the dispatcher.

case TokenUsers, case TokenServices, and case TokenRoles are no longer present in parseListCommand, so LIST USERS and LIST SERVICES now return "unknown LIST target" at parse time. Their corresponding command-handler cases ("list_users", "list_services") in ExecuteAdminCommand are now unreachable dead code.

If this removal is intentional, the orphaned handler arms in client.go's ExecuteAdminCommand should be pruned and a release note / migration hint added. If unintentional, the parser cases need to be restored.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/user_parser.go` around lines 112 - 169, The parseListCommand
parser removed handling for TokenUsers, TokenServices (and TokenRoles) causing
"LIST USERS" / "LIST SERVICES" to error and leaving dead handlers in
ExecuteAdminCommand ("list_users", "list_services"); either restore the parser
branches in parseListCommand to call the appropriate command constructors (e.g.
return NewCommand("list_users") / NewCommand("list_services") or the equivalent
functions used for other LIST cases) so those dispatcher arms become reachable,
OR remove the unreachable handler cases in ExecuteAdminCommand and
TokenRoles-related code and add a release note/migration hint documenting the
removal; locate parseListCommand, TokenUsers/TokenServices/TokenRoles,
ExecuteAdminCommand, and the "list_users"/"list_services" handler names to apply
the chosen fix.
internal/cli/http_client.go (1)

73-79: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Go code calls non-existent Python endpoints — endpoints were migrated but not updated in client code.

The Python backend migrated /chunk/retrieval_test to /retrieval (commit 35f6d81) and possibly changed the /kb/list endpoint structure, but the Go CLI code was not updated to match:

  • internal/cli/filesystem/dataset.go lines 327, 611: Request("POST", "/chunk/retrieval_test", ...) → endpoint does not exist; should call /retrieval
  • internal/cli/benchmark.go line 232: Request("POST", "/kb/list", ...) → endpoint does not exist; verify correct endpoint path
  • internal/cli/benchmark.go line 245: Request("POST", "/chunk/retrieval_test", ...) → endpoint does not exist; should call /retrieval

The stale comments in dataset.go about useAPIBase=false and /v1/chunk/retrieval_test should be removed. With BuildURL now always using APIBase() to produce /api/v1/... paths, calls to these outdated endpoint names will receive 404 errors.

Update the endpoint paths to match the current Python backend API structure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/http_client.go` around lines 73 - 79, Update the obsolete
endpoint strings and remove stale comments: replace any Request("POST",
"/chunk/retrieval_test", ...) calls in internal/cli/filesystem/dataset.go (lines
around 327 and 611) and internal/cli/benchmark.go (around line 245) to
Request("POST", "/retrieval", ...) to match the migrated Python backend; verify
and correct the Request("POST", "/kb/list", ...) call in
internal/cli/benchmark.go (around line 232) to the current KB list endpoint used
by the backend; also remove the stale comments about useAPIBase=false and
/v1/chunk/retrieval_test in dataset.go since HTTPClient.BuildURL() already
prepends APIBase(), and run a quick smoke test to ensure no 404s from these
updated Request(...) calls.
internal/cli/filesystem/file.go (1)

546-551: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't force API auth for deletes.

Line 551 is now inconsistent with the rest of this provider: list/cat/root traversal use "auto", but delete still forces "api". That breaks DeleteFile/DeleteFolderByPath for normal web-login sessions.

Suggested fix
-	resp, err := p.httpClient.Request("DELETE", "/files", "api", nil, payload)
+	resp, err := p.httpClient.Request("DELETE", "/files", "auto", nil, payload)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/filesystem/file.go` around lines 546 - 551, The DeleteFile
implementation forces API auth by passing "api" to p.httpClient.Request, which
is inconsistent with other methods (list/cat/root traversal) that use "auto" and
breaks DeleteFile/DeleteFolderByPath for web-login sessions; change the auth
argument in p.httpClient.Request inside DeleteFile (and any analogous delete
helper like DeleteFolderByPath) from "api" to "auto" so deletes use the same
auto-detection of auth as the rest of FileProvider.
internal/cli/user_command.go (1)

1786-1807: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Derive auth kind in the API-token-aware commands.

Lines 1807 and 1929 still force "web" even though both functions explicitly accept either APIToken or LoginToken. With an API-token-only session, the guard passes and the request still goes out with the wrong auth mode.

Suggested fix
+	authKind := "web"
+	if c.HTTPClient.useAPIToken && c.HTTPClient.LoginToken == "" {
+		authKind = "api"
+	}
-	resp, err := c.HTTPClient.Request("GET", url, "web", nil, nil)
+	resp, err := c.HTTPClient.Request("GET", url, authKind, nil, nil)
+	authKind := "web"
+	if c.HTTPClient.useAPIToken && c.HTTPClient.LoginToken == "" {
+		authKind = "api"
+	}
-	resp, err := c.HTTPClient.Request("POST", url, "web", nil, payload)
+	resp, err := c.HTTPClient.Request("POST", url, authKind, nil, payload)

Also applies to: 1879-1929

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/user_command.go` around lines 1786 - 1807, The request calls in
CheckProviderConnection (and the similar function around lines 1879-1929)
hardcode the auth kind as "web"; change them to derive authKind from the
HTTPClient: if c.HTTPClient.APIToken != "" use "api", else if
c.HTTPClient.LoginToken != "" use "web" (or the existing login token value
semantics), assign that to a local authKind variable and pass authKind into
c.HTTPClient.Request instead of the fixed "web" string so API-token-only
sessions use the correct auth mode.
🟠 Major comments (23)
web/src/routes.tsx-20-20 (1)

20-20: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix double-slash route composition in Routes.Dataset.

Line 20 builds Routes.Dataset as '/dataset//files' because Routes.Files already includes a leading slash. This can break matching/navigation for dataset-file routes.

Proposed fix
-  Dataset = `${Routes.DatasetBase}/${Routes.Files}`,
+  Dataset = `${Routes.DatasetBase}${Routes.Files}`,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/src/routes.tsx` at line 20, Routes.Dataset is constructed by inserting an
extra '/' between Routes.DatasetBase and Routes.Files which causes a double
slash because Routes.Files already begins with a leading slash; fix by removing
the inserted slash so you concatenate Routes.DatasetBase and Routes.Files
directly (or alternatively remove the leading slash from Routes.Files) to ensure
Routes.Dataset becomes a single valid path; update the code that defines the
Routes.Dataset symbol accordingly.
web/src/hooks/use-document-request.ts-216-228 (1)

216-228: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use datasetId in the query key to prevent stale filter cache.

Line 221 keys by knowledgeId, but Line 216 fetches by datasetId = knowledgeId || id. If knowledgeId is empty and route id changes, cached filter data can bleed across datasets.

Suggested fix
   const { data } = useQuery({
     queryKey: [
       DocumentApiAction.FetchDocumentFilter,
       debouncedSearchString,
-      knowledgeId,
+      datasetId,
     ],
+    enabled: !!datasetId,
     queryFn: async () => {
-      if (!datasetId) {
-        return;
-      }
-      const { data } = await documentFilter(datasetId);
+      const { data } = await documentFilter(datasetId!);
       if (data.code === 0) {
         return data.data;
       }
     },
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/src/hooks/use-document-request.ts` around lines 216 - 228, The query
cache key currently uses knowledgeId but the fetch uses datasetId (datasetId =
knowledgeId || id), causing stale cache when knowledgeId is empty and id
changes; update the useQuery queryKey to include datasetId (or replace
knowledgeId with datasetId) so the key uniquely reflects the actual dataset used
by the query (reference: datasetId, useQuery, queryKey, knowledgeId, id,
DocumentApiAction.FetchDocumentFilter).
api/apps/restful_apis/document_api.py-1903-1913 (1)

1903-1913: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve document downloads via File2DocumentService, not (tenant_id, doc_id).

This new route authenticates against the document record, but it still reads storage using settings.STORAGE_IMPL.get(tenant_id, doc_id) and trusts ?ext= for the MIME type. That keeps the old attachment-storage contract alive and can break downloads for valid documents whose blob is stored under a different bucket/key than the document id. Reuse the same storage-address lookup as get() and derive the extension from the persisted document name instead.

Suggested fix
 async def download_attachment(tenant_id=None, doc_id=None, attachment_id=None):
@@
     try:
         # Keep backward compatibility with older callers and unit tests that still
         # pass `attachment_id` instead of the route parameter name.
         doc_id = doc_id or attachment_id
         if not DocumentService.accessible(doc_id, current_user.id):
             return get_data_error_result(message="Document not found!")
-        ext = request.args.get("ext", "markdown")
-        data = await thread_pool_exec(settings.STORAGE_IMPL.get, tenant_id, doc_id)
+
+        e, doc = DocumentService.get_by_id(doc_id)
+        if not e:
+            return get_data_error_result(message="Document not found!")
+
+        bucket, name = File2DocumentService.get_storage_address(doc_id=doc_id)
+        data = await thread_pool_exec(settings.STORAGE_IMPL.get, bucket, name)
         response = await make_response(data)
-        content_type = CONTENT_TYPE_MAP.get(ext, f"application/{ext}")
+
+        ext_match = re.search(r"\.([^.]+)$", doc.name.lower())
+        ext = ext_match.group(1) if ext_match else None
+        content_type = CONTENT_TYPE_MAP.get(ext) if ext else None
         apply_safe_file_response_headers(response, content_type, ext)
 
         return response
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/apps/restful_apis/document_api.py` around lines 1903 - 1913, The handler
must stop calling settings.STORAGE_IMPL.get(tenant_id, doc_id) and instead
resolve the file storage address via File2DocumentService like the
DocumentService.get() path does: fetch the File2Document record for the document
(e.g. via File2DocumentService.get_by_document_id or equivalent), use the
storage address (bucket/key/object id) from that record when calling
settings.STORAGE_IMPL.get (or the same storage client method used by the
existing get() flow) and derive the response extension/MIME from the persisted
filename on the File2Document/Document record rather than request.args["ext"];
keep the existing DocumentService.accessible check and then pass the fetched
blob into make_response and apply_safe_file_response_headers as before.
internal/entity/models/volcengine.go-410-451 (1)

410-451: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate modelName and apiConfig before building requests.

This new path dereferences *modelName and *apiConfig.ApiKey without any guard. A missing model or API config will panic on the first iteration instead of returning a normal error.

Proposed fix
 func (z *VolcEngine) Encode(modelName *string, texts []string, apiConfig *APIConfig, embeddingConfig *EmbeddingConfig) ([][]float64, error) {
 	if len(texts) == 0 {
 		return [][]float64{}, nil
 	}
+	if modelName == nil || *modelName == "" {
+		return nil, fmt.Errorf("model name is required")
+	}
+	if apiConfig == nil || apiConfig.ApiKey == nil || *apiConfig.ApiKey == "" {
+		return nil, fmt.Errorf("api key is required")
+	}
 
 	var region = "default"
 	if apiConfig != nil && apiConfig.Region != nil && *apiConfig.Region != "" {
 		region = *apiConfig.Region
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/volcengine.go` around lines 410 - 451, The Encode
method dereferences modelName and apiConfig.ApiKey without validation and can
panic; add upfront checks at the start of VolcEngine.Encode to return a clear
error if modelName is nil or empty, or if apiConfig is nil or apiConfig.ApiKey
is nil/empty, and then use the validated values (e.g., store local vars model :=
*modelName and apiKey := *apiConfig.ApiKey) when building the request and
headers (referencing Encode, modelName, apiConfig, ApiKey) so no nil pointer
dereference can occur during the loop.
internal/entity/models/vllm.go-199-207 (1)

199-207: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle missing reasoning_content gracefully instead of failing.

The current code treats an absent reasoning_content field as a hard error when Thinking is enabled, even when message.content is valid. This breaks compatibility with OpenAI-compatible backends and models that omit this field.

Other providers in the codebase handle this: openai.go and xai.go make it optional with soft checks, while siliconflow.go and gitee.go include a fallback using GetThinkingAndAnswer() to extract thinking from response content when the field is missing.

Update vllm.go to follow the same pattern: make reasoning_content optional and fall back to parsing thinking content when absent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/vllm.go` around lines 199 - 207, When Thinking is
enabled in vllm.go (check chatModelConfig and chatModelConfig.Thinking) don't
treat a missing or non-string "reasoning_content" in messageMap as a fatal
error; instead, if messageMap["reasoning_content"] is absent or not a string,
call the existing GetThinkingAndAnswer(message.Content) fallback to extract
thinking and answer, assign the derived thinking to reasonContent (trim a
leading newline if present), and continue without returning an error—only fail
on unexpected failures from the fallback. Ensure you update the logic around
reasonContent, the type assertion of messageMap["reasoning_content"], and the
code path that currently returns fmt.Errorf("invalid content format") so vllm.go
mirrors the optional handling used in openai.go/xai.go and the fallback used in
siliconflow.go/gitee.go.
internal/entity/models/lmstudio.go-183-192 (1)

183-192: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Implement fallback for missing reasoning_content field.

LM Studio responses may omit reasoning_content. The current code fails immediately instead of falling back to think-tag parsing as implemented in gitee.go and siliconflow.go. Update the code to first attempt extracting reasoning_content, then fall back to GetThinkingAndAnswer() if the field is absent:

Proposed fix
 	var reasonContent string
 	if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking {
-		reasonContent, ok = messageMap["reasoning_content"].(string)
-		if !ok {
-			return nil, fmt.Errorf("invalid content format")
-		}
-		if reasonContent != "" && reasonContent[0] == '\n' {
-			reasonContent = reasonContent[1:]
-		}
+		if rc, ok := messageMap["reasoning_content"].(string); ok {
+			reasonContent = rc
+			if reasonContent != "" && reasonContent[0] == '\n' {
+				reasonContent = reasonContent[1:]
+			}
+		} else {
+			reasoning, answer := GetThinkingAndAnswer(chatModelConfig.ModelClass, &content)
+			if reasoning != nil {
+				reasonContent = *reasoning
+				content = *answer
+			}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/lmstudio.go` around lines 183 - 192, The code in
lmstudio.go currently returns an error when "reasoning_content" is missing;
change the logic in the block that reads messageMap["reasoning_content"] (within
the chatModelConfig != nil && chatModelConfig.Thinking check) to first try to
read reasonContent from messageMap as now, but if the key is absent or not a
string, do NOT return an error — instead call GetThinkingAndAnswer(messageMap)
(the same fallback used in gitee.go and siliconflow.go) to extract thinking and
answer, and then set reasonContent from the returned thinking value (trimming a
leading newline if present); ensure you still preserve the branch that returns
an error only for truly malformed values (if GetThinkingAndAnswer also indicates
failure, propagate that error).
internal/entity/models/ollama.go-182-191 (1)

182-191: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make reasoning_content optional with fallback to content parsing.

Other drivers in the codebase (siliconflow, gitee) handle missing reasoning_content gracefully by falling back to GetThinkingAndAnswer() when the field is absent. The current ollama implementation hard-fails instead, converting valid Ollama responses into errors when Thinking is enabled but the response lacks a separate reasoning_content field.

Align with the existing pattern by attempting to extract reasoning_content and, if unavailable, parsing reasoning from the content itself using the fallback mechanism already used in siliconflow.go and gitee.go.

Proposed fix
 	var reasonContent string
 	if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking {
-		reasonContent, ok = messageMap["reasoning_content"].(string)
-		if !ok {
-			return nil, fmt.Errorf("invalid content format")
-		}
-		if reasonContent != "" && reasonContent[0] == '\n' {
-			reasonContent = reasonContent[1:]
-		}
+		if rc, ok := messageMap["reasoning_content"].(string); ok {
+			reasonContent = rc
+			if reasonContent != "" && reasonContent[0] == '\n' {
+				reasonContent = reasonContent[1:]
+			}
+		} else {
+			reasoning, answer := GetThinkingAndAnswer(chatModelConfig.ModelClass, &content)
+			if reasoning != nil {
+				reasonContent = *reasoning
+				content = *answer
+			}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/ollama.go` around lines 182 - 191, The current block
in ollama.go hard-fails when "reasoning_content" is missing while
chatModelConfig.Thinking is enabled; instead, attempt to read
messageMap["reasoning_content"] into reasonContent and if the cast fails or the
key is absent, call the existing GetThinkingAndAnswer(content) fallback (same
approach used in siliconflow/gitee) to parse reasoning from the main content and
set reasonContent and answer accordingly; remove the fmt.Errorf("invalid content
format") early return and ensure trimming of a leading newline on the resolved
reasonContent as already done.
internal/entity/models/nvidia.go-168-177 (1)

168-177: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strict reasoning_content extraction will fail valid responses.

When Thinking is enabled, this aborts the entire chat request whenever reasoning_content is missing, null, or non-string in the JSON — but NIM/OpenAI-style responses can legitimately omit that field (e.g., short answers, model variants that don't return separate reasoning, or streamed-only reasoning). The user gets "invalid content format" instead of the assistant's actual answer in content.

Treat the absence of reasoning_content as "no reasoning returned" rather than an error.

🛡️ Proposed fix
 	var reasonContent string
 	if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking {
-		reasonContent, ok = messageMap["reasoning_content"].(string)
-		if !ok {
-			return nil, fmt.Errorf("invalid content format")
-		}
-		if reasonContent != "" && reasonContent[0] == '\n' {
-			reasonContent = reasonContent[1:]
-		}
+		if rc, ok := messageMap["reasoning_content"].(string); ok {
+			reasonContent = strings.TrimPrefix(rc, "\n")
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/nvidia.go` around lines 168 - 177, The current block
in the NVIDIA response parsing treats a missing/non-string "reasoning_content"
as a hard error; instead, when chatModelConfig != nil &&
chatModelConfig.Thinking != nil && *chatModelConfig.Thinking, change the logic
that reads messageMap["reasoning_content"] so that if the key is absent, nil, or
not a string you set reasonContent = "" (i.e., "no reasoning returned") and
continue rather than returning fmt.Errorf("invalid content format"); if the
value is a string keep the existing trim-of-leading-newline behavior. This
change touches the variables/functionality named messageMap, reasoning_content,
reasonContent, and the chatModelConfig.Thinking check.
internal/entity/models/openrouter.go-171-180 (1)

171-180: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Same strict reasoning extraction as nvidia.go — fails valid responses missing the field.

When Thinking=true but the upstream omits reasoning (legitimate for short answers or non-reasoning variants), this aborts the request even though content was returned successfully. Use a tolerant assertion.

🛡️ Proposed fix
 	var reasonContent string
 	if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking {
-		reasonContent, ok = messageMap["reasoning"].(string)
-		if !ok {
-			return nil, fmt.Errorf("invalid content format")
-		}
-		if reasonContent != "" && reasonContent[0] == '\n' {
-			reasonContent = reasonContent[1:]
-		}
+		if rc, ok := messageMap["reasoning"].(string); ok {
+			reasonContent = strings.TrimPrefix(rc, "\n")
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 171 - 180, The current
extraction for reasonContent when chatModelConfig != nil &&
chatModelConfig.Thinking != nil && *chatModelConfig.Thinking is too strict: it
assumes messageMap["reasoning"] exists and is a string and returns an error if
not, which breaks valid responses that omit reasoning; update the logic around
reasonContent/messageMap["reasoning"] to be tolerant—check for existence and
type with an ok-style assertion, and if the key is missing or not a string
simply leave reasonContent empty (or nil) instead of returning an error, but
still trim a leading newline when a non-empty string is present; keep the checks
on chatModelConfig.Thinking and only change the handling of
messageMap["reasoning"] so it mirrors the tolerant approach used in nvidia.go.
internal/entity/models/openrouter.go-60-72 (1)

60-72: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing region fallback to "default" (also at Line 200).

If apiConfig.Region points at a region key that isn't present in o.BaseURL (typo, mis-config, or a regional config that only the chat or models endpoint defines), o.BaseURL[region] returns "" and the request is sent to /<URLSuffix.Chat> — failing with a confusing parse error rather than falling back. The NVIDIA driver guards this at lines 68–71; mirror that here, both in ChatWithMessages and in ChatStreamlyWithSender at Line 200.

🛡️ Proposed fix
 	var region = "default"
 	if apiConfig.Region != nil && *apiConfig.Region != "" {
 		region = *apiConfig.Region
 	}
 
-	url := fmt.Sprintf("%s/%s", o.BaseURL[region], o.URLSuffix.Chat)
+	baseURL := o.BaseURL[region]
+	if baseURL == "" {
+		baseURL = o.BaseURL["default"]
+	}
+	if baseURL == "" {
+		return nil, fmt.Errorf("openrouter: no base URL configured for region %q", region)
+	}
+	url := fmt.Sprintf("%s/%s", baseURL, o.URLSuffix.Chat)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 60 - 72, The code computes
region from apiConfig.Region but uses o.BaseURL[region] without verifying the
key exists, so when the map lookup yields "" the URL becomes malformed; to fix,
in both ChatWithMessages and ChatStreamlyWithSender ensure after setting region
(from apiConfig.Region) you check if o.BaseURL[region] == "" and if so set
region = "default" (mirroring the NVIDIA guard), then build url :=
fmt.Sprintf("%s/%s", o.BaseURL[region], o.URLSuffix.Chat); this guarantees a
safe fallback when the requested region key is missing.
internal/entity/models/openrouter.go-286-289 (1)

286-289: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Logging every SSE line leaks user content and reasoning into logs.

common.Info(line) writes raw data: … events — i.e., assistant tokens, reasoning traces, and anything echoed in the response — to the application log on every streamed chat. That's a privacy/PII risk (logs are commonly aggregated, retained, or shipped to third parties) and adds significant log noise on a hot path. Either remove this entirely or downgrade to Debug-level and gate on a verbose-stream flag.

🔒 Proposed fix
 	for scanner.Scan() {
 		line := scanner.Text()
-		common.Info(line)
 
 		// SSE data line starts with "data:"
 		if !strings.HasPrefix(line, "data:") {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 286 - 289, The loop
reading SSE lines using scanner := bufio.NewScanner(resp.Body) currently logs
each line with common.Info(line), which leaks user content and reasoning; change
this to stop logging raw SSE data — either remove the common.Info call entirely
or downgrade it to common.Debug and guard it with a verbose/verboseStream
boolean flag (check the flag before logging) so only explicit verbose debugging
enables the output; update the code paths around scanner.Scan() and the
common.Info(line) invocation accordingly.
internal/entity/models/openrouter.go-202-205 (1)

202-205: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix delimiter in model type routing: OpenRouter uses / not _.

OpenRouter model identifiers follow the provider/model-name format (e.g., qwen/qwen-110b-chat, z-ai/glm-5.1). The code splits on "_" instead of "/", which means strings.Split(modelName, "_")[0] will never equal "qwen" or "glm" for actual OpenRouter models—the condition at line 203 is unreachable dead code. Change the delimiter from "_" to "/" to correctly route Qwen and GLM models through AsyncChat.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 202 - 205, The routing
check incorrectly splits OpenRouter model identifiers on "_" so modelType =
strings.Split(modelName, "_")[0] never matches providers like "qwen" or "glm";
update the split delimiter to "/" (i.e., strings.Split(modelName, "/")[0]) in
the block that sets url = fmt.Sprintf("%s/%s", o.BaseURL[region],
o.URLSuffix.AsyncChat) so Qwen and GLM models are correctly routed; keep
references to modelName, modelType, o.BaseURL[region], and o.URLSuffix.AsyncChat
when making the change.
internal/entity/models/openrouter.go-112-115 (1)

112-115: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

reasoning is always added to the request body, even when Thinking is unset/false.

The block lives inside if chatModelConfig != nil but is not gated on chatModelConfig.Thinking. Every chat call will request reasoning.effort=low, which (a) wastes tokens/cost on models that bill reasoning, (b) contradicts an explicit Thinking=false, and (c) is inconsistent with ChatStreamlyWithSender (Lines 248–258) which correctly gates the same field on Thinking.

🛠️ Proposed fix
-		if chatModelConfig.DoSample != nil {
-			reqBody["do_sample"] = *chatModelConfig.DoSample
-		}
-
-		reqBody["reasoning"] = map[string]interface{}{
-			"effort": "low",
-		}
+		if chatModelConfig.DoSample != nil {
+			reqBody["do_sample"] = *chatModelConfig.DoSample
+		}
+
+		if chatModelConfig.Thinking != nil && *chatModelConfig.Thinking {
+			reqBody["reasoning"] = map[string]interface{}{"effort": "low"}
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 112 - 115, The code
currently always adds reqBody["reasoning"] = {"effort":"low"} whenever
chatModelConfig != nil; change this so the reasoning field is only added when
chatModelConfig.Thinking is true (i.e., mirror the gating used in
ChatStreamlyWithSender). Locate the block that sets reqBody["reasoning"] and
wrap it in a conditional that checks chatModelConfig.Thinking (or similar
boolean on chatModelConfig) so that reasoning is omitted when Thinking is
unset/false.
agent/sandbox/result_protocol.py-32-33 (1)

32-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Python wrapper argument injection causes runtime failure with JSON booleans/null values.

Line 32 injects raw JSON into Python code (main(**{args_json})). When args_json contains JSON booleans or null values, the resulting code attempts to reference undefined Python names (true, false, null) and fails with NameError at runtime.

Proposed patch
 if __name__ == "__main__":
     import base64
     import json

-    result = main(**{args_json})
+    __ragflow_args = json.loads({args_json!r})
+    result = main(**__ragflow_args)
     payload = json.dumps({"present": True, "value": result, "type": "json"}, ensure_ascii=False, separators=(",", ":"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/sandbox/result_protocol.py` around lines 32 - 33, The code injects raw
JSON text into Python call site (main(**{args_json})), causing NameError for
JSON literals like true/false/null; replace that injection with parsing the JSON
string at runtime (use json.loads on args_json) and then call main with the
resulting dict (e.g., kwargs = json.loads(args_json); main(**kwargs)), and
ensure the module imports json and uses the parsed object when building
result/payload (refactoring the main, args_json, and payload usage accordingly).
agent/sandbox/providers/local.py-69-177 (1)

69-177: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add structured logs around the provider lifecycle.

This new execution flow is silent on initialize, execute start/end, timeout, artifact rejection, and destroy. A few structured logs keyed by instance_id, language, timeout, exit code, and rejection reason would make this operable without logging user code or artifact contents.

As per coding guidelines, "Add logging for new flows".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/sandbox/providers/local.py` around lines 69 - 177, Add structured
logging to the provider lifecycle by instrumenting initialize, create_instance,
execute_code, and destroy_instance: in initialize() log start/end with config
keys (e.g., work_dir, timeout, max_memory_mb) and set self._initialized; in
create_instance() log instance_id, language and work_dir when creating the
instance; in execute_code() add an execution-start log with instance_id,
language, script_path and timeout, an execution-end log with instance_id,
exit_code, execution_time and whether a structured result was present, and a
timeout/failure log inside the subprocess TimeoutExpired except branch (include
instance_id and timeout); also log artifact collection/rejection outcomes from
_collect_artifacts()/_validate_output_size() with instance_id and
rejection_reason; finally in destroy_instance() log instance_id and success when
removing the directory. Use structured key/value fields (instance_id, language,
timeout, exit_code, rejection_reason) and the existing symbols initialize,
create_instance, execute_code, destroy_instance, _validate_output_size,
_collect_artifacts and _prepare_script to locate insertion points.
agent/sandbox/providers/local.py-258-287 (1)

258-287: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a cumulative artifact size cap.

The current checks limit each artifact, but not the total payload returned in artifacts. With the defaults, one run can still load and base64-encode roughly 200 MiB of files into memory before response overhead. Track aggregate collected bytes and fail once the total budget is exceeded.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/sandbox/providers/local.py` around lines 258 - 287, In
_collect_artifacts, add tracking of the cumulative size of collected artifacts
(e.g., cumulative_bytes = 0) and enforce a new total-cap property (suggest name
self.max_total_artifact_bytes) by checking cumulative_bytes + size <=
self.max_total_artifact_bytes before reading/encoding each file; if the sum
would exceed the cap, raise a RuntimeError like "Total artifact payload exceeds
X bytes". Perform this check right after computing size (and before
path.read_bytes()/base64 encoding) and increment cumulative_bytes by size when
appending the artifact so you never load files that would push the total over
the budget.
agent/sandbox/providers/local.py-126-150 (1)

126-150: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Enforce max_output_bytes while reading child output.

communicate() buffers the entire stdout/stderr in memory before _validate_output_size() runs, so a program can exhaust the worker long before the 1 MiB cap is checked. Please stream the pipes and terminate once the combined byte count crosses the limit, or redirect output to bounded temp files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/sandbox/providers/local.py` around lines 126 - 150, The code currently
calls process.communicate() which buffers all stdout/stderr in memory before
calling _validate_output_size, allowing a child to exhaust memory; replace the
communicate() usage in the run/execute flow to stream the process stdout/stderr
(via process.stdout and process.stderr reads or iterators), accumulate byte
counts and write into temporary bounded buffers or files, and if the combined
byte count exceeds the configured max_output_bytes call the same termination
logic (os.killpg/process.kill) and stop reading; finally compute execution_time
and call _validate_output_size with the truncated outputs or paths to the temp
files. Target the block around process = subprocess.Popen(...) and the
subsequent communicate/timeout handling, keeping the existing timeout behavior
and preexec_fn (_limit_child_process) and env (_build_child_env) usage. Ensure
reads respect text/encoding settings and raise the same TimeoutError when the
timeout elapses.
agent/sandbox/providers/local.py-134-136 (1)

134-136: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

preexec_fn is unsafe in a threaded service process and logging is missing.

Using preexec_fn=self._limit_child_process can deadlock child creation when the parent has multiple threads (fork() behavior in Python's subprocess). Additionally, the execute_code method lacks logging for the execution flow, which violates the coding guideline requiring logging for new flows in Python files.

Suggested fixes:

  1. Remove preexec_fn and use a separate launcher/helper process for resource limiting instead
  2. Add logging statements to track execution lifecycle (start, timeout, completion, errors)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/sandbox/providers/local.py` around lines 134 - 136, The use of
preexec_fn=self._limit_child_process in the subprocess call is unsafe in a
threaded parent and must be removed; instead, refactor resource limiting to run
in a separate launcher/helper process (e.g., add a small helper that applies
_limit_child_process before exec and invoke that helper as the subprocess
entrypoint) and keep start_new_session=os.name == "posix' as appropriate. Also
add logging calls inside execute_code (and around the subprocess launch flow
that uses _build_child_env and start_new_session) to log execution start
(including command/instance id), timeouts, successful completion (exit code and
runtime), and exceptions/errors (including subprocess stderr) so the execution
lifecycle is traceable. Ensure references to _limit_child_process, execute_code,
_build_child_env and start_new_session are updated to reflect the helper-process
approach and that all new flows include the required logger calls.
common/metadata_es_filter.py-474-506 (1)

474-506: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject non-scalar in / not in members before building ES clauses.

_csv_or_list() currently lets dicts and nested lists/tuples through. Those values then reach _terms_string_or_numeric() and produce invalid term/terms payloads instead of raising UnsupportedMetaFilter, so malformed filters won't fall back cleanly to the in-memory path.

Proposed fix
     normalised: List[Any] = []
     for m in members:
-        if isinstance(m, str):
+        if isinstance(m, (list, tuple, dict, set)):
+            raise UnsupportedMetaFilter("membership members must be scalar", flt)
+        if isinstance(m, str):
             normalised.append(m.lower().strip())
-        else:
+        elif isinstance(m, (int, float, bool)):
             normalised.append(m)
+        else:
+            raise UnsupportedMetaFilter("membership members must be scalar", flt)
     return normalised
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@common/metadata_es_filter.py` around lines 474 - 506, _csv_or_list currently
permits dicts and nested lists/tuples to pass through, which later causes
invalid ES term/terms payloads in _terms_string_or_numeric; update _csv_or_list
to validate the resolved members list and raise
UnsupportedMetaFilter(f"membership value contains non-scalar item: {type(m)}",
flt) (or similar) if any member is a dict or a list/tuple (i.e., not a scalar
type like str/int/float/bool/None), ensuring non-scalar values are rejected
before ES clause construction and allowing proper fallback to the in-memory
path.
api/db/services/conversation_service.py-247-249 (1)

247-249: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace assert with explicit runtime checks in the iframe access guard.

Lines 248–249 use assert for session-to-dialog binding validation, but assertions are stripped under python -O. This security check can silently disappear in optimized deployments. Replace with explicit if...: raise and add logging for the mismatch case per the coding guideline to add logging for new flows.

Proposed fix
         e, conv = API4ConversationService.get_by_id(session_id)
-        assert e, "Session not found!"
-        assert conv.dialog_id == dialog_id, "Session does not belong to this dialog"
+        if not e:
+            raise AssertionError("Session not found!")
+        if conv.dialog_id != dialog_id:
+            logger.warning(
+                "Iframe session/dialog mismatch: session_id=%s expected_dialog_id=%s actual_dialog_id=%s",
+                session_id,
+                dialog_id,
+                conv.dialog_id,
+            )
+            raise AssertionError("Session does not belong to this dialog")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/db/services/conversation_service.py` around lines 247 - 249, Replace the
two assertions after calling API4ConversationService.get_by_id(session_id) with
explicit runtime checks: verify that the returned existence flag (e) is truthy
and that conv.dialog_id equals dialog_id, and if either check fails raise an
appropriate exception (e.g., ValueError or a more specific HTTP/auth exception)
instead of using assert; also add a process- or module-level log call (e.g.,
process_logger.error or logger.warning) that records the session_id, dialog_id
and the failure reason before raising to satisfy the new logging requirement in
the iframe access guard.
internal/cli/filesystem/skill.go-1176-1179 (1)

1176-1179: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep skill config lookup on "auto" auth.

Line 1178 is the lone "web" call in this provider. getDefaultEmbdID runs during skill indexing, and because this helper suppresses request errors, API-token sessions will silently fall back to an empty embd_id here.

Suggested fix
 	resp, err := p.httpClient.Request("GET",
 		fmt.Sprintf("/skills/config?embd_id=&space_id=%s", url.QueryEscape(spaceID)),
-		"web", nil, nil)
+		"auto", nil, nil)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/filesystem/skill.go` around lines 1176 - 1179, The GET to
"/skills/config" in skill.go is using hardcoded "web" auth which forces
anonymous tokens and causes getDefaultEmbdID to fall back to empty embd_id;
update the p.httpClient.Request call in the skill config lookup to use "auto"
(replace the "web" string with "auto") so authentication is chosen automatically
during skill indexing (locate the call in the provider method that requests
"/skills/config" and change the auth argument to "auto").
internal/cli/user_command.go-403-405 (1)

403-405: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the active auth mode in getDatasetID.

Line 405 hardcodes "web", but this helper is reused by dataset operations that otherwise support API-token sessions. After this change, commands that resolve a dataset name first will fail for API-token-only users even though ListDatasets already handles that case correctly.

Suggested fix
 func (c *RAGFlowClient) getDatasetID(datasetName string) (string, error) {
-	resp, err := c.HTTPClient.Request("GET", "/datasets", "web", nil, nil)
+	authKind := "web"
+	if c.HTTPClient.useAPIToken && c.HTTPClient.LoginToken == "" {
+		authKind = "api"
+	}
+	resp, err := c.HTTPClient.Request("GET", "/datasets", authKind, nil, nil)
 	if err != nil {
 		return "", fmt.Errorf("failed to list datasets: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/user_command.go` around lines 403 - 405, getDatasetID currently
hardcodes the auth mode string "web" when calling c.HTTPClient.Request, which
breaks API-token-only sessions; change the call in getDatasetID to use the
client's active auth mode instead of the literal "web" (e.g., replace the
hardcoded string with the same helper or property used by ListDatasets — call
the client's getActiveAuthMode()/activeAuthMode/AuthMode accessor that
ListDatasets uses) so c.HTTPClient.Request uses the correct auth mode for the
current session.
api/apps/services/dataset_api_service.py-1147-1151 (1)

1147-1151: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Collect tenant IDs for all selected datasets (don’t break on first match).

Lines 1147-1151 stop after the first matching tenant, so multi-dataset searches can miss valid datasets owned by other tenants and return incomplete results.

Proposed patch
-    tenant_ids = []
-    tenants = UserTenantService.query(user_id=tenant_id)
-    for tenant in tenants:
-        if any(KnowledgebaseService.query(tenant_id=tenant.tenant_id, id=kb_id) for kb_id in kb_ids):
-            tenant_ids.append(tenant.tenant_id)
-            break
-    else:
-        return False, "Only owner of datasets authorized for this operation."
+    # Access has already been validated per dataset above, so scope retrieval
+    # to all dataset owner tenants.
+    tenant_ids = list({kb.tenant_id for kb in kbs})
+    if not tenant_ids:
+        return False, "Only owner of datasets authorized for this operation."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/apps/services/dataset_api_service.py` around lines 1147 - 1151, The loop
over tenants stops at the first match because of the break, causing tenant_ids
to miss other tenants owning datasets in kb_ids; in the block using
UserTenantService.query and KnowledgebaseService.query remove the break so you
collect all matching tenant. Also ensure you avoid duplicates when appending to
tenant_ids (e.g., check if tenant.tenant_id not already in tenant_ids) so every
tenant owning any kb_id is included.

Comment on lines +151 to +159
choices, ok := result["choices"].([]interface{})
if !ok {
return nil, fmt.Errorf("no choices in response")
}

firstChoice, ok := choices[0].(map[string]interface{})
if !ok {
return nil, fmt.Errorf("no choices in response")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Panic on empty choices array.

Line 152 only validates that choices is []interface{}, but Line 156 immediately indexes choices[0]. An empty array (well-formed JSON, just zero choices — e.g., a content-filtered response) will panic the request goroutine. The NVIDIA driver guards this at line 149 with len(choices) == 0.

🛡️ Proposed fix
 	choices, ok := result["choices"].([]interface{})
-	if !ok {
+	if !ok || len(choices) == 0 {
 		return nil, fmt.Errorf("no choices in response")
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
choices, ok := result["choices"].([]interface{})
if !ok {
return nil, fmt.Errorf("no choices in response")
}
firstChoice, ok := choices[0].(map[string]interface{})
if !ok {
return nil, fmt.Errorf("no choices in response")
}
choices, ok := result["choices"].([]interface{})
if !ok || len(choices) == 0 {
return nil, fmt.Errorf("no choices in response")
}
firstChoice, ok := choices[0].(map[string]interface{})
if !ok {
return nil, fmt.Errorf("no choices in response")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 151 - 159, The code parses
result["choices"] into choices but does not guard against an empty slice before
indexing; update the parsing in the function handling OpenRouter responses so
that after asserting choices, you check len(choices) == 0 and return a
descriptive error (e.g., "no choices in response") before attempting to access
choices[0]; ensure the same guard is used before converting choices[0] to
firstChoice (map[string]interface{}) to avoid a panic.

Comment on lines +270 to +273

req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", *apiConfig.ApiKey))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Possible nil-pointer dereference on apiConfig.ApiKey (also at Lines 503, 559, 607).

ChatStreamlyWithSender, Rerank, ListModels, and Balance dereference *apiConfig.ApiKey without first checking that apiConfig is non-nil and ApiKey is non-nil. Only ChatWithMessages has that guard (Lines 60–62). Any caller that passes nil (e.g., the getModelConfig path that builds apiConfig only when an instance is found, or future direct callers) will panic the request goroutine.

🛡️ Proposed fix (apply to all four call sites)
+	if apiConfig == nil || apiConfig.ApiKey == nil || *apiConfig.ApiKey == "" {
+		return fmt.Errorf("api key is nil or empty")
+	}
 	req.Header.Set("Content-Type", "application/json")
 	req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", *apiConfig.ApiKey))

(Adjust the return type per method — nil, error for non-error-only signatures, and place the check before http.NewRequest so failures short-circuit.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 270 - 273, Several
functions (ChatStreamlyWithSender, Rerank, ListModels, Balance) dereference
*apiConfig.ApiKey without null checks and can panic; add an early guard in each
function to check apiConfig != nil and apiConfig.ApiKey != nil before any use
(place the guard before calling http.NewRequest), and return the appropriate
zero-value plus an error (e.g., nil, error for functions returning (X, error) or
error for error-only signatures) so the failure short-circuits safely; update
all four call sites to use this same pattern.

Comment on lines +582 to +590
// convert result["data"] to []map[string]interface{}
models := make([]string, 0)
for _, model := range result["data"].([]interface{}) {
modelMap := model.(map[string]interface{})
modelName := modelMap["id"].(string)
models = append(models, modelName)
}

return models, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Three chained unchecked type assertions — any unexpected payload will panic.

result["data"].([]interface{}), model.(map[string]interface{}), and modelMap["id"].(string) all use the single-return form, so any field type mismatch (or data being null/missing) crashes the goroutine. Compare with the NVIDIA ListModels (Lines 389–405), which uses the comma-ok form throughout and skips malformed entries.

🛡️ Proposed fix
-	// convert result["data"] to []map[string]interface{}
-	models := make([]string, 0)
-	for _, model := range result["data"].([]interface{}) {
-		modelMap := model.(map[string]interface{})
-		modelName := modelMap["id"].(string)
-		models = append(models, modelName)
-	}
+	data, ok := result["data"].([]interface{})
+	if !ok {
+		return nil, fmt.Errorf("invalid models list format")
+	}
+	models := make([]string, 0, len(data))
+	for _, item := range data {
+		m, ok := item.(map[string]interface{})
+		if !ok {
+			continue
+		}
+		id, ok := m["id"].(string)
+		if !ok {
+			continue
+		}
+		models = append(models, id)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// convert result["data"] to []map[string]interface{}
models := make([]string, 0)
for _, model := range result["data"].([]interface{}) {
modelMap := model.(map[string]interface{})
modelName := modelMap["id"].(string)
models = append(models, modelName)
}
return models, nil
data, ok := result["data"].([]interface{})
if !ok {
return nil, fmt.Errorf("invalid models list format")
}
models := make([]string, 0, len(data))
for _, item := range data {
m, ok := item.(map[string]interface{})
if !ok {
continue
}
id, ok := m["id"].(string)
if !ok {
continue
}
models = append(models, id)
}
return models, nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/entity/models/openrouter.go` around lines 582 - 590, The code
performing three unchecked type assertions (result["data"].([]interface{}),
model.(map[string]interface{}), modelMap["id"].(string)) can panic on malformed
payloads; update the conversion in the function that builds models to use
comma-ok type assertions for each step (check result["data"] exists and is
[]interface{}, each item is map[string]interface{}, and the "id" is a string),
skip entries that fail these checks (log or silently ignore) and continue, and
only append validated model names to the models slice so the goroutine cannot
panic on unexpected payloads.

Comment on lines 90 to +99
def get_presigned_url(self, bucket, fnm, expires):
blob_name = f"{bucket}/{fnm}"
for _ in range(10):
try:
return self.conn.get_presigned_url("GET", bucket, fnm, expires)
return self.conn.get_presigned_url("GET", bucket, blob_name, expires)
except Exception:
logging.exception(f"fail get {bucket}/{fnm}")
logging.exception(f"fail get {blob_name}")
self.__open__()
time.sleep(1)
return
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether ContainerClient exposes get_presigned_url and find callers.
python - <<'PY'
try:
    from azure.storage.blob import ContainerClient
    print("has get_presigned_url:", hasattr(ContainerClient, "get_presigned_url"))
    print("sas-related members:", [m for m in dir(ContainerClient) if 'sas' in m.lower() or 'url' in m.lower()])
except Exception as e:
    print("import failed:", e)
PY
rg -nP --type=py -C3 'RAGFlowAzureSasBlob\(\)\.get_presigned_url|azure_sas_conn.*get_presigned_url'

Repository: infiniflow/ragflow

Length of output: 101


🌐 Web query:

azure-storage-blob ContainerClient get_presigned_url method API

💡 Result:

There is no azure.storage.blob.ContainerClient.get_presigned_url method in the Azure Storage Blobs Python SDK. What you likely want instead is to generate a SAS token/URL and then use it as a “pre-signed” URL: - For a container SAS (service SAS), create it with generate_container_sas and permissions/expiry, then append it to the container URL with? delimiter [1]. - The ContainerClient.from_container_url helper accepts a full container URL including a SAS token [2]. Example (Python, container SAS URL): from azure.storage.blob import generate_container_sas, ContainerSasPermissions from datetime import datetime, timezone, timedelta start_time = datetime.now(timezone.utc) expiry_time = start_time + timedelta(days=1) sas_token = generate_container_sas( account_name=container_client.account_name, container_name=container_client.container_name, account_key=account_key, permission=ContainerSasPermissions(read=True), expiry=expiry_time, start=start_time,) container_url_with_sas = f"{container_client.url}?{sas_token}" This SAS token is what effectively acts as a time-limited pre-signed URL; you can then create a ContainerClient from the SAS URL [1][2]. If you meant “pre-signed URL for a specific blob” (not the container), use the blob SAS generation helpers (e.g., generate_blob_sas) and append the resulting token to the blob URL; the same SAS-in-URL pattern applies [3][1].

Citations:


🏁 Script executed:

# Examine the actual implementation in the file being reviewed
cat -n rag/utils/azure_sas_conn.py | head -120

# Search for where get_presigned_url is defined or used
rg -n "get_presigned_url" --type=py

# Check requirements or setup files to see the azure-storage-blob version
fd -e txt -e py -e cfg -e toml | xargs rg -l "azure-storage-blob" 2>/dev/null | head -5

Repository: infiniflow/ragflow

Length of output: 5288


🏁 Script executed:

# Look at azure_spn_conn.py to see what self.conn is there
sed -n '1,150p' rag/utils/azure_spn_conn.py | cat -n

Repository: infiniflow/ragflow

Length of output: 5435


🏁 Script executed:

# Check if FileSystemClient has get_presigned_url in Azure SDK docs/patterns
# Also check what the minio_conn implementation does (since it seems to work)
sed -n '217,225p' rag/utils/minio_conn.py | cat -n

# Check s3_conn for comparison
sed -n '185,195p' rag/utils/s3_conn.py | cat -n

Repository: infiniflow/ragflow

Length of output: 1124


ContainerClient does not have a get_presigned_url method — this will always fail at runtime.

The code calls self.conn.get_presigned_url("GET", bucket, blob_name, expires) on line 94, but azure.storage.blob.ContainerClient does not expose this method. The standard Azure SDK approach is to use generate_blob_sas() to create a SAS token, then manually construct a pre-signed URL by appending the token to the blob URL.

As written, every invocation will catch an AttributeError, retry 10 times, and return None. The same issue exists in azure_spn_conn.py line 118 with FileSystemClient.

Replace with proper SAS token generation using generate_blob_sas() from azure.storage.blob.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rag/utils/azure_sas_conn.py` around lines 90 - 99, The get_presigned_url
method currently calls self.conn.get_presigned_url which ContainerClient does
not provide; replace the implementation in get_presigned_url to generate a SAS
token using azure.storage.blob.generate_blob_sas and then build the full blob
URL + "?" + sas_token (use the blob name constructed as f"{bucket}/{fnm}" and
the ContainerClient.url or blob_client.url from self.conn to form the base URL),
preserving the retry logic and calls to self.__open__ on failure; also apply the
same change pattern for the analogous call in azure_spn_conn.py where
FileSystemClient is mistakenly used, ensuring you import generate_blob_sas and
use the proper account/key and permissions/expiry parameters when creating the
SAS token.

Comment on lines 73 to 83
for _ in range(3):
try:
f = self.conn.create_file(f"{bucket}/{fnm}")
f = self.conn.create_file(f"{blob}")
f.append_data(binary, offset=0, length=len(binary))
return f.flush_data(len(binary))
except Exception:
logging.exception(f"Fail put {bucket}/{fnm}")
logging.exception(f"Fail put {blob}")
self.__open__()
time.sleep(1)
return None
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Retry loop is short-circuited — put() never retries.

return None on line 82 lives inside the except branch, so the very first failure exits the function and the for _ in range(3): loop is effectively dead. If the first create_file fails, no retry happens; if it succeeds the explicit return None after the loop is unreachable. Compare with azure_sas_conn.py:put() which correctly lets the loop iterate.

🔧 Proposed fix
     def put(self, bucket, fnm, binary, tenant_id=None):
         blob = f"{bucket}/{fnm}"
         for _ in range(3):
             try:
-                f = self.conn.create_file(f"{blob}")
+                f = self.conn.create_file(blob)
                 f.append_data(binary, offset=0, length=len(binary))
                 return f.flush_data(len(binary))
             except Exception:
                 logging.exception(f"Fail put {blob}")
                 self.__open__()
                 time.sleep(1)
-                return None
         return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rag/utils/azure_spn_conn.py` around lines 73 - 83, The put() method's retry
loop is short-circuited because the except block returns None immediately;
remove the inner return so exceptions trigger the retry (call
logging.exception(...), self.__open__(), time.sleep(1) and continue), and only
return None after the for-loop if all attempts fail; keep the successful path
returning f.flush_data(len(binary)) from the try block. Reference the put()
implementation, self.conn.create_file, f.append_data, f.flush_data,
logging.exception, and self.__open__(), and model the loop behavior on
azure_sas_conn.py:put().

Comment on lines 105 to 112
def obj_exist(self, bucket, fnm):
blob = f"{bucket}/{fnm}"
try:
client = self.conn.get_file_client(f"{bucket}/{fnm}")
client = self.conn.get_blob_client(f"{blob}")
return client.exists()
except Exception:
logging.exception(f"Fail put {bucket}/{fnm}")
logging.exception(f"Fail put {blob}")
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look for FileSystemClient bindings and confirm available client accessors.
fd -t f 'requirements*.txt|pyproject.toml|setup.py|setup.cfg' --exec rg -nH 'azure-storage-file-datalake' {} \;
rg -nP --type=py -C2 '\bFileSystemClient\b'
python - <<'PY'
import inspect
try:
    from azure.storage.filedatalake import FileSystemClient
    methods = [m for m in dir(FileSystemClient) if 'client' in m.lower()]
    print("client-accessor methods on FileSystemClient:", methods)
    print("has get_blob_client:", hasattr(FileSystemClient, "get_blob_client"))
    print("has get_file_client:", hasattr(FileSystemClient, "get_file_client"))
except Exception as e:
    print("import failed:", e)
PY

Repository: infiniflow/ragflow

Length of output: 1071


🏁 Script executed:

# Read the complete azure_spn_conn.py file to understand the context
cat -n rag/utils/azure_spn_conn.py

Repository: infiniflow/ragflow

Length of output: 5435


🏁 Script executed:

# Search for uses of get_file_client and get_blob_client in the codebase
rg -n 'get_file_client|get_blob_client' -t py

Repository: infiniflow/ragflow

Length of output: 346


🏁 Script executed:

# Look for any tests related to this file
fd -t f 'test.*azure|azure.*test' --search-path . -i

Repository: infiniflow/ragflow

Length of output: 44


FileSystemClient has no get_blob_clientobj_exist will always fail.

self.conn is a FileSystemClient from azure.storage.filedatalake (line 56). This class exposes get_file_client() and get_directory_client(), not get_blob_client() (which belongs to BlobServiceClient). Line 108 will raise AttributeError at runtime, caught by the broad except handler and logged misleadingly as "Fail put" instead of "Fail obj_exist". The get() method on line 96 correctly uses get_file_client() on the same object.

🔧 Proposed fix
     def obj_exist(self, bucket, fnm):
         blob = f"{bucket}/{fnm}"
         try:
-            client = self.conn.get_blob_client(f"{blob}")
+            client = self.conn.get_file_client(f"{blob}")
             return client.exists()
         except Exception:
-            logging.exception(f"Fail put {blob}")
+            logging.exception(f"Fail obj_exist {blob}")
         return False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rag/utils/azure_spn_conn.py` around lines 105 - 112, The obj_exist method
uses self.conn.get_blob_client which doesn't exist on FileSystemClient and will
always raise; change obj_exist to use the same pattern as get() by calling
self.conn.get_file_client(f"{bucket}/{fnm}") and then use the
FileClient.exists() result, and update the exception log message to something
like "Fail obj_exist {bucket}/{fnm}" to reflect the correct operation; ensure
you reference obj_exist, self.conn, get_file_client, and exists() when making
the change.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.16%. Comparing base (5e96c5c) to head (509c6de).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14645      +/-   ##
==========================================
- Coverage   95.59%   94.16%   -1.43%     
==========================================
  Files          10       10              
  Lines         703      703              
  Branches      112      112              
==========================================
- Hits          672      662      -10     
- Misses         21       25       +4     
- Partials       10       16       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels May 8, 2026
@jony376
Copy link
Copy Markdown
Contributor Author

jony376 commented May 8, 2026

Hi, @wangq8
Thanks your feedback. I fixed test failure issue and conflicts. Please check this PR. thanks.

@KevinHuSh KevinHuSh marked this pull request as draft May 9, 2026 04:21
@KevinHuSh KevinHuSh marked this pull request as ready for review May 9, 2026 04:21
@KevinHuSh KevinHuSh merged commit 3b6eeab into infiniflow:main May 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:M This PR changes 30-99 lines, ignoring generated files. 🧪 test Pull requests that update test cases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Private datasets remain accessible to tenant members through document APIs