Fix: private dataset authorization bypass in shared dataset access checks#14645
Conversation
📝 WalkthroughWalkthroughDocument authorization now delegates to knowledgebase-level checks: knowledgebase existence, VALID status, owner equality, permission flag (TEAM vs ME), and tenant membership via TenantService. DocumentService.accessible loads the document and defers to KnowledgebaseService.accessible, returning False if the document is missing. ChangesDataset Access Permission Enforcement
Sequence Diagram(s)sequenceDiagram
participant Client
participant DocumentService
participant KnowledgebaseService
participant TenantService
participant DB
Client->>DocumentService: request with doc_id, user_id
DocumentService->>DB: get document by doc_id
DB-->>DocumentService: doc (with kb_id) or None
alt doc exists
DocumentService->>KnowledgebaseService: accessible(kb_id, user_id)
KnowledgebaseService->>DB: get_by_id(kb_id)
DB-->>KnowledgebaseService: kb or None
alt kb exists and VALID
KnowledgebaseService->>KnowledgebaseService: if tenant_id == user_id -> True
alt not owner and permission == TEAM
KnowledgebaseService->>TenantService: get_joined_tenants_by_user_id(user_id)
TenantService-->>KnowledgebaseService: joined_tenants
KnowledgebaseService-->>DocumentService: True/False
else not TEAM
KnowledgebaseService-->>DocumentService: False
end
else
KnowledgebaseService-->>DocumentService: False
end
DocumentService-->>Client: True/False
else
DocumentService-->>Client: False
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
|
There was a problem hiding this comment.
🧹 Nitpick comments (5)
api/db/services/knowledgebase_service.py (3)
513-516: ⚡ Quick win
get_kb_by_idfetches the knowledgebase twice.Line 513 calls
cls.get_by_id(kb_id), and Line 514 callscls.accessible(kb_id, user_id)which unconditionally callscls.get_by_id(kb_id)again insideaccessible. Everyget_kb_by_idinvocation therefore pays two DB round-trips for the same row. The already-fetchedkbobject is available and can be used to short-circuit the second fetch.♻️ Avoid the second fetch
`@classmethod` `@DB.connection_context`() def get_kb_by_id(cls, kb_id, user_id): - e, kb = cls.get_by_id(kb_id) - if not e or not cls.accessible(kb_id, user_id): - return [] - return [kb.to_dict()] + if not cls.accessible(kb_id, user_id): + return [] + e, kb = cls.get_by_id(kb_id) + if not e: + return [] + return [kb.to_dict()]Alternatively, expose a private
_accessible(cls, kb, user_id)overload that accepts an already-loaded model instance and avoid the redundant lookup altogether.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/db/services/knowledgebase_service.py` around lines 513 - 516, get_kb_by_id currently calls cls.get_by_id(kb_id) then calls cls.accessible(kb_id, user_id) which calls cls.get_by_id again; change get_kb_by_id to reuse the already-fetched kb by checking accessibility against the loaded instance: either add a private overload _accessible(cls, kb, user_id) that accepts the model instance and use that from get_kb_by_id, or modify accessible to accept an optional kb parameter and pass the kb returned from cls.get_by_id(kb_id) to avoid the second DB fetch; ensure you still return [] when not found or not accessible and return [kb.to_dict()] when allowed.
481-502: 💤 Low valueAdd logging for the new
accessibleflow.The new access-control path (denied for private dataset, denied for non-member, granted for team member) has no log output, making it difficult to diagnose authorization failures in production. As per coding guidelines: "Add logging for new flows".
📋 Suggested logging additions
`@classmethod` `@DB.connection_context`() def accessible(cls, kb_id, user_id): + import logging e, kb = cls.get_by_id(kb_id) if not e: + logging.debug("accessible: kb_id=%s not found", kb_id) return False if kb.status != StatusEnum.VALID.value: + logging.debug("accessible: kb_id=%s status=%s invalid", kb_id, kb.status) return False if kb.tenant_id == user_id: return True if kb.permission != TenantPermission.TEAM.value: + logging.debug("accessible: kb_id=%s is private, denying user_id=%s", kb_id, user_id) return False joined_tenants = TenantService.get_joined_tenants_by_user_id(user_id) - return any(tenant["tenant_id"] == kb.tenant_id for tenant in joined_tenants) + result = any(tenant["tenant_id"] == kb.tenant_id for tenant in joined_tenants) + if not result: + logging.debug("accessible: kb_id=%s tenant_id=%s not in joined tenants for user_id=%s", kb_id, kb.tenant_id, user_id) + return result🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/db/services/knowledgebase_service.py` around lines 481 - 502, The accessible(cls, kb_id, user_id) method lacks logs for its access-control decisions; add structured debug/info logs at each decision point (e.g., when get_by_id returns no kb, when kb.status != StatusEnum.VALID, when tenant_id matches user_id and access is granted, when permission != TenantPermission.TEAM and access denied, and when checking TenantService.get_joined_tenants_by_user_id results) including kb_id, user_id, kb.tenant_id and the reason for grant/deny; use the module logger (the logger used elsewhere in this file) and keep messages concise and consistent with existing logging style.
527-531: 💤 Low value
get_kb_by_namere-fetches each already-loadedkbinsideaccessible.For every knowledgebase returned by
cls.query(...)at Line 527,cls.accessible(kb.id, user_id)at Line 529 issues anotherget_by_idDB call for the same row. While in practice the loop usually terminates after the first match, this is the same redundant-fetch pattern as inget_kb_by_id.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/db/services/knowledgebase_service.py` around lines 527 - 531, get_kb_by_name is re-fetching each knowledgebase because it calls cls.accessible(kb.id, user_id) which does a DB get_by_id; change the check to avoid a redundant fetch by either (A) updating accessible to accept a KB object (e.g., accessible(kb_or_id, user_id)) and branch on type, or (B) add a new helper like accessible_obj(kb, user_id) that contains the same permission logic but operates on the already-loaded kb; then update get_kb_by_name to call the object-aware check (replace cls.accessible(kb.id, user_id) with the new/updated call) so you use the in-memory kb and return kb.to_dict() without an extra DB round-trip.test/unit_test/api/db/services/test_dataset_access_permissions.py (1)
32-72: ⚡ Quick winMissing test for the dataset owner always-accessible path.
KnowledgebaseService.accessiblegrants access whenkb.tenant_id == user_id(the owner path, Line 495-496 inknowledgebase_service.py). There is no test covering this branch. If that condition is ever accidentally removed or inverted, the regression would go undetected.✅ Suggested additional test
def test_owner_can_always_access_private_dataset(monkeypatch): kb = SimpleNamespace( id="kb-private", tenant_id="owner-1", permission=TenantPermission.ME.value, status=StatusEnum.VALID.value, ) monkeypatch.setattr(KnowledgebaseService, "get_by_id", classmethod(lambda cls, kb_id: (True, kb))) # Owner should always have access regardless of permission mode assert _unwrapped_kb_accessible()(KnowledgebaseService, "kb-private", "owner-1") is True🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/unit_test/api/db/services/test_dataset_access_permissions.py` around lines 32 - 72, Add a unit test that covers the owner-access branch: create a SimpleNamespace kb with id "kb-private", tenant_id "owner-1", permission TenantPermission.ME.value and status StatusEnum.VALID.value, monkeypatch KnowledgebaseService.get_by_id to return (True, kb), then assert that _unwrapped_kb_accessible()(KnowledgebaseService, "kb-private", "owner-1") is True (name the test e.g. test_owner_can_always_access_private_dataset) to ensure KnowledgebaseService.accessible treats the tenant owner as always allowed.api/db/services/document_service.py (1)
680-684: 💤 Low valueAdd logging for the new
DocumentService.accessibleflow.The updated delegation path (document not found, or KB access denied) produces no log output. As per coding guidelines: "Add logging for new flows".
📋 Suggested logging additions
`@classmethod` `@DB.connection_context`() def accessible(cls, doc_id, user_id): e, doc = cls.get_by_id(doc_id) if not e: + logging.debug("DocumentService.accessible: doc_id=%s not found", doc_id) return False return KnowledgebaseService.accessible(doc.kb_id, user_id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/db/services/document_service.py` around lines 680 - 684, Add logging to DocumentService.accessible: after calling cls.get_by_id(doc_id) log an info/debug message when the document is not found (include doc_id and user_id) and return False; if the document is found, log that access is being delegated to KnowledgebaseService.accessible (include doc_id, doc.kb_id and user_id) and optionally log the result returned by KnowledgebaseService.accessible before returning it. Use the module's logger (or processLogger) consistent with other services.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@api/db/services/document_service.py`:
- Around line 680-684: Add logging to DocumentService.accessible: after calling
cls.get_by_id(doc_id) log an info/debug message when the document is not found
(include doc_id and user_id) and return False; if the document is found, log
that access is being delegated to KnowledgebaseService.accessible (include
doc_id, doc.kb_id and user_id) and optionally log the result returned by
KnowledgebaseService.accessible before returning it. Use the module's logger (or
processLogger) consistent with other services.
In `@api/db/services/knowledgebase_service.py`:
- Around line 513-516: get_kb_by_id currently calls cls.get_by_id(kb_id) then
calls cls.accessible(kb_id, user_id) which calls cls.get_by_id again; change
get_kb_by_id to reuse the already-fetched kb by checking accessibility against
the loaded instance: either add a private overload _accessible(cls, kb, user_id)
that accepts the model instance and use that from get_kb_by_id, or modify
accessible to accept an optional kb parameter and pass the kb returned from
cls.get_by_id(kb_id) to avoid the second DB fetch; ensure you still return []
when not found or not accessible and return [kb.to_dict()] when allowed.
- Around line 481-502: The accessible(cls, kb_id, user_id) method lacks logs for
its access-control decisions; add structured debug/info logs at each decision
point (e.g., when get_by_id returns no kb, when kb.status != StatusEnum.VALID,
when tenant_id matches user_id and access is granted, when permission !=
TenantPermission.TEAM and access denied, and when checking
TenantService.get_joined_tenants_by_user_id results) including kb_id, user_id,
kb.tenant_id and the reason for grant/deny; use the module logger (the logger
used elsewhere in this file) and keep messages concise and consistent with
existing logging style.
- Around line 527-531: get_kb_by_name is re-fetching each knowledgebase because
it calls cls.accessible(kb.id, user_id) which does a DB get_by_id; change the
check to avoid a redundant fetch by either (A) updating accessible to accept a
KB object (e.g., accessible(kb_or_id, user_id)) and branch on type, or (B) add a
new helper like accessible_obj(kb, user_id) that contains the same permission
logic but operates on the already-loaded kb; then update get_kb_by_name to call
the object-aware check (replace cls.accessible(kb.id, user_id) with the
new/updated call) so you use the in-memory kb and return kb.to_dict() without an
extra DB round-trip.
In `@test/unit_test/api/db/services/test_dataset_access_permissions.py`:
- Around line 32-72: Add a unit test that covers the owner-access branch: create
a SimpleNamespace kb with id "kb-private", tenant_id "owner-1", permission
TenantPermission.ME.value and status StatusEnum.VALID.value, monkeypatch
KnowledgebaseService.get_by_id to return (True, kb), then assert that
_unwrapped_kb_accessible()(KnowledgebaseService, "kb-private", "owner-1") is
True (name the test e.g. test_owner_can_always_access_private_dataset) to ensure
KnowledgebaseService.accessible treats the tenant owner as always allowed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4a6eadbb-50cc-4b7e-91e4-097193bd7958
📒 Files selected for processing (3)
api/db/services/document_service.pyapi/db/services/knowledgebase_service.pytest/unit_test/api/db/services/test_dataset_access_permissions.py
|
LGTM |
|
Hi @jony376 please help to fix the unit test failrue. |
…flow#14185) ## Summary Fixes infiniflow#14159 — files from different datasets can overwrite each other in Azure Blob storage. ## Problem Both `azure_spn_conn.py` and `azure_sas_conn.py` ignore the `bucket` parameter in all storage operations (`put`, `get`, `rm`, `obj_exist`, `get_presigned_url`). Files are stored flat using only the filename, so two datasets containing a file with the same name will overwrite each other. The MinIO and S3 implementations correctly use the bucket (typically the knowledge base ID) as a path prefix to create logical folder isolation: - MinIO: uses `use_prefix_path` decorator → `{orig_bucket}/{fnm}` - S3: uses `use_prefix_path` decorator → `{prefix_path}/{bucket}/{fnm}` ## Fix Prepend `{bucket}/` to the file path in all 5 operations across both Azure connector files: | File | Methods fixed | |------|---------------| | `azure_spn_conn.py` | `put`, `get`, `rm`, `obj_exist`, `get_presigned_url` | | `azure_sas_conn.py` | `put`, `get`, `rm`, `obj_exist`, `get_presigned_url` | This matches the existing convention where `bucket` is the knowledge base ID used as a directory prefix. ##⚠️ Migration Note Existing Azure SPN/SAS deployments have files stored without the bucket prefix. After this fix, new files will be stored under `{bucket}/{filename}` while existing files remain at `{filename}`. A one-time migration script or manual file move may be needed for existing deployments. New deployments are unaffected. ## Testing - Verified the fix is consistent across all 5 methods in both files - The `health()` method is intentionally left unchanged as it uses a hardcoded test filename without bucket semantics Co-authored-by: Jin Hai <[email protected]>
…tdated models (infiniflow#14637) ### What problem does this PR solve? Feat: support local provider for code exec component & remove some outdated models ### Type of change - [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve? Fix: Change route name ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? Fixes infiniflow#14412. `common.metadata_utils.meta_filter` evaluates user-defined metadata conditions in Python after `DocMetadataService.get_flatted_meta_by_kbs` loads the entire `meta_fields` table into memory. Past a few thousand documents per knowledge base this becomes a memory bottleneck and a wasted ES round-trip — every filter request currently fetches up to 10000 metadata rows even when the resulting `doc_ids` list is tiny. This PR adds an ES push-down path that translates the same filter language into a `bool` query and returns just the matching document IDs. **Changes** - `common/metadata_es_filter.py` *(new)*: pure-Python translator from the RAGflow filter list to ES DSL. Covers every operator the in-memory path supports (`=`, `≠`, `>`, `<`, `≥`, `≤`, `in`, `not in`, `contains`, `not contains`, `start with`, `end with`, `empty`, `not empty`) with `case_insensitive: true` on `prefix` and `wildcard` for parity with the existing lower-cased Python comparisons. User wildcard metacharacters are escaped before being injected into `wildcard` patterns. Negative operators (`≠`, `not in`, `not contains`, ranges) are wrapped with an `exists` guard so they do not accidentally match documents missing the key, matching the legacy `if k not in metas` behaviour. - `api/db/services/doc_metadata_service.py`: new `DocMetadataService.filter_doc_ids_by_meta_pushdown(kb_ids, filters, logic)` that returns the doc IDs ES matched, or `None` to signal the caller should fall back to the in-memory path. Returns `None` when the active doc store is Infinity (`meta_fields` is a JSON column, not a dotted-object mapping), when any filter cannot be expressed in DSL (`UnsupportedMetaFilter`), or when the ES request or metadata index lookup errors. - `common/metadata_utils.py`: `apply_meta_data_filter` accepts an optional `kb_ids` argument. When supplied, conditions go through push-down first via a new `_try_meta_pushdown` helper; on `None` the function falls back to the original `meta_filter` call. Default behaviour is unchanged for callers that don't pass `kb_ids`. - Updated all four callers (`agent/tools/retrieval.py`, `api/db/services/dialog_service.py` ×2, `api/apps/services/dataset_api_service.py`, `api/apps/sdk/session.py`) to forward `kb_ids` so the push-down path is exercised in production. - `test/unit_test/common/test_metadata_es_filter.py` *(new)*: 35 unit tests covering every operator's DSL shape, value coercion (`ast.literal_eval`, lowercasing, ISO-date pass-through), wildcard escaping, OR-logic wrapping that protects negative clauses, and the doc-ID extractor. **Behaviour preserved** - The in-memory `meta_filter` is untouched and still services every fallback case (Infinity backend, unknown operators, ES outages). - The eligibility / credibility / issue-multiplier semantics described in the LLM-driven `auto` and `semi_auto` modes still hand the LLM the full in-memory `metas` dict to choose conditions from. Only the *evaluation* of those generated conditions is pushed down. - Existing tests in `test/unit_test/common/test_metadata_filter_operators.py` continue to pass (14/14). **Test plan** - `pytest test/unit_test/common/test_metadata_es_filter.py` — 35 passed. - `pytest test/unit_test/common/test_metadata_filter_operators.py` — 14 passed. - `ruff check` clean on every modified file. - Reviewer please validate the ES query shapes against a live cluster — particularly `case_insensitive` on `wildcard` and `prefix` (requires ES 7.10+) and the `exists` + `must_not` pairing for `≠`. **Notes** - The first cut caps each push-down request at 10000 results, matching the existing `get_flatted_meta_by_kbs` limit, and logs a warning when the cap is hit. A `search_after` follow-up would let us drop the cap entirely once the push-down path is validated. - Operator parity with the in-memory path is exact for the canonical unicode operators (`≥`, `≤`, `≠`) used internally; the ASCII aliases (`>=`, `<=`, `!=`) are normalised by `convert_conditions` before they reach the translator. ### Type of change - [x] Performance Improvement --------- Co-authored-by: sxxtony <[email protected]>
## Summary Update the Turkish locale file to match the latest English locale keys. ## Changes - Add missing Turkish translations for the new Skills and Skill Search sections - Add newly introduced common, header, dataset, settings, and agent workflow strings - Align renamed flow keys such as file format options and list operations with the English source - Add empty-state strings for skill spaces ## Validation - Compared web/src/locales/en.ts and web/src/locales/tr.ts: 0 missing keys, 0 extra keys - Checked jsonjoy-builder locale: Turkish is already complete - Checked translated README variants: no new Turkish-specific documentation gap found - VS Code diagnostics: no errors in web/src/locales/tr.ts Co-authored-by: bakiburakogun <[email protected]>
… vllm (infiniflow#14614) ### What problem does this PR solve? Three Go drivers had `CheckConnection` returning a hardcoded `no such method` error, even though each one already has a working `ListModels` that hits the configured base URL with the configured API key. So the "Check connection" button in the model provider UI always failed for these three providers, even when the underlying setup was fine. Affected drivers: - `internal/entity/models/ollama.go` - `internal/entity/models/lmstudio.go` - `internal/entity/models/vllm.go` This is a real user-facing gap because Ollama and LM Studio are two of the most popular local LLM runners, and vLLM is widely used for self-hosted deployments. ### What this PR includes For each of the three drivers, replace the stub with a small implementation that calls `ListModels` and returns its error: ```go func (o *OllamaModel) CheckConnection(apiConfig *APIConfig) error { _, err := o.ListModels(apiConfig) return err } ``` This is the exact pattern that xai, moonshot, deepseek, aliyun, and gitee already use for the same method. No JSON change. No factory change. No interface change. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### How was this tested? - `go build ./internal/entity/models/...` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full ModelDriver interface still resolves on each driver (NewInstance, Name, ChatWithMessages, ChatStreamlyWithSender, Encode, Rerank, ListModels, Balance, CheckConnection). - Pattern parity with the existing xai, moonshot, deepseek, aliyun, and gitee CheckConnection methods. Closes infiniflow#14609
### What problem does this PR solve?
The SiliconFlow Go driver shipped with a stub \`Balance\` method that
returned \`no such method\`, even though SiliconFlow exposes a public
\`GET /v1/user/info\` endpoint that returns the account balance per
currency.
So the "Balance" panel in the model provider UI always shows an error
for SiliconFlow tenants, while it already works for
Moonshot and Gitee. This PR fills the gap.
### What this PR includes
- \`conf/models/siliconflow.json\`: add \`\"balance\": \"user/info\"\`
under \`url_suffix\` so the driver builds the URL from config.
- \`internal/entity/models/siliconflow.go\`: replace the \`Balance\`
stub with a real implementation. Adds a small local response type that
matches the upstream shape.
No factory change. No interface change.
### How the driver works
- Validate \`apiConfig\` and the API key, resolve the region with a
default fallback, and build the URL from \`BaseURL[region] +
URLSuffix.Balance\`.
- GET the URL with \`Authorization: Bearer <api_key>\`.
- Parse the upstream response. SiliconFlow returns balance fields as
strings, so the driver parses them with \`strconv.ParseFloat\`. It
prefers \`totalBalance\` over \`balance\` when both are present.
- Return \`{\"balance\": <float>, \"currency\": \"CNY\"}\`, the same
shape the Moonshot driver returns. The UI can render it
with no provider-specific code.
### Edge cases
- Missing or empty API key returns a clear local error before any HTTP
call.
- An unknown region falls back to the default base URL.
- Empty \`balance\` and \`totalBalance\` returns a clear "no balance
info in response" error rather than a zero-value silent success.
- Non-numeric balance string returns a clear parse error.
- Non-200 responses propagate the upstream status line and body.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### How was this tested?
- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
returns exit 0.
- The full method set on \`SiliconflowModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing Moonshot and Gitee Balance
implementations.
Closes infiniflow#14642
### What problem does this PR solve? 1. **Implement `OpenRouter` Provider:** Fully support OpenRouter AI models (e.g., `gemma`, `minimax`). Includes robust handling of Server-Sent Events (SSE) streams, error event interception, and proper parsing of both `reasoning_content` and standard `content`. 2. **Fix BaseURL Resolution Bug:** Fixed a critical edge case in region configuration parsing. Added a strict empty string check (`*apiConfig.Region != ""`) alongside the `nil` check. This ensures that if the UI passes an empty string, the system correctly falls back to the `"default"` region, preventing `unsupported protocol scheme ""` errors during HTTP requests. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)
Closes infiniflow#14631 ### What problem does this PR solve? The DeepSeek Go driver shipped with a stub \`Balance\` method that returned \`no such method\`, even though DeepSeek exposes a public \`GET /user/balance\` endpoint that works with the same Bearer token used for chat. So the "Balance" panel in the model provider UI always shows an error for DeepSeek tenants, while it already works for Moonshot and Gitee. This PR fills the gap. ### What this PR includes - \`conf/models/deepseek.json\`: add \`\"balance\": \"user/balance\"\` under \`url_suffix\` so the driver can build the URL from config the same way the other endpoints do. - \`internal/entity/models/deepseek.go\`: replace the \`Balance\` stub with a real implementation. Adds a small local response type \`deepseekBalanceResponse\` that matches the upstream shape. No factory change. No interface change. ### How the driver works - Validate \`apiConfig\` and the API key, resolve the region (with a \`default\` fallback), and build the URL from \`BaseURL[region] + URLSuffix.Balance\`. - GET the URL with \`Authorization: Bearer <api_key>\`. - Parse the upstream response: \`\`\`json { \"is_available\": true, \"balance_infos\": [ {\"currency\": \"USD\", \"total_balance\": \"10.00\", ...}, {\"currency\": \"CNY\", \"total_balance\": \"70.00\", ...} ] } \`\`\` \`total_balance\` is a string in the upstream API, so the driver parses it with \`strconv.ParseFloat\`. - Return the first balance entry as \`{\"balance\": <float>, \"currency\": <string>}\`, the same shape the Moonshot driver returns. The UI can render it with no provider-specific code. ### Edge cases - Missing or empty API key returns a clear local error before any HTTP call. - Empty \`balance_infos\` returns a clear \"no balance info in response\" error rather than a zero-value silent success. - Non-numeric \`total_balance\` returns a clear parse error. - Non-200 responses propagate the upstream status line and body so the user can see why the call failed. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on \`DeepSeekModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing Moonshot and Gitee Balance implementations.
…nfiniflow#14636) ### What problem does this PR solve? The NVIDIA Go driver added in infiniflow#14623 has a real chat path, but \`ListModels\` and \`CheckConnection\` are stubs that always return \`no such method\`. So: - The model picker cannot auto-populate available NVIDIA NIM model ids. Users have to type the full id by hand (e.g. \`abacusai/dracarys-llama-3.1-70b-instruct\`). - The "Check connection" button always fails for NVIDIA, even when the base URL is reachable and the API key is accepted. NVIDIA NIM is OpenAI-compatible. \`/v1/models\` works with the same Bearer token used for chat. The \`conf/models/nvidia.json\` file already wires the \`models\` url_suffix, so no config change is needed. ### What this PR includes - \`internal/entity/models/nvidia.go\`: - \`ListModels\` now calls \`GET ${BaseURL}/${URLSuffix.Models}\`, parses \`response.data[*].id\`, and returns the list. Same shape as the moonshot, xai, and openai drivers. - \`CheckConnection\` now calls \`ListModels\` and returns its error. Same pattern xai, moonshot, deepseek, aliyun, and gitee already use. \`Balance\`, \`Encode\`, and \`Rerank\` are still stubs in this PR and can be added in follow-ups. No JSON change. No factory change. No interface change. ### How the implementation works - Region resolution falls back to \`default\` when the supplied region is unknown, so a stray region value does not break a valid request. - The Authorization header is only set when \`apiConfig\` and \`ApiKey\` are non-nil and non-empty. This avoids a nil-pointer dereference and lets self-hosted NIM deployments without a key still work. - Non-200 responses propagate the upstream status line and body so the user sees a real error message. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on \`NvidiaModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing xai, moonshot, deepseek, aliyun, gitee, and openai drivers. Closes infiniflow#14635
…nfiniflow#14347) ## Summary Fixes file collision between different datasets when using Azure Blob storage (SPN or SAS authentication). ## Bug azure_spn_conn.py and �zure_sas_conn.py ignored the �ucket parameter entirely, storing all files flat with just the filename. This caused files with the same name from different datasets (knowledge bases) to overwrite each other. ## Fix Prepend bucket/ as a path prefix in all methods (put, m, get, obj_exist, get_presigned_url, health) to match the behavior of MinIO and S3 implementations. ## Changes - **rag/utils/azure_spn_conn.py**: Added {bucket}/ prefix to file paths in all operations - **rag/utils/azure_sas_conn.py**: Same fix applied for consistency (also noted in the original issue) ## Testing Manual verification: files from different datasets now stored under distinct bucket/ prefixes, preventing collisions. Fixes infiniflow#14159 Co-authored-by: Hunter <[email protected]> Co-authored-by: Jin Hai <[email protected]>
### What problem does this PR solve?
The Gitee AI Go driver shipped with a stub \`Rerank\` method that
returned \`Rerank not implemented\`, even though
\`conf/models/gitee.json\` already wires the rerank URL suffix at
\`\"rerank\": \"rerank\"\`. The same config did not list any
rerank model, so the picker had nothing to select.
So a Gitee tenant could not use BAAI/bge-reranker-v2-m3 as a reranker
through the Go layer today, even though the
infrastructure was one config entry and one method body away.
### What this PR includes
- \`conf/models/gitee.json\`: add \`BAAI/bge-reranker-v2-m3\` to the
\`models\` array.
- \`internal/entity/models/gitee.go\`: replace the \`Rerank\` stub with
a real implementation. Adds two small local types
that match the OpenAI-compatible \`/rerank\` shape already used by the
SiliconFlow and ZhipuAI drivers.
No factory change. No interface change.
### How the driver works
- Validate \`apiConfig\` and the API key, validate the model name,
resolve the region with a default fallback, build the
URL from \`BaseURL[region] + URLSuffix.Rerank\`.
- Use a per-call \`context.WithTimeout(30s)\` and
\`http.NewRequestWithContext\`, matching the pattern the
recently merged Aliyun Encode and the OpenAI driver already use.
- Send \`{model, query, documents, top_n, return_documents:false}\` in
the body.
- Parse \`results[*].relevance_score\` and copy each score into the
output slice indexed by \`results[*].index\`, so the
output order matches the input order even if the API returns items in a
different order.
- Empty input returns \`[]float64{}\` with no HTTP call.
- An out-of-range result index returns a clear error rather than
silently skipping the entry.
- Non-200 responses propagate the upstream status line and body.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### How was this tested?
- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
returns exit 0.
- The full method set on \`GiteeModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing SiliconFlow Rerank and the recently
merged ZhipuAI Rerank (infiniflow#14608).
Closes infiniflow#14655
### What problem does this PR solve? Use right key in error text. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
…iflow#14649) ### Related issues Closes infiniflow#14648 ### What problem does this PR solve? This PR fixes an authorization flaw in `POST /files/link-to-datasets`. Before this change, the endpoint only checked whether the supplied `file_ids` and `kb_ids` existed. It did not verify whether the authenticated user was actually allowed to access those files or target datasets. As a result, an authenticated user who knew valid IDs could relink another user's files to arbitrary datasets. This was especially risky because the relinking flow is state-changing: the background worker removes existing file-document mappings and then recreates documents under the attacker-supplied dataset IDs. This change makes the route enforce the same permission model already used by nearby file and document operations: - each resolved file must pass `check_file_team_permission(...)` - each target dataset must pass `check_kb_team_permission(...)` - authorization is enforced before scheduling background relinking work ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Testing - Added regression coverage in `test/testcases/test_web_api/test_file_app/test_file2document_routes_unit.py` - Covered: - unauthorized file access is rejected - unauthorized dataset access is rejected - existing success path still returns immediately after scheduling background work - Attempted to run: - `python -m pytest test\\testcases\\test_web_api\\test_file_app\\test_file2document_routes_unit.py -q` - Local execution in this workspace is currently blocked by missing test dependencies during bootstrap, including `ragflow_sdk` --------- Co-authored-by: jony376 <[email protected]>
### What problem does this PR solve? Since API is updated, CLI login failed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <[email protected]>
### What problem does this PR solve? 1. implement `rerank`, `embedding`, `balance`, `checkConnet` method for `OpenRouter` 2. delete `chat` method in `internal/entity/models/volcengine.go` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring
### What problem does this PR solve?
The Aliyun Go driver shipped with a stub \`Encode\` method that returned
\`no such method\`, even though \`conf/models/aliyun.json\` already
wires the OpenAI-compatible embeddings URL suffix at
\`compatible-mode/v1/embeddings\`. The same config also did not list any
embedding models, so the picker had nothing to select.
So an Aliyun tenant who wanted to use Tongyi text-embedding-v3 or v4 in
the Go layer could not, even though the upstream endpoint is public and
uses the standard \`POST /v1/embeddings\` shape that the SiliconFlow and
ZhipuAI
drivers already support.
This PR fills the gap.
### What this PR includes
- \`conf/models/aliyun.json\`: add \`text-embedding-v4\` and
\`text-embedding-v3\` to the \`models\` array.
- \`internal/entity/models/aliyun.go\`: replace the \`Encode\` stub with
a real implementation. Adds a small local response type that matches the
OpenAI-compatible shape.
No factory change. No interface change.
### How the driver works
- Validate \`apiConfig\` and the API key, validate the model name,
resolve the region with a default fallback, build the
URL from \`BaseURL[region] + URLSuffix.Embedding\`.
- Send all input texts in one request as the \`input\` array, the same
OpenAI-compatible shape the SiliconFlow \`Encode\`
uses.
- Parse \`data[*].embedding\` and copy each slice into a \`[][]float64\`
indexed by \`data[*].index\` so the output order matches the input order
even if the API returns items in a different order.
- Handle both \`float64\` and \`float32\` element types.
- Empty input returns \`[][]float64{}\` with no HTTP call.
- Non-200 responses propagate the upstream status line and body.
- A final pass checks every input slot got a vector and returns a clear
error if any slot is still nil.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### How was this tested?
- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
returns exit 0.
- The full method set on \`AliyunModel\` still matches the
\`ModelDriver\` interface.
- Pattern parity with the existing SiliconFlow Encode implementation.
Closes infiniflow#14646
---------
Co-authored-by: Jin Hai <[email protected]>
…niflow#14618) (infiniflow#14625) ### What problem does this PR solve? Closes infiniflow#14618. The `GET /v1/document/get/<doc_id>` endpoint in `api/apps/document_app.py` was protected only by `@login_required` and called `DocumentService.get_by_id(doc_id)` without verifying that the document's knowledge base belonged to the requesting user's tenant. Any authenticated user who knew (or guessed) a document ID could download files belonging to any other tenant — a cross-tenant IDOR. This PR adds a `DocumentService.accessible(doc_id, current_user.id)` check before serving the file. The helper already exists and joins `Document` → `Knowledgebase` → `UserTenant` to verify the requesting user belongs to the tenant that owns the document's KB. The same pattern is already used by `api/apps/restful_apis/document_api.py` and mirrors the tenant scoping in the SDK route at `api/apps/sdk/doc.py`. The check returns the existing `"Document not found!"` error for both non-existent and inaccessible documents, so attackers cannot use the response to enumerate valid doc IDs across tenants. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Other (please describe): Security fix (cross-tenant IDOR / authorization bypass)
…l tag (infiniflow#14613) ## Summary - **Collapsible thinking**: Replace `<section>` with `<details>` for `<think>` content, so model thinking output is collapsed by default (click to expand). Works for all models that output `<think>` tags (Qwen3, DeepSeek, Gemini, Claude, etc.). - **Fix double thinking tags**: When reasoning/deep research mode is enabled in knowledge base chat, both the retrieval progress and model thinking were wrapped in `<think>` tags, producing two "Thinking..." blocks. Now retrieval progress uses a dedicated `<retrieving>` tag rendered as a separate "Retrieving..." collapsible with a distinct green accent. ### Before - Thinking content displayed as flat gray-bordered `<section>`, occupying significant screen space - Deep research + model thinking both use `<think>` → two identical "Thinking..." blocks ### After - Thinking content collapsed by default in a `<details>` element, click "Thinking..." to expand - Deep research shows "Retrieving..." (green border), model thinking shows "Thinking..." (gray border) ## Changes **Backend (`api/db/services/dialog_service.py`)** - Deep research callback: replace `start_to_think`/`end_to_think` marker flags with direct `<retrieving>`/`</retrieving>` answer text **Frontend** - `web/src/utils/chat.ts`: `replaceThinkToSection()` now uses `<details>` instead of `<section>`; add new `replaceRetrievingToSection()` - 4 tsx files: import and pipe `replaceRetrievingToSection`, whitelist `details`, `summary`, `retrieving` in DOMPurify `ADD_TAGS` - 4 less files: `section.think` → `details.think` with `<summary>` styles; add `details.retrieving` with green accent; dark mode and RTL variants ## Test plan - [ ] Open a chat WITHOUT knowledge base, ask a question to a model with thinking (e.g. Qwen3) → thinking content should be collapsed by default, click "Thinking..." to expand - [ ] Open a chat WITH knowledge base and reasoning enabled, ask a question → "Retrieving..." (green) shows retrieval progress, "Thinking..." (gray) shows model thinking, each independently collapsible - [ ] Verify dark mode renders correctly for both collapsible blocks - [ ] Verify RTL layout renders correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: wanghualoong <[email protected]> Co-authored-by: Claude Opus 4.6 <[email protected]>
…low#14663) ### What problem does this PR solve? add compatibility route for document download under /v1 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
…nfiniflow#14600) ## Summary - When an agent workflow has multiple `UserFillUp` pause points, `canvas.run()` calls `reset(True)` on **all** components at the start of each run. This clears outputs from components that completed in prior runs, so downstream references like `{Agent:XXX@content}` resolve to `None`. - This fix only resets components on the **current execution path** (`self.path`), preserving outputs from previously completed components. ## Problem In a multi-step agent (e.g. draft email → user confirms → send email): 1. First `run()`: Agent drafts content, UserFillUp pauses for user input → Agent output is saved 2. Second `run()`: User submits input, but `reset(True)` clears **all** components including the Agent that already completed 3. Email component references `{Agent:XXX@content}` → gets `None` instead of the draft This affects **all** agents that reference upstream component outputs after a UserFillUp pause point. ## Fix ```python # Before: reset ALL components for k, cpn in self.components.items(): self.components[k]["obj"].reset(True) # After: only reset components on current execution path path_set = set(self.path) for k, cpn in self.components.items(): if k in path_set: self.components[k]["obj"].reset(True) ``` `self.path` already tracks the current execution path. For agents without UserFillUp (single run), `path` contains all components, so behavior is unchanged. ## Test plan - [x] Agent with single UserFillUp: outputs from prior components are preserved after resume - [x] Agent with multiple UserFillUp: each resume preserves all previously completed outputs - [x] Agent without UserFillUp: behavior unchanged (all components in path, all reset) - [x] Webhook-triggered agents: unaffected (path includes all components on first run) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: wanghualoong <[email protected]> Co-authored-by: Claude Opus 4.6 <[email protected]>
### What problem does this PR solve? Update the type of tenant_rerank_id in validation. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? 1. Update API URL 2. Add password encryption ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <[email protected]>
…ck (infiniflow#14669) ### What problem does this PR solve? 1. **Fix Global State Pollution in Local Providers (Critical Bug):** - Resolved a severe concurrency and architecture issue in `model_service.go`. Previously, `ListSupportedModels` would permanently overwrite the global provider singleton with a localized URL instance (`driver.NewInstance`). This caused cross-request contamination in multi-tenant environments. - Fixed `CheckProviderConnection` for local models (LM Studio, vLLM, Ollama). It now properly creates a localized driver copy and injects the `base_url` before testing the connection, entirely eliminating the false-positive `missing base URL` error without polluting the global state. 2. **Implement `VolcEngine` Embeddings:** - Fully implemented the `Encode` method for the `volcengine` provider, enabling text embedding capabilities for VolcEngine models. 3. **Enhance Region Validation in `SiliconFlow`:** - Added a strict empty string check (`*apiConfig.Region != ""`) alongside the existing `nil` check when parsing regions. This ensures that if an empty string is passed, the system safely falls back to the `"default"` region, preventing malformed URL requests and `unsupported protocol scheme` errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve? ``` RAGFlow(user)> logout; SUCCESS ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <[email protected]>
…rash (infiniflow#14651) (infiniflow#14666) ### What problem does this PR solve? Fixes infiniflow#14651. `kb_prompt()` in `rag/prompts/generator.py` crashes with `AttributeError: 'NoneType' object has no attribute 'items'` during agent citation generation when a retrieved chunk carries `document_metadata: null`. **Root cause.** The crash happens at `rag/prompts/generator.py:132-133`: ```python meta = ck.get("document_metadata", {}) for k, v in meta.items(): ``` `dict.get(key, default)` only returns the default when the key is *missing*. When the key is present with an explicit `None` value, `.get()` returns `None`, and `.items()` crashes. **How the chunk gets `None`.** It's a round-trip inside RAGFlow itself, not bad input from retrieval: 1. The agent stores retrieved chunks via `agent/canvas.py:814`, which routes them through `chunks_format()`. 2. `rag/prompts/generator.py:61` canonicalizes the field with `chunk.get("document_metadata")` (no default), so chunks without metadata become `{"document_metadata": None, ...}`. 3. `agent/component/agent_with_tools.py:314` feeds those canonicalized chunks back into `kb_prompt()` for citation generation, and `.get("document_metadata", {})` no longer protects us. **Fix.** One-line change at `rag/prompts/generator.py:132`: use `ck.get("document_metadata") or {}` so an explicit `None` is also coerced to `{}`. The line-61 `None` is intentionally part of the API/UI contract — the frontend handles it via optional chaining (`web/src/components/markdown-content/index.tsx:184`, `web/src/pages/next-search/search-view.tsx:217`) — so the fix belongs at the consumer, not the producer. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):
…nfiniflow#14684) ### What problem does this PR solve? Do not bypass threshold for rerank when metadata filter is enabled ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve? Restrict file move operations: prevent moving a folder to itself or to one of its own subfolders. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)
…nfiniflow#14592) Closes infiniflow#14590 ## Self Checks - [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones. - [x] I confirm that I am using English to submit this report ([Language Policy](infiniflow#5910)). - [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](infiniflow#5910)). - [x] Please do not modify this template :) and fill in all the required fields. ## RAGFlow workspace code commit ID `a1b2c3d4e5f67890123456789abcdef12345678` ## RAGFlow image version `0.13.1` ## Other environment information - Hardware parameters: N/A - OS type: Linux 6.17.0-22-generic - Others: API key authentication via `Authorization: Bearer <token>` ## Actual behavior The chatbot API endpoints: - `POST /chatbots/<dialog_id>/completions` - `GET /chatbots/<dialog_id>/info` validate only that the bearer token exists in `APIToken`, but do not verify that `dialog_id` belongs to the same tenant as that token. Current flow (simplified): 1. Route extracts bearer token and checks `APIToken.query(beta=token)`. 2. If token exists, request is accepted. 3. Downstream service resolves dialog globally by ID (`DialogService.get_by_id(dialog_id)` in `conversation_service.py`). 4. No tenant ownership check is enforced for `dialog_id`. Impact: Any user with a valid API key can attempt arbitrary `dialog_id` values and access/invoke chatbots outside their own tenant boundary if IDs are known/guessed/leaked. Security classification: - Vulnerability class: Broken Access Control (IDOR, OWASP Top 10 A01) - Severity recommendation: Critical - Exploit prerequisite: any valid API key + discoverable target `dialog_id` ## Expected behavior Requests to `/chatbots/<dialog_id>/completions` and `/chatbots/<dialog_id>/info` must be authorized only when: 1. bearer token is valid, and 2. `dialog_id` belongs to the same `tenant_id` as the token. Otherwise, reject with authorization failure (e.g., 403 or 404-equivalent policy). ## Steps to reproduce 1. Prepare two tenants: - Tenant A with API key `TOKEN_A` - Tenant B with chatbot `dialog_id = DIALOG_B` 2. Send request from Tenant A to Tenant B chatbot completion endpoint: ```bash curl -X POST "https://<host>/chatbots/DIALOG_B/completions" \ -H "Authorization: Bearer TOKEN_A" \ -H "Content-Type: application/json" \ -d '{"question":"hello","stream":false}' ``` 3. Observe request is processed (or reaches dialog resolution) without tenant ownership rejection. 4. Repeat against info endpoint: ```bash curl -X GET "https://<host>/chatbots/DIALOG_B/info" \ -H "Authorization: Bearer TOKEN_A" ``` 5. Observe the same missing ownership enforcement. ## Additional information Affected code paths: - `api/apps/sdk/session.py` - `chatbot_completions(dialog_id)` - `chatbots_inputs(dialog_id)` - `api/db/services/conversation_service.py` - `async_iframe_completion(...)` uses global dialog lookup Suggested fix: 1. In both chatbot endpoints: - Resolve `tenant_id = objs[0].tenant_id` from validated token. - Fetch dialog with tenant-scoped query (`DialogService.query(id=dialog_id, tenant_id=tenant_id)`). - Reject if dialog is not found/owned by tenant. 2. Defense in depth: - Require and enforce `tenant_id` in service-layer dialog resolution for external flows. - Avoid global `get_by_id(dialog_id)` where user-controlled dialog IDs are reachable. 3. Add regression tests: - Positive: same-tenant token + dialog succeeds. - Negative: cross-tenant token + dialog fails for both endpoints.
### What problem does this PR solve? Refactor : Allow search multiple datasets 1. support /datasets/search 2. get rid of /graph/search, use /graph ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring
There was a problem hiding this comment.
Actionable comments posted: 6
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
internal/entity/models/siliconflow.go (1)
475-497:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
apiConfigbefore using it inListModels.This method still dereferences
apiConfig.ApiKeyunconditionally later, soListModels(nil)or a missing key crashes instead of returning a regular error. The new region guard only fixes part of that path.Proposed fix
func (z *SiliconflowModel) ListModels(apiConfig *APIConfig) ([]string, error) { + if apiConfig == nil || apiConfig.ApiKey == nil || *apiConfig.ApiKey == "" { + return nil, fmt.Errorf("api key is required") + } + var region = "default" - if apiConfig.Region != nil && *apiConfig.Region != "" { + if apiConfig.Region != nil && *apiConfig.Region != "" { region = *apiConfig.Region } @@ req.Header.Set("Content-Type", "application/json") req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", *apiConfig.ApiKey))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/siliconflow.go` around lines 475 - 497, The ListModels method currently dereferences apiConfig and apiConfig.ApiKey without validation; update ListModels to first check that apiConfig is not nil and that apiConfig.ApiKey is non-nil and non-empty, returning a clear error if the config or key is missing, adjust the region selection to handle a nil apiConfig (use default region when apiConfig==nil), and only set the Authorization header when a valid ApiKey exists; reference the ListModels function, APIConfig type and apiConfig.ApiKey/Region, and URL building using z.BaseURL and z.URLSuffix.Models to locate the changes.api/utils/validation_utils.py (1)
855-878:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the type mismatch in
SearchDatasetsReq.tenant_rerank_id— it should beint | None, notstr | None.
SearchDatasetReq.tenant_rerank_idis correctly typed asint | None(line 857), butSearchDatasetsReq.tenant_rerank_idis incorrectlystr | None(line 877). Both models route through the same downstream service layer, which callsget_model_config_by_id(req["tenant_rerank_id"])— a function that expects anintparameter (seeapi/db/joint_services/tenant_model_service.py:24). The database also stores this asIntegerField.With the current
str | Nonetyping onSearchDatasetsReq, requests with numeric string IDs sent toPOST /datasets/searchwill fail at service invocation, while the same payload toPOST /datasets/<dataset_id>/search(which has the correctint | Nonetype) will succeed.🛠️ Fix
class SearchDatasetsReq(BaseModel): model_config = ConfigDict(extra="ignore") dataset_ids: Annotated[list[str], Field(..., min_length=1)] question: Annotated[str, StringConstraints(strip_whitespace=True, min_length=1), Field(...)] doc_ids: Annotated[list[str], Field(default=[])] page: Annotated[int, Field(default=1, ge=1)] size: Annotated[int, Field(default=30, ge=1)] top_k: Annotated[int, Field(default=1024, ge=1)] similarity_threshold: Annotated[float, Field(default=0.0, ge=0.0, le=1.0)] vector_similarity_weight: Annotated[float, Field(default=0.3, ge=0.0, le=1.0)] use_kg: Annotated[bool, Field(default=False)] cross_languages: Annotated[list[str], Field(default=[])] keyword: Annotated[bool, Field(default=False)] search_id: Annotated[str | None, Field(default=None)] rerank_id: Annotated[str | None, Field(default=None)] - tenant_rerank_id: Annotated[str | None, Field(default=None)] + tenant_rerank_id: Annotated[int | None, Field(default=None)] meta_data_filter: Annotated[dict | None, Field(default=None)]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/utils/validation_utils.py` around lines 855 - 878, SearchDatasetsReq.tenant_rerank_id is incorrectly typed as str | None; change it to int | None to match SearchDatasetReq and downstream expectations (the model/service function get_model_config_by_id expects an int and DB stores IntegerField). Locate the SearchDatasetsReq class and replace the annotation for tenant_rerank_id with Annotated[int | None, Field(default=None)] so requests to POST /datasets/search pass an integer (or None) through to get_model_config_by_id.api/apps/restful_apis/file2document_api.py (1)
119-147:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFolder
file_ids bypasscheck_file_team_permission— only their expanded inner files are checked.In the folder expansion branch (lines 119–125, pre-existing),
file_idis never appended toall_file_ids— only its inner files are. Because the new auth loop at lines 128–147 only iteratesall_file_ids, a caller can submit any reachable folder ID and the folder itself is never passed tocheck_file_team_permission. Only the inner files are verified. Ifcheck_file_team_permissionrelies on tenant membership (similar to how the oldaccessible()worked before this PR), a tenant member can bypass folder-level ownership the same way private datasets were previously bypassed.The fix is to auth-check every original
file_id(including folders) before expansion:🛡️ Proposed fix
+ user_id = current_user.id + for file_id in file_ids: + file = files_set[file_id] + if not check_file_team_permission(file, user_id): + logger.warning( + "user_id=%s resource_type=file resource_id=%s action=authorize_file result=denied file_ids=%s kb_ids=%s", + user_id, + file_id, + file_ids, + kb_ids, + ) + return get_data_error_result(message="No authorization.") + # Expand folders to their innermost file IDs all_file_ids = [] for file_id in file_ids: file = files_set[file_id] if file.type == FileType.FOLDER.value: all_file_ids.extend(FileService.get_all_innermost_file_ids(file_id, [])) else: all_file_ids.append(file_id) - user_id = current_user.id for file_id in all_file_ids:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/apps/restful_apis/file2document_api.py` around lines 119 - 147, The folder IDs from file_ids are never checked because only expanded inner IDs are appended to all_file_ids; update the logic so every original file_id (including folders) is authorization-checked before or during expansion: either append the folder id to all_file_ids before calling FileService.get_all_innermost_file_ids(file_id, []) or run check_file_team_permission(file, user_id) against the folder entry immediately after retrieving it (the code that calls FileService.get_all_innermost_file_ids, iterates file_ids, and the authorization loop over all_file_ids should ensure both folder and inner-file IDs are validated), referencing all_file_ids, file_ids, FileService.get_all_innermost_file_ids, and check_file_team_permission.internal/cli/filesystem/dataset.go (1)
323-328:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale comments contradict the actual URL built by
BuildURL— remove or update them.Both lines 326 and 610 contain:
// Call retrieval API (useAPIBase=false because the route is /v1/chunk/retrieval_test, not /api/v1/...)These comments document the old intent: use
NonAPIBase()to produce{host}/v1/chunk/retrieval_test. After this change,BuildURL("/chunk/retrieval_test")produces{host}/api/v1/chunk/retrieval_test. The comments are actively misleading and are the primary evidence for the routing regression flagged athttp_client.go. Once the backend path question is resolved, update these comments to reflect the final routing intent.🗑️ Remove stale comments at both locations
- // Call retrieval API (useAPIBase=false because the route is /v1/chunk/retrieval_test, not /api/v1/...) resp, err := p.httpClient.Request("POST", "/chunk/retrieval_test", "auto", nil, payload)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/filesystem/dataset.go` around lines 323 - 328, Update/remove the stale inline comment that claims the request uses NonAPIBase/`/v1/...` in the block calling p.httpClient.Request("POST", "/chunk/retrieval_test", ...); locate the call sites (e.g., the Request invocation in dataset.go and the similar comment at the other location around line ~610) and either delete the misleading comment or replace it with a succinct note that reflects the current URL behavior (e.g., that BuildURL now produces {host}/api/v1/..., or indicate the chosen routing intent once the backend path is finalized). Ensure references to BuildURL, NonAPIBase, and httpClient.Request are updated so the comment matches the actual URL construction.internal/cli/user_parser.go (1)
112-169:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
LIST USERSandLIST SERVICESare now silently broken; dead handlers remain in the dispatcher.
case TokenUsers,case TokenServices, andcase TokenRolesare no longer present inparseListCommand, soLIST USERSandLIST SERVICESnow return"unknown LIST target"at parse time. Their corresponding command-handler cases ("list_users","list_services") inExecuteAdminCommandare now unreachable dead code.If this removal is intentional, the orphaned handler arms in
client.go'sExecuteAdminCommandshould be pruned and a release note / migration hint added. If unintentional, the parser cases need to be restored.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/user_parser.go` around lines 112 - 169, The parseListCommand parser removed handling for TokenUsers, TokenServices (and TokenRoles) causing "LIST USERS" / "LIST SERVICES" to error and leaving dead handlers in ExecuteAdminCommand ("list_users", "list_services"); either restore the parser branches in parseListCommand to call the appropriate command constructors (e.g. return NewCommand("list_users") / NewCommand("list_services") or the equivalent functions used for other LIST cases) so those dispatcher arms become reachable, OR remove the unreachable handler cases in ExecuteAdminCommand and TokenRoles-related code and add a release note/migration hint documenting the removal; locate parseListCommand, TokenUsers/TokenServices/TokenRoles, ExecuteAdminCommand, and the "list_users"/"list_services" handler names to apply the chosen fix.internal/cli/http_client.go (1)
73-79:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGo code calls non-existent Python endpoints — endpoints were migrated but not updated in client code.
The Python backend migrated
/chunk/retrieval_testto/retrieval(commit 35f6d81) and possibly changed the/kb/listendpoint structure, but the Go CLI code was not updated to match:
internal/cli/filesystem/dataset.golines 327, 611:Request("POST", "/chunk/retrieval_test", ...)→ endpoint does not exist; should call/retrievalinternal/cli/benchmark.goline 232:Request("POST", "/kb/list", ...)→ endpoint does not exist; verify correct endpoint pathinternal/cli/benchmark.goline 245:Request("POST", "/chunk/retrieval_test", ...)→ endpoint does not exist; should call/retrievalThe stale comments in dataset.go about
useAPIBase=falseand/v1/chunk/retrieval_testshould be removed. WithBuildURLnow always usingAPIBase()to produce/api/v1/...paths, calls to these outdated endpoint names will receive 404 errors.Update the endpoint paths to match the current Python backend API structure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/http_client.go` around lines 73 - 79, Update the obsolete endpoint strings and remove stale comments: replace any Request("POST", "/chunk/retrieval_test", ...) calls in internal/cli/filesystem/dataset.go (lines around 327 and 611) and internal/cli/benchmark.go (around line 245) to Request("POST", "/retrieval", ...) to match the migrated Python backend; verify and correct the Request("POST", "/kb/list", ...) call in internal/cli/benchmark.go (around line 232) to the current KB list endpoint used by the backend; also remove the stale comments about useAPIBase=false and /v1/chunk/retrieval_test in dataset.go since HTTPClient.BuildURL() already prepends APIBase(), and run a quick smoke test to ensure no 404s from these updated Request(...) calls.internal/cli/filesystem/file.go (1)
546-551:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't force API auth for deletes.
Line 551 is now inconsistent with the rest of this provider: list/cat/root traversal use
"auto", but delete still forces"api". That breaksDeleteFile/DeleteFolderByPathfor normal web-login sessions.Suggested fix
- resp, err := p.httpClient.Request("DELETE", "/files", "api", nil, payload) + resp, err := p.httpClient.Request("DELETE", "/files", "auto", nil, payload)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/filesystem/file.go` around lines 546 - 551, The DeleteFile implementation forces API auth by passing "api" to p.httpClient.Request, which is inconsistent with other methods (list/cat/root traversal) that use "auto" and breaks DeleteFile/DeleteFolderByPath for web-login sessions; change the auth argument in p.httpClient.Request inside DeleteFile (and any analogous delete helper like DeleteFolderByPath) from "api" to "auto" so deletes use the same auto-detection of auth as the rest of FileProvider.internal/cli/user_command.go (1)
1786-1807:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDerive auth kind in the API-token-aware commands.
Lines 1807 and 1929 still force
"web"even though both functions explicitly accept eitherAPITokenorLoginToken. With an API-token-only session, the guard passes and the request still goes out with the wrong auth mode.Suggested fix
+ authKind := "web" + if c.HTTPClient.useAPIToken && c.HTTPClient.LoginToken == "" { + authKind = "api" + } - resp, err := c.HTTPClient.Request("GET", url, "web", nil, nil) + resp, err := c.HTTPClient.Request("GET", url, authKind, nil, nil)+ authKind := "web" + if c.HTTPClient.useAPIToken && c.HTTPClient.LoginToken == "" { + authKind = "api" + } - resp, err := c.HTTPClient.Request("POST", url, "web", nil, payload) + resp, err := c.HTTPClient.Request("POST", url, authKind, nil, payload)Also applies to: 1879-1929
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/user_command.go` around lines 1786 - 1807, The request calls in CheckProviderConnection (and the similar function around lines 1879-1929) hardcode the auth kind as "web"; change them to derive authKind from the HTTPClient: if c.HTTPClient.APIToken != "" use "api", else if c.HTTPClient.LoginToken != "" use "web" (or the existing login token value semantics), assign that to a local authKind variable and pass authKind into c.HTTPClient.Request instead of the fixed "web" string so API-token-only sessions use the correct auth mode.
🟠 Major comments (23)
web/src/routes.tsx-20-20 (1)
20-20:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix double-slash route composition in
Routes.Dataset.Line 20 builds
Routes.Datasetas'/dataset//files'becauseRoutes.Filesalready includes a leading slash. This can break matching/navigation for dataset-file routes.Proposed fix
- Dataset = `${Routes.DatasetBase}/${Routes.Files}`, + Dataset = `${Routes.DatasetBase}${Routes.Files}`,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@web/src/routes.tsx` at line 20, Routes.Dataset is constructed by inserting an extra '/' between Routes.DatasetBase and Routes.Files which causes a double slash because Routes.Files already begins with a leading slash; fix by removing the inserted slash so you concatenate Routes.DatasetBase and Routes.Files directly (or alternatively remove the leading slash from Routes.Files) to ensure Routes.Dataset becomes a single valid path; update the code that defines the Routes.Dataset symbol accordingly.web/src/hooks/use-document-request.ts-216-228 (1)
216-228:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse
datasetIdin the query key to prevent stale filter cache.Line 221 keys by
knowledgeId, but Line 216 fetches bydatasetId = knowledgeId || id. IfknowledgeIdis empty and routeidchanges, cached filter data can bleed across datasets.Suggested fix
const { data } = useQuery({ queryKey: [ DocumentApiAction.FetchDocumentFilter, debouncedSearchString, - knowledgeId, + datasetId, ], + enabled: !!datasetId, queryFn: async () => { - if (!datasetId) { - return; - } - const { data } = await documentFilter(datasetId); + const { data } = await documentFilter(datasetId!); if (data.code === 0) { return data.data; } }, });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@web/src/hooks/use-document-request.ts` around lines 216 - 228, The query cache key currently uses knowledgeId but the fetch uses datasetId (datasetId = knowledgeId || id), causing stale cache when knowledgeId is empty and id changes; update the useQuery queryKey to include datasetId (or replace knowledgeId with datasetId) so the key uniquely reflects the actual dataset used by the query (reference: datasetId, useQuery, queryKey, knowledgeId, id, DocumentApiAction.FetchDocumentFilter).api/apps/restful_apis/document_api.py-1903-1913 (1)
1903-1913:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve document downloads via
File2DocumentService, not(tenant_id, doc_id).This new route authenticates against the document record, but it still reads storage using
settings.STORAGE_IMPL.get(tenant_id, doc_id)and trusts?ext=for the MIME type. That keeps the old attachment-storage contract alive and can break downloads for valid documents whose blob is stored under a different bucket/key than the document id. Reuse the same storage-address lookup asget()and derive the extension from the persisted document name instead.Suggested fix
async def download_attachment(tenant_id=None, doc_id=None, attachment_id=None): @@ try: # Keep backward compatibility with older callers and unit tests that still # pass `attachment_id` instead of the route parameter name. doc_id = doc_id or attachment_id if not DocumentService.accessible(doc_id, current_user.id): return get_data_error_result(message="Document not found!") - ext = request.args.get("ext", "markdown") - data = await thread_pool_exec(settings.STORAGE_IMPL.get, tenant_id, doc_id) + + e, doc = DocumentService.get_by_id(doc_id) + if not e: + return get_data_error_result(message="Document not found!") + + bucket, name = File2DocumentService.get_storage_address(doc_id=doc_id) + data = await thread_pool_exec(settings.STORAGE_IMPL.get, bucket, name) response = await make_response(data) - content_type = CONTENT_TYPE_MAP.get(ext, f"application/{ext}") + + ext_match = re.search(r"\.([^.]+)$", doc.name.lower()) + ext = ext_match.group(1) if ext_match else None + content_type = CONTENT_TYPE_MAP.get(ext) if ext else None apply_safe_file_response_headers(response, content_type, ext) return response🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/apps/restful_apis/document_api.py` around lines 1903 - 1913, The handler must stop calling settings.STORAGE_IMPL.get(tenant_id, doc_id) and instead resolve the file storage address via File2DocumentService like the DocumentService.get() path does: fetch the File2Document record for the document (e.g. via File2DocumentService.get_by_document_id or equivalent), use the storage address (bucket/key/object id) from that record when calling settings.STORAGE_IMPL.get (or the same storage client method used by the existing get() flow) and derive the response extension/MIME from the persisted filename on the File2Document/Document record rather than request.args["ext"]; keep the existing DocumentService.accessible check and then pass the fetched blob into make_response and apply_safe_file_response_headers as before.internal/entity/models/volcengine.go-410-451 (1)
410-451:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
modelNameandapiConfigbefore building requests.This new path dereferences
*modelNameand*apiConfig.ApiKeywithout any guard. A missing model or API config will panic on the first iteration instead of returning a normal error.Proposed fix
func (z *VolcEngine) Encode(modelName *string, texts []string, apiConfig *APIConfig, embeddingConfig *EmbeddingConfig) ([][]float64, error) { if len(texts) == 0 { return [][]float64{}, nil } + if modelName == nil || *modelName == "" { + return nil, fmt.Errorf("model name is required") + } + if apiConfig == nil || apiConfig.ApiKey == nil || *apiConfig.ApiKey == "" { + return nil, fmt.Errorf("api key is required") + } var region = "default" if apiConfig != nil && apiConfig.Region != nil && *apiConfig.Region != "" { region = *apiConfig.Region }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/volcengine.go` around lines 410 - 451, The Encode method dereferences modelName and apiConfig.ApiKey without validation and can panic; add upfront checks at the start of VolcEngine.Encode to return a clear error if modelName is nil or empty, or if apiConfig is nil or apiConfig.ApiKey is nil/empty, and then use the validated values (e.g., store local vars model := *modelName and apiKey := *apiConfig.ApiKey) when building the request and headers (referencing Encode, modelName, apiConfig, ApiKey) so no nil pointer dereference can occur during the loop.internal/entity/models/vllm.go-199-207 (1)
199-207:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle missing
reasoning_contentgracefully instead of failing.The current code treats an absent
reasoning_contentfield as a hard error whenThinkingis enabled, even whenmessage.contentis valid. This breaks compatibility with OpenAI-compatible backends and models that omit this field.Other providers in the codebase handle this:
openai.goandxai.gomake it optional with soft checks, whilesiliconflow.goandgitee.goinclude a fallback usingGetThinkingAndAnswer()to extract thinking from response content when the field is missing.Update
vllm.goto follow the same pattern: makereasoning_contentoptional and fall back to parsing thinking content when absent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/vllm.go` around lines 199 - 207, When Thinking is enabled in vllm.go (check chatModelConfig and chatModelConfig.Thinking) don't treat a missing or non-string "reasoning_content" in messageMap as a fatal error; instead, if messageMap["reasoning_content"] is absent or not a string, call the existing GetThinkingAndAnswer(message.Content) fallback to extract thinking and answer, assign the derived thinking to reasonContent (trim a leading newline if present), and continue without returning an error—only fail on unexpected failures from the fallback. Ensure you update the logic around reasonContent, the type assertion of messageMap["reasoning_content"], and the code path that currently returns fmt.Errorf("invalid content format") so vllm.go mirrors the optional handling used in openai.go/xai.go and the fallback used in siliconflow.go/gitee.go.internal/entity/models/lmstudio.go-183-192 (1)
183-192:⚠️ Potential issue | 🟠 Major | ⚡ Quick winImplement fallback for missing
reasoning_contentfield.LM Studio responses may omit
reasoning_content. The current code fails immediately instead of falling back to think-tag parsing as implemented in gitee.go and siliconflow.go. Update the code to first attempt extractingreasoning_content, then fall back toGetThinkingAndAnswer()if the field is absent:Proposed fix
var reasonContent string if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking { - reasonContent, ok = messageMap["reasoning_content"].(string) - if !ok { - return nil, fmt.Errorf("invalid content format") - } - if reasonContent != "" && reasonContent[0] == '\n' { - reasonContent = reasonContent[1:] - } + if rc, ok := messageMap["reasoning_content"].(string); ok { + reasonContent = rc + if reasonContent != "" && reasonContent[0] == '\n' { + reasonContent = reasonContent[1:] + } + } else { + reasoning, answer := GetThinkingAndAnswer(chatModelConfig.ModelClass, &content) + if reasoning != nil { + reasonContent = *reasoning + content = *answer + } + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/lmstudio.go` around lines 183 - 192, The code in lmstudio.go currently returns an error when "reasoning_content" is missing; change the logic in the block that reads messageMap["reasoning_content"] (within the chatModelConfig != nil && chatModelConfig.Thinking check) to first try to read reasonContent from messageMap as now, but if the key is absent or not a string, do NOT return an error — instead call GetThinkingAndAnswer(messageMap) (the same fallback used in gitee.go and siliconflow.go) to extract thinking and answer, and then set reasonContent from the returned thinking value (trimming a leading newline if present); ensure you still preserve the branch that returns an error only for truly malformed values (if GetThinkingAndAnswer also indicates failure, propagate that error).internal/entity/models/ollama.go-182-191 (1)
182-191:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake
reasoning_contentoptional with fallback to content parsing.Other drivers in the codebase (siliconflow, gitee) handle missing
reasoning_contentgracefully by falling back toGetThinkingAndAnswer()when the field is absent. The current ollama implementation hard-fails instead, converting valid Ollama responses into errors whenThinkingis enabled but the response lacks a separatereasoning_contentfield.Align with the existing pattern by attempting to extract
reasoning_contentand, if unavailable, parsing reasoning from the content itself using the fallback mechanism already used in siliconflow.go and gitee.go.Proposed fix
var reasonContent string if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking { - reasonContent, ok = messageMap["reasoning_content"].(string) - if !ok { - return nil, fmt.Errorf("invalid content format") - } - if reasonContent != "" && reasonContent[0] == '\n' { - reasonContent = reasonContent[1:] - } + if rc, ok := messageMap["reasoning_content"].(string); ok { + reasonContent = rc + if reasonContent != "" && reasonContent[0] == '\n' { + reasonContent = reasonContent[1:] + } + } else { + reasoning, answer := GetThinkingAndAnswer(chatModelConfig.ModelClass, &content) + if reasoning != nil { + reasonContent = *reasoning + content = *answer + } + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/ollama.go` around lines 182 - 191, The current block in ollama.go hard-fails when "reasoning_content" is missing while chatModelConfig.Thinking is enabled; instead, attempt to read messageMap["reasoning_content"] into reasonContent and if the cast fails or the key is absent, call the existing GetThinkingAndAnswer(content) fallback (same approach used in siliconflow/gitee) to parse reasoning from the main content and set reasonContent and answer accordingly; remove the fmt.Errorf("invalid content format") early return and ensure trimming of a leading newline on the resolved reasonContent as already done.internal/entity/models/nvidia.go-168-177 (1)
168-177:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStrict
reasoning_contentextraction will fail valid responses.When
Thinkingis enabled, this aborts the entire chat request wheneverreasoning_contentis missing,null, or non-string in the JSON — but NIM/OpenAI-style responses can legitimately omit that field (e.g., short answers, model variants that don't return separate reasoning, or streamed-only reasoning). The user gets"invalid content format"instead of the assistant's actual answer incontent.Treat the absence of
reasoning_contentas "no reasoning returned" rather than an error.🛡️ Proposed fix
var reasonContent string if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking { - reasonContent, ok = messageMap["reasoning_content"].(string) - if !ok { - return nil, fmt.Errorf("invalid content format") - } - if reasonContent != "" && reasonContent[0] == '\n' { - reasonContent = reasonContent[1:] - } + if rc, ok := messageMap["reasoning_content"].(string); ok { + reasonContent = strings.TrimPrefix(rc, "\n") + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/nvidia.go` around lines 168 - 177, The current block in the NVIDIA response parsing treats a missing/non-string "reasoning_content" as a hard error; instead, when chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking, change the logic that reads messageMap["reasoning_content"] so that if the key is absent, nil, or not a string you set reasonContent = "" (i.e., "no reasoning returned") and continue rather than returning fmt.Errorf("invalid content format"); if the value is a string keep the existing trim-of-leading-newline behavior. This change touches the variables/functionality named messageMap, reasoning_content, reasonContent, and the chatModelConfig.Thinking check.internal/entity/models/openrouter.go-171-180 (1)
171-180:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSame strict reasoning extraction as
nvidia.go— fails valid responses missing the field.When
Thinking=truebut the upstream omitsreasoning(legitimate for short answers or non-reasoning variants), this aborts the request even thoughcontentwas returned successfully. Use a tolerant assertion.🛡️ Proposed fix
var reasonContent string if chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking { - reasonContent, ok = messageMap["reasoning"].(string) - if !ok { - return nil, fmt.Errorf("invalid content format") - } - if reasonContent != "" && reasonContent[0] == '\n' { - reasonContent = reasonContent[1:] - } + if rc, ok := messageMap["reasoning"].(string); ok { + reasonContent = strings.TrimPrefix(rc, "\n") + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/openrouter.go` around lines 171 - 180, The current extraction for reasonContent when chatModelConfig != nil && chatModelConfig.Thinking != nil && *chatModelConfig.Thinking is too strict: it assumes messageMap["reasoning"] exists and is a string and returns an error if not, which breaks valid responses that omit reasoning; update the logic around reasonContent/messageMap["reasoning"] to be tolerant—check for existence and type with an ok-style assertion, and if the key is missing or not a string simply leave reasonContent empty (or nil) instead of returning an error, but still trim a leading newline when a non-empty string is present; keep the checks on chatModelConfig.Thinking and only change the handling of messageMap["reasoning"] so it mirrors the tolerant approach used in nvidia.go.internal/entity/models/openrouter.go-60-72 (1)
60-72:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMissing region fallback to
"default"(also at Line 200).If
apiConfig.Regionpoints at a region key that isn't present ino.BaseURL(typo, mis-config, or a regional config that only the chat or models endpoint defines),o.BaseURL[region]returns""and the request is sent to/<URLSuffix.Chat>— failing with a confusing parse error rather than falling back. The NVIDIA driver guards this at lines 68–71; mirror that here, both inChatWithMessagesand inChatStreamlyWithSenderat Line 200.🛡️ Proposed fix
var region = "default" if apiConfig.Region != nil && *apiConfig.Region != "" { region = *apiConfig.Region } - url := fmt.Sprintf("%s/%s", o.BaseURL[region], o.URLSuffix.Chat) + baseURL := o.BaseURL[region] + if baseURL == "" { + baseURL = o.BaseURL["default"] + } + if baseURL == "" { + return nil, fmt.Errorf("openrouter: no base URL configured for region %q", region) + } + url := fmt.Sprintf("%s/%s", baseURL, o.URLSuffix.Chat)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/openrouter.go` around lines 60 - 72, The code computes region from apiConfig.Region but uses o.BaseURL[region] without verifying the key exists, so when the map lookup yields "" the URL becomes malformed; to fix, in both ChatWithMessages and ChatStreamlyWithSender ensure after setting region (from apiConfig.Region) you check if o.BaseURL[region] == "" and if so set region = "default" (mirroring the NVIDIA guard), then build url := fmt.Sprintf("%s/%s", o.BaseURL[region], o.URLSuffix.Chat); this guarantees a safe fallback when the requested region key is missing.internal/entity/models/openrouter.go-286-289 (1)
286-289:⚠️ Potential issue | 🟠 Major | ⚡ Quick winLogging every SSE line leaks user content and reasoning into logs.
common.Info(line)writes rawdata: …events — i.e., assistant tokens, reasoning traces, and anything echoed in the response — to the application log on every streamed chat. That's a privacy/PII risk (logs are commonly aggregated, retained, or shipped to third parties) and adds significant log noise on a hot path. Either remove this entirely or downgrade toDebug-level and gate on a verbose-stream flag.🔒 Proposed fix
for scanner.Scan() { line := scanner.Text() - common.Info(line) // SSE data line starts with "data:" if !strings.HasPrefix(line, "data:") {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/openrouter.go` around lines 286 - 289, The loop reading SSE lines using scanner := bufio.NewScanner(resp.Body) currently logs each line with common.Info(line), which leaks user content and reasoning; change this to stop logging raw SSE data — either remove the common.Info call entirely or downgrade it to common.Debug and guard it with a verbose/verboseStream boolean flag (check the flag before logging) so only explicit verbose debugging enables the output; update the code paths around scanner.Scan() and the common.Info(line) invocation accordingly.internal/entity/models/openrouter.go-202-205 (1)
202-205:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix delimiter in model type routing: OpenRouter uses
/not_.OpenRouter model identifiers follow the
provider/model-nameformat (e.g.,qwen/qwen-110b-chat,z-ai/glm-5.1). The code splits on"_"instead of"/", which meansstrings.Split(modelName, "_")[0]will never equal"qwen"or"glm"for actual OpenRouter models—the condition at line 203 is unreachable dead code. Change the delimiter from"_"to"/"to correctly route Qwen and GLM models throughAsyncChat.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/openrouter.go` around lines 202 - 205, The routing check incorrectly splits OpenRouter model identifiers on "_" so modelType = strings.Split(modelName, "_")[0] never matches providers like "qwen" or "glm"; update the split delimiter to "/" (i.e., strings.Split(modelName, "/")[0]) in the block that sets url = fmt.Sprintf("%s/%s", o.BaseURL[region], o.URLSuffix.AsyncChat) so Qwen and GLM models are correctly routed; keep references to modelName, modelType, o.BaseURL[region], and o.URLSuffix.AsyncChat when making the change.internal/entity/models/openrouter.go-112-115 (1)
112-115:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
reasoningis always added to the request body, even when Thinking is unset/false.The block lives inside
if chatModelConfig != nilbut is not gated onchatModelConfig.Thinking. Every chat call will requestreasoning.effort=low, which (a) wastes tokens/cost on models that bill reasoning, (b) contradicts an explicitThinking=false, and (c) is inconsistent withChatStreamlyWithSender(Lines 248–258) which correctly gates the same field onThinking.🛠️ Proposed fix
- if chatModelConfig.DoSample != nil { - reqBody["do_sample"] = *chatModelConfig.DoSample - } - - reqBody["reasoning"] = map[string]interface{}{ - "effort": "low", - } + if chatModelConfig.DoSample != nil { + reqBody["do_sample"] = *chatModelConfig.DoSample + } + + if chatModelConfig.Thinking != nil && *chatModelConfig.Thinking { + reqBody["reasoning"] = map[string]interface{}{"effort": "low"} + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/entity/models/openrouter.go` around lines 112 - 115, The code currently always adds reqBody["reasoning"] = {"effort":"low"} whenever chatModelConfig != nil; change this so the reasoning field is only added when chatModelConfig.Thinking is true (i.e., mirror the gating used in ChatStreamlyWithSender). Locate the block that sets reqBody["reasoning"] and wrap it in a conditional that checks chatModelConfig.Thinking (or similar boolean on chatModelConfig) so that reasoning is omitted when Thinking is unset/false.agent/sandbox/result_protocol.py-32-33 (1)
32-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPython wrapper argument injection causes runtime failure with JSON booleans/null values.
Line 32 injects raw JSON into Python code (
main(**{args_json})). Whenargs_jsoncontains JSON booleans or null values, the resulting code attempts to reference undefined Python names (true,false,null) and fails with NameError at runtime.Proposed patch
if __name__ == "__main__": import base64 import json - result = main(**{args_json}) + __ragflow_args = json.loads({args_json!r}) + result = main(**__ragflow_args) payload = json.dumps({"present": True, "value": result, "type": "json"}, ensure_ascii=False, separators=(",", ":"))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/sandbox/result_protocol.py` around lines 32 - 33, The code injects raw JSON text into Python call site (main(**{args_json})), causing NameError for JSON literals like true/false/null; replace that injection with parsing the JSON string at runtime (use json.loads on args_json) and then call main with the resulting dict (e.g., kwargs = json.loads(args_json); main(**kwargs)), and ensure the module imports json and uses the parsed object when building result/payload (refactoring the main, args_json, and payload usage accordingly).agent/sandbox/providers/local.py-69-177 (1)
69-177: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd structured logs around the provider lifecycle.
This new execution flow is silent on initialize, execute start/end, timeout, artifact rejection, and destroy. A few structured logs keyed by
instance_id, language, timeout, exit code, and rejection reason would make this operable without logging user code or artifact contents.As per coding guidelines, "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/sandbox/providers/local.py` around lines 69 - 177, Add structured logging to the provider lifecycle by instrumenting initialize, create_instance, execute_code, and destroy_instance: in initialize() log start/end with config keys (e.g., work_dir, timeout, max_memory_mb) and set self._initialized; in create_instance() log instance_id, language and work_dir when creating the instance; in execute_code() add an execution-start log with instance_id, language, script_path and timeout, an execution-end log with instance_id, exit_code, execution_time and whether a structured result was present, and a timeout/failure log inside the subprocess TimeoutExpired except branch (include instance_id and timeout); also log artifact collection/rejection outcomes from _collect_artifacts()/_validate_output_size() with instance_id and rejection_reason; finally in destroy_instance() log instance_id and success when removing the directory. Use structured key/value fields (instance_id, language, timeout, exit_code, rejection_reason) and the existing symbols initialize, create_instance, execute_code, destroy_instance, _validate_output_size, _collect_artifacts and _prepare_script to locate insertion points.agent/sandbox/providers/local.py-258-287 (1)
258-287:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd a cumulative artifact size cap.
The current checks limit each artifact, but not the total payload returned in
artifacts. With the defaults, one run can still load and base64-encode roughly 200 MiB of files into memory before response overhead. Track aggregate collected bytes and fail once the total budget is exceeded.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/sandbox/providers/local.py` around lines 258 - 287, In _collect_artifacts, add tracking of the cumulative size of collected artifacts (e.g., cumulative_bytes = 0) and enforce a new total-cap property (suggest name self.max_total_artifact_bytes) by checking cumulative_bytes + size <= self.max_total_artifact_bytes before reading/encoding each file; if the sum would exceed the cap, raise a RuntimeError like "Total artifact payload exceeds X bytes". Perform this check right after computing size (and before path.read_bytes()/base64 encoding) and increment cumulative_bytes by size when appending the artifact so you never load files that would push the total over the budget.agent/sandbox/providers/local.py-126-150 (1)
126-150:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftEnforce
max_output_byteswhile reading child output.
communicate()buffers the entire stdout/stderr in memory before_validate_output_size()runs, so a program can exhaust the worker long before the 1 MiB cap is checked. Please stream the pipes and terminate once the combined byte count crosses the limit, or redirect output to bounded temp files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/sandbox/providers/local.py` around lines 126 - 150, The code currently calls process.communicate() which buffers all stdout/stderr in memory before calling _validate_output_size, allowing a child to exhaust memory; replace the communicate() usage in the run/execute flow to stream the process stdout/stderr (via process.stdout and process.stderr reads or iterators), accumulate byte counts and write into temporary bounded buffers or files, and if the combined byte count exceeds the configured max_output_bytes call the same termination logic (os.killpg/process.kill) and stop reading; finally compute execution_time and call _validate_output_size with the truncated outputs or paths to the temp files. Target the block around process = subprocess.Popen(...) and the subsequent communicate/timeout handling, keeping the existing timeout behavior and preexec_fn (_limit_child_process) and env (_build_child_env) usage. Ensure reads respect text/encoding settings and raise the same TimeoutError when the timeout elapses.agent/sandbox/providers/local.py-134-136 (1)
134-136:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
preexec_fnis unsafe in a threaded service process and logging is missing.Using
preexec_fn=self._limit_child_processcan deadlock child creation when the parent has multiple threads (fork() behavior in Python's subprocess). Additionally, theexecute_codemethod lacks logging for the execution flow, which violates the coding guideline requiring logging for new flows in Python files.Suggested fixes:
- Remove
preexec_fnand use a separate launcher/helper process for resource limiting instead- Add logging statements to track execution lifecycle (start, timeout, completion, errors)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agent/sandbox/providers/local.py` around lines 134 - 136, The use of preexec_fn=self._limit_child_process in the subprocess call is unsafe in a threaded parent and must be removed; instead, refactor resource limiting to run in a separate launcher/helper process (e.g., add a small helper that applies _limit_child_process before exec and invoke that helper as the subprocess entrypoint) and keep start_new_session=os.name == "posix' as appropriate. Also add logging calls inside execute_code (and around the subprocess launch flow that uses _build_child_env and start_new_session) to log execution start (including command/instance id), timeouts, successful completion (exit code and runtime), and exceptions/errors (including subprocess stderr) so the execution lifecycle is traceable. Ensure references to _limit_child_process, execute_code, _build_child_env and start_new_session are updated to reflect the helper-process approach and that all new flows include the required logger calls.common/metadata_es_filter.py-474-506 (1)
474-506:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject non-scalar
in/not inmembers before building ES clauses.
_csv_or_list()currently lets dicts and nested lists/tuples through. Those values then reach_terms_string_or_numeric()and produce invalidterm/termspayloads instead of raisingUnsupportedMetaFilter, so malformed filters won't fall back cleanly to the in-memory path.Proposed fix
normalised: List[Any] = [] for m in members: - if isinstance(m, str): + if isinstance(m, (list, tuple, dict, set)): + raise UnsupportedMetaFilter("membership members must be scalar", flt) + if isinstance(m, str): normalised.append(m.lower().strip()) - else: + elif isinstance(m, (int, float, bool)): normalised.append(m) + else: + raise UnsupportedMetaFilter("membership members must be scalar", flt) return normalised🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@common/metadata_es_filter.py` around lines 474 - 506, _csv_or_list currently permits dicts and nested lists/tuples to pass through, which later causes invalid ES term/terms payloads in _terms_string_or_numeric; update _csv_or_list to validate the resolved members list and raise UnsupportedMetaFilter(f"membership value contains non-scalar item: {type(m)}", flt) (or similar) if any member is a dict or a list/tuple (i.e., not a scalar type like str/int/float/bool/None), ensuring non-scalar values are rejected before ES clause construction and allowing proper fallback to the in-memory path.api/db/services/conversation_service.py-247-249 (1)
247-249:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReplace
assertwith explicit runtime checks in the iframe access guard.Lines 248–249 use
assertfor session-to-dialog binding validation, but assertions are stripped underpython -O. This security check can silently disappear in optimized deployments. Replace with explicitif...: raiseand add logging for the mismatch case per the coding guideline to add logging for new flows.Proposed fix
e, conv = API4ConversationService.get_by_id(session_id) - assert e, "Session not found!" - assert conv.dialog_id == dialog_id, "Session does not belong to this dialog" + if not e: + raise AssertionError("Session not found!") + if conv.dialog_id != dialog_id: + logger.warning( + "Iframe session/dialog mismatch: session_id=%s expected_dialog_id=%s actual_dialog_id=%s", + session_id, + dialog_id, + conv.dialog_id, + ) + raise AssertionError("Session does not belong to this dialog")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/db/services/conversation_service.py` around lines 247 - 249, Replace the two assertions after calling API4ConversationService.get_by_id(session_id) with explicit runtime checks: verify that the returned existence flag (e) is truthy and that conv.dialog_id equals dialog_id, and if either check fails raise an appropriate exception (e.g., ValueError or a more specific HTTP/auth exception) instead of using assert; also add a process- or module-level log call (e.g., process_logger.error or logger.warning) that records the session_id, dialog_id and the failure reason before raising to satisfy the new logging requirement in the iframe access guard.internal/cli/filesystem/skill.go-1176-1179 (1)
1176-1179:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep skill config lookup on
"auto"auth.Line 1178 is the lone
"web"call in this provider.getDefaultEmbdIDruns during skill indexing, and because this helper suppresses request errors, API-token sessions will silently fall back to an emptyembd_idhere.Suggested fix
resp, err := p.httpClient.Request("GET", fmt.Sprintf("/skills/config?embd_id=&space_id=%s", url.QueryEscape(spaceID)), - "web", nil, nil) + "auto", nil, nil)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/filesystem/skill.go` around lines 1176 - 1179, The GET to "/skills/config" in skill.go is using hardcoded "web" auth which forces anonymous tokens and causes getDefaultEmbdID to fall back to empty embd_id; update the p.httpClient.Request call in the skill config lookup to use "auto" (replace the "web" string with "auto") so authentication is chosen automatically during skill indexing (locate the call in the provider method that requests "/skills/config" and change the auth argument to "auto").internal/cli/user_command.go-403-405 (1)
403-405:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the active auth mode in
getDatasetID.Line 405 hardcodes
"web", but this helper is reused by dataset operations that otherwise support API-token sessions. After this change, commands that resolve a dataset name first will fail for API-token-only users even thoughListDatasetsalready handles that case correctly.Suggested fix
func (c *RAGFlowClient) getDatasetID(datasetName string) (string, error) { - resp, err := c.HTTPClient.Request("GET", "/datasets", "web", nil, nil) + authKind := "web" + if c.HTTPClient.useAPIToken && c.HTTPClient.LoginToken == "" { + authKind = "api" + } + resp, err := c.HTTPClient.Request("GET", "/datasets", authKind, nil, nil) if err != nil { return "", fmt.Errorf("failed to list datasets: %w", err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/user_command.go` around lines 403 - 405, getDatasetID currently hardcodes the auth mode string "web" when calling c.HTTPClient.Request, which breaks API-token-only sessions; change the call in getDatasetID to use the client's active auth mode instead of the literal "web" (e.g., replace the hardcoded string with the same helper or property used by ListDatasets — call the client's getActiveAuthMode()/activeAuthMode/AuthMode accessor that ListDatasets uses) so c.HTTPClient.Request uses the correct auth mode for the current session.api/apps/services/dataset_api_service.py-1147-1151 (1)
1147-1151:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCollect tenant IDs for all selected datasets (don’t break on first match).
Lines 1147-1151 stop after the first matching tenant, so multi-dataset searches can miss valid datasets owned by other tenants and return incomplete results.
Proposed patch
- tenant_ids = [] - tenants = UserTenantService.query(user_id=tenant_id) - for tenant in tenants: - if any(KnowledgebaseService.query(tenant_id=tenant.tenant_id, id=kb_id) for kb_id in kb_ids): - tenant_ids.append(tenant.tenant_id) - break - else: - return False, "Only owner of datasets authorized for this operation." + # Access has already been validated per dataset above, so scope retrieval + # to all dataset owner tenants. + tenant_ids = list({kb.tenant_id for kb in kbs}) + if not tenant_ids: + return False, "Only owner of datasets authorized for this operation."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/apps/services/dataset_api_service.py` around lines 1147 - 1151, The loop over tenants stops at the first match because of the break, causing tenant_ids to miss other tenants owning datasets in kb_ids; in the block using UserTenantService.query and KnowledgebaseService.query remove the break so you collect all matching tenant. Also ensure you avoid duplicates when appending to tenant_ids (e.g., check if tenant.tenant_id not already in tenant_ids) so every tenant owning any kb_id is included.
| choices, ok := result["choices"].([]interface{}) | ||
| if !ok { | ||
| return nil, fmt.Errorf("no choices in response") | ||
| } | ||
|
|
||
| firstChoice, ok := choices[0].(map[string]interface{}) | ||
| if !ok { | ||
| return nil, fmt.Errorf("no choices in response") | ||
| } |
There was a problem hiding this comment.
Panic on empty choices array.
Line 152 only validates that choices is []interface{}, but Line 156 immediately indexes choices[0]. An empty array (well-formed JSON, just zero choices — e.g., a content-filtered response) will panic the request goroutine. The NVIDIA driver guards this at line 149 with len(choices) == 0.
🛡️ Proposed fix
choices, ok := result["choices"].([]interface{})
- if !ok {
+ if !ok || len(choices) == 0 {
return nil, fmt.Errorf("no choices in response")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| choices, ok := result["choices"].([]interface{}) | |
| if !ok { | |
| return nil, fmt.Errorf("no choices in response") | |
| } | |
| firstChoice, ok := choices[0].(map[string]interface{}) | |
| if !ok { | |
| return nil, fmt.Errorf("no choices in response") | |
| } | |
| choices, ok := result["choices"].([]interface{}) | |
| if !ok || len(choices) == 0 { | |
| return nil, fmt.Errorf("no choices in response") | |
| } | |
| firstChoice, ok := choices[0].(map[string]interface{}) | |
| if !ok { | |
| return nil, fmt.Errorf("no choices in response") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/entity/models/openrouter.go` around lines 151 - 159, The code parses
result["choices"] into choices but does not guard against an empty slice before
indexing; update the parsing in the function handling OpenRouter responses so
that after asserting choices, you check len(choices) == 0 and return a
descriptive error (e.g., "no choices in response") before attempting to access
choices[0]; ensure the same guard is used before converting choices[0] to
firstChoice (map[string]interface{}) to avoid a panic.
|
|
||
| req.Header.Set("Content-Type", "application/json") | ||
| req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", *apiConfig.ApiKey)) | ||
|
|
There was a problem hiding this comment.
Possible nil-pointer dereference on apiConfig.ApiKey (also at Lines 503, 559, 607).
ChatStreamlyWithSender, Rerank, ListModels, and Balance dereference *apiConfig.ApiKey without first checking that apiConfig is non-nil and ApiKey is non-nil. Only ChatWithMessages has that guard (Lines 60–62). Any caller that passes nil (e.g., the getModelConfig path that builds apiConfig only when an instance is found, or future direct callers) will panic the request goroutine.
🛡️ Proposed fix (apply to all four call sites)
+ if apiConfig == nil || apiConfig.ApiKey == nil || *apiConfig.ApiKey == "" {
+ return fmt.Errorf("api key is nil or empty")
+ }
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", *apiConfig.ApiKey))(Adjust the return type per method — nil, error for non-error-only signatures, and place the check before http.NewRequest so failures short-circuit.)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/entity/models/openrouter.go` around lines 270 - 273, Several
functions (ChatStreamlyWithSender, Rerank, ListModels, Balance) dereference
*apiConfig.ApiKey without null checks and can panic; add an early guard in each
function to check apiConfig != nil and apiConfig.ApiKey != nil before any use
(place the guard before calling http.NewRequest), and return the appropriate
zero-value plus an error (e.g., nil, error for functions returning (X, error) or
error for error-only signatures) so the failure short-circuits safely; update
all four call sites to use this same pattern.
| // convert result["data"] to []map[string]interface{} | ||
| models := make([]string, 0) | ||
| for _, model := range result["data"].([]interface{}) { | ||
| modelMap := model.(map[string]interface{}) | ||
| modelName := modelMap["id"].(string) | ||
| models = append(models, modelName) | ||
| } | ||
|
|
||
| return models, nil |
There was a problem hiding this comment.
Three chained unchecked type assertions — any unexpected payload will panic.
result["data"].([]interface{}), model.(map[string]interface{}), and modelMap["id"].(string) all use the single-return form, so any field type mismatch (or data being null/missing) crashes the goroutine. Compare with the NVIDIA ListModels (Lines 389–405), which uses the comma-ok form throughout and skips malformed entries.
🛡️ Proposed fix
- // convert result["data"] to []map[string]interface{}
- models := make([]string, 0)
- for _, model := range result["data"].([]interface{}) {
- modelMap := model.(map[string]interface{})
- modelName := modelMap["id"].(string)
- models = append(models, modelName)
- }
+ data, ok := result["data"].([]interface{})
+ if !ok {
+ return nil, fmt.Errorf("invalid models list format")
+ }
+ models := make([]string, 0, len(data))
+ for _, item := range data {
+ m, ok := item.(map[string]interface{})
+ if !ok {
+ continue
+ }
+ id, ok := m["id"].(string)
+ if !ok {
+ continue
+ }
+ models = append(models, id)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // convert result["data"] to []map[string]interface{} | |
| models := make([]string, 0) | |
| for _, model := range result["data"].([]interface{}) { | |
| modelMap := model.(map[string]interface{}) | |
| modelName := modelMap["id"].(string) | |
| models = append(models, modelName) | |
| } | |
| return models, nil | |
| data, ok := result["data"].([]interface{}) | |
| if !ok { | |
| return nil, fmt.Errorf("invalid models list format") | |
| } | |
| models := make([]string, 0, len(data)) | |
| for _, item := range data { | |
| m, ok := item.(map[string]interface{}) | |
| if !ok { | |
| continue | |
| } | |
| id, ok := m["id"].(string) | |
| if !ok { | |
| continue | |
| } | |
| models = append(models, id) | |
| } | |
| return models, nil |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/entity/models/openrouter.go` around lines 582 - 590, The code
performing three unchecked type assertions (result["data"].([]interface{}),
model.(map[string]interface{}), modelMap["id"].(string)) can panic on malformed
payloads; update the conversion in the function that builds models to use
comma-ok type assertions for each step (check result["data"] exists and is
[]interface{}, each item is map[string]interface{}, and the "id" is a string),
skip entries that fail these checks (log or silently ignore) and continue, and
only append validated model names to the models slice so the goroutine cannot
panic on unexpected payloads.
| def get_presigned_url(self, bucket, fnm, expires): | ||
| blob_name = f"{bucket}/{fnm}" | ||
| for _ in range(10): | ||
| try: | ||
| return self.conn.get_presigned_url("GET", bucket, fnm, expires) | ||
| return self.conn.get_presigned_url("GET", bucket, blob_name, expires) | ||
| except Exception: | ||
| logging.exception(f"fail get {bucket}/{fnm}") | ||
| logging.exception(f"fail get {blob_name}") | ||
| self.__open__() | ||
| time.sleep(1) | ||
| return | ||
| return None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether ContainerClient exposes get_presigned_url and find callers.
python - <<'PY'
try:
from azure.storage.blob import ContainerClient
print("has get_presigned_url:", hasattr(ContainerClient, "get_presigned_url"))
print("sas-related members:", [m for m in dir(ContainerClient) if 'sas' in m.lower() or 'url' in m.lower()])
except Exception as e:
print("import failed:", e)
PY
rg -nP --type=py -C3 'RAGFlowAzureSasBlob\(\)\.get_presigned_url|azure_sas_conn.*get_presigned_url'Repository: infiniflow/ragflow
Length of output: 101
🌐 Web query:
azure-storage-blob ContainerClient get_presigned_url method API
💡 Result:
There is no azure.storage.blob.ContainerClient.get_presigned_url method in the Azure Storage Blobs Python SDK. What you likely want instead is to generate a SAS token/URL and then use it as a “pre-signed” URL: - For a container SAS (service SAS), create it with generate_container_sas and permissions/expiry, then append it to the container URL with? delimiter [1]. - The ContainerClient.from_container_url helper accepts a full container URL including a SAS token [2]. Example (Python, container SAS URL): from azure.storage.blob import generate_container_sas, ContainerSasPermissions from datetime import datetime, timezone, timedelta start_time = datetime.now(timezone.utc) expiry_time = start_time + timedelta(days=1) sas_token = generate_container_sas( account_name=container_client.account_name, container_name=container_client.container_name, account_key=account_key, permission=ContainerSasPermissions(read=True), expiry=expiry_time, start=start_time,) container_url_with_sas = f"{container_client.url}?{sas_token}" This SAS token is what effectively acts as a time-limited pre-signed URL; you can then create a ContainerClient from the SAS URL [1][2]. If you meant “pre-signed URL for a specific blob” (not the container), use the blob SAS generation helpers (e.g., generate_blob_sas) and append the resulting token to the blob URL; the same SAS-in-URL pattern applies [3][1].
Citations:
- 1: https://learn.microsoft.com/en-us/azure/storage/blobs/sas-service-create-python
- 2: https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.ContainerClient?view=azure-python&viewFallbackFrom=storage-py-v12
- 3: https://stackoverflow.com/questions/62523166/how-can-i-generate-an-azure-blob-sas-url-in-python
🏁 Script executed:
# Examine the actual implementation in the file being reviewed
cat -n rag/utils/azure_sas_conn.py | head -120
# Search for where get_presigned_url is defined or used
rg -n "get_presigned_url" --type=py
# Check requirements or setup files to see the azure-storage-blob version
fd -e txt -e py -e cfg -e toml | xargs rg -l "azure-storage-blob" 2>/dev/null | head -5Repository: infiniflow/ragflow
Length of output: 5288
🏁 Script executed:
# Look at azure_spn_conn.py to see what self.conn is there
sed -n '1,150p' rag/utils/azure_spn_conn.py | cat -nRepository: infiniflow/ragflow
Length of output: 5435
🏁 Script executed:
# Check if FileSystemClient has get_presigned_url in Azure SDK docs/patterns
# Also check what the minio_conn implementation does (since it seems to work)
sed -n '217,225p' rag/utils/minio_conn.py | cat -n
# Check s3_conn for comparison
sed -n '185,195p' rag/utils/s3_conn.py | cat -nRepository: infiniflow/ragflow
Length of output: 1124
ContainerClient does not have a get_presigned_url method — this will always fail at runtime.
The code calls self.conn.get_presigned_url("GET", bucket, blob_name, expires) on line 94, but azure.storage.blob.ContainerClient does not expose this method. The standard Azure SDK approach is to use generate_blob_sas() to create a SAS token, then manually construct a pre-signed URL by appending the token to the blob URL.
As written, every invocation will catch an AttributeError, retry 10 times, and return None. The same issue exists in azure_spn_conn.py line 118 with FileSystemClient.
Replace with proper SAS token generation using generate_blob_sas() from azure.storage.blob.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rag/utils/azure_sas_conn.py` around lines 90 - 99, The get_presigned_url
method currently calls self.conn.get_presigned_url which ContainerClient does
not provide; replace the implementation in get_presigned_url to generate a SAS
token using azure.storage.blob.generate_blob_sas and then build the full blob
URL + "?" + sas_token (use the blob name constructed as f"{bucket}/{fnm}" and
the ContainerClient.url or blob_client.url from self.conn to form the base URL),
preserving the retry logic and calls to self.__open__ on failure; also apply the
same change pattern for the analogous call in azure_spn_conn.py where
FileSystemClient is mistakenly used, ensuring you import generate_blob_sas and
use the proper account/key and permissions/expiry parameters when creating the
SAS token.
| for _ in range(3): | ||
| try: | ||
| f = self.conn.create_file(f"{bucket}/{fnm}") | ||
| f = self.conn.create_file(f"{blob}") | ||
| f.append_data(binary, offset=0, length=len(binary)) | ||
| return f.flush_data(len(binary)) | ||
| except Exception: | ||
| logging.exception(f"Fail put {bucket}/{fnm}") | ||
| logging.exception(f"Fail put {blob}") | ||
| self.__open__() | ||
| time.sleep(1) | ||
| return None | ||
| return None |
There was a problem hiding this comment.
Retry loop is short-circuited — put() never retries.
return None on line 82 lives inside the except branch, so the very first failure exits the function and the for _ in range(3): loop is effectively dead. If the first create_file fails, no retry happens; if it succeeds the explicit return None after the loop is unreachable. Compare with azure_sas_conn.py:put() which correctly lets the loop iterate.
🔧 Proposed fix
def put(self, bucket, fnm, binary, tenant_id=None):
blob = f"{bucket}/{fnm}"
for _ in range(3):
try:
- f = self.conn.create_file(f"{blob}")
+ f = self.conn.create_file(blob)
f.append_data(binary, offset=0, length=len(binary))
return f.flush_data(len(binary))
except Exception:
logging.exception(f"Fail put {blob}")
self.__open__()
time.sleep(1)
- return None
return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rag/utils/azure_spn_conn.py` around lines 73 - 83, The put() method's retry
loop is short-circuited because the except block returns None immediately;
remove the inner return so exceptions trigger the retry (call
logging.exception(...), self.__open__(), time.sleep(1) and continue), and only
return None after the for-loop if all attempts fail; keep the successful path
returning f.flush_data(len(binary)) from the try block. Reference the put()
implementation, self.conn.create_file, f.append_data, f.flush_data,
logging.exception, and self.__open__(), and model the loop behavior on
azure_sas_conn.py:put().
| def obj_exist(self, bucket, fnm): | ||
| blob = f"{bucket}/{fnm}" | ||
| try: | ||
| client = self.conn.get_file_client(f"{bucket}/{fnm}") | ||
| client = self.conn.get_blob_client(f"{blob}") | ||
| return client.exists() | ||
| except Exception: | ||
| logging.exception(f"Fail put {bucket}/{fnm}") | ||
| logging.exception(f"Fail put {blob}") | ||
| return False |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for FileSystemClient bindings and confirm available client accessors.
fd -t f 'requirements*.txt|pyproject.toml|setup.py|setup.cfg' --exec rg -nH 'azure-storage-file-datalake' {} \;
rg -nP --type=py -C2 '\bFileSystemClient\b'
python - <<'PY'
import inspect
try:
from azure.storage.filedatalake import FileSystemClient
methods = [m for m in dir(FileSystemClient) if 'client' in m.lower()]
print("client-accessor methods on FileSystemClient:", methods)
print("has get_blob_client:", hasattr(FileSystemClient, "get_blob_client"))
print("has get_file_client:", hasattr(FileSystemClient, "get_file_client"))
except Exception as e:
print("import failed:", e)
PYRepository: infiniflow/ragflow
Length of output: 1071
🏁 Script executed:
# Read the complete azure_spn_conn.py file to understand the context
cat -n rag/utils/azure_spn_conn.pyRepository: infiniflow/ragflow
Length of output: 5435
🏁 Script executed:
# Search for uses of get_file_client and get_blob_client in the codebase
rg -n 'get_file_client|get_blob_client' -t pyRepository: infiniflow/ragflow
Length of output: 346
🏁 Script executed:
# Look for any tests related to this file
fd -t f 'test.*azure|azure.*test' --search-path . -iRepository: infiniflow/ragflow
Length of output: 44
FileSystemClient has no get_blob_client — obj_exist will always fail.
self.conn is a FileSystemClient from azure.storage.filedatalake (line 56). This class exposes get_file_client() and get_directory_client(), not get_blob_client() (which belongs to BlobServiceClient). Line 108 will raise AttributeError at runtime, caught by the broad except handler and logged misleadingly as "Fail put" instead of "Fail obj_exist". The get() method on line 96 correctly uses get_file_client() on the same object.
🔧 Proposed fix
def obj_exist(self, bucket, fnm):
blob = f"{bucket}/{fnm}"
try:
- client = self.conn.get_blob_client(f"{blob}")
+ client = self.conn.get_file_client(f"{blob}")
return client.exists()
except Exception:
- logging.exception(f"Fail put {blob}")
+ logging.exception(f"Fail obj_exist {blob}")
return False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@rag/utils/azure_spn_conn.py` around lines 105 - 112, The obj_exist method
uses self.conn.get_blob_client which doesn't exist on FileSystemClient and will
always raise; change obj_exist to use the same pattern as get() by calling
self.conn.get_file_client(f"{bucket}/{fnm}") and then use the
FileClient.exists() result, and update the exception log message to something
like "Fail obj_exist {bucket}/{fnm}" to reflect the correct operation; ensure
you reference obj_exist, self.conn, get_file_client, and exists() when making
the change.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14645 +/- ##
==========================================
- Coverage 95.59% 94.16% -1.43%
==========================================
Files 10 10
Lines 703 703
Branches 112 112
==========================================
- Hits 672 662 -10
- Misses 21 25 +4
- Partials 10 16 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi, @wangq8 |
Related issues
Closes #14644
What problem does this PR solve?
This PR fixes an authorization bug where datasets marked with
permission = mecould still be accessed by other members of the same tenant through APIs that relied onKnowledgebaseService.accessible()orDocumentService.accessible().Before this change, those shared access helpers only checked tenant membership and did not enforce the dataset's permission mode. As a result, a non-owner who knew a private
dataset_idcould still reach downstream document and chunk operations even though the dataset was intended to be owner-only.This change updates the central access checks so that:
TEAMpermission = meremain inaccessible to non-ownersThe PR also adds regression coverage for private-vs-team dataset access behavior.
Type of change
Testing
test/unit_test/api/db/services/test_dataset_access_permissions.pypython -m pytest test\\unit_test\\api\\db\\services\\test_dataset_access_permissions.py -qstrenumdependency