Skip to content

fix(api): check kb ownership in /dify/retrieval#15028

Merged
wangq8 merged 4 commits into
infiniflow:mainfrom
dripsmvcp:fix/15027-dify-retrieval-tenant-check
May 21, 2026
Merged

fix(api): check kb ownership in /dify/retrieval#15028
wangq8 merged 4 commits into
infiniflow:mainfrom
dripsmvcp:fix/15027-dify-retrieval-tenant-check

Conversation

@dripsmvcp
Copy link
Copy Markdown
Contributor

@dripsmvcp dripsmvcp commented May 20, 2026

POST /api/v1/dify/retrieval resolved the caller via @apikey_required (injecting tenant_id) but then fetched the requested knowledge_id with no tenant filter and ran the full retrieval pipeline against kb.tenant_id (the owner). Any valid Dify-compatible API key could retrieve chunks from any tenant whose KB UUID was known. Adds the missing ownership check.

Root Cause

api/apps/sdk/dify_retrieval.py line 253: KnowledgebaseService.get_by_id(kb_id) fetched the KB by id alone, then the handler used kb.tenant_id (the OWNER) to build the embedding model and call the retriever. The caller tenant_id was only used downstream at line 278 for retrieval_by_children, well after cross-tenant data was already retrieved.

grep confirmed there was no KnowledgebaseService.accessible call anywhere in the handler.

Fix

Two-line guard immediately after the existing get_by_id lookup, mirroring the pattern PR #14749 lands for the sibling sdk/doc.py routes (download, parse, stop_parsing, retrieval_test):

e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
    return build_error_result(message="Knowledgebase not found!", code=RetCode.NOT_FOUND)
  • if not KnowledgebaseService.accessible(kb_id, tenant_id):
  •   return build_error_result(message="No authorization.", code=RetCode.AUTHENTICATION_ERROR)
    
    if kb.tenant_embd_id:
    ...

KnowledgebaseService.accessible already handles solo-tenant ownership, team membership via TenantService.get_joined_tenants_by_user_id, and the permission=ME distinction. No behavior change for legitimate callers; cross-tenant callers now receive RetCode.AUTHENTICATION_ERROR (109).

Test Plan

  • Regression test added: test/unit_test/api/apps/sdk/test_dify_retrieval.py
    • test_cross_tenant_request_is_rejected -- attacker tenant calling owner tenant KB gets 109; retriever is not invoked
    • test_same_tenant_request_succeeds -- owner tenant gets the records back
    • test_missing_knowledge_base_returns_not_found -- missing KB returns 404 BEFORE the access check fires (legit callers see the clearer message)
  • All 3 tests pass after the fix
  • Cross-tenant test FAILS on pre-fix main (KeyError on result[code] because handler leaks records dict instead of returning auth error)
  • ruff check clean on both changed files
  • No drive-by reformatting in dify_retrieval.py -- only the 2 added lines

Post-fix output

test_cross_tenant_request_is_rejected           PASSED [ 33%]
test_same_tenant_request_succeeds               PASSED [ 66%]
test_missing_knowledge_base_returns_not_found   PASSED [100%]

============================== 3 passed in 0.04s ===============================

Closes #15027

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 20, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b78894bb-dc33-4004-be69-895839aca427

📥 Commits

Reviewing files that changed from the base of the PR and between a2b993f and 6449797.

📒 Files selected for processing (3)
  • api/apps/sdk/dify_retrieval.py
  • test/testcases/restful_api/test_dify_retrieval_routes_unit.py
  • test/unit_test/api/apps/sdk/test_dify_retrieval.py

📝 Walkthrough

Walkthrough

Adds a tenant authorization check to the Dify retrieval handler: after fetching a knowledge base, the handler verifies KnowledgebaseService.accessible(kb_id, tenant_id) and returns RetCode.AUTHENTICATION_ERROR ("No authorization.") on denial. Regression and unit tests cover denial, owner access, and missing-KB cases.

Changes

Tenant Authorization in Dify Retrieval

Layer / File(s) Summary
Authorization Gate Implementation
api/apps/sdk/dify_retrieval.py
After fetching the KB by ID, the handler calls KnowledgebaseService.accessible(kb_id, tenant_id); if false it logs a warning and returns error code 109 (AUTHENTICATION_ERROR) with message No authorization. before any retrieval or embedding setup.
Authorization Regression Test Suite
test/unit_test/api/apps/sdk/test_dify_retrieval.py
Adds test scaffolding and validates three scenarios: (1) non-owner tenant denied (code 109), no records, retriever not called, warning audit log emitted without leaking payload; (2) owner tenant succeeds, retriever invoked, returns one expected record; (3) missing knowledge_id returns 404 and accessible() is not called.
Unit test mocks updated
test/testcases/restful_api/test_dify_retrieval_routes_unit.py
Adds monkeypatches to set KnowledgebaseService.accessible to always return True for existing retrieval and exception-mapping tests to maintain consistent access behavior during those tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

🐞 bug, size:M, 🧪 test, 🐖api

Suggested reviewers

  • JinHai-CN
  • wangq8
  • Lynn-Inf

Poem

"I'm a rabbit guarding KB beds,
I hop and check each tenant's threads. 🐰
If sneaky paws try to peep inside,
I warn and close the gate with pride.
Tests hum — the secrets hide."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(api): check kb ownership in /dify/retrieval' accurately summarizes the main security fix—adding a tenant authorization check to prevent cross-tenant knowledge base access.
Description check ✅ Passed The PR description comprehensively covers the security vulnerability, root cause analysis, the two-line fix with context, and detailed test plan with passing test results.
Linked Issues check ✅ Passed The PR directly addresses issue #15027 by implementing the missing KnowledgebaseService.accessible() check after KB lookup, preventing cross-tenant IDOR and returning RetCode.AUTHENTICATION_ERROR as expected.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the IDOR vulnerability: the core fix in dify_retrieval.py, comprehensive regression tests, and updates to existing tests to mock accessibility checks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/apps/sdk/dify_retrieval.py`:
- Around line 256-257: The authorization-denied branch after calling
KnowledgebaseService.accessible(kb_id, tenant_id) must emit an audit log entry;
update the branch that returns build_error_result(...) to first log an audit
event (e.g., using the project's audit/logger) containing non-sensitive context
such as kb_id and tenant_id, the attempted action (access check), and caller
identity if available, but do not include request payloads or secrets; ensure
you use the same logging convention as other audit logs in the codebase and keep
the log message concise and machine-parseable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de7797d1-357d-41e1-a5f0-195770f3ab9d

📥 Commits

Reviewing files that changed from the base of the PR and between 0a0bae2 and bd648ac.

📒 Files selected for processing (2)
  • api/apps/sdk/dify_retrieval.py
  • test/unit_test/api/apps/sdk/test_dify_retrieval.py

Comment thread api/apps/sdk/dify_retrieval.py
dripsmvcp added a commit to dripsmvcp/ragflow that referenced this pull request May 20, 2026
Per CodeRabbit review on PR infiniflow#15028: the new authorization-denied branch
in api/apps/sdk/dify_retrieval.py should emit an audit log entry so
operators can detect repeated cross-tenant access attempts. Matches the
logger.warning convention used by the existing rejection branch in
api/apps/restful_apis/search_api.py.

The log line includes caller_tenant and knowledge_id; it deliberately
excludes the request payload to avoid leaking attempted query strings.
Test extended with caplog assertions covering both the presence and the
sanitization of the log.
@JinHai-CN
Copy link
Copy Markdown
Contributor

@dripsmvcp Since previous PR includes a binary file which is very big. So, I reset the head of main branch. Would you please re-submit the commit from current HEAD of main branch? Thank you very much.

@dripsmvcp
Copy link
Copy Markdown
Contributor Author

Sure thing @JinHai-CN
Thanks for the review

dripsmvcp added a commit to dripsmvcp/ragflow that referenced this pull request May 20, 2026
Per CodeRabbit review on PR infiniflow#15028: the new authorization-denied branch
in api/apps/sdk/dify_retrieval.py should emit an audit log entry so
operators can detect repeated cross-tenant access attempts. Matches the
logger.warning convention used by the existing rejection branch in
api/apps/restful_apis/search_api.py.

The log line includes caller_tenant and knowledge_id; it deliberately
excludes the request payload to avoid leaking attempted query strings.
Test extended with caplog assertions covering both the presence and the
sanitization of the log.
@dripsmvcp dripsmvcp force-pushed the fix/15027-dify-retrieval-tenant-check branch from ab7e235 to 7ad5bbf Compare May 20, 2026 06:02
@dripsmvcp
Copy link
Copy Markdown
Contributor Author

@JinHai-CN done — rebased the two commits onto the current HEAD of main (7783487) and force-pushed. PR #15028 now contains only:

f8506d9 fix(api): check kb ownership in /dify/retrieval
7ad5bbf fix(api): log warning on /dify/retrieval cross-tenant denial
+246 / -0 across two files; no binaries. Tests still pass and ruff is clean. Thanks for the quick turnaround on resetting main.

@wangq8 wangq8 added the ci Continue Integration label May 21, 2026
@wangq8 wangq8 marked this pull request as draft May 21, 2026 03:01
@wangq8 wangq8 marked this pull request as ready for review May 21, 2026 03:01
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.12%. Comparing base (fec0b96) to head (6449797).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15028   +/-   ##
=======================================
  Coverage   93.12%   93.12%           
=======================================
  Files          10       10           
  Lines         713      713           
  Branches      116      116           
=======================================
  Hits          664      664           
  Misses         29       29           
  Partials       20       20           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dripsmvcp added 4 commits May 21, 2026 12:40
`POST /api/v1/dify/retrieval` resolved the caller via @apikey_required
(injecting `tenant_id`) but then fetched the requested `knowledge_id`
with no tenant filter and ran the full retrieval pipeline against
`kb.tenant_id` (the owner). Any valid Dify-compatible API key could
retrieve chunks from any tenant whose KB UUID was known.

Mirror the pattern PR infiniflow#14749 adds to the sibling sdk/doc.py routes:
after the existing `KnowledgebaseService.get_by_id` lookup, call
`KnowledgebaseService.accessible(kb_id, tenant_id)` and return
`RetCode.AUTHENTICATION_ERROR` ("No authorization.") when it returns
False. No behavior change for owners or for team members already
allowed by the existing accessible() rules.

Closes infiniflow#15027
Per CodeRabbit review on PR infiniflow#15028: the new authorization-denied branch
in api/apps/sdk/dify_retrieval.py should emit an audit log entry so
operators can detect repeated cross-tenant access attempts. Matches the
logger.warning convention used by the existing rejection branch in
api/apps/restful_apis/search_api.py.

The log line includes caller_tenant and knowledge_id; it deliberately
excludes the request payload to avoid leaking attempted query strings.
Test extended with caplog assertions covering both the presence and the
sanitization of the log.
The _stub helper only replaced sys.modules["common.settings"]. Because
'common' was already imported (and its 'settings' attribute already
bound to the real module from earlier in the test session), the
'from common import settings' in dify_retrieval.py resolved via
attribute lookup, bypassing our stub. Result: settings.retriever was
the real None placeholder and test_same_tenant_request_succeeds hit
AttributeError on .retrieval().

Fix: when stubbing a submodule, also setattr the stub on the parent
package via monkeypatch (auto-reverted on test teardown).
The new ownership check added in df02085 calls accessible() between
get_by_id and the rest of the retrieval pipeline. The three success-
path tests in restful_api/test_dify_retrieval_routes_unit.py only
mocked get_by_id, so accessible() fell through to the real DB and
failed with 'Can't connect to MySQL'.

Stub accessible() to return True alongside each get_by_id mock that
returns an owner KB.
@dripsmvcp dripsmvcp force-pushed the fix/15027-dify-retrieval-tenant-check branch from a2b993f to 6449797 Compare May 21, 2026 03:41
@wangq8 wangq8 merged commit 440153c into infiniflow:main May 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: IDOR — cross-tenant KB exposure via POST /api/v1/dify/retrieval

3 participants