fix(api): check kb ownership in /dify/retrieval#15028
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a tenant authorization check to the Dify retrieval handler: after fetching a knowledge base, the handler verifies KnowledgebaseService.accessible(kb_id, tenant_id) and returns RetCode.AUTHENTICATION_ERROR ("No authorization.") on denial. Regression and unit tests cover denial, owner access, and missing-KB cases. ChangesTenant Authorization in Dify Retrieval
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/apps/sdk/dify_retrieval.py`:
- Around line 256-257: The authorization-denied branch after calling
KnowledgebaseService.accessible(kb_id, tenant_id) must emit an audit log entry;
update the branch that returns build_error_result(...) to first log an audit
event (e.g., using the project's audit/logger) containing non-sensitive context
such as kb_id and tenant_id, the attempted action (access check), and caller
identity if available, but do not include request payloads or secrets; ensure
you use the same logging convention as other audit logs in the codebase and keep
the log message concise and machine-parseable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: de7797d1-357d-41e1-a5f0-195770f3ab9d
📒 Files selected for processing (2)
api/apps/sdk/dify_retrieval.pytest/unit_test/api/apps/sdk/test_dify_retrieval.py
Per CodeRabbit review on PR infiniflow#15028: the new authorization-denied branch in api/apps/sdk/dify_retrieval.py should emit an audit log entry so operators can detect repeated cross-tenant access attempts. Matches the logger.warning convention used by the existing rejection branch in api/apps/restful_apis/search_api.py. The log line includes caller_tenant and knowledge_id; it deliberately excludes the request payload to avoid leaking attempted query strings. Test extended with caplog assertions covering both the presence and the sanitization of the log.
|
@dripsmvcp Since previous PR includes a binary file which is very big. So, I reset the head of main branch. Would you please re-submit the commit from current HEAD of main branch? Thank you very much. |
|
Sure thing @JinHai-CN |
Per CodeRabbit review on PR infiniflow#15028: the new authorization-denied branch in api/apps/sdk/dify_retrieval.py should emit an audit log entry so operators can detect repeated cross-tenant access attempts. Matches the logger.warning convention used by the existing rejection branch in api/apps/restful_apis/search_api.py. The log line includes caller_tenant and knowledge_id; it deliberately excludes the request payload to avoid leaking attempted query strings. Test extended with caplog assertions covering both the presence and the sanitization of the log.
ab7e235 to
7ad5bbf
Compare
|
@JinHai-CN done — rebased the two commits onto the current HEAD of main (7783487) and force-pushed. PR #15028 now contains only: f8506d9 fix(api): check kb ownership in /dify/retrieval |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #15028 +/- ##
=======================================
Coverage 93.12% 93.12%
=======================================
Files 10 10
Lines 713 713
Branches 116 116
=======================================
Hits 664 664
Misses 29 29
Partials 20 20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
`POST /api/v1/dify/retrieval` resolved the caller via @apikey_required (injecting `tenant_id`) but then fetched the requested `knowledge_id` with no tenant filter and ran the full retrieval pipeline against `kb.tenant_id` (the owner). Any valid Dify-compatible API key could retrieve chunks from any tenant whose KB UUID was known. Mirror the pattern PR infiniflow#14749 adds to the sibling sdk/doc.py routes: after the existing `KnowledgebaseService.get_by_id` lookup, call `KnowledgebaseService.accessible(kb_id, tenant_id)` and return `RetCode.AUTHENTICATION_ERROR` ("No authorization.") when it returns False. No behavior change for owners or for team members already allowed by the existing accessible() rules. Closes infiniflow#15027
Per CodeRabbit review on PR infiniflow#15028: the new authorization-denied branch in api/apps/sdk/dify_retrieval.py should emit an audit log entry so operators can detect repeated cross-tenant access attempts. Matches the logger.warning convention used by the existing rejection branch in api/apps/restful_apis/search_api.py. The log line includes caller_tenant and knowledge_id; it deliberately excludes the request payload to avoid leaking attempted query strings. Test extended with caplog assertions covering both the presence and the sanitization of the log.
The _stub helper only replaced sys.modules["common.settings"]. Because 'common' was already imported (and its 'settings' attribute already bound to the real module from earlier in the test session), the 'from common import settings' in dify_retrieval.py resolved via attribute lookup, bypassing our stub. Result: settings.retriever was the real None placeholder and test_same_tenant_request_succeeds hit AttributeError on .retrieval(). Fix: when stubbing a submodule, also setattr the stub on the parent package via monkeypatch (auto-reverted on test teardown).
The new ownership check added in df02085 calls accessible() between get_by_id and the rest of the retrieval pipeline. The three success- path tests in restful_api/test_dify_retrieval_routes_unit.py only mocked get_by_id, so accessible() fell through to the real DB and failed with 'Can't connect to MySQL'. Stub accessible() to return True alongside each get_by_id mock that returns an owner KB.
a2b993f to
6449797
Compare
POST /api/v1/dify/retrieval resolved the caller via @apikey_required (injecting tenant_id) but then fetched the requested knowledge_id with no tenant filter and ran the full retrieval pipeline against kb.tenant_id (the owner). Any valid Dify-compatible API key could retrieve chunks from any tenant whose KB UUID was known. Adds the missing ownership check.
Root Cause
api/apps/sdk/dify_retrieval.py line 253: KnowledgebaseService.get_by_id(kb_id) fetched the KB by id alone, then the handler used kb.tenant_id (the OWNER) to build the embedding model and call the retriever. The caller tenant_id was only used downstream at line 278 for retrieval_by_children, well after cross-tenant data was already retrieved.
grep confirmed there was no KnowledgebaseService.accessible call anywhere in the handler.
Fix
Two-line guard immediately after the existing get_by_id lookup, mirroring the pattern PR #14749 lands for the sibling sdk/doc.py routes (download, parse, stop_parsing, retrieval_test):
...
KnowledgebaseService.accessible already handles solo-tenant ownership, team membership via TenantService.get_joined_tenants_by_user_id, and the permission=ME distinction. No behavior change for legitimate callers; cross-tenant callers now receive RetCode.AUTHENTICATION_ERROR (109).
Test Plan
Post-fix output
Closes #15027