fix: secure document thumbnail image access#14766
Conversation
📝 WalkthroughWalkthroughAdds authenticated GET /documents/<doc_id>/thumbnail, helpers to map image IDs to accessible documents and to build safe thumbnail URLs/headers, rewrites listing/thumbnails URLs to the per-document endpoint, tightens GET /documents/images/<image_id> with access checks, and updates test fixtures and tests. ChangesAuthenticated Thumbnail Endpoints with Access Control
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as DocumentAPI
participant Store as DocStore
participant DocSvc as DocumentService
participant Storage
Client->>API: GET /documents/images/<image_id> or /documents/<doc_id>/thumbnail
API->>Store: query image owner via OrderByExpr (per-tenant index)
Store-->>API: doc_id (or not found)
API->>DocSvc: DocumentService.accessible(doc_id, current_user.id)
DocSvc-->>API: accessible? (True/False)
API->>Storage: STORAGE_IMPL.get(bucket, object_name) (if accessible)
Storage-->>API: bytes
API-->>Client: HTTP response with safe headers and Content-Type from filename extension
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/testcases/test_web_api/test_document_app/test_document_metadata.py (1)
534-543: ⚡ Quick winPlease add a direct unit test for
_get_accessible_chunk_image_doc_id().This test replaces the new authorization helper with a lambda, so it never exercises the tenant/index lookup that actually closes the vulnerability. Given that this PR is a security fix, I’d keep route coverage here but add a focused unit test for the helper itself.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py` around lines 534 - 543, Add a focused unit test that directly exercises _get_accessible_chunk_image_doc_id rather than stubbing it in the route test: call module._get_accessible_chunk_image_doc_id with image IDs representing (a) a chunk image that is accessible (mock the authorization/index/tenant lookup functions it uses to return a valid doc id) and assert it returns the expected "doc-..." id, and (b) a chunk image that is not accessible (mock the lookups to deny access) and assert it returns None; use monkeypatch to replace the specific helper functions or data sources the helper calls (not the helper itself) so the tenant/index resolution logic runs, and keep assertions on returned values only.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1708-1712: The image response builds a Response and sets
Content-Type directly, but it must reuse apply_safe_file_response_headers to
enforce safe headers (prevent serving active types like SVG). Replace the direct
response.headers.set(...) call and the return with a call to
apply_safe_file_response_headers(response, filename=doc.thumbnail) (or the
helper's actual signature) after make_response, or call the helper first then
set any fallback Content-Type with CONTENT_TYPE_MAP if needed; ensure you still
await thread_pool_exec(STORAGE_IMPL.get, doc.kb_id, doc.thumbnail) and use
make_response(response_body) before invoking apply_safe_file_response_headers so
the thumbnail and content type are derived via the helper instead of setting
headers manually.
In `@test/testcases/test_web_api/test_document_app/conftest.py`:
- Line 243: Replace the no-op stub with one that returns the same tuple shape as
the real helper: implement api_utils_mod.check_duplicate_ids as a small stub
that accepts *args, **kwargs and returns (unique_ids, duplicate_messages) — e.g.
return (args[0] if args else kwargs.get('ids'), []) so callers unpacking
(unique_ids, duplicate_messages) get valid values; apply the same change to the
other fixture instance.
---
Nitpick comments:
In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 534-543: Add a focused unit test that directly exercises
_get_accessible_chunk_image_doc_id rather than stubbing it in the route test:
call module._get_accessible_chunk_image_doc_id with image IDs representing (a) a
chunk image that is accessible (mock the authorization/index/tenant lookup
functions it uses to return a valid doc id) and assert it returns the expected
"doc-..." id, and (b) a chunk image that is not accessible (mock the lookups to
deny access) and assert it returns None; use monkeypatch to replace the specific
helper functions or data sources the helper calls (not the helper itself) so the
tenant/index resolution logic runs, and keep assertions on returned values only.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bb6daffb-5261-4c30-b498-d5836a8fb6d6
📒 Files selected for processing (4)
api/apps/restful_apis/document_api.pytest/testcases/test_web_api/test_common.pytest/testcases/test_web_api/test_document_app/conftest.pytest/testcases/test_web_api/test_document_app/test_document_metadata.py
There was a problem hiding this comment.
Pull request overview
This PR addresses an authorization bypass in document image serving by moving document thumbnails to an authenticated, document-scoped endpoint and restricting the legacy image route to authorized access patterns.
Changes:
- Added authenticated
GET /api/v1/documents/<doc_id>/thumbnailand updated thumbnail URL generation to use it. - Secured the legacy
GET /api/v1/documents/images/<image_id>route with authentication plus doc-store-backed authorization checks. - Added/updated web API and unit tests to cover invalid-auth and new thumbnail/image behaviors.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
api/apps/restful_apis/document_api.py |
Adds authenticated thumbnail endpoint, tightens legacy image access, and updates thumbnail URL generation. |
test/testcases/test_web_api/test_common.py |
Adds a client helper for the new document thumbnail endpoint. |
test/testcases/test_web_api/test_document_app/test_document_metadata.py |
Adds invalid-auth coverage for thumbnails and new unit tests for the new/secured handlers. |
test/testcases/test_web_api/test_document_app/conftest.py |
Extends module stubs to support the new imports/authorization logic in unit tests. |
Comments suppressed due to low confidence (2)
api/apps/restful_apis/document_api.py:1236
list_thumbnailsnow requires authentication, but the OpenAPI docstring no longer includes thesecuritysection like other authenticated endpoints in this module. This can cause generated API docs/clients to treat it as public. Please update the docstring to reflect that it requiresApiKeyAuth/auth.
@manager.route("/thumbnails", methods=["GET"]) # noqa: F821
@login_required
def list_thumbnails():
"""
Get thumbnails for documents.
---
tags:
- Documents
parameters:
- in: query
name: doc_ids
type: array
required: true
description: List of document IDs to get thumbnails for.
responses:
200:
description: Successfully retrieved thumbnails
api/apps/restful_apis/document_api.py:1733
get_document_imageis now@login_required, but its OpenAPI docstring does not declare anysecurityrequirement. Please add the security stanza (consistent with other routes) so the API schema/docs match runtime behavior.
@manager.route("/documents/images/<image_id>", methods=["GET"]) # noqa: F821
@login_required
async def get_document_image(image_id):
"""
Get a document image by ID.
---
tags:
- Documents
parameters:
- name: image_id
in: path
required: true
schema:
type: string
description: The image ID (format: bucket-name-image-name)
responses:
200:
| if not DocumentService.accessible(doc_id, current_user.id): | ||
| logging.warning("get_document_thumbnail: access denied for doc_id=%s user_id=%s", doc_id, current_user.id) | ||
| return get_data_error_result(message="Document not found!") | ||
|
|
||
| e, doc = DocumentService.get_by_id(doc_id) | ||
| if not e or not doc.thumbnail or doc.thumbnail.startswith(IMG_BASE64_PREFIX): |
| def _get_accessible_chunk_image_doc_id(image_id: str) -> str | None: | ||
| tenants = UserTenantService.query(user_id=current_user.id) | ||
| if not tenants: | ||
| return None | ||
|
|
||
| accessible_kbs, _ = KnowledgebaseService.get_by_tenant_ids( | ||
| [tenant.tenant_id for tenant in tenants], | ||
| current_user.id, | ||
| 0, | ||
| 0, | ||
| "update_time", | ||
| True, | ||
| "", | ||
| ) | ||
|
|
||
| kb_ids_by_tenant = {} | ||
| for kb in accessible_kbs: | ||
| kb_ids_by_tenant.setdefault(kb["tenant_id"], []).append(kb["id"]) | ||
|
|
||
| for tenant_id, kb_ids in kb_ids_by_tenant.items(): | ||
| index_name = search.index_name(tenant_id) | ||
| for kb_id in kb_ids: | ||
| if not settings.docStoreConn.index_exist(index_name, kb_id): | ||
| continue | ||
|
|
||
| result = settings.docStoreConn.search( | ||
| ["doc_id"], | ||
| [], | ||
| {"img_id": image_id}, | ||
| [], | ||
| OrderByExpr(), | ||
| 0, | ||
| 1, | ||
| index_name, | ||
| [kb_id], | ||
| ) | ||
| fields = settings.docStoreConn.get_fields(result, ["doc_id"]) | ||
| if not fields: | ||
| continue | ||
|
|
||
| doc_id = next(iter(fields.values())).get("doc_id") | ||
| if doc_id and DocumentService.accessible(doc_id, current_user.id): | ||
| return doc_id | ||
|
|
||
| return None |
| common_settings_mod = ModuleType("common.settings") | ||
| common_settings_mod.STORAGE_IMPL = SimpleNamespace(get=lambda *_args, **_kwargs: b"", obj_exist=lambda *_args, **_kwargs: False) | ||
| common_settings_mod.docStoreConn = SimpleNamespace( | ||
| index_exist=lambda *_args, **_kwargs: False, | ||
| search=lambda *_args, **_kwargs: {}, | ||
| get_fields=lambda *_args, **_kwargs: {}, | ||
| ) | ||
| monkeypatch.setitem(sys.modules, "common.settings", common_settings_mod) | ||
|
|
||
| metadata_utils_mod = ModuleType("common.metadata_utils") | ||
| metadata_utils_mod.convert_conditions = lambda *_args, **_kwargs: {} | ||
| metadata_utils_mod.meta_filter = lambda *_args, **_kwargs: True | ||
| metadata_utils_mod.turn2jsonschema = lambda value: value | ||
| monkeypatch.setitem(sys.modules, "common.metadata_utils", metadata_utils_mod) | ||
|
|
||
| rag_nlp_mod = ModuleType("rag.nlp") | ||
| rag_nlp_mod.search = SimpleNamespace(index_name=lambda tenant_id: f"ragflow_{tenant_id}") | ||
| monkeypatch.setitem(sys.modules, "rag.nlp", rag_nlp_mod) | ||
|
|
| def _pass_through_decorator(func): | ||
| return func | ||
|
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/testcases/test_web_api/test_document_app/conftest.py`:
- Around line 81-82: The stub _check_duplicate_ids currently returns
deduplicated ids using list(set(ids)) which drops insertion order; change it to
preserve original order by using an ordered-dedup approach (e.g.
list(dict.fromkeys(ids))) so unique_ids maintain the input sequence; keep the
second return value as the empty list and do this replacement inside the
_check_duplicate_ids function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 12a306fd-ced0-43a3-a322-668cb73a4c88
📒 Files selected for processing (3)
api/apps/restful_apis/document_api.pytest/testcases/test_web_api/test_document_app/conftest.pytest/testcases/test_web_api/test_document_app/test_document_metadata.py
🚧 Files skipped from review as they are similar to previous changes (2)
- test/testcases/test_web_api/test_document_app/test_document_metadata.py
- api/apps/restful_apis/document_api.py
| def _check_duplicate_ids(ids, *_args, **_kwargs): | ||
| return list(set(ids)), [] |
There was a problem hiding this comment.
Preserve insertion order in _check_duplicate_ids stub.
list(set(ids)) deduplicates but does not preserve the input order, so any test that asserts on positional ordering of unique_ids (or iterates through them deterministically) may become flaky. The real helper conventionally preserves order while deduplicating; using dict.fromkeys matches that behavior with negligible cost.
♻️ Proposed fix
def _check_duplicate_ids(ids, *_args, **_kwargs):
- return list(set(ids)), []
+ return list(dict.fromkeys(ids)), []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/testcases/test_web_api/test_document_app/conftest.py` around lines 81 -
82, The stub _check_duplicate_ids currently returns deduplicated ids using
list(set(ids)) which drops insertion order; change it to preserve original order
by using an ordered-dedup approach (e.g. list(dict.fromkeys(ids))) so unique_ids
maintain the input sequence; keep the second return value as the empty list and
do this replacement inside the _check_duplicate_ids function.
What problem does this PR solve?
Fixes #14763
This PR fixes an authorization bypass in document image serving.
Previously, GET /api/v1/documents/images/<image_id> had no authentication and accepted a user-controlled storage key, which allowed unauthenticated and cross-tenant access to document thumbnails and even arbitrary objects stored in the same bucket. This change moves document thumbnails to an authenticated document-scoped endpoint, validates access with DocumentService.accessible(...), updates thumbnail URL generation, and restricts the legacy image route to authorized chunk-image access only.
Type of change