fix: secure document thumbnail image access by Parvezkhan0 · Pull Request #14766 · infiniflow/ragflow

Parvezkhan0 · 2026-05-11T05:30:23Z

What problem does this PR solve?

Fixes #14763
This PR fixes an authorization bypass in document image serving.
Previously, GET /api/v1/documents/images/<image_id> had no authentication and accepted a user-controlled storage key, which allowed unauthenticated and cross-tenant access to document thumbnails and even arbitrary objects stored in the same bucket. This change moves document thumbnails to an authenticated document-scoped endpoint, validates access with DocumentService.accessible(...), updates thumbnail URL generation, and restricts the legacy image route to authorized chunk-image access only.

Type of change

Bug Fix (non-breaking change which fixes an issue)

coderabbitai · 2026-05-11T05:30:57Z

📝 Walkthrough

Walkthrough

Adds authenticated GET /documents/<doc_id>/thumbnail, helpers to map image IDs to accessible documents and to build safe thumbnail URLs/headers, rewrites listing/thumbnails URLs to the per-document endpoint, tightens GET /documents/images/<image_id> with access checks, and updates test fixtures and tests.

Changes

Authenticated Thumbnail Endpoints with Access Control

Layer / File(s)	Summary
Required Imports `api/apps/restful_apis/document_api.py`	Adds `OrderByExpr` import used for image-to-document resolution (`range_8d32bcaac736`).
Document Listing Thumbnail URL Rewrite `api/apps/restful_apis/document_api.py`	Rewrites dataset document listing non-base64 thumbnail URLs to `/api/v1/documents/<doc_id>/thumbnail` (`range_8853e4cd5eec`).
Thumbnails Auth & Filtering `api/apps/restful_apis/document_api.py`	Adds `@login_required` to `/thumbnails`, filters requested `doc_ids` to those accessible to `current_user`, and rewrites returned thumbnail URLs to per-document endpoint (`range_2ba0f86cc51b`, `range_2e9ea99ca5e0`).
URL & Access Resolution Helpers `api/apps/restful_apis/document_api.py`	Adds helpers: thumbnail URL generator, safe response header/content-type applier, and `_get_accessible_chunk_image_doc_id()` which queries docStore with `OrderByExpr()` and verifies `DocumentService.accessible(...)` (`range_7c47c4c57693`).
New Document Thumbnail Endpoint `api/apps/restful_apis/document_api.py`	Adds `GET /documents/<doc_id>/thumbnail`: validates doc existence and KB access, rejects missing/non-binary thumbnails, fetches bytes from `settings.STORAGE_IMPL`, and sets Content-Type/headers from filename extension (`range_7803a2b41cf4`).
Image Endpoint Access Control `api/apps/restful_apis/document_api.py`	Updates `GET /documents/images/<image_id>` to validate `image_id` format, resolve owning accessible doc via helper, parse bucket/object, fetch bytes, and set Content-Type based on object name extension (`range_64a28500ba0f`).
Test Fixtures and Dependency Stubs `test/testcases/test_web_api/test_document_app/conftest.py`	Centralizes sys.modules stubs and loader helpers, deterministic `rag.nlp.search.index_name`, parser stubs, DB/service stubs, and utility helpers used when loading `document_api.py` in tests (`range_be0a6445f7ee`, `range_b5708b40d972`, `range_0653aaece14b`, `range_a0096fa5712e`, `range_44aae2681969`, `range_62ede9f84677`).
Test API Helper `test/testcases/test_web_api/test_common.py`	Adds `document_thumbnail()` helper to call the new thumbnail endpoint and return the raw response (`range_603458eafc08`).
Test Cases and Authorization `test/testcases/test_web_api/test_document_app/test_document_metadata.py`	Adds `test_thumbnail_auth_invalid` and `test_get_document_thumbnail_success_and_exception_unit`; updates image tests to exercise accessible-image branches and content-type expectations (`range_093b7f31ec09`, `range_209747fa5654`, `range_cf29bf613eeb`, `range_44788ab45a9d`).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API as DocumentAPI
  participant Store as DocStore
  participant DocSvc as DocumentService
  participant Storage
  Client->>API: GET /documents/images/<image_id> or /documents/<doc_id>/thumbnail
  API->>Store: query image owner via OrderByExpr (per-tenant index)
  Store-->>API: doc_id (or not found)
  API->>DocSvc: DocumentService.accessible(doc_id, current_user.id)
  DocSvc-->>API: accessible? (True/False)
  API->>Storage: STORAGE_IMPL.get(bucket, object_name)  (if accessible)
  Storage-->>API: bytes
  API-->>Client: HTTP response with safe headers and Content-Type from filename extension

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

infiniflow/ragflow#14344: Also modifies document thumbnail and image endpoints in api/apps/restful_apis/document_api.py.

Suggested labels

lgtm

Suggested reviewers

wangq8

Poem

🐰 A rabbit hops through thumbnails new,
Per-document endpoints guard the view.
Resolving image owners, access in sight,
Storage returns bytes only when right.
Hooray — small paws keep data tight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: secure document thumbnail image access' accurately summarizes the main change—adding authentication and authorization to document thumbnail endpoints to prevent cross-tenant data exfiltration.
Description check	✅ Passed	The PR description addresses both required template sections: it explains the authorization bypass being fixed (issue `#14763`) and correctly marks the change type as a bug fix.
Linked Issues check	✅ Passed	The code changes comprehensively address all objectives from issue `#14763`: authentication is enforced via `@login_required`, authorization uses DocumentService.accessible(...), thumbnails are served only by document lookup, URL builders are updated, and tests cover anonymous/cross-tenant denial.
Out of Scope Changes check	✅ Passed	All changes are narrowly scoped to fixing the authorization bypass: new authenticated thumbnail endpoint, updated URL generation, hardened legacy image route, test infrastructure refactoring, and related test coverage—no unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

test/testcases/test_web_api/test_document_app/test_document_metadata.py (1)
534-543: ⚡ Quick win

Please add a direct unit test for _get_accessible_chunk_image_doc_id().

This test replaces the new authorization helper with a lambda, so it never exercises the tenant/index lookup that actually closes the vulnerability. Given that this PR is a security fix, I’d keep route coverage here but add a focused unit test for the helper itself.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`
around lines 534 - 543, Add a focused unit test that directly exercises
_get_accessible_chunk_image_doc_id rather than stubbing it in the route test:
call module._get_accessible_chunk_image_doc_id with image IDs representing (a) a
chunk image that is accessible (mock the authorization/index/tenant lookup
functions it uses to return a valid doc id) and assert it returns the expected
"doc-..." id, and (b) a chunk image that is not accessible (mock the lookups to
deny access) and assert it returns None; use monkeypatch to replace the specific
helper functions or data sources the helper calls (not the helper itself) so the
tenant/index resolution logic runs, and keep assertions on returned values only.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1708-1712: The image response builds a Response and sets
Content-Type directly, but it must reuse apply_safe_file_response_headers to
enforce safe headers (prevent serving active types like SVG). Replace the direct
response.headers.set(...) call and the return with a call to
apply_safe_file_response_headers(response, filename=doc.thumbnail) (or the
helper's actual signature) after make_response, or call the helper first then
set any fallback Content-Type with CONTENT_TYPE_MAP if needed; ensure you still
await thread_pool_exec(STORAGE_IMPL.get, doc.kb_id, doc.thumbnail) and use
make_response(response_body) before invoking apply_safe_file_response_headers so
the thumbnail and content type are derived via the helper instead of setting
headers manually.

In `@test/testcases/test_web_api/test_document_app/conftest.py`:
- Line 243: Replace the no-op stub with one that returns the same tuple shape as
the real helper: implement api_utils_mod.check_duplicate_ids as a small stub
that accepts *args, **kwargs and returns (unique_ids, duplicate_messages) — e.g.
return (args[0] if args else kwargs.get('ids'), []) so callers unpacking
(unique_ids, duplicate_messages) get valid values; apply the same change to the
other fixture instance.

---

Nitpick comments:
In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 534-543: Add a focused unit test that directly exercises
_get_accessible_chunk_image_doc_id rather than stubbing it in the route test:
call module._get_accessible_chunk_image_doc_id with image IDs representing (a) a
chunk image that is accessible (mock the authorization/index/tenant lookup
functions it uses to return a valid doc id) and assert it returns the expected
"doc-..." id, and (b) a chunk image that is not accessible (mock the lookups to
deny access) and assert it returns None; use monkeypatch to replace the specific
helper functions or data sources the helper calls (not the helper itself) so the
tenant/index resolution logic runs, and keep assertions on returned values only.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bb6daffb-5261-4c30-b498-d5836a8fb6d6

📥 Commits

Reviewing files that changed from the base of the PR and between 46897d6 and b42a0e4.

📒 Files selected for processing (4)

api/apps/restful_apis/document_api.py
test/testcases/test_web_api/test_common.py
test/testcases/test_web_api/test_document_app/conftest.py
test/testcases/test_web_api/test_document_app/test_document_metadata.py

Copilot

Pull request overview

This PR addresses an authorization bypass in document image serving by moving document thumbnails to an authenticated, document-scoped endpoint and restricting the legacy image route to authorized access patterns.

Changes:

Added authenticated GET /api/v1/documents/<doc_id>/thumbnail and updated thumbnail URL generation to use it.
Secured the legacy GET /api/v1/documents/images/<image_id> route with authentication plus doc-store-backed authorization checks.
Added/updated web API and unit tests to cover invalid-auth and new thumbnail/image behaviors.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`api/apps/restful_apis/document_api.py`	Adds authenticated thumbnail endpoint, tightens legacy image access, and updates thumbnail URL generation.
`test/testcases/test_web_api/test_common.py`	Adds a client helper for the new document thumbnail endpoint.
`test/testcases/test_web_api/test_document_app/test_document_metadata.py`	Adds invalid-auth coverage for thumbnails and new unit tests for the new/secured handlers.
`test/testcases/test_web_api/test_document_app/conftest.py`	Extends module stubs to support the new imports/authorization logic in unit tests.

Comments suppressed due to low confidence (2)

api/apps/restful_apis/document_api.py:1236

list_thumbnails now requires authentication, but the OpenAPI docstring no longer includes the security section like other authenticated endpoints in this module. This can cause generated API docs/clients to treat it as public. Please update the docstring to reflect that it requires ApiKeyAuth/auth.

@manager.route("/thumbnails", methods=["GET"])  # noqa: F821
@login_required
def list_thumbnails():
    """
    Get thumbnails for documents.
    ---
    tags:
      - Documents
    parameters:
      - in: query
        name: doc_ids
        type: array
        required: true
        description: List of document IDs to get thumbnails for.
    responses:
      200:
        description: Successfully retrieved thumbnails

api/apps/restful_apis/document_api.py:1733

get_document_image is now @login_required, but its OpenAPI docstring does not declare any security requirement. Please add the security stanza (consistent with other routes) so the API schema/docs match runtime behavior.

@manager.route("/documents/images/<image_id>", methods=["GET"])  # noqa: F821
@login_required
async def get_document_image(image_id):
    """
    Get a document image by ID.
    ---
    tags:
      - Documents
    parameters:
      - name: image_id
        in: path
        required: true
        schema:
          type: string
        description: The image ID (format: bucket-name-image-name)
    responses:
      200:

+        if not DocumentService.accessible(doc_id, current_user.id):
+            logging.warning("get_document_thumbnail: access denied for doc_id=%s user_id=%s", doc_id, current_user.id)
+            return get_data_error_result(message="Document not found!")
+
+        e, doc = DocumentService.get_by_id(doc_id)
+        if not e or not doc.thumbnail or doc.thumbnail.startswith(IMG_BASE64_PREFIX):


+def _get_accessible_chunk_image_doc_id(image_id: str) -> str | None:
+    tenants = UserTenantService.query(user_id=current_user.id)
+    if not tenants:
+        return None
+
+    accessible_kbs, _ = KnowledgebaseService.get_by_tenant_ids(
+        [tenant.tenant_id for tenant in tenants],
+        current_user.id,
+        0,
+        0,
+        "update_time",
+        True,
+        "",
+    )
+
+    kb_ids_by_tenant = {}
+    for kb in accessible_kbs:
+        kb_ids_by_tenant.setdefault(kb["tenant_id"], []).append(kb["id"])
+
+    for tenant_id, kb_ids in kb_ids_by_tenant.items():
+        index_name = search.index_name(tenant_id)
+        for kb_id in kb_ids:
+            if not settings.docStoreConn.index_exist(index_name, kb_id):
+                continue
+
+            result = settings.docStoreConn.search(
+                ["doc_id"],
+                [],
+                {"img_id": image_id},
+                [],
+                OrderByExpr(),
+                0,
+                1,
+                index_name,
+                [kb_id],
+            )
+            fields = settings.docStoreConn.get_fields(result, ["doc_id"])
+            if not fields:
+                continue
+
+            doc_id = next(iter(fields.values())).get("doc_id")
+            if doc_id and DocumentService.accessible(doc_id, current_user.id):
+                return doc_id
+
+    return None


+    common_settings_mod = ModuleType("common.settings")
+    common_settings_mod.STORAGE_IMPL = SimpleNamespace(get=lambda *_args, **_kwargs: b"", obj_exist=lambda *_args, **_kwargs: False)
+    common_settings_mod.docStoreConn = SimpleNamespace(
+        index_exist=lambda *_args, **_kwargs: False,
+        search=lambda *_args, **_kwargs: {},
+        get_fields=lambda *_args, **_kwargs: {},
+    )
+    monkeypatch.setitem(sys.modules, "common.settings", common_settings_mod)
+
+    metadata_utils_mod = ModuleType("common.metadata_utils")
+    metadata_utils_mod.convert_conditions = lambda *_args, **_kwargs: {}
+    metadata_utils_mod.meta_filter = lambda *_args, **_kwargs: True
+    metadata_utils_mod.turn2jsonschema = lambda value: value
+    monkeypatch.setitem(sys.modules, "common.metadata_utils", metadata_utils_mod)
+
+    rag_nlp_mod = ModuleType("rag.nlp")
+    rag_nlp_mod.search = SimpleNamespace(index_name=lambda tenant_id: f"ragflow_{tenant_id}")
+    monkeypatch.setitem(sys.modules, "rag.nlp", rag_nlp_mod)
+


+    def _pass_through_decorator(func):
+        return func
+


coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/testcases/test_web_api/test_document_app/conftest.py`:
- Around line 81-82: The stub _check_duplicate_ids currently returns
deduplicated ids using list(set(ids)) which drops insertion order; change it to
preserve original order by using an ordered-dedup approach (e.g.
list(dict.fromkeys(ids))) so unique_ids maintain the input sequence; keep the
second return value as the empty list and do this replacement inside the
_check_duplicate_ids function.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 12a306fd-ced0-43a3-a322-668cb73a4c88

📥 Commits

Reviewing files that changed from the base of the PR and between b42a0e4 and ba0a0f0.

📒 Files selected for processing (3)

api/apps/restful_apis/document_api.py
test/testcases/test_web_api/test_document_app/conftest.py
test/testcases/test_web_api/test_document_app/test_document_metadata.py

🚧 Files skipped from review as they are similar to previous changes (2)

test/testcases/test_web_api/test_document_app/test_document_metadata.py
api/apps/restful_apis/document_api.py

coderabbitai · 2026-05-11T06:27:23Z

+def _check_duplicate_ids(ids, *_args, **_kwargs):
+    return list(set(ids)), []


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve insertion order in _check_duplicate_ids stub.

list(set(ids)) deduplicates but does not preserve the input order, so any test that asserts on positional ordering of unique_ids (or iterates through them deterministically) may become flaky. The real helper conventionally preserves order while deduplicating; using dict.fromkeys matches that behavior with negligible cost.

♻️ Proposed fix

def _check_duplicate_ids(ids, *_args, **_kwargs): - return list(set(ids)), [] + return list(dict.fromkeys(ids)), []

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/testcases/test_web_api/test_document_app/conftest.py` around lines 81 - 82, The stub _check_duplicate_ids currently returns deduplicated ids using list(set(ids)) which drops insertion order; change it to preserve original order by using an ordered-dedup approach (e.g. list(dict.fromkeys(ids))) so unique_ids maintain the input sequence; keep the second return value as the empty list and do this replacement inside the _check_duplicate_ids function.

fix: secure document thumbnail image access

b42a0e4

Copilot AI review requested due to automatic review settings May 11, 2026 05:30

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. 🧪 test Pull requests that update test cases. labels May 11, 2026

Copilot started reviewing on behalf of Parvezkhan0 May 11, 2026 05:30 View session

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread api/apps/restful_apis/document_api.py

Comment thread test/testcases/test_web_api/test_document_app/conftest.py Outdated

Copilot AI reviewed May 11, 2026

View reviewed changes

fix: address thumbnail access review feedback

ba0a0f0

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: secure document thumbnail image access#14766

fix: secure document thumbnail image access#14766
Parvezkhan0 wants to merge 2 commits into
infiniflow:mainfrom
Parvezkhan0:fix/document-thumbnail-auth-14763

Parvezkhan0 commented May 11, 2026 •

edited by JinHai-CN

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def _check_duplicate_ids(ids, _args, *_kwargs):
		return list(set(ids)), []

Conversation

Parvezkhan0 commented May 11, 2026 • edited by JinHai-CN Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Parvezkhan0 commented May 11, 2026 •

edited by JinHai-CN

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading