Refactor: migrate document thumbnails API#14344
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request refactors thumbnail and image serving by migrating HTTP handlers from the main document app to the RESTful API layer, implementing new endpoints with URL rewriting logic, and introducing comprehensive test coverage for the relocated functionality. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
api/apps/restful_apis/document_api.py (2)
915-916: Redundant local imports.
IMG_BASE64_PREFIXandDocumentServiceare already imported at the top of this module (lines 26 and 29). The function-scoped re-imports are dead weight.♻️ Proposed cleanup
def list_thumbnails(): ... - from api.constants import IMG_BASE64_PREFIX - from api.db.services.document_service import DocumentService - doc_ids = request.args.getlist("doc_ids")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 915 - 916, Remove the redundant function-scoped imports of IMG_BASE64_PREFIX and DocumentService inside the function where they were re-imported; these symbols are already imported at module top, so delete the lines "from api.constants import IMG_BASE64_PREFIX" and "from api.db.services.document_service import DocumentService" in that scope and rely on the module-level imports to avoid duplicate imports and dead weight.
896-931: No logging on the new flow.The error/success paths here have no
logging.*calls, unlike neighboring handlers (upload_document,list_docs,update_metadata_config) which log validation failures and exceptions. Add at minimum an error log for the missing-IDs branch and anlogging.exceptionin the except branch.As per coding guidelines:
**/*.py: Add logging for new flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 896 - 931, The list_thumbnails handler lacks logging: import the logging module and add a logging.error when doc_ids is empty in list_thumbnails with context (e.g., include request info or the missing parameter name and any user/kb context), and replace/augment the generic except return by calling logging.exception inside the except block (include doc_ids and any relevant variables) before returning server_error_response; ensure the log messages reference list_thumbnails and DocumentService.get_thumbnails/IMG_BASE64_PREFIX so it's clear which flow failed.test/testcases/test_web_api/test_document_app/test_document_metadata.py (1)
166-184: Thumbnail assertion for a.txtupload is fairly weak.A plain
.txthas no renderable thumbnail, sothumbnail()in the upload path returns"". The endpoint then produces"/v1/document/image/{kb_id}-"(trailing dash + empty value) or an empty string, and the test only asserts key presence. Consider either:
- uploading an image file so you can assert the value starts with
IMG_BASE64_PREFIX, or- at minimum asserting the mapped value is a string (catches
None/type regressions):res = document_thumbnails(WebApiAuth, params={"doc_ids": [doc_id]}) assert res["code"] == 0, res assert doc_id in res["data"], res + assert isinstance(res["data"][doc_id], str), resNot blocking — the happy-path key check is still useful coverage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py` around lines 166 - 184, The test test_thumbnails_with_doc_ids only checks that doc_id is present but doesn't validate the thumbnail value (upload_documents uses thumbnail() which returns "" for .txt), causing weak coverage; update the test to either upload an image file instead of a .txt so you can assert the returned thumbnail string starts with IMG_BASE64_PREFIX (or whatever base64/image prefix constant you use), or at minimum assert that res["data"][doc_id] is an instance of str to catch None/type regressions; reference the existing helpers upload_documents and document_thumbnails and the test name test_thumbnails_with_doc_ids when making the change.test/testcases/test_web_api/test_common.py (1)
432-440: Minor consistency nit: prefer composing the URL withVERSION.The rest of this module uses
f"/api/{VERSION}/..."(or named URL constants) rather than a hardcoded/api/v1/.... For consistency and to keep futureVERSIONbumps painless:♻️ Proposed tweak
- res = requests.get(url=f"{HOST_ADDRESS}/api/v1/thumbnails", params=params, headers=headers, auth=auth, data=data) + res = requests.get(url=f"{HOST_ADDRESS}/api/{VERSION}/thumbnails", params=params, headers=headers, auth=auth, data=data)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/testcases/test_web_api/test_common.py` around lines 432 - 440, In document_thumbnails replace the hardcoded "/api/v1/thumbnails" URL with the module's VERSION usage (e.g., build the URL using VERSION) so it follows the same pattern as other helpers; update the f-string that currently uses f"{HOST_ADDRESS}/api/v1/thumbnails" to use f"{HOST_ADDRESS}/api/{VERSION}/thumbnails" (or the shared named URL constant if present) so future VERSION bumps remain consistent with the rest of the module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 896-931: Add authentication, tenant scoping, and logging to the
list_thumbnails handler: decorate list_thumbnails with `@login_required` and
`@add_tenant_id_to_kwargs`, accept tenant_id from kwargs, log the missing doc_ids
case before returning the ARGUMENT_ERROR, call
DocumentService.get_thumbnails(doc_ids) as before but then filter the returned
docs by keeping only those where
KnowledgebaseService.accessible(kb_id=d["kb_id"], user_id=tenant_id) is True
(import KnowledgebaseService), and add logging in the exception handler before
returning server_error_response(e) so failures are recorded; keep the existing
thumbnail prefix logic and produced response shape.
---
Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 915-916: Remove the redundant function-scoped imports of
IMG_BASE64_PREFIX and DocumentService inside the function where they were
re-imported; these symbols are already imported at module top, so delete the
lines "from api.constants import IMG_BASE64_PREFIX" and "from
api.db.services.document_service import DocumentService" in that scope and rely
on the module-level imports to avoid duplicate imports and dead weight.
- Around line 896-931: The list_thumbnails handler lacks logging: import the
logging module and add a logging.error when doc_ids is empty in list_thumbnails
with context (e.g., include request info or the missing parameter name and any
user/kb context), and replace/augment the generic except return by calling
logging.exception inside the except block (include doc_ids and any relevant
variables) before returning server_error_response; ensure the log messages
reference list_thumbnails and DocumentService.get_thumbnails/IMG_BASE64_PREFIX
so it's clear which flow failed.
In `@test/testcases/test_web_api/test_common.py`:
- Around line 432-440: In document_thumbnails replace the hardcoded
"/api/v1/thumbnails" URL with the module's VERSION usage (e.g., build the URL
using VERSION) so it follows the same pattern as other helpers; update the
f-string that currently uses f"{HOST_ADDRESS}/api/v1/thumbnails" to use
f"{HOST_ADDRESS}/api/{VERSION}/thumbnails" (or the shared named URL constant if
present) so future VERSION bumps remain consistent with the rest of the module.
In `@test/testcases/test_web_api/test_document_app/test_document_metadata.py`:
- Around line 166-184: The test test_thumbnails_with_doc_ids only checks that
doc_id is present but doesn't validate the thumbnail value (upload_documents
uses thumbnail() which returns "" for .txt), causing weak coverage; update the
test to either upload an image file instead of a .txt so you can assert the
returned thumbnail string starts with IMG_BASE64_PREFIX (or whatever
base64/image prefix constant you use), or at minimum assert that
res["data"][doc_id] is an instance of str to catch None/type regressions;
reference the existing helpers upload_documents and document_thumbnails and the
test name test_thumbnails_with_doc_ids when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0df7d55e-0ea7-4207-91e1-ded64c9c5e90
📒 Files selected for processing (5)
api/apps/document_app.pyapi/apps/restful_apis/document_api.pytest/testcases/test_web_api/test_common.pytest/testcases/test_web_api/test_document_app/test_document_metadata.pyweb/src/utils/api.ts
| @manager.route("/thumbnails", methods=["GET"]) # noqa: F821 | ||
| def list_thumbnails(): | ||
| """ | ||
| Get thumbnails for documents. | ||
| --- | ||
| tags: | ||
| - Documents | ||
| parameters: | ||
| - in: query | ||
| name: doc_ids | ||
| type: array | ||
| required: true | ||
| description: List of document IDs to get thumbnails for. | ||
| responses: | ||
| 200: | ||
| description: Successfully retrieved thumbnails | ||
| 400: | ||
| description: Missing document IDs | ||
| """ | ||
| from api.constants import IMG_BASE64_PREFIX | ||
| from api.db.services.document_service import DocumentService | ||
|
|
||
| doc_ids = request.args.getlist("doc_ids") | ||
| if not doc_ids: | ||
| return get_json_result(data=False, message='Lack of "Document ID"', code=RetCode.ARGUMENT_ERROR) | ||
|
|
||
| try: | ||
| docs = DocumentService.get_thumbnails(doc_ids) | ||
|
|
||
| for doc_item in docs: | ||
| if doc_item["thumbnail"] and not doc_item["thumbnail"].startswith(IMG_BASE64_PREFIX): | ||
| doc_item["thumbnail"] = f"/v1/document/image/{doc_item['kb_id']}-{doc_item['thumbnail']}" | ||
|
|
||
| return get_json_result(data={d["id"]: d["thumbnail"] for d in docs}) | ||
| except Exception as e: | ||
| return server_error_response(e) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect the handler as it existed just before this PR's changes on the target branch.
git fetch origin main --depth=1 >/dev/null 2>&1 || true
git show origin/main:api/apps/document_app.py 2>/dev/null \
| awk '/@manager\.route\("\/thumbnails"/{flag=1} flag{print; if(/^def |^async def /){c++}; if(c>=1 && /^$/){exit}}'Repository: infiniflow/ragflow
Length of output: 331
Add authentication and authorization to the /thumbnails endpoint.
The list_thumbnails handler is missing @login_required and tenant/KB scoping, unlike all other handlers in this module:
- Unauthenticated public access. Any caller can retrieve thumbnail data and leak document
kb_idvalues. While the old handler was also undecorated, this endpoint must enforce authentication. - Cross-tenant enumeration. No permission check scopes
DocumentService.get_thumbnails(doc_ids)to the caller's tenant, allowing an authenticated user to enumerate documents across other tenants via direct-object-reference. - Missing logging. No logging for invalid arguments or exceptions, violating the coding guideline to add logging for new flows.
Add @login_required, @add_tenant_id_to_kwargs, and filter results by calling KnowledgebaseService.accessible(kb_id=d["kb_id"], user_id=tenant_id) before returning. Also add logging for the missing doc_ids case and exception handler. Sync the implementation with pattern used in list_docs and other authenticated endpoints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/apps/restful_apis/document_api.py` around lines 896 - 931, Add
authentication, tenant scoping, and logging to the list_thumbnails handler:
decorate list_thumbnails with `@login_required` and `@add_tenant_id_to_kwargs`,
accept tenant_id from kwargs, log the missing doc_ids case before returning the
ARGUMENT_ERROR, call DocumentService.get_thumbnails(doc_ids) as before but then
filter the returned docs by keeping only those where
KnowledgebaseService.accessible(kb_id=d["kb_id"], user_id=tenant_id) is True
(import KnowledgebaseService), and add logging in the exception handler before
returning server_error_response(e) so failures are recorded; keep the existing
thumbnail prefix logic and produced response shape.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14344 +/- ##
=======================================
Coverage 95.30% 95.30%
=======================================
Files 10 10
Lines 703 703
Branches 112 112
=======================================
Hits 670 670
Misses 16 16
Partials 17 17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1078e39 to
42608a6
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
api/apps/restful_apis/document_api.py (1)
982-1017:⚠️ Potential issue | 🟠 MajorAuth, tenant scoping, and logging still missing on
list_thumbnails.This handler still has no
@login_required/@add_tenant_id_to_kwargs, no permission check against the caller's tenant before returning thumbnails, and no logging on the argument-error or exception paths (the latter violates the repo's "Add logging for new flows" guideline). As-is, an unauthenticated caller can fetch thumbnails for arbitrarydoc_idsand enumeratekb_idvalues across tenants via IDOR. Please align this with the patterns used inlist_docs/ other endpoints in this module: decorate with@login_required+@add_tenant_id_to_kwargs, and filter results byKnowledgebaseService.accessible(kb_id=d["kb_id"], user_id=tenant_id)before returning.Also note that
IMG_BASE64_PREFIX(line 1001) andDocumentService(line 1002) are already imported at the top of the file (lines 26 and 30), so the local re-imports can be removed.As per coding guidelines: "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 982 - 1017, Add auth, tenant scoping, permission checks, logging, and remove duplicate imports in list_thumbnails: decorate the handler with `@login_required` and `@add_tenant_id_to_kwargs`, remove the local imports of IMG_BASE64_PREFIX and DocumentService, and call DocumentService.get_thumbnails(doc_ids) using the injected tenant_id; after fetching docs filter them with KnowledgebaseService.accessible(kb_id=d["kb_id"], user_id=tenant_id) (only include accessible docs), keep the existing base64-prefix logic using the top-of-file IMG_BASE64_PREFIX, and on argument errors and in the except block emit logs (use the module logger) before returning get_json_result(...) or server_error_response(e). Ensure you reference list_thumbnails, KnowledgebaseService.accessible, DocumentService.get_thumbnails, `@login_required`, and `@add_tenant_id_to_kwargs` in the changes.
🧹 Nitpick comments (1)
api/apps/restful_apis/document_api.py (1)
1386-1390: Drop the redundant in-function imports.
settings,thread_pool_exec,get_data_error_result, andserver_error_responseare all already imported at the top of this module (lines 35–36, 40, 43), so re-importing them inside the handler just adds noise and makes the call path slightly slower on each request. Onlymake_responseactually needs a local import here (top-levelfrom quart import requestdoesn't pull it in) — and even that could be promoted to the module-levelquartimport for consistency with the rest of the file.♻️ Proposed cleanup
- try: - from quart import make_response - - from common import settings - from common.misc_utils import thread_pool_exec - from api.utils.api_utils import get_data_error_result, server_error_response - - arr = image_id.split("-") + try: + from quart import make_response + + arr = image_id.split("-")Or move
make_responseup next tofrom quart import requestat line 19.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 1386 - 1390, Remove the redundant in-function imports of settings, thread_pool_exec, get_data_error_result, and server_error_response inside the handler; these are already imported at module level, so delete those local import statements and only keep make_response if needed, or better yet move make_response up to the module-level import alongside the existing "from quart import request" to keep imports consistent; update the handler to use the already-imported symbols (settings, thread_pool_exec, get_data_error_result, server_error_response) and ensure only make_response is imported once at module scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1362-1401: The get_document_image handler is unauthenticated and
unlogged; protect it by adding the authentication/authorization check (e.g.,
apply `@login_required` and verify the caller has access to the target tenant/KB
before calling settings.STORAGE_IMPL.get) in get_document_image, and log
failures: when image_id parsing fails or get_data_error_result is returned, log
the bad image_id and reason; in the exception handler log the image_id and the
caught exception (stack/trace) before returning server_error_response(e). Ensure
imports for the auth decorator/current_user/authorization helper and the module
logger (or use process_logger) are added and reference get_document_image,
settings.STORAGE_IMPL.get, get_data_error_result, and server_error_response so
the checks and logs are co-located with the storage fetch.
---
Duplicate comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 982-1017: Add auth, tenant scoping, permission checks, logging,
and remove duplicate imports in list_thumbnails: decorate the handler with
`@login_required` and `@add_tenant_id_to_kwargs`, remove the local imports of
IMG_BASE64_PREFIX and DocumentService, and call
DocumentService.get_thumbnails(doc_ids) using the injected tenant_id; after
fetching docs filter them with KnowledgebaseService.accessible(kb_id=d["kb_id"],
user_id=tenant_id) (only include accessible docs), keep the existing
base64-prefix logic using the top-of-file IMG_BASE64_PREFIX, and on argument
errors and in the except block emit logs (use the module logger) before
returning get_json_result(...) or server_error_response(e). Ensure you reference
list_thumbnails, KnowledgebaseService.accessible,
DocumentService.get_thumbnails, `@login_required`, and `@add_tenant_id_to_kwargs` in
the changes.
---
Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1386-1390: Remove the redundant in-function imports of settings,
thread_pool_exec, get_data_error_result, and server_error_response inside the
handler; these are already imported at module level, so delete those local
import statements and only keep make_response if needed, or better yet move
make_response up to the module-level import alongside the existing "from quart
import request" to keep imports consistent; update the handler to use the
already-imported symbols (settings, thread_pool_exec, get_data_error_result,
server_error_response) and ensure only make_response is imported once at module
scope.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 97420aee-1998-4a51-ba4e-2442001d622d
📒 Files selected for processing (6)
api/apps/document_app.pyapi/apps/restful_apis/document_api.pytest/testcases/test_web_api/test_common.pytest/testcases/test_web_api/test_document_app/test_document_metadata.pyweb/src/locales/zh.tsweb/src/utils/api.ts
✅ Files skipped from review due to trivial changes (2)
- web/src/utils/api.ts
- web/src/locales/zh.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- test/testcases/test_web_api/test_common.py
- test/testcases/test_web_api/test_document_app/test_document_metadata.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
api/apps/restful_apis/document_api.py (2)
1362-1395:⚠️ Potential issue | 🟠 Major
get_document_imagestill has no authentication and no logging.Already raised previously and not yet addressed. Without
@login_requiredand a KB-access check onbkt(which is akb_id), any caller who can guess or read a thumbnail URL can pull bytes directly out ofsettings.STORAGE_IMPLacross tenants (IDOR). Both theget_data_error_result(...)branch and theexcept Exceptionarm should also log the offendingimage_idand the failure cause; right now failures are silent.As per coding guidelines: "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 1362 - 1395, The endpoint get_document_image is missing authentication and tenant/Kb access checks and lacks logging; add the `@login_required` decorator to the get_document_image route, extract the bucket part (kb_id) from image_id and call the project's KB access check helper (e.g., check_kb_access(kb_id) or ensure_user_has_access_to_kb(kb_id)) before calling settings.STORAGE_IMPL.get via thread_pool_exec, and if the image_id format is invalid or access is denied return the appropriate get_data_error_result/403 while logging the offending image_id; also wrap exceptions to log image_id and the caught exception before returning server_error_response(e) so all failure paths log both image_id and error context.
982-1017:⚠️ Potential issue | 🔴 Critical
list_thumbnailsstill lacks authentication, tenant scoping, and logging.Already raised in a previous review and not addressed: handler is undecorated (no
@login_required/@add_tenant_id_to_kwargs),DocumentService.get_thumbnails(doc_ids)is not scoped to the caller's tenant/KB (cross-tenant enumeration), and there is no logging on the missing-doc_idspath or theexceptbranch — violating the repo guideline to add logging for new flows.As per coding guidelines: "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 982 - 1017, The list_thumbnails handler lacks authentication/tenant scoping and logging: add the appropriate decorators (e.g., `@login_required` and `@add_tenant_id_to_kwargs`) to list_thumbnails, extract the tenant/kb id from the handler kwargs and pass it into DocumentService.get_thumbnails (e.g., call DocumentService.get_thumbnails(doc_ids, tenant_id=tenant_id) or the KB-specific parameter expected by that method) to prevent cross-tenant enumeration, and add logging statements for the missing doc_ids branch and inside the except branch (include context like doc_ids and tenant_id); preserve the existing thumbnail prefix logic that uses IMG_BASE64_PREFIX and return behavior.
🧹 Nitpick comments (1)
api/apps/restful_apis/document_api.py (1)
1386-1389: Reusearrinstead of splitting twice.
image_id.split("-")is called on line 1386 and again on line 1389; the second call is redundant and risks drifting from the validation above. Also, sinceimage_idisf"{kb_id}-{thumbnail}", this only works becausekb_idis currently a hex UUID with no-. Worth a brief comment orrsplit("-", 1)to make the contract explicit and tolerate future ID format changes.♻️ Proposed tweak
- arr = image_id.split("-") - if len(arr) != 2: + arr = image_id.split("-", 1) + if len(arr) != 2: return get_data_error_result(message="Image not found.") - bkt, nm = image_id.split("-") + bkt, nm = arr🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 1386 - 1389, Replace the redundant second split by reusing the earlier result: after computing arr = image_id.split("-") and validating len(arr) == 2, assign bkt, nm = arr (or use arr[0], arr[1]) instead of calling image_id.split("-") again; alternatively use image_id.rsplit("-", 1) once and destructure into bkt, nm to make the contract explicit for future ID formats, and keep the existing get_data_error_result(message="Image not found.") validation intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1011-1015: The thumbnail URLs returned by list_thumbnails are
built with the wrong path ("/api/v1/documents/image/...") and thus 404 because
the handler get_document_image is mounted at "/documents/images/<image_id>";
update the URL rewrite inside the loop in list_thumbnails (the block iterating
docs where you set doc_item["thumbnail"]) to use the plural "images" path (e.g.,
"/api/v1/documents/images/{kb_id}-{thumbnail}") so it matches get_document_image
(also ensure consistency with list_docs which already uses "images").
---
Duplicate comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1362-1395: The endpoint get_document_image is missing
authentication and tenant/Kb access checks and lacks logging; add the
`@login_required` decorator to the get_document_image route, extract the bucket
part (kb_id) from image_id and call the project's KB access check helper (e.g.,
check_kb_access(kb_id) or ensure_user_has_access_to_kb(kb_id)) before calling
settings.STORAGE_IMPL.get via thread_pool_exec, and if the image_id format is
invalid or access is denied return the appropriate get_data_error_result/403
while logging the offending image_id; also wrap exceptions to log image_id and
the caught exception before returning server_error_response(e) so all failure
paths log both image_id and error context.
- Around line 982-1017: The list_thumbnails handler lacks authentication/tenant
scoping and logging: add the appropriate decorators (e.g., `@login_required` and
`@add_tenant_id_to_kwargs`) to list_thumbnails, extract the tenant/kb id from the
handler kwargs and pass it into DocumentService.get_thumbnails (e.g., call
DocumentService.get_thumbnails(doc_ids, tenant_id=tenant_id) or the KB-specific
parameter expected by that method) to prevent cross-tenant enumeration, and add
logging statements for the missing doc_ids branch and inside the except branch
(include context like doc_ids and tenant_id); preserve the existing thumbnail
prefix logic that uses IMG_BASE64_PREFIX and return behavior.
---
Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1386-1389: Replace the redundant second split by reusing the
earlier result: after computing arr = image_id.split("-") and validating
len(arr) == 2, assign bkt, nm = arr (or use arr[0], arr[1]) instead of calling
image_id.split("-") again; alternatively use image_id.rsplit("-", 1) once and
destructure into bkt, nm to make the contract explicit for future ID formats,
and keep the existing get_data_error_result(message="Image not found.")
validation intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 94db55f8-6859-45dc-8994-f512056dc652
📒 Files selected for processing (2)
api/apps/document_app.pyapi/apps/restful_apis/document_api.py
🚧 Files skipped from review as they are similar to previous changes (1)
- api/apps/document_app.py
There was a problem hiding this comment.
♻️ Duplicate comments (3)
api/apps/restful_apis/document_api.py (3)
1133-1135:⚠️ Potential issue | 🔴 CriticalURL still uses singular
imagewhile the handler is mounted at pluralimages— thumbnails will 404.Line 1135 rewrites thumbnails to
/api/v1/documents/image/..., butget_document_imageat line 1484 is registered as/documents/images/<image_id>.list_docson line 655 already correctly uses the plural form. Every URL returned by this endpoint will 404, defeating the whole purpose of the migration.🔧 Proposed fix
for doc_item in docs: if doc_item["thumbnail"] and not doc_item["thumbnail"].startswith(IMG_BASE64_PREFIX): - doc_item["thumbnail"] = f"/api/v1/documents/image/{doc_item['kb_id']}-{doc_item['thumbnail']}" + doc_item["thumbnail"] = f"/api/v1/documents/images/{doc_item['kb_id']}-{doc_item['thumbnail']}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 1133 - 1135, The URL rewrite for thumbnails uses the wrong singular path and will 404; update the loop that rewrites doc_item["thumbnail"] to use the plural route segment so it matches the registered handler (change "/api/v1/documents/image/..." to "/api/v1/documents/images/..."); ensure you only rewrite when thumbnail is not base64 (retain the IMG_BASE64_PREFIX check) and keep the same id construction using doc_item["kb_id"] and doc_item["thumbnail"] so the result matches what get_document_image expects (and aligns with list_docs behavior).
1484-1517:⚠️ Potential issue | 🟠 Major
get_document_imageis still unauthenticated, unlogged, and double-splitsimage_id.This was previously flagged and marked as resolved in commit 6f3bbba, but the handler in the current state still has no
@login_required, no tenant/KB authorization, and no logging on the bad-input or exception paths. Becauseimage_idis{kb_id}-{filename}, any caller who can read a thumbnail URL (now emitted unauthenticated bylist_thumbnails) can pull bytes directly out ofsettings.STORAGE_IMPLfor arbitrary buckets/objects — same IDOR risk as before.Additionally, line 1511 calls
image_id.split("-")a second time afterarrwas already produced on line 1508; just unpackarr.🔒 Suggested change
-@manager.route("/documents/images/<image_id>", methods=["GET"]) # noqa: F821 -async def get_document_image(image_id): +@manager.route("/documents/images/<image_id>", methods=["GET"]) # noqa: F821 +@login_required +@add_tenant_id_to_kwargs +async def get_document_image(tenant_id, image_id): ... try: arr = image_id.split("-") if len(arr) != 2: + logging.error(f"Invalid image_id format: {image_id}") return get_data_error_result(message="Image not found.") - bkt, nm = image_id.split("-") + bkt, nm = arr + if not KnowledgebaseService.accessible(kb_id=bkt, user_id=tenant_id): + return get_error_data_result(message="No authorization.", code=RetCode.AUTHENTICATION_ERROR) data = await thread_pool_exec(settings.STORAGE_IMPL.get, bkt, nm) response = await make_response(data) response.headers.set("Content-Type", "image/JPEG") return response except Exception as e: + logging.exception(f"get_document_image failed for {image_id}") return server_error_response(e)As per coding guidelines: "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 1484 - 1517, The get_document_image handler is unauthenticated, lacks tenant/KB authorization, logs no failures, and double-splits image_id; fix it by adding the same `@login_required` decorator and the tenant/KB authorization check used by list_thumbnails (reject if caller not authorized for the kb), replace the second split by unpacking arr into bkt,nm (arr = image_id.split("-"); if len(arr)!=2 -> log warning and return get_data_error_result), add logging for the bad-input path and for exceptions (use the module logger to log image_id and exception details before returning server_error_response), and keep using thread_pool_exec(settings.STORAGE_IMPL.get, bkt, nm) and make_response as before.
1104-1139:⚠️ Potential issue | 🔴 Critical
list_thumbnailsis still unauthenticated and unlogged — gate access and add logging.The handler still has no
@login_required, no@add_tenant_id_to_kwargs, no tenant scoping onDocumentService.get_thumbnails(doc_ids), and no logging on the bad-input or exception paths. As-is, any caller can enumerate thumbnails (and leakkb_idvalues) for arbitrarydoc_idsacross tenants by direct-object-reference, which is a regression on the rest of this module's auth posture and is the same defect previously raised on this segment.Per repo guidelines: "Add logging for new flows".
🔒 Suggested change
-@manager.route("/thumbnails", methods=["GET"]) # noqa: F821 -def list_thumbnails(): +@manager.route("/thumbnails", methods=["GET"]) # noqa: F821 +@login_required +@add_tenant_id_to_kwargs +def list_thumbnails(tenant_id): ... - from api.constants import IMG_BASE64_PREFIX - from api.db.services.document_service import DocumentService - doc_ids = request.args.getlist("doc_ids") if not doc_ids: + logging.warning("list_thumbnails called without doc_ids") return get_json_result(data=False, message='Lack of "Document ID"', code=RetCode.ARGUMENT_ERROR) try: docs = DocumentService.get_thumbnails(doc_ids) - - for doc_item in docs: + docs = [d for d in docs if KnowledgebaseService.accessible(kb_id=d["kb_id"], user_id=tenant_id)] + for doc_item in docs: if doc_item["thumbnail"] and not doc_item["thumbnail"].startswith(IMG_BASE64_PREFIX): doc_item["thumbnail"] = f"/api/v1/documents/images/{doc_item['kb_id']}-{doc_item['thumbnail']}" return get_json_result(data={d["id"]: d["thumbnail"] for d in docs}) except Exception as e: + logging.exception("list_thumbnails failed") return server_error_response(e)Note: the local re-imports of
IMG_BASE64_PREFIXandDocumentServiceon lines 1123–1124 are redundant — they're already imported at the top of the file.As per coding guidelines: "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 1104 - 1139, The list_thumbnails handler is missing authentication, tenant scoping, and logging; add the `@login_required` and `@add_tenant_id_to_kwargs` decorators to list_thumbnails, ensure you call DocumentService.get_thumbnails with the tenant context (e.g. pass tenant_id from kwargs or call a tenant-scoped method) instead of the current unauthenticated call, add logging calls (use the module logger) for the missing doc_ids branch and inside the exception handler before returning server_error_response, and remove the redundant local imports of IMG_BASE64_PREFIX and DocumentService since they exist at the top of the file; keep existing return shapes (get_json_result/RetCode) and reference the same symbols (list_thumbnails, DocumentService.get_thumbnails, IMG_BASE64_PREFIX, server_error_response) when implementing these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 1133-1135: The URL rewrite for thumbnails uses the wrong singular
path and will 404; update the loop that rewrites doc_item["thumbnail"] to use
the plural route segment so it matches the registered handler (change
"/api/v1/documents/image/..." to "/api/v1/documents/images/..."); ensure you
only rewrite when thumbnail is not base64 (retain the IMG_BASE64_PREFIX check)
and keep the same id construction using doc_item["kb_id"] and
doc_item["thumbnail"] so the result matches what get_document_image expects (and
aligns with list_docs behavior).
- Around line 1484-1517: The get_document_image handler is unauthenticated,
lacks tenant/KB authorization, logs no failures, and double-splits image_id; fix
it by adding the same `@login_required` decorator and the tenant/KB authorization
check used by list_thumbnails (reject if caller not authorized for the kb),
replace the second split by unpacking arr into bkt,nm (arr =
image_id.split("-"); if len(arr)!=2 -> log warning and return
get_data_error_result), add logging for the bad-input path and for exceptions
(use the module logger to log image_id and exception details before returning
server_error_response), and keep using
thread_pool_exec(settings.STORAGE_IMPL.get, bkt, nm) and make_response as
before.
- Around line 1104-1139: The list_thumbnails handler is missing authentication,
tenant scoping, and logging; add the `@login_required` and
`@add_tenant_id_to_kwargs` decorators to list_thumbnails, ensure you call
DocumentService.get_thumbnails with the tenant context (e.g. pass tenant_id from
kwargs or call a tenant-scoped method) instead of the current unauthenticated
call, add logging calls (use the module logger) for the missing doc_ids branch
and inside the exception handler before returning server_error_response, and
remove the redundant local imports of IMG_BASE64_PREFIX and DocumentService
since they exist at the top of the file; keep existing return shapes
(get_json_result/RetCode) and reference the same symbols (list_thumbnails,
DocumentService.get_thumbnails, IMG_BASE64_PREFIX, server_error_response) when
implementing these changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f4527cc9-97de-4c78-b758-ab6d06b058c9
📒 Files selected for processing (4)
api/apps/document_app.pyapi/apps/restful_apis/document_api.pytest/testcases/test_web_api/test_common.pyweb/src/utils/api.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- test/testcases/test_web_api/test_common.py
- api/apps/document_app.py
What problem does this PR solve?
Before migration: GET /v1/document/thumbnails
After migration: GET /api/v1/thumbnails
Type of change