Refa: unify document create flows under REST documents API by buua436 · Pull Request #14345 · infiniflow/ragflow

buua436 · 2026-04-24T03:16:47Z

What problem does this PR solve?

unify document create flows under REST documents API

Type of change

Refactoring

coderabbitai · 2026-04-24T03:16:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Consolidates document creation into dataset-scoped REST endpoint POST /api/v1/datasets/{dataset_id}/documents?type= supporting local, web, and empty modes; removes legacy app routes POST /web_crawl and POST /create; updates frontend calls, docs, and tests to use the new REST flow.

Changes

Cohort / File(s)	Summary
Removed legacy app routes `api/apps/document_app.py`	Removes legacy `POST /web_crawl` and `POST /create` handlers and related imports.
Unified REST document handler `api/apps/restful_apis/document_api.py`	Refactors `upload_document` to branch on `?type=` with helpers for `local` (multipart uploads), `web` (validate name/url, download HTML → PDF, dedupe, storage put, insert document, infer parser_id, link to KB), and `empty` (validate name, dedupe, create virtual FileType.VIRTUAL record and link to KB). Moves dataset lookup/permission checks earlier.
API documentation `docs/references/http_api_reference.md`	Adds `type` query param, documents conditional headers/body per mode, and adds curl examples for `web` and `empty` modes.
Frontend API & services `web/src/utils/api.ts`, `web/src/services/knowledge-service.ts`, `web/src/hooks/use-document-request.ts`	Switches client URLs to dataset-scoped REST endpoints, adds `createDocument` & `webCrawlDocument` service functions, removes `kb_id` from body/form, and adapts response handling for `web`/`empty`.
Tests & fixtures `test/.../test_common.py`, `test/.../test_document_app/conftest.py`, `test/.../test_create_document.py`, `test/.../test_upload_documents.py`	Adds fixture to load REST module with stubs; migrates tests from legacy app handlers to `upload_document(dataset_id=...)`; updates payloads (datasetId in URL), `request.args` (`type`), expected responses, KB stubs, and error messages.
Test helpers `test/.../test_document_app/conftest.py`	Introduces `document_rest_api_module` pytest fixture and `_StubKBRecord` to enable dict-like KB stubs with attribute access.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as REST API\n/datasets/{id}/documents
    participant Validator as Validation & Storage
    participant DB as Document DB

    rect rgba(100, 200, 100, 0.5)
    note over Client,DB: Mode = "local" (multipart file upload)
    Client->>API: POST ?type=local (multipart file)
    API->>Validator: Validate file, check duplicates
    Validator->>Validator: Infer FileType, choose storage key
    Validator->>Validator: Persist blob, generate thumbnail
    API->>DB: Insert document record
    API->>Client: Return mapped document
    end

    rect rgba(100, 150, 200, 0.5)
    note over Client,DB: Mode = "web" (HTML page crawl)
    Client->>API: POST ?type=web (form: name, url)
    API->>Validator: Validate name & URL, check duplicates
    Validator->>Validator: Download HTML, convert to PDF blob
    Validator->>Validator: Persist PDF blob, generate thumbnail
    API->>DB: Insert document record
    API->>Client: Return mapped document
    end

    rect rgba(200, 150, 100, 0.5)
    note over Client,DB: Mode = "empty" (virtual document)
    Client->>API: POST ?type=empty (JSON: name)
    API->>Validator: Validate name, check duplicates
    API->>DB: Create virtual document record (FileType.VIRTUAL, empty location)
    API->>Client: Return virtual document
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Consolidateion of document upload API #14106: Also removes legacy app-layer upload/web_crawl/create endpoints and consolidates document creation into dataset-scoped REST POST handler.
Refactor: Doc metadata update #14289: Similar refactor migrating legacy app handlers to RESTful dataset-scoped document creation APIs.

Suggested labels

☯️ refactor, 🌈 python, 🧰 typescript, 🧪 test, 🐖api

Suggested reviewers

yingfeng
yuzhichang
JinHai-CN

Poem

🐰 I hopped through routes both old and new,
I fetched a page and wrapped it into glue.
A virtual leaf, a stored PDF bright,
Tests moved their hops and docs caught the light.
Carrot-coded cheers — hop, commit, good night!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description includes the required template sections but lacks detailed context about the problem solved and the implementation approach.	Expand the description with specific details about why unifying under REST API was necessary and what benefits this refactoring provides.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately describes the main refactoring: unifying document creation flows under a REST API. It directly reflects the core changes throughout the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

web/src/hooks/use-document-request.ts (1)

520-544: ⚠️ Potential issue | 🟠 Major

Refresh the document list after a successful web crawl.

This mutation now creates a document through the unified API, but unlike useCreateDocument() it only shows a toast. The new document won't become visible until a manual refresh, and if the user is not on page 1 it should also reset pagination so the freshly created row can appear.

Suggested fix

 export const useNextWebCrawl = () => {
   const { knowledgeId } = useGetKnowledgeSearchParams();
+  const { setPaginationParams, page } = useSetPaginationParams();
+  const queryClient = useQueryClient();

   const {
     data,
     isPending: loading,
     mutateAsync,
@@
       const ret = await webCrawlDocument(knowledgeId, formData);
       const code = get(ret, 'code');
       if (code === 0) {
+        if (page === 1) {
+          queryClient.invalidateQueries({
+            queryKey: [DocumentApiAction.FetchDocumentList],
+          });
+        } else {
+          setPaginationParams();
+        }
         message.success(i18n.t('message.uploaded'));
       }

       return code;
     },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/src/hooks/use-document-request.ts` around lines 520 - 544, The web crawl
mutation in useNextWebCrawl (mutationFn using webCrawlDocument) currently only
shows a toast and returns a code, so newly created documents don't appear until
a manual refresh; update the success path to trigger the same post-create
behavior as useCreateDocument: after detecting code === 0, call the documents
list refresh and reset pagination to page 1 (e.g., invoke the documents-list
hook's mutate/refresh method or the same refresh handler used by
useCreateDocument and set currentPage = 1) so the new row becomes visible
immediately; ensure you reference useNextWebCrawl, webCrawlDocument, and the
documents list refresh/mutate function from the documents hook when implementing
this.

docs/references/http_api_reference.md (1)

1376-1439: ⚠️ Potential issue | 🟠 Major

Document the non-list response shapes for type=web and type=empty.

This section adds two new creation modes, but the only success shape documented below is still the local upload array. The backend returns a single document object for both type=web and type=empty, so API consumers following this page will parse the response incorrectly for those modes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/references/http_api_reference.md` around lines 1376 - 1439, The docs
only show the multipart/local upload response (an array) but omit that POST
/api/v1/datasets/{dataset_id}/documents with query param `type=web` and
`type=empty` returns a single document object; update the "Request" and response
examples/parameters to explicitly document the non-list response shape for
`type=web` and `type=empty` (single JSON document object), adjust the example
curl outputs and the "Request parameters" response description accordingly, and
include a brief example JSON response for both `web` and `empty` modes so
consumers parse them as an object instead of an array.

🧹 Nitpick comments (4)

api/apps/restful_apis/document_api.py (3)

336-339: Dead code: parser-override regexes can never match in the web branch.

In this branch the stored filename is always f"{name}.pdf" (line 308) before duplicate_name is applied, so the suffixes \.(ppt|pptx|pages)$ and \.(eml)$ will never match. These two re.search blocks appear to have been copied from the local upload path where arbitrary suffixes can appear. They can be dropped from the web branch.

♻️ Proposed diff

             if doc["type"] == FileType.VISUAL:
                 doc["parser_id"] = ParserType.PICTURE.value
             if doc["type"] == FileType.AURAL:
                 doc["parser_id"] = ParserType.AUDIO.value
-            if re.search(r"\.(ppt|pptx|pages)$", filename):
-                doc["parser_id"] = ParserType.PRESENTATION.value
-            if re.search(r"\.(eml)$", filename):
-                doc["parser_id"] = ParserType.EMAIL.value

Similarly, VISUAL/AURAL overrides are unreachable for PDF output — you can arguably drop those too; left conservative here in case html2pdf can emit non-PDF in some edge case.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 336 - 339, In
document_api.py, remove the unreachable parser-override regex checks that set
doc["parser_id"] for presentations and emails (the
re.search(r"\.(ppt|pptx|pages)$", filename) and re.search(r"\.(eml)$", filename)
blocks) from the web upload branch where filename is forced to f"{name}.pdf";
these conditions can never match in the web path so delete those two if-blocks
(leave other parser_id/override logic intact).

281-387: Add info-level logging for the new web and empty flows.

Both new branches only log on error paths (logging.error). Per the **/*.py: Add logging for new flows guideline, consider adding a couple of info logs at key success milestones (e.g., dataset + mode on entry, and a success log before get_result(...)) to aid operational debugging of the unified endpoint.

As per coding guidelines: "**/*.py: Add logging for new flows".

♻️ Suggested additions

     if upload_type == "web":
+        logging.info(f"upload_document web-crawl requested: dataset_id={dataset_id}")
         form = await request.form
         ...
-            DocumentService.insert(doc)
+            DocumentService.insert(doc)
+            logging.info(f"Web-crawled document created: dataset_id={dataset_id}, doc_id={doc['id']}, name={filename}")
             FileService.add_file_from_kb(doc, kb_folder["id"], kb.tenant_id)

     if upload_type == "empty":
+        logging.info(f"upload_document empty requested: dataset_id={dataset_id}")
         req = await get_request_json()
         ...
-            FileService.add_file_from_kb(doc.to_dict(), kb_folder["id"], kb.tenant_id)
+            FileService.add_file_from_kb(doc.to_dict(), kb_folder["id"], kb.tenant_id)
+            logging.info(f"Empty document created: dataset_id={dataset_id}, doc_id={doc.id}, name={name}")
             return get_result(data=map_doc_keys(doc))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 281 - 387, Add info-level
logs for both the "web" and "empty" upload branches: log dataset id and
upload_type on entry (inside the upload_type == "web" and upload_type == "empty"
blocks) and log a success message just before the successful get_result returns
(before the return that calls get_result in the "web" branch and before the
get_result in the "empty" branch). Use the existing logger (e.g., logging.info
or the module logger) and include identifying fields like dataset_id, kb.id or
current_user.id and the resulting document id/name (from doc or filename) to aid
tracing; place logs near the symbols shown (upload_type == "web", blob handling
and the return get_result(data=map_doc_keys_with_run_status(...)) in the web
flow, and around DocumentService.insert(...) and return
get_result(data=map_doc_keys(...)) in the empty flow).

298-305: Consider offloading blocking I/O to thread pool to prevent event loop stalls.

html2pdf(url) at line 298 performs a synchronous HTTP fetch via Selenium (driver.get) that can block indefinitely on an unresponsive site—the timeout parameter (default=2) only applies to DOM staleness after the page loads, not to the HTTP connection itself. The subsequent FileService.* calls execute synchronous database queries via Peewee ORM, further stalling the async handler. Since this is now the unified front door for document creation and will receive higher traffic, run both the PDF conversion and file service operations on the thread pool, mirroring the pattern already established at line 411 with await thread_pool_exec(FileService.upload_document, ...).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 298 - 305, The
synchronous html2pdf(url) call and subsequent synchronous Peewee FileService
calls (FileService.get_root_folder, FileService.init_knowledgebase_docs,
FileService.get_kb_folder, FileService.new_a_file_from_kb) must be executed off
the event loop using the thread pool; replace the direct calls with await
thread_pool_exec(html2pdf, url) for the PDF conversion and await
thread_pool_exec(FileService.get_root_folder, current_user.id), await
thread_pool_exec(FileService.init_knowledgebase_docs, root_id, current_user.id),
await thread_pool_exec(FileService.get_kb_folder, current_user.id), and await
thread_pool_exec(FileService.new_a_file_from_kb, kb.tenant_id, kb.name,
kb_root_id) (matching the existing pattern used at the other call site) so
blocking I/O runs in threads and does not stall the async handler.

test/testcases/test_web_api/test_document_app/conftest.py (1)

133-190: Extract shared fixture setup to avoid ~60 lines of duplication.

The document_rest_api_module fixture duplicates almost all of document_app_module (stubs for common, deepdoc.parser.*, xgboost, api.apps, dummy manager). Only the final module_path differs. Consider factoring the stub setup into a helper so future changes (adding a new stub, fixing a path) don't need to be mirrored in two places.

♻️ Proposed refactor sketch

def _install_document_api_stubs(monkeypatch, repo_root):
    common_pkg = ModuleType("common")
    common_pkg.__path__ = [str(repo_root / "common")]
    monkeypatch.setitem(sys.modules, "common", common_pkg)

    deepdoc_pkg = ModuleType("deepdoc")
    deepdoc_parser_pkg = ModuleType("deepdoc.parser")
    deepdoc_parser_pkg.__path__ = []

    class _Stub: pass

    deepdoc_parser_pkg.PdfParser = _Stub
    deepdoc_pkg.parser = deepdoc_parser_pkg
    monkeypatch.setitem(sys.modules, "deepdoc", deepdoc_pkg)
    monkeypatch.setitem(sys.modules, "deepdoc.parser", deepdoc_parser_pkg)

    for modname, attr in [
        ("deepdoc.parser.excel_parser", "RAGFlowExcelParser"),
        ("deepdoc.parser.html_parser", "RAGFlowHtmlParser"),
        ("deepdoc.parser.mineru_parser", "MinerUParser"),
        ("deepdoc.parser.paddleocr_parser", "PaddleOCRParser"),
    ]:
        m = ModuleType(modname); setattr(m, attr, _Stub)
        monkeypatch.setitem(sys.modules, modname, m)

    monkeypatch.setitem(sys.modules, "xgboost", ModuleType("xgboost"))

    stub_apps = ModuleType("api.apps")
    stub_apps.current_user = SimpleNamespace(id="user-1")
    stub_apps.login_required = lambda func: func
    monkeypatch.setitem(sys.modules, "api.apps", stub_apps)


def _load_module(name, module_path):
    spec = importlib.util.spec_from_file_location(name, module_path)
    module = importlib.util.module_from_spec(spec)
    module.manager = _DummyManager()
    spec.loader.exec_module(module)
    return module


`@pytest.fixture`()
def document_app_module(monkeypatch):
    repo_root = Path(__file__).resolve().parents[4]
    _install_document_api_stubs(monkeypatch, repo_root)
    return _load_module("test_document_app_unit", repo_root / "api" / "apps" / "document_app.py")


`@pytest.fixture`()
def document_rest_api_module(monkeypatch):
    repo_root = Path(__file__).resolve().parents[4]
    _install_document_api_stubs(monkeypatch, repo_root)
    return _load_module(
        "test_document_api_unit",
        repo_root / "api" / "apps" / "restful_apis" / "document_api.py",
    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/conftest.py` around lines 133 -
190, The document_rest_api_module fixture duplicates almost all setup from
document_app_module; extract the shared monkeypatch/stub logic into a helper
(e.g., _install_document_api_stubs) that creates the ModuleType stubs (common,
deepdoc.parser.*, xgboost, api.apps) and a loader helper (e.g., _load_module)
that creates a spec, sets module.manager = _DummyManager and execs the module;
then simplify document_app_module and document_rest_api_module to call
_install_document_api_stubs(monkeypatch, repo_root) and return _load_module(...)
for their respective module paths, keeping the existing symbols ModuleType,
_DummyManager, document_app_module and document_rest_api_module names so tests
continue to find the same fixtures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/testcases/test_web_api/test_document_app/test_create_document.py`:
- Around line 139-140: The assertion in test_create_document.py currently
expects "Can't find this dataset!" but the API handler (in
api/apps/restful_apis/document_api.py) returns a message formatted as "Can't
find the dataset with ID {dataset_id}!", so update the test in
test_create_document.py (the failing assertions for res["code"] and
res["message"]) to either assert the exact message using the dataset_id used in
the test (match "Can't find the dataset with ID {dataset_id}!") or assert using
a substring check (e.g., verify "Can't find the dataset" in res["message"])
similar to the pattern used in test_upload_documents.py::test_invalid_kb_id.
- Around line 104-127: The tests fail because upload_document calls
KnowledgebaseService.get_by_id and check_kb_team_permission before filename
validation; add the same mocks used in other tests so the handler proceeds to
name checks: in both test_filename_too_long and test_filename_whitespace
monkeypatch module.KnowledgebaseService.get_by_id to return a successful (True,
<kb object>) response and monkeypatch module.check_kb_team_permission to return
True (or the expected success tuple), mirroring the stubs used in
test_duplicate_name/test_success so the filename validation in upload_document
runs and returns code 101.

In `@test/testcases/test_web_api/test_document_app/test_upload_documents.py`:
- Around line 451-460: The test_invalid_url currently only monkeypatches request
so the real KnowledgebaseService.get_by_id and check_kb_team_permission run and
block the test; update the test to also stub KnowledgebaseService.get_by_id
(return a dummy kb object for "kb1") and check_kb_team_permission (return True
or a permitted response) on the module under test before calling
module.upload_document(dataset_id="kb1") so execution reaches the upload_type ==
"web" branch and triggers is_valid_url error handling; reference the test
function name test_invalid_url and the target function upload_document to locate
where to add these monkeypatches.

---

Outside diff comments:
In `@docs/references/http_api_reference.md`:
- Around line 1376-1439: The docs only show the multipart/local upload response
(an array) but omit that POST /api/v1/datasets/{dataset_id}/documents with query
param `type=web` and `type=empty` returns a single document object; update the
"Request" and response examples/parameters to explicitly document the non-list
response shape for `type=web` and `type=empty` (single JSON document object),
adjust the example curl outputs and the "Request parameters" response
description accordingly, and include a brief example JSON response for both
`web` and `empty` modes so consumers parse them as an object instead of an
array.

In `@web/src/hooks/use-document-request.ts`:
- Around line 520-544: The web crawl mutation in useNextWebCrawl (mutationFn
using webCrawlDocument) currently only shows a toast and returns a code, so
newly created documents don't appear until a manual refresh; update the success
path to trigger the same post-create behavior as useCreateDocument: after
detecting code === 0, call the documents list refresh and reset pagination to
page 1 (e.g., invoke the documents-list hook's mutate/refresh method or the same
refresh handler used by useCreateDocument and set currentPage = 1) so the new
row becomes visible immediately; ensure you reference useNextWebCrawl,
webCrawlDocument, and the documents list refresh/mutate function from the
documents hook when implementing this.

---

Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 336-339: In document_api.py, remove the unreachable
parser-override regex checks that set doc["parser_id"] for presentations and
emails (the re.search(r"\.(ppt|pptx|pages)$", filename) and
re.search(r"\.(eml)$", filename) blocks) from the web upload branch where
filename is forced to f"{name}.pdf"; these conditions can never match in the web
path so delete those two if-blocks (leave other parser_id/override logic
intact).
- Around line 281-387: Add info-level logs for both the "web" and "empty" upload
branches: log dataset id and upload_type on entry (inside the upload_type ==
"web" and upload_type == "empty" blocks) and log a success message just before
the successful get_result returns (before the return that calls get_result in
the "web" branch and before the get_result in the "empty" branch). Use the
existing logger (e.g., logging.info or the module logger) and include
identifying fields like dataset_id, kb.id or current_user.id and the resulting
document id/name (from doc or filename) to aid tracing; place logs near the
symbols shown (upload_type == "web", blob handling and the return
get_result(data=map_doc_keys_with_run_status(...)) in the web flow, and around
DocumentService.insert(...) and return get_result(data=map_doc_keys(...)) in the
empty flow).
- Around line 298-305: The synchronous html2pdf(url) call and subsequent
synchronous Peewee FileService calls (FileService.get_root_folder,
FileService.init_knowledgebase_docs, FileService.get_kb_folder,
FileService.new_a_file_from_kb) must be executed off the event loop using the
thread pool; replace the direct calls with await thread_pool_exec(html2pdf, url)
for the PDF conversion and await thread_pool_exec(FileService.get_root_folder,
current_user.id), await thread_pool_exec(FileService.init_knowledgebase_docs,
root_id, current_user.id), await thread_pool_exec(FileService.get_kb_folder,
current_user.id), and await thread_pool_exec(FileService.new_a_file_from_kb,
kb.tenant_id, kb.name, kb_root_id) (matching the existing pattern used at the
other call site) so blocking I/O runs in threads and does not stall the async
handler.

In `@test/testcases/test_web_api/test_document_app/conftest.py`:
- Around line 133-190: The document_rest_api_module fixture duplicates almost
all setup from document_app_module; extract the shared monkeypatch/stub logic
into a helper (e.g., _install_document_api_stubs) that creates the ModuleType
stubs (common, deepdoc.parser.*, xgboost, api.apps) and a loader helper (e.g.,
_load_module) that creates a spec, sets module.manager = _DummyManager and execs
the module; then simplify document_app_module and document_rest_api_module to
call _install_document_api_stubs(monkeypatch, repo_root) and return
_load_module(...) for their respective module paths, keeping the existing
symbols ModuleType, _DummyManager, document_app_module and
document_rest_api_module names so tests continue to find the same fixtures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 480ce29e-8d2c-4a4d-adc7-e73d3a96fe9f

📥 Commits

Reviewing files that changed from the base of the PR and between 199fbce and 95bff61.

📒 Files selected for processing (10)

api/apps/document_app.py
api/apps/restful_apis/document_api.py
docs/references/http_api_reference.md
test/testcases/test_web_api/test_common.py
test/testcases/test_web_api/test_document_app/conftest.py
test/testcases/test_web_api/test_document_app/test_create_document.py
test/testcases/test_web_api/test_document_app/test_upload_documents.py
web/src/hooks/use-document-request.ts
web/src/services/knowledge-service.ts
web/src/utils/api.ts

coderabbitai · 2026-04-24T03:28:18Z

+    def test_invalid_url(self, document_rest_api_module, monkeypatch):
+        module = document_rest_api_module
+        monkeypatch.setattr(
+            module,
+            "request",
+            _DummyRequest(form={"name": "doc", "url": "not-a-url"}, args={"type": "web"}),
+        )
+        res = _run(module.upload_document(dataset_id="kb1"))
        assert res["code"] == 101
        assert res["message"] == "The URL format is invalid"


⚠️ Potential issue | 🟠 Major

test_invalid_url unlikely to reach URL validation — KB/permission gate now sits in front of it.

In the unified handler (see context snippet 1 at document_api.py:271-279), KnowledgebaseService.get_by_id(dataset_id) and check_kb_team_permission(...) execute before the if upload_type == "web": branch that houses the is_valid_url check. This test only monkeypatches request, so the real KnowledgebaseService.get_by_id("kb1") runs and will typically short-circuit with "Can't find the dataset with ID kb1!", not "The URL format is invalid".

Other web-mode tests in this class (test_invalid_kb_id, test_no_permission, test_download_failure, …) correctly stub both. Please align this one.

🐛 Suggested fix

def test_invalid_url(self, document_rest_api_module, monkeypatch): module = document_rest_api_module + kb = SimpleNamespace(id="kb1", tenant_id="tenant1", name="kb", parser_id="parser", pipeline_id="pipe", parser_config={}) + monkeypatch.setattr(module.KnowledgebaseService, "get_by_id", lambda _kb_id: (True, kb)) + monkeypatch.setattr(module, "check_kb_team_permission", lambda *_a, **_k: True) monkeypatch.setattr( module, "request", _DummyRequest(form={"name": "doc", "url": "not-a-url"}, args={"type": "web"}), ) res = _run(module.upload_document(dataset_id="kb1")) assert res["code"] == 101 assert res["message"] == "The URL format is invalid"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/testcases/test_web_api/test_document_app/test_upload_documents.py` around lines 451 - 460, The test_invalid_url currently only monkeypatches request so the real KnowledgebaseService.get_by_id and check_kb_team_permission run and block the test; update the test to also stub KnowledgebaseService.get_by_id (return a dummy kb object for "kb1") and check_kb_team_permission (return True or a permitted response) on the module under test before calling module.upload_document(dataset_id="kb1") so execution reaches the upload_type == "web" branch and triggers is_valid_url error handling; reference the test function name test_invalid_url and the target function upload_document to locate where to add these monkeypatches.

codecov · 2026-04-24T04:04:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.30%. Comparing base (4dcc42e) to head (00adce5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #14345   +/-   ##
=======================================
  Coverage   95.30%   95.30%           
=======================================
  Files          10       10           
  Lines         703      703           
  Branches      112      112           
=======================================
  Hits          670      670           
  Misses         16       16           
  Partials       17       17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/testcases/test_web_api/test_document_app/test_create_document.py (1)
16-26: ⚠️ Potential issue | 🔴 Critical

Missing SimpleNamespace import — all unit tests will fail with NameError.

SimpleNamespace is referenced on lines 119, 131, 144, 159, 175, 192, and 222 but is never imported. Every test in TestDocumentCreateUnit will raise NameError: name 'SimpleNamespace' is not defined at collection/execution time.
🐛 Proposed fix
 import asyncio
 import string
 from concurrent.futures import ThreadPoolExecutor, as_completed
+from types import SimpleNamespace

 import pytest
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/testcases/test_web_api/test_document_app/test_create_document.py` around
lines 16 - 26, The tests fail because SimpleNamespace is used in
TestDocumentCreateUnit but not imported; add the missing import by importing
SimpleNamespace from the types module (e.g., add "from types import
SimpleNamespace") alongside the other top-level imports so references in
TestDocumentCreateUnit (lines where SimpleNamespace is used) resolve correctly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@test/testcases/test_web_api/test_document_app/test_create_document.py`:
- Around line 16-26: The tests fail because SimpleNamespace is used in
TestDocumentCreateUnit but not imported; add the missing import by importing
SimpleNamespace from the types module (e.g., add "from types import
SimpleNamespace") alongside the other top-level imports so references in
TestDocumentCreateUnit (lines where SimpleNamespace is used) resolve correctly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 085daaee-dac7-400f-9651-15e1d9dbbff1

📥 Commits

Reviewing files that changed from the base of the PR and between ccd5722 and bc9355c.

📒 Files selected for processing (2)

test/testcases/test_web_api/test_document_app/conftest.py
test/testcases/test_web_api/test_document_app/test_create_document.py

🚧 Files skipped from review as they are similar to previous changes (1)

test/testcases/test_web_api/test_document_app/conftest.py

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

api/apps/restful_apis/document_api.py (1)
296-403: Add logging to the new web/empty upload flows.

The new helpers have no logging.* calls, while _upload_local_documents logs validation failures (No file part!, No file selected!, etc.) and the existing upload_document dispatcher logs its own errors. Please add parity logging for validation errors (missing name/url, length overflow, invalid URL, duplicate name) and for successful creations so these flows are observable.

As per coding guidelines, "Add logging for new flows" for **/*.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/restful_apis/document_api.py` around lines 296 - 403, Add logging
calls to the new upload flows: in _upload_web_document and
_upload_empty_document, log validation failures (missing name, missing url, name
too long, invalid URL, duplicate name) at warning level and log successful
document creation (after DocumentService.insert / FileService.add_file_from_kb
completes) at info level including the doc id/name/kb id; use the module logger
(e.g., logging.getLogger(__name__)) consistent with other handlers and place
logs immediately where current get_error_data_result or return get_result calls
happen to mirror _upload_local_documents behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 322-360: The blob can be orphaned if STORAGE_IMPL.put succeeds but
later DocumentService.insert or FileService.add_file_from_kb fails, and
unsupported-type RuntimeError is being turned into a 500 by the broad except; to
fix: (1) change the flow in the function so you either insert the DB record
(DocumentService.insert) before calling STORAGE_IMPL.put, or keep the current
order but wrap the DB calls in a try/except that on any failure calls
settings.STORAGE_IMPL.rm(dataset_id, location) to delete the uploaded blob; (2)
replace the thrown RuntimeError for unsupported file types with an explicit
early return using get_error_data_result(..., code=RetCode.ARGUMENT_ERROR) (or
catch RuntimeError specifically and map it to get_error_data_result) so clients
receive an argument error instead of server_error_response; update references to
STORAGE_IMPL.put, settings.STORAGE_IMPL.rm, DocumentService.insert,
FileService.add_file_from_kb, RuntimeError, server_error_response and
get_error_data_result/RetCode.ARGUMENT_ERROR accordingly.
- Around line 313-315: The handler currently calls html2pdf(url) synchronously
which blocks the async event loop; update the async handler to run html2pdf via
the existing thread_pool_exec helper (same pattern as _upload_local_documents)
and await its result, and likewise offload the blocking STORAGE_IMPL.obj_exist
and STORAGE_IMPL.put calls to thread_pool_exec so all CPU/IO-heavy work is
executed on the thread pool; locate the synchronous calls to html2pdf,
STORAGE_IMPL.obj_exist and STORAGE_IMPL.put in the async function and wrap them
with thread_pool_exec (await the returned futures) to avoid blocking the Quart
event loop.
- Around line 317-320: The KB folder resolution is inconsistent:
_upload_web_document calls FileService.get_root_folder,
FileService.init_knowledgebase_docs, and FileService.get_kb_folder with
current_user.id but then calls FileService.new_a_file_from_kb with kb.tenant_id,
causing wrong folder linkage for team members; update _upload_web_document to
use kb.tenant_id for all folder-resolution calls (replace current_user.id with
kb.tenant_id when invoking get_root_folder, init_knowledgebase_docs, and
get_kb_folder) so its behavior matches _upload_empty_document and ensures the
created file is placed under the KB owner’s folders.
- Around line 385-399: The DocumentService.insert call is storing the enum
member FileType.VIRTUAL instead of its string value; update the inserted dict so
the "type" field uses FileType.VIRTUAL.value to match the rest of the codebase
(see usage patterns in document_service and file_service) so comparisons and
stored values remain consistent.
- Around line 347-354: The conditional block that sets doc["parser_id"] for
FileType.VISUAL, FileType.AURAL and regex checks for .ppt/.pptx/.pages/.eml is
dead because filename is always constructed as f"{name}.pdf" and doc["type"]
will always be FileType.PDF; remove this entire block (the if branches
referencing doc["type"], FileType.VISUAL/FileType.AURAL, and the re.search
checks) so the default kb.parser_id remains in effect (see references to
filename construction and doc["type"] and parser_id). If you actually need to
handle presentations/emails for crawled pages, change the logic to recover the
original source filename from the URL and run the type checks against that
source filename instead of the generated f"{name}.pdf".

---

Nitpick comments:
In `@api/apps/restful_apis/document_api.py`:
- Around line 296-403: Add logging calls to the new upload flows: in
_upload_web_document and _upload_empty_document, log validation failures
(missing name, missing url, name too long, invalid URL, duplicate name) at
warning level and log successful document creation (after DocumentService.insert
/ FileService.add_file_from_kb completes) at info level including the doc
id/name/kb id; use the module logger (e.g., logging.getLogger(__name__))
consistent with other handlers and place logs immediately where current
get_error_data_result or return get_result calls happen to mirror
_upload_local_documents behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d3d4fdd0-a072-45cc-a2a4-7de059e28fef

📥 Commits

Reviewing files that changed from the base of the PR and between 80e8f9c and 241ea0e.

📒 Files selected for processing (1)

api/apps/restful_apis/document_api.py

coderabbitai · 2026-04-24T08:34:39Z

+    blob = html2pdf(url)
+    if not blob:
+        return server_error_response(ValueError("Download failure."))


⚠️ Potential issue | 🟠 Major

Blocking Selenium call on the async event loop.

html2pdf(url) launches headless Chrome via Selenium (see api/utils/web_utils.py:157-188) and can take many seconds. Running it synchronously inside this async handler blocks the Quart event loop for every concurrent request. _upload_local_documents already offloads heavy work via thread_pool_exec — do the same here (and ideally also for the subsequent STORAGE_IMPL.obj_exist / STORAGE_IMPL.put which are blocking I/O).

🛠️ Suggested change

- blob = html2pdf(url) + blob = await thread_pool_exec(html2pdf, url) if not blob: return server_error_response(ValueError("Download failure."))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

blob = html2pdf(url)

if not blob:

return server_error_response(ValueError("Download failure."))

blob = await thread_pool_exec(html2pdf, url)

if not blob:

return server_error_response(ValueError("Download failure."))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 313 - 315, The handler currently calls html2pdf(url) synchronously which blocks the async event loop; update the async handler to run html2pdf via the existing thread_pool_exec helper (same pattern as _upload_local_documents) and await its result, and likewise offload the blocking STORAGE_IMPL.obj_exist and STORAGE_IMPL.put calls to thread_pool_exec so all CPU/IO-heavy work is executed on the thread pool; locate the synchronous calls to html2pdf, STORAGE_IMPL.obj_exist and STORAGE_IMPL.put in the async function and wrap them with thread_pool_exec (await the returned futures) to avoid blocking the Quart event loop.

coderabbitai · 2026-04-24T08:34:39Z

+    try:
+        filename = duplicate_name(DocumentService.query, name=f"{name}.pdf", kb_id=kb.id)
+        filetype = filename_type(filename)
+        if filetype == FileType.OTHER.value:
+            raise RuntimeError("This type of file has not been supported yet!")
+
+        location = filename
+        while settings.STORAGE_IMPL.obj_exist(dataset_id, location):
+            location += "_"
+        settings.STORAGE_IMPL.put(dataset_id, location, blob)
+
+        doc = {
+            "id": get_uuid(),
+            "kb_id": kb.id,
+            "parser_id": kb.parser_id,
+            "pipeline_id": kb.pipeline_id,
+            "parser_config": kb.parser_config,
+            "created_by": current_user.id,
+            "type": filetype,
+            "name": filename,
+            "location": location,
+            "size": len(blob),
+            "thumbnail": thumbnail(filename, blob),
+            "suffix": Path(filename).suffix.lstrip("."),
+        }
+        if doc["type"] == FileType.VISUAL:
+            doc["parser_id"] = ParserType.PICTURE.value
+        if doc["type"] == FileType.AURAL:
+            doc["parser_id"] = ParserType.AUDIO.value
+        if re.search(r"\.(ppt|pptx|pages)$", filename):
+            doc["parser_id"] = ParserType.PRESENTATION.value
+        if re.search(r"\.(eml)$", filename):
+            doc["parser_id"] = ParserType.EMAIL.value
+
+        DocumentService.insert(doc)
+        FileService.add_file_from_kb(doc, kb_folder["id"], kb.tenant_id)
+        return get_result(data=map_doc_keys_with_run_status(doc, run_status="0"))
+    except Exception as e:
+        return server_error_response(e)


⚠️ Potential issue | 🟡 Minor

Orphaned storage blob on partial failure + wrong error code for unsupported type.

Two smaller correctness/UX points inside the try block:

If STORAGE_IMPL.put (line 331) succeeds but DocumentService.insert / add_file_from_kb later fails, the blob stays in object storage with no DB record referencing it. Consider ordering the DB insert before the storage write, or adding a compensating STORAGE_IMPL.rm(dataset_id, location) in an exception path.

RuntimeError("This type of file has not been supported yet!") (line 326) is a client-input error, but the bare except at line 359 maps it to server_error_response (HTTP 500). Return a get_error_data_result(..., code=RetCode.ARGUMENT_ERROR) for that specific case so clients can distinguish it from server faults. (This case becomes unreachable if you drop the dead parser_id block suggested above — in that case this is moot.)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 322 - 360, The blob can be orphaned if STORAGE_IMPL.put succeeds but later DocumentService.insert or FileService.add_file_from_kb fails, and unsupported-type RuntimeError is being turned into a 500 by the broad except; to fix: (1) change the flow in the function so you either insert the DB record (DocumentService.insert) before calling STORAGE_IMPL.put, or keep the current order but wrap the DB calls in a try/except that on any failure calls settings.STORAGE_IMPL.rm(dataset_id, location) to delete the uploaded blob; (2) replace the thrown RuntimeError for unsupported file types with an explicit early return using get_error_data_result(..., code=RetCode.ARGUMENT_ERROR) (or catch RuntimeError specifically and map it to get_error_data_result) so clients receive an argument error instead of server_error_response; update references to STORAGE_IMPL.put, settings.STORAGE_IMPL.rm, DocumentService.insert, FileService.add_file_from_kb, RuntimeError, server_error_response and get_error_data_result/RetCode.ARGUMENT_ERROR accordingly.

coderabbitai · 2026-04-24T08:34:39Z

+        if doc["type"] == FileType.VISUAL:
+            doc["parser_id"] = ParserType.PICTURE.value
+        if doc["type"] == FileType.AURAL:
+            doc["parser_id"] = ParserType.AUDIO.value
+        if re.search(r"\.(ppt|pptx|pages)$", filename):
+            doc["parser_id"] = ParserType.PRESENTATION.value
+        if re.search(r"\.(eml)$", filename):
+            doc["parser_id"] = ParserType.EMAIL.value


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect FileType enum definition and filename_type return type ast-grep --pattern $'class FileType($_): $$$' rg -nP --type=py -C3 '\bdef\s+filename_type\s*\('

Repository: infiniflow/ragflow

Length of output: 824

🏁 Script executed:

cd api/apps/restful_apis && sed -n '320,360p' document_api.py | cat -n

Repository: infiniflow/ragflow

Length of output: 2058

🏁 Script executed:

rg -n "def filename_type" api/utils/file_utils.py -A 15

Repository: infiniflow/ragflow

Length of output: 827

🏁 Script executed:

grep -n "FileType" api/apps/restful_apis/document_api.py | head -20

Repository: infiniflow/ragflow

Length of output: 299

🏁 Script executed:

sed -n '58,85p' api/utils/file_utils.py | cat -n

Repository: infiniflow/ragflow

Length of output: 1404

🏁 Script executed:

rg "new_kb_from_web_page" api/apps/restful_apis/document_api.py -B 5

Repository: infiniflow/ragflow

Length of output: 44

Remove this dead code block — filename is always .pdf, making these branches unreachable.

The filename is constructed as f"{name}.pdf" (line 323), so filename_type(filename) always returns FileType.PDF.value. This means:

doc["type"] is always "pdf" and can never equal FileType.VISUAL or FileType.AURAL

The regex patterns checking for .pptx and .eml can never match a .pdf filename

These overrides are dead code and should be removed entirely. The default kb.parser_id set on line 336 is what actually applies here.

If presentation/email handling is needed for crawled pages, recover the original source filename from the URL and run the type checks against that instead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 347 - 354, The conditional block that sets doc["parser_id"] for FileType.VISUAL, FileType.AURAL and regex checks for .ppt/.pptx/.pages/.eml is dead because filename is always constructed as f"{name}.pdf" and doc["type"] will always be FileType.PDF; remove this entire block (the if branches referencing doc["type"], FileType.VISUAL/FileType.AURAL, and the re.search checks) so the default kb.parser_id remains in effect (see references to filename construction and doc["type"] and parser_id). If you actually need to handle presentations/emails for crawled pages, change the logic to recover the original source filename from the URL and run the type checks against that source filename instead of the generated f"{name}.pdf".

coderabbitai · 2026-04-24T08:34:39Z

+        doc = DocumentService.insert(
+            {
+                "id": get_uuid(),
+                "kb_id": kb.id,
+                "parser_id": kb.parser_id,
+                "pipeline_id": kb.pipeline_id,
+                "parser_config": kb.parser_config,
+                "created_by": current_user.id,
+                "type": FileType.VIRTUAL,
+                "name": name,
+                "suffix": Path(name).suffix.lstrip("."),
+                "location": "",
+                "size": 0,
+            }
+        )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm how `type` is persisted / compared for documents across the codebase ast-grep --pattern $'class FileType($_): $$$' rg -nP --type=py -C2 '"type"\s*:\s*FileType\.[A-Z_]+\b' rg -nP --type=py -C2 '\btype\s*==\s*FileType\.[A-Z_]+\.value\b'

Repository: infiniflow/ragflow

Length of output: 7291

Store FileType.VIRTUAL.value, not the enum member.

FileType is a StrEnum, so enum members are strings internally. However, throughout the codebase—in file inserts (user_account_service.py:88, file_service.py:205, file_api_service.py:326) and all type comparisons (document_service.py:256/532/553, file_service.py:396, file2document_api.py:103, file_api_service.py:236/255/313, file_api.py:294, document_app.py:227)—the pattern is consistently to use .value. This location is the only exception and should align with the standard pattern.

Suggested change

- "type": FileType.VIRTUAL, + "type": FileType.VIRTUAL.value,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

doc = DocumentService.insert(

{

"id": get_uuid(),

"kb_id": kb.id,

"parser_id": kb.parser_id,

"pipeline_id": kb.pipeline_id,

"parser_config": kb.parser_config,

"created_by": current_user.id,

"type": FileType.VIRTUAL,

"name": name,

"suffix": Path(name).suffix.lstrip("."),

"location": "",

"size": 0,

}

)

doc = DocumentService.insert(

{

"id": get_uuid(),

"kb_id": kb.id,

"parser_id": kb.parser_id,

"pipeline_id": kb.pipeline_id,

"parser_config": kb.parser_config,

"created_by": current_user.id,

"type": FileType.VIRTUAL.value,

"name": name,

"suffix": Path(name).suffix.lstrip("."),

"location": "",

"size": 0,

}

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@api/apps/restful_apis/document_api.py` around lines 385 - 399, The DocumentService.insert call is storing the enum member FileType.VIRTUAL instead of its string value; update the inserted dict so the "type" field uses FileType.VIRTUAL.value to match the rest of the codebase (see usage patterns in document_service and file_service) so comparisons and stored values remain consistent.

xugangqiang · 2026-04-24T08:54:36Z

LGTM

buua436 added 2 commits April 24, 2026 11:08

Refa：unify document create flows under REST documents API

3ebc804

Refa：remove legacy document create endpoints

95bff61

buua436 assigned buua436 and unassigned buua436 Apr 24, 2026

buua436 added the ci Continue Integration label Apr 24, 2026

buua436 marked this pull request as ready for review April 24, 2026 03:17

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

buua436 added 2 commits April 24, 2026 13:09

Refa：fix document REST test stubs and assertions

ccd5722

Refa：fix document REST unit test stubs

bc9355c

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

fix: add SimpleNamespace import to test_create_document.py

80e8f9c

wangq8 changed the title ~~Refa：unify document create flows under REST documents API~~ Refa : unify document create flows under REST documents API Apr 24, 2026

refactor: split document upload branches

241ea0e

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

xugangqiang reviewed Apr 24, 2026

View reviewed changes

Comment thread api/apps/restful_apis/document_api.py

buua436 added 2 commits April 24, 2026 16:46

refactor: pass tenant id through document upload helpers

77f3d8d

refactor: rename document upload helper arg

ed7396c

trigger CI

011d619

buua436 changed the title ~~Refa : unify document create flows under REST documents API~~ Refa: unify document create flows under REST documents API Apr 24, 2026

buua436 added 5 commits April 24, 2026 18:50

trigger CI

fe6c581

Merge branch 'main' into r008

2839145

Remove unused import of get_uuid from document_app.py

5d111f0

Merge upstream/main into r008

3992d40

Merge origin/r008 into r008

00adce5

wangq8 merged commit a9e5724 into infiniflow:main Apr 27, 2026
2 checks passed

coderabbitai Bot mentioned this pull request Apr 27, 2026

Refactor: migrate document thumbnails API #14344

Merged

1 task

This was referenced Apr 27, 2026

Refactor: migrate doc upload info used in chat #14359

Merged

Refactor: deco document upload_and_parse API #14366

Merged

coderabbitai Bot mentioned this pull request May 8, 2026

Fix: private dataset authorization bypass in shared dataset access checks #14645

Merged

6 tasks

Conversation

buua436 commented Apr 24, 2026

What problem does this PR solve?

Type of change

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xugangqiang commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

codecov Bot commented Apr 24, 2026 •

edited

Loading