fix(dify): guard retrieval argument error behavior by Achieve3318 · Pull Request #14169 · infiniflow/ragflow

Achieve3318 · 2026-04-16T13:47:02Z

What problem does this PR solve?

The Dify-compatible /dify/retrieval endpoint recently gained stricter parsing and validation for its request payload, including:

Normalized retrieval_setting.top_k and retrieval_setting.score_threshold types.
Clear separation between malformed arguments vs missing required fields.
Previously, there was no unit test explicitly guarding the exact error code and message contract for these cases.

What does this PR change?

Add guard-style unit test in test_dify_retrieval_routes_unit.py:
- test_retrieval_argument_error_messages:
  - Sends a request with malformed numeric options:
    - retrieval_setting = {"top_k": "not-int", "score_threshold": "not-float"}
    - Asserts code == RetCode.ARGUMENT_ERROR and message contains
      "invalid or malformed arguments:".
  - Sends a request with required fields missing:
    - Empty payload ({})
    - Asserts code == RetCode.ARGUMENT_ERROR and message contains
      "required arguments are missing:".

This test encodes the intended behavior of the Dify retrieval API so future refactors cannot silently regress error handling.

Type of change

Tests (add coverage and guardrails for existing behavior)

coderabbitai · 2026-04-16T13:47:28Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 269220e3-a13f-4201-90f8-8566ddfa93bc

📥 Commits

Reviewing files that changed from the base of the PR and between 4b49a2a and 1bccae4.

📒 Files selected for processing (2)

api/apps/sdk/dify_retrieval.py
test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py

🚧 Files skipped from review as they are similar to previous changes (2)

test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py
api/apps/sdk/dify_retrieval.py

📝 Walkthrough

Walkthrough

The PR expands the /dify/retrieval endpoint to accept GET and POST, adds GET query-parameter normalization, introduces typed retrieval-option parsing/validation, removes decorator-based validation in favor of in-handler checks, and adds unit tests covering parsing and argument-error cases.

Changes

Cohort / File(s)	Summary
API Endpoint `api/apps/sdk/dify_retrieval.py`	Endpoint methods changed to support GET and POST; removed `validate_request` decorator; added `_read_retrieval_request()` to normalize GET query params into a request dict; added `_parse_retrieval_options()` to coerce/validate `retrieval_setting` into `similarity_threshold` (float) and `top` (int); explicit in-handler argument error handling and Quart/Werkzeug bad-request compatibility; updated OpenAPI-style docstring.
Unit Tests `test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py`	Added four `pytest.mark.p2` tests verifying GET query parsing/type coercion, POST JSON passthrough, and multiple `RetCode.ARGUMENT_ERROR` cases (malformed numeric values, missing `knowledge_id`/`query`, invalid `retrieval_setting` type).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble query strings and JSON delight,
GET or POST, I fetch knowledge just right.
I tidy the numbers, I guard every key,
If formats go sideways, I warn gently — teehee! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change—adding guard-style tests for the retrieval endpoint's argument error behavior—making it specific and relevant to the changeset.
Description check	✅ Passed	The description follows the template with all required sections complete: it explains the problem (missing test coverage for error handling), lists changes (new test cases), and specifies the change type (Tests).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

api/apps/sdk/dify_retrieval.py (2)

224-230: Minor: Unused retrieval_setting variable from tuple unpacking.

The retrieval_setting variable returned by _parse_retrieval_options is unpacked but never used afterward. Consider using _ for the unused value.

♻️ Minor cleanup

     try:
-        retrieval_setting, similarity_threshold, top = _parse_retrieval_options(req.get("retrieval_setting", {}))
+        _, similarity_threshold, top = _parse_retrieval_options(req.get("retrieval_setting", {}))
     except ValueError as e:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/apps/sdk/dify_retrieval.py` around lines 224 - 230, The tuple unpacking
assigns an unused retrieval_setting from _parse_retrieval_options; change the
unpack to use a throwaway name (e.g., `_`) instead of `retrieval_setting` so it
reads `_ , similarity_threshold, top =
_parse_retrieval_options(req.get("retrieval_setting", {}))`; update the
unpacking at the try block that calls `_parse_retrieval_options` to avoid the
unused variable warning while keeping `similarity_threshold` and `top` intact.

40-78: Consider adding debug logging for POST requests as well.

The function logs GET request normalization details (lines 62-69), but POST requests pass through without equivalent logging. Per coding guidelines, new flows should include logging.

📝 Suggested enhancement for consistency

     return req
-    return await get_request_json()
+    req = await get_request_json()
+    logger.debug(
+        "Dify retrieval POST payload: knowledge_id=%s query_len=%s use_kg=%s",
+        req.get("knowledge_id"),
+        len(req.get("query", "")) if isinstance(req.get("query"), str) else 0,
+        req.get("use_kg"),
+    )
+    return req

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@api/apps/sdk/dify_retrieval.py` around lines 40 - 78, The POST path in
_read_retrieval_request lacks the same normalization debug log as the GET path;
update the POST flow to await get_request_json(), extract knowledge_id, query,
use_kg and retrieval_setting from the returned dict (using the same keys as the
GET branch), compute safe_query (e.g., "len=N" when query is a str), and emit
the same logger.debug call (with knowledge_id, safe_query, use_kg,
retrieval_setting.get("top_k"), retrieval_setting.get("score_threshold")) before
returning the req so POST requests are logged consistently.

test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py (1)

394-417: Consider adding edge case tests for more comprehensive coverage.

The guard test covers the primary cases well. For stronger protection, consider adding:

Partial missing fields (e.g., {"knowledge_id": "kb-1"} without query)
Invalid retrieval_setting type (e.g., {"retrieval_setting": "not-a-dict"})

📝 Additional test cases

# Case 3: partial required fields
_set_request_json(monkeypatch, module, {"knowledge_id": "kb-1"})
res_partial = _run(inspect.unwrap(module.retrieval)("tenant-1"))
assert res_partial["code"] == module.RetCode.ARGUMENT_ERROR
assert "query" in res_partial["message"]

# Case 4: invalid retrieval_setting type
_set_request_json(monkeypatch, module, {
    "knowledge_id": "kb-1",
    "query": "hello",
    "retrieval_setting": "not-a-dict"
})
res_bad_type = _run(inspect.unwrap(module.retrieval)("tenant-1"))
assert res_bad_type["code"] == module.RetCode.ARGUMENT_ERROR
assert "retrieval_setting must be an object" in res_bad_type["message"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py`
around lines 394 - 417, Add two additional edge-case assertions to the retrieval
argument guard tests: call module.retrieval (via inspect.unwrap and _run) with a
partially missing required field (e.g., only {"knowledge_id": "kb-1"}) and
assert the response code equals module.RetCode.ARGUMENT_ERROR and that the
message mentions the missing "query"; and call module.retrieval with
retrieval_setting of the wrong type (e.g., a string) and assert
module.RetCode.ARGUMENT_ERROR and that the message indicates "retrieval_setting
must be an object" (use the same helpers _set_request_json and the existing
test_retrieval_argument_error_messages or a new test function to add these
cases).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/apps/sdk/dify_retrieval.py`:
- Around line 224-230: The tuple unpacking assigns an unused retrieval_setting
from _parse_retrieval_options; change the unpack to use a throwaway name (e.g.,
`_`) instead of `retrieval_setting` so it reads `_ , similarity_threshold, top =
_parse_retrieval_options(req.get("retrieval_setting", {}))`; update the
unpacking at the try block that calls `_parse_retrieval_options` to avoid the
unused variable warning while keeping `similarity_threshold` and `top` intact.
- Around line 40-78: The POST path in _read_retrieval_request lacks the same
normalization debug log as the GET path; update the POST flow to await
get_request_json(), extract knowledge_id, query, use_kg and retrieval_setting
from the returned dict (using the same keys as the GET branch), compute
safe_query (e.g., "len=N" when query is a str), and emit the same logger.debug
call (with knowledge_id, safe_query, use_kg, retrieval_setting.get("top_k"),
retrieval_setting.get("score_threshold")) before returning the req so POST
requests are logged consistently.

In
`@test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py`:
- Around line 394-417: Add two additional edge-case assertions to the retrieval
argument guard tests: call module.retrieval (via inspect.unwrap and _run) with a
partially missing required field (e.g., only {"knowledge_id": "kb-1"}) and
assert the response code equals module.RetCode.ARGUMENT_ERROR and that the
message mentions the missing "query"; and call module.retrieval with
retrieval_setting of the wrong type (e.g., a string) and assert
module.RetCode.ARGUMENT_ERROR and that the message indicates "retrieval_setting
must be an object" (use the same helpers _set_request_json and the existing
test_retrieval_argument_error_messages or a new test function to add these
cases).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3398f5df-9df3-4475-b1ff-6bdf44b7a441

📥 Commits

Reviewing files that changed from the base of the PR and between f906a20 and 4b49a2a.

📒 Files selected for processing (2)

api/apps/sdk/dify_retrieval.py
test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py

Support Dify retrieval normalization and argument validation consistency across GET/POST flows, and add regression tests for malformed, missing, and mistyped retrieval arguments.

Achieve3318 · 2026-04-16T13:57:18Z

hi, @yingfeng , Could you review my PR?

Achieve3318 · 2026-04-17T02:33:45Z

Hi, @yingfeng , Could you review my PR, please?

Achieve3318 · 2026-04-19T22:55:48Z

hi, @yingfeng , Could you review my PR?

Achieve3318 · 2026-04-24T13:37:31Z

Hi, @yingfeng , Could you review my PR, please?

codecov · 2026-05-11T04:45:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.16%. Comparing base (827ccec) to head (cae7612).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #14169   +/-   ##
=======================================
  Coverage   94.16%   94.16%           
=======================================
  Files          10       10           
  Lines         703      703           
  Branches      112      112           
=======================================
  Hits          662      662           
  Misses         25       25           
  Partials       16       16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 🧪 test Pull requests that update test cases. labels Apr 16, 2026

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Fix Dify retrieval request handling and strengthen argument guard tests.

1bccae4

Support Dify retrieval normalization and argument validation consistency across GET/POST flows, and add regression tests for malformed, missing, and mistyped retrieval arguments.

Achieve3318 force-pushed the fix-dify-retrieval-arg-errors branch from 407eca7 to 1bccae4 Compare April 16, 2026 13:56

Merge branch 'main' into fix-dify-retrieval-arg-errors

cae7612

KevinHuSh added the ci Continue Integration label May 11, 2026

KevinHuSh marked this pull request as draft May 11, 2026 04:02

KevinHuSh marked this pull request as ready for review May 11, 2026 04:02

KevinHuSh merged commit 16354f4 into infiniflow:main May 11, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dify): guard retrieval argument error behavior#14169

fix(dify): guard retrieval argument error behavior#14169
KevinHuSh merged 2 commits intoinfiniflow:mainfrom
Achieve3318:fix-dify-retrieval-arg-errors

Achieve3318 commented Apr 16, 2026

Uh oh!

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Achieve3318 commented Apr 16, 2026

Uh oh!

Achieve3318 commented Apr 17, 2026

Uh oh!

Achieve3318 commented Apr 19, 2026

Uh oh!

Achieve3318 commented Apr 24, 2026

Uh oh!

codecov Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Achieve3318 commented Apr 16, 2026

What problem does this PR solve?

What does this PR change?

Type of change

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Achieve3318 commented Apr 16, 2026

Uh oh!

Achieve3318 commented Apr 17, 2026

Uh oh!

Achieve3318 commented Apr 19, 2026

Uh oh!

Achieve3318 commented Apr 24, 2026

Uh oh!

codecov Bot commented May 11, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading