Skip to content

fix(dify): guard retrieval argument error behavior#14169

Merged
KevinHuSh merged 2 commits intoinfiniflow:mainfrom
Achieve3318:fix-dify-retrieval-arg-errors
May 11, 2026
Merged

fix(dify): guard retrieval argument error behavior#14169
KevinHuSh merged 2 commits intoinfiniflow:mainfrom
Achieve3318:fix-dify-retrieval-arg-errors

Conversation

@Achieve3318
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

The Dify-compatible /dify/retrieval endpoint recently gained stricter parsing and validation for its request payload, including:

  • Normalized retrieval_setting.top_k and retrieval_setting.score_threshold types.
  • Clear separation between malformed arguments vs missing required fields.
    Previously, there was no unit test explicitly guarding the exact error code and message contract for these cases.

What does this PR change?

  • Add guard-style unit test in test_dify_retrieval_routes_unit.py:
    • test_retrieval_argument_error_messages:
      • Sends a request with malformed numeric options:
        • retrieval_setting = {"top_k": "not-int", "score_threshold": "not-float"}
        • Asserts code == RetCode.ARGUMENT_ERROR and message contains
          "invalid or malformed arguments:".
      • Sends a request with required fields missing:
        • Empty payload ({})
        • Asserts code == RetCode.ARGUMENT_ERROR and message contains
          "required arguments are missing:".

This test encodes the intended behavior of the Dify retrieval API so future refactors cannot silently regress error handling.

Type of change

  • Tests (add coverage and guardrails for existing behavior)

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 🧪 test Pull requests that update test cases. labels Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 269220e3-a13f-4201-90f8-8566ddfa93bc

📥 Commits

Reviewing files that changed from the base of the PR and between 4b49a2a and 1bccae4.

📒 Files selected for processing (2)
  • api/apps/sdk/dify_retrieval.py
  • test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py
  • api/apps/sdk/dify_retrieval.py

📝 Walkthrough

Walkthrough

The PR expands the /dify/retrieval endpoint to accept GET and POST, adds GET query-parameter normalization, introduces typed retrieval-option parsing/validation, removes decorator-based validation in favor of in-handler checks, and adds unit tests covering parsing and argument-error cases.

Changes

Cohort / File(s) Summary
API Endpoint
api/apps/sdk/dify_retrieval.py
Endpoint methods changed to support GET and POST; removed validate_request decorator; added _read_retrieval_request() to normalize GET query params into a request dict; added _parse_retrieval_options() to coerce/validate retrieval_setting into similarity_threshold (float) and top (int); explicit in-handler argument error handling and Quart/Werkzeug bad-request compatibility; updated OpenAPI-style docstring.
Unit Tests
test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py
Added four pytest.mark.p2 tests verifying GET query parsing/type coercion, POST JSON passthrough, and multiple RetCode.ARGUMENT_ERROR cases (malformed numeric values, missing knowledge_id/query, invalid retrieval_setting type).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble query strings and JSON delight,
GET or POST, I fetch knowledge just right.
I tidy the numbers, I guard every key,
If formats go sideways, I warn gently — teehee! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change—adding guard-style tests for the retrieval endpoint's argument error behavior—making it specific and relevant to the changeset.
Description check ✅ Passed The description follows the template with all required sections complete: it explains the problem (missing test coverage for error handling), lists changes (new test cases), and specifies the change type (Tests).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
api/apps/sdk/dify_retrieval.py (2)

224-230: Minor: Unused retrieval_setting variable from tuple unpacking.

The retrieval_setting variable returned by _parse_retrieval_options is unpacked but never used afterward. Consider using _ for the unused value.

♻️ Minor cleanup
     try:
-        retrieval_setting, similarity_threshold, top = _parse_retrieval_options(req.get("retrieval_setting", {}))
+        _, similarity_threshold, top = _parse_retrieval_options(req.get("retrieval_setting", {}))
     except ValueError as e:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/sdk/dify_retrieval.py` around lines 224 - 230, The tuple unpacking
assigns an unused retrieval_setting from _parse_retrieval_options; change the
unpack to use a throwaway name (e.g., `_`) instead of `retrieval_setting` so it
reads `_ , similarity_threshold, top =
_parse_retrieval_options(req.get("retrieval_setting", {}))`; update the
unpacking at the try block that calls `_parse_retrieval_options` to avoid the
unused variable warning while keeping `similarity_threshold` and `top` intact.

40-78: Consider adding debug logging for POST requests as well.

The function logs GET request normalization details (lines 62-69), but POST requests pass through without equivalent logging. Per coding guidelines, new flows should include logging.

📝 Suggested enhancement for consistency
     return req
-    return await get_request_json()
+    req = await get_request_json()
+    logger.debug(
+        "Dify retrieval POST payload: knowledge_id=%s query_len=%s use_kg=%s",
+        req.get("knowledge_id"),
+        len(req.get("query", "")) if isinstance(req.get("query"), str) else 0,
+        req.get("use_kg"),
+    )
+    return req
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/apps/sdk/dify_retrieval.py` around lines 40 - 78, The POST path in
_read_retrieval_request lacks the same normalization debug log as the GET path;
update the POST flow to await get_request_json(), extract knowledge_id, query,
use_kg and retrieval_setting from the returned dict (using the same keys as the
GET branch), compute safe_query (e.g., "len=N" when query is a str), and emit
the same logger.debug call (with knowledge_id, safe_query, use_kg,
retrieval_setting.get("top_k"), retrieval_setting.get("score_threshold")) before
returning the req so POST requests are logged consistently.
test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py (1)

394-417: Consider adding edge case tests for more comprehensive coverage.

The guard test covers the primary cases well. For stronger protection, consider adding:

  1. Partial missing fields (e.g., {"knowledge_id": "kb-1"} without query)
  2. Invalid retrieval_setting type (e.g., {"retrieval_setting": "not-a-dict"})
📝 Additional test cases
# Case 3: partial required fields
_set_request_json(monkeypatch, module, {"knowledge_id": "kb-1"})
res_partial = _run(inspect.unwrap(module.retrieval)("tenant-1"))
assert res_partial["code"] == module.RetCode.ARGUMENT_ERROR
assert "query" in res_partial["message"]

# Case 4: invalid retrieval_setting type
_set_request_json(monkeypatch, module, {
    "knowledge_id": "kb-1",
    "query": "hello",
    "retrieval_setting": "not-a-dict"
})
res_bad_type = _run(inspect.unwrap(module.retrieval)("tenant-1"))
assert res_bad_type["code"] == module.RetCode.ARGUMENT_ERROR
assert "retrieval_setting must be an object" in res_bad_type["message"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py`
around lines 394 - 417, Add two additional edge-case assertions to the retrieval
argument guard tests: call module.retrieval (via inspect.unwrap and _run) with a
partially missing required field (e.g., only {"knowledge_id": "kb-1"}) and
assert the response code equals module.RetCode.ARGUMENT_ERROR and that the
message mentions the missing "query"; and call module.retrieval with
retrieval_setting of the wrong type (e.g., a string) and assert
module.RetCode.ARGUMENT_ERROR and that the message indicates "retrieval_setting
must be an object" (use the same helpers _set_request_json and the existing
test_retrieval_argument_error_messages or a new test function to add these
cases).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/apps/sdk/dify_retrieval.py`:
- Around line 224-230: The tuple unpacking assigns an unused retrieval_setting
from _parse_retrieval_options; change the unpack to use a throwaway name (e.g.,
`_`) instead of `retrieval_setting` so it reads `_ , similarity_threshold, top =
_parse_retrieval_options(req.get("retrieval_setting", {}))`; update the
unpacking at the try block that calls `_parse_retrieval_options` to avoid the
unused variable warning while keeping `similarity_threshold` and `top` intact.
- Around line 40-78: The POST path in _read_retrieval_request lacks the same
normalization debug log as the GET path; update the POST flow to await
get_request_json(), extract knowledge_id, query, use_kg and retrieval_setting
from the returned dict (using the same keys as the GET branch), compute
safe_query (e.g., "len=N" when query is a str), and emit the same logger.debug
call (with knowledge_id, safe_query, use_kg, retrieval_setting.get("top_k"),
retrieval_setting.get("score_threshold")) before returning the req so POST
requests are logged consistently.

In
`@test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py`:
- Around line 394-417: Add two additional edge-case assertions to the retrieval
argument guard tests: call module.retrieval (via inspect.unwrap and _run) with a
partially missing required field (e.g., only {"knowledge_id": "kb-1"}) and
assert the response code equals module.RetCode.ARGUMENT_ERROR and that the
message mentions the missing "query"; and call module.retrieval with
retrieval_setting of the wrong type (e.g., a string) and assert
module.RetCode.ARGUMENT_ERROR and that the message indicates "retrieval_setting
must be an object" (use the same helpers _set_request_json and the existing
test_retrieval_argument_error_messages or a new test function to add these
cases).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3398f5df-9df3-4475-b1ff-6bdf44b7a441

📥 Commits

Reviewing files that changed from the base of the PR and between f906a20 and 4b49a2a.

📒 Files selected for processing (2)
  • api/apps/sdk/dify_retrieval.py
  • test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py

Support Dify retrieval normalization and argument validation consistency across GET/POST flows, and add regression tests for malformed, missing, and mistyped retrieval arguments.
@Achieve3318 Achieve3318 force-pushed the fix-dify-retrieval-arg-errors branch from 407eca7 to 1bccae4 Compare April 16, 2026 13:56
@Achieve3318
Copy link
Copy Markdown
Contributor Author

hi, @yingfeng , Could you review my PR?

@Achieve3318
Copy link
Copy Markdown
Contributor Author

Hi, @yingfeng , Could you review my PR, please?

@Achieve3318
Copy link
Copy Markdown
Contributor Author

hi, @yingfeng , Could you review my PR?

@Achieve3318
Copy link
Copy Markdown
Contributor Author

Hi, @yingfeng , Could you review my PR, please?

@KevinHuSh KevinHuSh added the ci Continue Integration label May 11, 2026
@KevinHuSh KevinHuSh marked this pull request as draft May 11, 2026 04:02
@KevinHuSh KevinHuSh marked this pull request as ready for review May 11, 2026 04:02
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.16%. Comparing base (827ccec) to head (cae7612).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14169   +/-   ##
=======================================
  Coverage   94.16%   94.16%           
=======================================
  Files          10       10           
  Lines         703      703           
  Branches      112      112           
=======================================
  Hits          662      662           
  Misses         25       25           
  Partials       16       16           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@KevinHuSh KevinHuSh merged commit 16354f4 into infiniflow:main May 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐖api The modified files are located under directory 'api/apps/sdk' ci Continue Integration size:L This PR changes 100-499 lines, ignoring generated files. 🧪 test Pull requests that update test cases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants