fix(dify): guard retrieval argument error behavior#14169
fix(dify): guard retrieval argument error behavior#14169KevinHuSh merged 2 commits intoinfiniflow:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThe PR expands the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
api/apps/sdk/dify_retrieval.py (2)
224-230: Minor: Unusedretrieval_settingvariable from tuple unpacking.The
retrieval_settingvariable returned by_parse_retrieval_optionsis unpacked but never used afterward. Consider using_for the unused value.♻️ Minor cleanup
try: - retrieval_setting, similarity_threshold, top = _parse_retrieval_options(req.get("retrieval_setting", {})) + _, similarity_threshold, top = _parse_retrieval_options(req.get("retrieval_setting", {})) except ValueError as e:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/sdk/dify_retrieval.py` around lines 224 - 230, The tuple unpacking assigns an unused retrieval_setting from _parse_retrieval_options; change the unpack to use a throwaway name (e.g., `_`) instead of `retrieval_setting` so it reads `_ , similarity_threshold, top = _parse_retrieval_options(req.get("retrieval_setting", {}))`; update the unpacking at the try block that calls `_parse_retrieval_options` to avoid the unused variable warning while keeping `similarity_threshold` and `top` intact.
40-78: Consider adding debug logging for POST requests as well.The function logs GET request normalization details (lines 62-69), but POST requests pass through without equivalent logging. Per coding guidelines, new flows should include logging.
📝 Suggested enhancement for consistency
return req - return await get_request_json() + req = await get_request_json() + logger.debug( + "Dify retrieval POST payload: knowledge_id=%s query_len=%s use_kg=%s", + req.get("knowledge_id"), + len(req.get("query", "")) if isinstance(req.get("query"), str) else 0, + req.get("use_kg"), + ) + return req🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/sdk/dify_retrieval.py` around lines 40 - 78, The POST path in _read_retrieval_request lacks the same normalization debug log as the GET path; update the POST flow to await get_request_json(), extract knowledge_id, query, use_kg and retrieval_setting from the returned dict (using the same keys as the GET branch), compute safe_query (e.g., "len=N" when query is a str), and emit the same logger.debug call (with knowledge_id, safe_query, use_kg, retrieval_setting.get("top_k"), retrieval_setting.get("score_threshold")) before returning the req so POST requests are logged consistently.test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py (1)
394-417: Consider adding edge case tests for more comprehensive coverage.The guard test covers the primary cases well. For stronger protection, consider adding:
- Partial missing fields (e.g.,
{"knowledge_id": "kb-1"}withoutquery)- Invalid
retrieval_settingtype (e.g.,{"retrieval_setting": "not-a-dict"})📝 Additional test cases
# Case 3: partial required fields _set_request_json(monkeypatch, module, {"knowledge_id": "kb-1"}) res_partial = _run(inspect.unwrap(module.retrieval)("tenant-1")) assert res_partial["code"] == module.RetCode.ARGUMENT_ERROR assert "query" in res_partial["message"] # Case 4: invalid retrieval_setting type _set_request_json(monkeypatch, module, { "knowledge_id": "kb-1", "query": "hello", "retrieval_setting": "not-a-dict" }) res_bad_type = _run(inspect.unwrap(module.retrieval)("tenant-1")) assert res_bad_type["code"] == module.RetCode.ARGUMENT_ERROR assert "retrieval_setting must be an object" in res_bad_type["message"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py` around lines 394 - 417, Add two additional edge-case assertions to the retrieval argument guard tests: call module.retrieval (via inspect.unwrap and _run) with a partially missing required field (e.g., only {"knowledge_id": "kb-1"}) and assert the response code equals module.RetCode.ARGUMENT_ERROR and that the message mentions the missing "query"; and call module.retrieval with retrieval_setting of the wrong type (e.g., a string) and assert module.RetCode.ARGUMENT_ERROR and that the message indicates "retrieval_setting must be an object" (use the same helpers _set_request_json and the existing test_retrieval_argument_error_messages or a new test function to add these cases).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@api/apps/sdk/dify_retrieval.py`:
- Around line 224-230: The tuple unpacking assigns an unused retrieval_setting
from _parse_retrieval_options; change the unpack to use a throwaway name (e.g.,
`_`) instead of `retrieval_setting` so it reads `_ , similarity_threshold, top =
_parse_retrieval_options(req.get("retrieval_setting", {}))`; update the
unpacking at the try block that calls `_parse_retrieval_options` to avoid the
unused variable warning while keeping `similarity_threshold` and `top` intact.
- Around line 40-78: The POST path in _read_retrieval_request lacks the same
normalization debug log as the GET path; update the POST flow to await
get_request_json(), extract knowledge_id, query, use_kg and retrieval_setting
from the returned dict (using the same keys as the GET branch), compute
safe_query (e.g., "len=N" when query is a str), and emit the same logger.debug
call (with knowledge_id, safe_query, use_kg, retrieval_setting.get("top_k"),
retrieval_setting.get("score_threshold")) before returning the req so POST
requests are logged consistently.
In
`@test/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py`:
- Around line 394-417: Add two additional edge-case assertions to the retrieval
argument guard tests: call module.retrieval (via inspect.unwrap and _run) with a
partially missing required field (e.g., only {"knowledge_id": "kb-1"}) and
assert the response code equals module.RetCode.ARGUMENT_ERROR and that the
message mentions the missing "query"; and call module.retrieval with
retrieval_setting of the wrong type (e.g., a string) and assert
module.RetCode.ARGUMENT_ERROR and that the message indicates "retrieval_setting
must be an object" (use the same helpers _set_request_json and the existing
test_retrieval_argument_error_messages or a new test function to add these
cases).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3398f5df-9df3-4475-b1ff-6bdf44b7a441
📒 Files selected for processing (2)
api/apps/sdk/dify_retrieval.pytest/testcases/test_http_api/test_dataset_management/test_dify_retrieval_routes_unit.py
Support Dify retrieval normalization and argument validation consistency across GET/POST flows, and add regression tests for malformed, missing, and mistyped retrieval arguments.
407eca7 to
1bccae4
Compare
|
hi, @yingfeng , Could you review my PR? |
|
Hi, @yingfeng , Could you review my PR, please? |
|
hi, @yingfeng , Could you review my PR? |
|
Hi, @yingfeng , Could you review my PR, please? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #14169 +/- ##
=======================================
Coverage 94.16% 94.16%
=======================================
Files 10 10
Lines 703 703
Branches 112 112
=======================================
Hits 662 662
Misses 25 25
Partials 16 16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What problem does this PR solve?
The Dify-compatible
/dify/retrievalendpoint recently gained stricter parsing and validation for its request payload, including:retrieval_setting.top_kandretrieval_setting.score_thresholdtypes.Previously, there was no unit test explicitly guarding the exact error code and message contract for these cases.
What does this PR change?
test_dify_retrieval_routes_unit.py:test_retrieval_argument_error_messages:retrieval_setting = {"top_k": "not-int", "score_threshold": "not-float"}code == RetCode.ARGUMENT_ERRORand message contains"invalid or malformed arguments:".{})code == RetCode.ARGUMENT_ERRORand message contains"required arguments are missing:".This test encodes the intended behavior of the Dify retrieval API so future refactors cannot silently regress error handling.
Type of change