Fix: Support extra_body.reference_metadata on agent completions (#14308)#14314
Fix: Support extra_body.reference_metadata on agent completions (#14308)#14314spider-yamet wants to merge 6 commits into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds validation for Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Server as SDK Session Handler
participant DocSvc as DocMetadataService
participant Stream as SSE/Response Stream
Client->>Server: POST /agents/{id}/completions (extra_body.reference_metadata)
Server->>Server: validate reference_metadata, derive session_id
alt include_reference_metadata == true
Server->>DocSvc: fetch document metadata for referenced doc_ids
DocSvc-->>Server: return document metadata
end
Server->>Stream: enrich SSE chunks or final JSON with document_metadata
Stream-->>Client: SSE chunks / JSON response with enriched references
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/apps/sdk/session.py (1)
541-575:⚠️ Potential issue | 🔴 CriticalSerialize non-string streaming answers before yielding.
When
agent_completion(...)yields a dict withoutreferenceand withoutreturn_trace, this path reachesyield answerwithanswerstill being a dict, which can break the SSE response iterator. Convert non-string payloads to SSE immediately.Proposed fix
else: ans = answer + answer = _to_sse(ans) data = ans.get("data", {})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/sdk/session.py` around lines 541 - 575, The stream may yield non-string dicts which break SSE; ensure any non-string "answer" is converted via _to_sse before yielding: inside the agent_completion loop in session.py, after computing ans/data and handling reference transformation, detect if answer is not a str (or ans is a dict) and set answer = _to_sse(ans) when appropriate (i.e. for events "message" and "message_end" or any branch that yields answer), and also apply this conversion in the node_finished/return_trace branch before yield; update the yield sites so they always emit a string SSE payload from _to_sse(ans) rather than a raw dict.
🧹 Nitpick comments (1)
api/apps/sdk/session.py (1)
1319-1363: Add low-cardinality logging for metadata enrichment.This new metadata lookup/enrichment flow can silently no-op when no document IDs, matching metadata, or requested fields are found. Add debug logs with counts/field names only; avoid logging metadata values. As per coding guidelines,
**/*.py: Add logging for new flows.Example logging placement
if not doc_ids_by_kb: + logging.debug("No document ids found while enriching reference metadata") return {} meta_by_doc = {} @@ filtered = {} for doc_id, meta in meta_by_doc.items(): if metadata_fields is not None: meta = {k: v for k, v in meta.items() if k in metadata_fields} if meta: filtered[doc_id] = meta + logging.debug( + "Reference metadata enrichment matched %d document(s)", + len(filtered), + ) return filteredAlso applies to: 1366-1429
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/sdk/session.py` around lines 1319 - 1363, Add low-cardinality debug logging to the metadata enrichment flow by instrumenting the caller block that calls _get_reference_metadata_by_doc (the code that assigns meta_by_doc and iterates over chunks) and inside _get_reference_metadata_by_doc itself: log counts (number of chunks processed, number of unique dataset_ids/kb_ids, number of document_ids collected, number of documents returned in meta_by_doc, and number of documents after filtering) and the set/list of requested metadata_fields (only names) where applicable, but never log actual metadata values; use the module logger (e.g., logger.debug) or getLogger(__name__) and place logs at early exits (no doc ids, no meta map, no metadata_fields) and before returning so you can observe sizes for both the initial caller block and the related block referenced in the review (lines 1366-1429).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/sdk/session.py`:
- Around line 443-452: The current checks using "or {}" plus "if extra_body and
not isinstance(...)" allow falsy non-object values (e.g., [], false) to be
treated as absent; update the logic in the session handling code (variables
extra_body, reference_metadata, include_reference_metadata, metadata_fields) so
you only default to {} when the incoming value is None and you explicitly reject
non-dict/non-list types otherwise: replace "extra_body = req.get('extra_body')
or {}" / "if extra_body and not isinstance(extra_body, dict)" with an explicit
None check (e.g., extra_body = req.get('extra_body'); if extra_body is None:
extra_body = {} ; elif not isinstance(extra_body, dict): return
get_error_data_result(...)), and do the same for reference_metadata and
reference_metadata.fields (use metadata_fields is None before treating as absent
and then type-check for list), and apply identical fixes in
agents_completion_openai_compatibility and agent_completions where the same
pattern appears.
---
Outside diff comments:
In `@api/apps/sdk/session.py`:
- Around line 541-575: The stream may yield non-string dicts which break SSE;
ensure any non-string "answer" is converted via _to_sse before yielding: inside
the agent_completion loop in session.py, after computing ans/data and handling
reference transformation, detect if answer is not a str (or ans is a dict) and
set answer = _to_sse(ans) when appropriate (i.e. for events "message" and
"message_end" or any branch that yields answer), and also apply this conversion
in the node_finished/return_trace branch before yield; update the yield sites so
they always emit a string SSE payload from _to_sse(ans) rather than a raw dict.
---
Nitpick comments:
In `@api/apps/sdk/session.py`:
- Around line 1319-1363: Add low-cardinality debug logging to the metadata
enrichment flow by instrumenting the caller block that calls
_get_reference_metadata_by_doc (the code that assigns meta_by_doc and iterates
over chunks) and inside _get_reference_metadata_by_doc itself: log counts
(number of chunks processed, number of unique dataset_ids/kb_ids, number of
document_ids collected, number of documents returned in meta_by_doc, and number
of documents after filtering) and the set/list of requested metadata_fields
(only names) where applicable, but never log actual metadata values; use the
module logger (e.g., logger.debug) or getLogger(__name__) and place logs at
early exits (no doc ids, no meta map, no metadata_fields) and before returning
so you can observe sizes for both the initial caller block and the related block
referenced in the review (lines 1366-1429).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8968fd69-fc60-42d4-b7de-50d23f665ba9
📒 Files selected for processing (1)
api/apps/sdk/session.py
|
@6ba3i I would appreciate your feedback :) |
6ba3i
left a comment
There was a problem hiding this comment.
Thanks a lot for the contribution and for picking this up.
I agree with the direction here. Supporting extra_body.reference_metadata on the agent completion endpoints is the right behavior, and matching the chat-side capability makes sense.
I tested a smaller version of the fix locally and it works, so I think this can be tightened a bit while keeping the same functionality. My main suggestion is to keep the change as targeted as possible:
- parse
extra_body.reference_metadatathrough one small shared helper instead of duplicating the validation block - enrich agent references only where needed
- avoid the broader helper/refactor layer when a smaller agent-specific patch is enough
- keep the number of changed lines as low as possible
This is the smaller patch I am suggesting instead:
diff --git a/api/apps/sdk/session.py b/api/apps/sdk/session.py
index 82e048ff1..c9177a658 100644
--- a/api/apps/sdk/session.py
+++ b/api/apps/sdk/session.py
@@ -440,6 +440,9 @@ async def chat_completion_openai_like(tenant_id, chat_id):
@token_required
async def agents_completion_openai_compatibility(tenant_id, agent_id):
req = await get_request_json()
+ err, include_reference_metadata, metadata_fields = _parse_reference_metadata(req)
+ if err:
+ return err
messages = req.get("messages", [])
if not messages:
return get_error_data_result("You must provide at least one message.")
@@ -461,18 +464,27 @@ async def agents_completion_openai_compatibility(tenant_id, agent_id):
)
question = next((m["content"] for m in reversed(messages) if m["role"] == "user"), "")
+ session_id = req.pop("session_id", req.get("id", "")) or req.get("metadata", {}).get("id", "")
stream = req.pop("stream", False)
if stream:
+ body = completion_openai(
+ tenant_id,
+ agent_id,
+ question,
+ session_id=session_id,
+ stream=True,
+ **req,
+ )
+ if include_reference_metadata:
+ async def generate():
+ async for answer in body:
+ yield _build_agent_openai_response(answer, metadata_fields=metadata_fields)
+
+ body = generate()
+
resp = Response(
- completion_openai(
- tenant_id,
- agent_id,
- question,
- session_id=req.pop("session_id", req.get("id", "")) or req.get("metadata", {}).get("id", ""),
- stream=True,
- **req,
- ),
+ body,
mimetype="text/event-stream",
)
resp.headers.add_header("Cache-control", "no-cache")
@@ -486,10 +498,12 @@ async def agents_completion_openai_compatibility(tenant_id, agent_id):
tenant_id,
agent_id,
question,
- session_id=req.pop("session_id", req.get("id", "")) or req.get("metadata", {}).get("id", ""),
+ session_id=session_id,
stream=False,
**req,
):
+ if include_reference_metadata:
+ response = _build_agent_openai_response(response, metadata_fields=metadata_fields)
return jsonify(response)
return None
@@ -499,6 +513,9 @@ async def agents_completion_openai_compatibility(tenant_id, agent_id):
@token_required
async def agent_completions(tenant_id, agent_id):
req = await get_request_json()
+ err, include_reference_metadata, metadata_fields = _parse_reference_metadata(req)
+ if err:
+ return err
return_trace = bool(req.get("return_trace", False))
if req.get("stream", True):
@@ -512,10 +529,14 @@ async def agent_completions(tenant_id, agent_id):
except Exception:
continue
+ data = ans.get("data", {})
+ if include_reference_metadata and data.get("reference") is not None:
+ data["reference"] = _build_agent_reference(data["reference"], metadata_fields=metadata_fields)
+ answer = "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"
+
event = ans.get("event")
if event == "node_finished":
if return_trace:
- data = ans.get("data", {})
trace_items.append(
{
"component_id": data.get("component_id"),
@@ -553,7 +574,10 @@ async def agent_completions(tenant_id, agent_id):
full_content += ans["data"]["content"]
if ans.get("data", {}).get("reference", None):
- reference.update(ans["data"]["reference"])
+ ref = ans["data"]["reference"]
+ if include_reference_metadata:
+ ref = _build_agent_reference(ref, metadata_fields=metadata_fields)
+ reference.update(ref)
if ans.get("event") == "node_finished":
data = ans.get("data", {})
@@ -1302,3 +1326,92 @@ def _build_reference_chunks(reference, include_metadata=False, metadata_fields=N
chunk["document_metadata"] = meta
return chunks
+
+
+def _parse_reference_metadata(req):
+ extra_body = req.get("extra_body") or {}
+ if extra_body and not isinstance(extra_body, dict):
+ return get_error_data_result("extra_body must be an object."), False, None
+ reference_metadata = extra_body.get("reference_metadata") or {}
+ if reference_metadata and not isinstance(reference_metadata, dict):
+ return get_error_data_result("reference_metadata must be an object."), False, None
+ metadata_fields = reference_metadata.get("fields")
+ if metadata_fields is not None and not isinstance(metadata_fields, list):
+ return get_error_data_result("reference_metadata.fields must be an array."), False, None
+ return None, bool(reference_metadata.get("include", False)), metadata_fields
+
+
+def _build_agent_reference(reference, metadata_fields=None):
+ if not isinstance(reference, dict):
+ return reference
+ raw_chunks = reference.get("chunks")
+ if isinstance(raw_chunks, dict):
+ chunks = [chunk for chunk in raw_chunks.values() if isinstance(chunk, dict)]
+ elif isinstance(raw_chunks, list):
+ chunks = [chunk for chunk in raw_chunks if isinstance(chunk, dict)]
+ else:
+ return reference
+
+ if metadata_fields is not None:
+ metadata_fields = {f for f in metadata_fields if isinstance(f, str)}
+ if not metadata_fields:
+ return reference
+
+ doc_ids_by_kb = {}
+ for chunk in chunks:
+ kb_id = chunk.get("kb_id", chunk.get("dataset_id"))
+ doc_id = chunk.get("doc_id", chunk.get("document_id"))
+ if not kb_id or not doc_id:
+ continue
+ doc_ids_by_kb.setdefault(kb_id, set()).add(doc_id)
+
+ if not doc_ids_by_kb:
+ return reference
+
+ meta_by_doc = {}
+ for kb_id, doc_ids in doc_ids_by_kb.items():
+ meta_map = DocMetadataService.get_metadata_for_documents(list(doc_ids), kb_id)
+ if meta_map:
+ meta_by_doc.update(meta_map)
+
+ if not meta_by_doc:
+ return reference
+
+ for chunk in chunks:
+ doc_id = chunk.get("doc_id", chunk.get("document_id"))
+ meta = meta_by_doc.get(doc_id)
+ if not meta:
+ continue
+ if metadata_fields is not None:
+ meta = {k: v for k, v in meta.items() if k in metadata_fields}
+ if meta:
+ chunk["document_metadata"] = meta
+ return reference
+
+
+def _build_agent_openai_response(answer, metadata_fields=None):
+ if isinstance(answer, str):
+ if answer.strip() in {"data:[DONE]", "data: [DONE]"}:
+ return answer
+ if not answer.startswith("data:"):
+ return answer
+ try:
+ payload = json.loads(answer[5:].strip())
+ except Exception:
+ return answer
+ payload = _build_agent_openai_response(payload, metadata_fields=metadata_fields)
+ return "data: " + json.dumps(payload, ensure_ascii=False) + "\n\n"
+
+ if not isinstance(answer, dict):
+ return answer
+
+ for choice in answer.get("choices", []):
+ if not isinstance(choice, dict):
+ continue
+ delta = choice.get("delta")
+ if isinstance(delta, dict) and delta.get("reference") is not None:
+ delta["reference"] = _build_agent_reference(delta["reference"], metadata_fields=metadata_fields)
+ message = choice.get("message")
+ if isinstance(message, dict) and message.get("reference") is not None:
+ message["reference"] = _build_agent_reference(message["reference"], metadata_fields=metadata_fields)
+ return answerI think this keeps the feature intact while making the change easier to review and more tightly scoped to the actual missing behavior.
| @token_required | ||
| async def agents_completion_openai_compatibility(tenant_id, agent_id): | ||
| req = await get_request_json() | ||
| extra_body = req.get("extra_body", {}) |
There was a problem hiding this comment.
This parsing block is duplicated again below in agent_completions(...), and I think it can be reduced to a tiny shared helper.
Suggested change here:
err, include_reference_metadata, metadata_fields = _parse_reference_metadata(req)
if err:
return err
Then add this helper near the bottom of the file:
def _parse_reference_metadata(req):
extra_body = req.get("extra_body") or {}
if extra_body and not isinstance(extra_body, dict):
return get_error_data_result("extra_body must be an object."), False, None
reference_metadata = extra_body.get("reference_metadata") or {}
if reference_metadata and not isinstance(reference_metadata, dict):
return get_error_data_result("reference_metadata must be an object."), False, None
metadata_fields = reference_metadata.get("fields")
if metadata_fields is not None and not isinstance(metadata_fields, list):
return get_error_data_result("reference_metadata.fields must be an array."), False, None
return None, bool(reference_metadata.get("include", False)), metadata_fields
| return chunks | ||
|
|
||
| meta_by_doc = _get_reference_metadata_by_doc(chunks, metadata_fields) | ||
| if not meta_by_doc: |
There was a problem hiding this comment.
I think this helper section can be made quite a bit smaller.
For this issue, we do not really need to refactor _build_reference_chunks() or add the extra helper stack around _get_reference_metadata_by_doc(...) and _to_sse(). The feature can stay isolated to the agent path with a smaller helper set.
I would suggest:
- keep
_build_reference_chunks()unchanged - add a tiny
_parse_reference_metadata(req) - add an agent-only
_build_agent_reference(...) - add a small
_build_agent_openai_response(...) - skip
_get_reference_metadata_by_doc(...) - skip
_to_sse()
Suggested helper set:
def _parse_reference_metadata(req):
extra_body = req.get("extra_body") or {}
if extra_body and not isinstance(extra_body, dict):
return get_error_data_result("extra_body must be an object."), False, None
reference_metadata = extra_body.get("reference_metadata") or {}
if reference_metadata and not isinstance(reference_metadata, dict):
return get_error_data_result("reference_metadata must be an object."), False, None
metadata_fields = reference_metadata.get("fields")
if metadata_fields is not None and not isinstance(metadata_fields, list):
return get_error_data_result("reference_metadata.fields must be an array."), False, None
return None, bool(reference_metadata.get("include", False)), metadata_fields
def _build_agent_reference(reference, metadata_fields=None):
if not isinstance(reference, dict):
return reference
raw_chunks = reference.get("chunks")
if isinstance(raw_chunks, dict):
chunks = [chunk for chunk in raw_chunks.values() if isinstance(chunk, dict)]
elif isinstance(raw_chunks, list):
chunks = [chunk for chunk in raw_chunks if isinstance(chunk, dict)]
else:
return reference
if metadata_fields is not None:
metadata_fields = {f for f in metadata_fields if isinstance(f, str)}
if not metadata_fields:
return reference
doc_ids_by_kb = {}
for chunk in chunks:
kb_id = chunk.get("kb_id", chunk.get("dataset_id"))
doc_id = chunk.get("doc_id", chunk.get("document_id"))
if not kb_id or not doc_id:
continue
doc_ids_by_kb.setdefault(kb_id, set()).add(doc_id)
if not doc_ids_by_kb:
return reference
meta_by_doc = {}
for kb_id, doc_ids in doc_ids_by_kb.items():
meta_map = DocMetadataService.get_metadata_for_documents(list(doc_ids), kb_id)
if meta_map:
meta_by_doc.update(meta_map)
if not meta_by_doc:
return reference
for chunk in chunks:
doc_id = chunk.get("doc_id", chunk.get("document_id"))
meta = meta_by_doc.get(doc_id)
if not meta:
continue
if metadata_fields is not None:
meta = {k: v for k, v in meta.items() if k in metadata_fields}
if meta:
chunk["document_metadata"] = meta
return reference
def _build_agent_openai_response(answer, metadata_fields=None):
if isinstance(answer, str):
if answer.strip() in {"data:[DONE]", "data: [DONE]"}:
return answer
if not answer.startswith("data:"):
return answer
try:
payload = json.loads(answer[5:].strip())
except Exception:
return answer
payload = _build_agent_openai_response(payload, metadata_fields=metadata_fields)
return "data: " + json.dumps(payload, ensure_ascii=False) + "\n\n"
if not isinstance(answer, dict):
return answer
for choice in answer.get("choices", []):
if not isinstance(choice, dict):
continue
delta = choice.get("delta")
if isinstance(delta, dict) and delta.get("reference") is not None:
delta["reference"] = _build_agent_reference(delta["reference"], metadata_fields=metadata_fields)
message = choice.get("message")
if isinstance(message, dict) and message.get("reference") is not None:
message["reference"] = _build_agent_reference(message["reference"], metadata_fields=metadata_fields)
return answer
I think that keeps the requested feature fully intact, but with a smaller and more targeted patch.
| ans = json.loads(answer[5:]) # remove "data:" | ||
| except Exception: | ||
| continue | ||
| else: |
There was a problem hiding this comment.
I think this can stay local to the agent path without needing the extra _to_sse() helper.
Suggested change here:
data = ans.get("data", {})
if include_reference_metadata and data.get("reference") is not None:
data["reference"] = _build_agent_reference(data["reference"], metadata_fields=metadata_fields)
answer = "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"
It avoids adding another formatting helper just for this.
| @token_required | ||
| async def agent_completions(tenant_id, agent_id): | ||
| req = await get_request_json() | ||
| extra_body = req.get("extra_body", {}) |
There was a problem hiding this comment.
this is the same extra_body.reference_metadata parsing logic again, so I think it should be replaced with the same small helper instead of duplicating the whole block.
Suggested replacement:
err, include_reference_metadata, metadata_fields = _parse_reference_metadata(req)
if err:
return err
That keeps the validation consistent across both agent endpoints and reduces the amount of code changed for this feature.
|
@6ba3i Thanks for the review. I updated the patch to follow your suggestion more closely:
I would appreciate your feedback. Thanks |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
api/apps/sdk/session.py (1)
1356-1402: Add debug logging for the new metadata enrichment flow.This helper introduces the new agent reference metadata path, but there is no local signal for whether enrichment was skipped, found no metadata, or attached metadata to chunks.
Proposed logging addition
if not doc_ids_by_kb: + logging.debug("[agent_reference_metadata] skipped enrichment: no document ids in reference chunks") return reference meta_by_doc = {} for kb_id, doc_ids in doc_ids_by_kb.items(): meta_map = DocMetadataService.get_metadata_for_documents(list(doc_ids), kb_id) if meta_map: meta_by_doc.update(meta_map) if not meta_by_doc: + logging.debug( + "[agent_reference_metadata] no metadata found for %s reference documents across %s knowledge bases", + sum(len(doc_ids) for doc_ids in doc_ids_by_kb.values()), + len(doc_ids_by_kb), + ) return reference + attached_count = 0 for chunk in chunks: doc_id = chunk.get("doc_id", chunk.get("document_id")) meta = meta_by_doc.get(doc_id) if not meta: continue @@ if metadata_fields is not None: meta = {k: v for k, v in meta.items() if k in metadata_fields} if meta: chunk["document_metadata"] = meta + attached_count += 1 + logging.debug( + "[agent_reference_metadata] attached metadata to %s/%s reference chunks", + attached_count, + len(chunks), + ) return referenceAs per coding guidelines,
**/*.py: Add logging for new flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/sdk/session.py` around lines 1356 - 1402, In _build_agent_reference, add debug logging (using the module logger) to signal each enrichment outcome: log when the input is not a dict or chunks are missing/invalid and enrichment is skipped, log when metadata_fields filters out everything, log the doc_ids_by_kb contents before calling DocMetadataService.get_metadata_for_documents and log when meta_map is empty for a kb_id, and finally log how many chunks were enriched (and which doc_ids or kb_ids) after populating document_metadata; reference the function name _build_agent_reference, the variables metadata_fields, chunks, doc_ids_by_kb, meta_map/meta_by_doc and the call to DocMetadataService.get_metadata_for_documents to locate where to add these debug messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/sdk/session.py`:
- Around line 531-540: The streaming branch assigns ans = answer which can be a
dict and later yields answer directly to the SSE stream; ensure any non-string
ans/answer is JSON-serialized before it's used for SSE output. Update the logic
around ans, answer, data, include_reference_metadata and the place that
constructs the SSE payload (the code that currently does answer =
"data:"+json.dumps(ans,...)+ "\n\n" and the subsequent yield/send) so that if
ans is not a string you call json.dumps(ans, ensure_ascii=False) (and still
apply _build_agent_reference when include_reference_metadata is true) and only
yield the serialized string to the SSE send function.
- Line 467: The session_id extraction line can raise AttributeError when
req["metadata"] is null or not a dict; update the logic around the session_id
assignment (the session_id = ... statement in api/apps/sdk/session.py) to first
guard that metadata is a mapping (e.g., check isinstance(req.get("metadata"),
dict) or use a safe default {}) before calling .get("id"), then fall back to
req.pop("session_id") or req.get("id") as currently intended so
metadata.get("id") is only called on a dict.
---
Nitpick comments:
In `@api/apps/sdk/session.py`:
- Around line 1356-1402: In _build_agent_reference, add debug logging (using the
module logger) to signal each enrichment outcome: log when the input is not a
dict or chunks are missing/invalid and enrichment is skipped, log when
metadata_fields filters out everything, log the doc_ids_by_kb contents before
calling DocMetadataService.get_metadata_for_documents and log when meta_map is
empty for a kb_id, and finally log how many chunks were enriched (and which
doc_ids or kb_ids) after populating document_metadata; reference the function
name _build_agent_reference, the variables metadata_fields, chunks,
doc_ids_by_kb, meta_map/meta_by_doc and the call to
DocMetadataService.get_metadata_for_documents to locate where to add these debug
messages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 70727b1b-4771-42c9-b61a-c7a110f0eeab
📒 Files selected for processing (1)
api/apps/sdk/session.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
api/apps/sdk/session.py (1)
1418-1421: Log skipped SSE metadata enrichment.Malformed
data:payloads currently fall back silently, making the new metadata flow hard to diagnose when enrichment is requested.Proposed fix
try: payload = json.loads(answer[5:].strip()) - except Exception: + except Exception as e: + logging.debug("Skipping agent reference metadata enrichment for non-JSON SSE chunk: %s", e) return answerAs per coding guidelines,
**/*.py: Add logging for new flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/sdk/session.py` around lines 1418 - 1421, The silent fallback on malformed SSE JSON should log the failure: around the payload = json.loads(answer[5:].strip()) line, catch the exception as e (except Exception as e) and emit a log entry including the exception and the raw answer (and any context like session id if available) before returning answer; use the module's existing logger (or create one via logging.getLogger(__name__) if none exists) and log at warning/debug level so malformed `data:` payloads can be diagnosed while keeping the current return behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/sdk/session.py`:
- Around line 1355-1359: The current code coerces
reference_metadata.get("include") with bool(), which treats non-empty strings
like "false" as True; update the validation so that if reference_metadata
contains an "include" key it must be a boolean (True/False) otherwise return
get_error_data_result("reference_metadata.include must be a boolean."), False,
None; keep the existing return of metadata_fields and use
reference_metadata.get("include", False) only after confirming its type. Locate
the variables reference_metadata, metadata_fields and the return site in this
block to implement the check.
---
Nitpick comments:
In `@api/apps/sdk/session.py`:
- Around line 1418-1421: The silent fallback on malformed SSE JSON should log
the failure: around the payload = json.loads(answer[5:].strip()) line, catch the
exception as e (except Exception as e) and emit a log entry including the
exception and the raw answer (and any context like session id if available)
before returning answer; use the module's existing logger (or create one via
logging.getLogger(__name__) if none exists) and log at warning/debug level so
malformed `data:` payloads can be diagnosed while keeping the current return
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 306b6f1e-19b3-4f35-9817-e31e343b1acb
📒 Files selected for processing (1)
api/apps/sdk/session.py
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
api/apps/sdk/session.py (1)
1342-1441: Add logging for the new enrichment flows.Per coding guidelines, new flows should include logging. The three new helpers (
_parse_reference_metadata,_build_agent_reference,_build_agent_openai_response) are silent — validation failures return API errors without a log entry, and enrichment success/no-op paths have no debug trace. This makes it hard to diagnose cases where a client enablesreference_metadata.includebut receives chunks withoutdocument_metadata(e.g., emptydoc_ids_by_kb, emptymeta_by_doc, or JSON parse failure on an SSE frame).Suggested minimum:
logging.warning(orlogging.info) in_parse_reference_metadatawhen a validation error is returned, including which field failed.logging.debugin_build_agent_referencewhen no enrichment happens (unknownchunksshape, no kb/doc ids, no metadata found) and when chunks are enriched (count).logging.debug(orlogging.exception) in_build_agent_openai_responseon the silentexcept Exceptionat line 1424 — a malformed SSE payload is currently swallowed.As per coding guidelines: "Add logging for new flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/apps/sdk/session.py` around lines 1342 - 1441, Add logging to the new enrichment helpers: in _parse_reference_metadata, log a warning (logging.warning) when returning validation errors and include which field failed (extra_body, reference_metadata, reference_metadata.fields, reference_metadata.include); in _build_agent_reference, add debug logs for the early no-op returns (unknown chunks shape, empty doc_ids_by_kb, empty meta_by_doc) and log a debug/info summary when enrichment occurs (number of chunks enriched and/or docs found) and reference the variables chunks, doc_ids_by_kb, meta_by_doc; in _build_agent_openai_response, log the JSON parse failure in the except block (logging.debug or logging.exception) including the raw SSE payload string and ensure the rest of the behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/apps/sdk/session.py`:
- Around line 1383-1412: The current extraction of kb_id and doc_id uses
chunk.get("kb_id", chunk.get("dataset_id")) and chunk.get("doc_id",
chunk.get("document_id")), which does not fall back when the first key exists
but is None; change these to use explicit None-safe fallbacks (e.g. kb_id =
chunk.get("kb_id") or chunk.get("dataset_id") and doc_id = chunk.get("doc_id")
or chunk.get("document_id")) in both places where kb_id/doc_id are computed (the
initial doc_ids_by_kb collection loop and the later per-chunk metadata
assignment before looking up meta_by_doc), so valid dataset_id/document_id
values are used when the primary key is present but None and ensure
DocMetadataService.get_metadata_for_documents receives the correct doc_id list.
- Around line 536-546: The guard around ans.get("data", {}) is unsafe because
ans["data"] can be None; update the handling in the block that builds reference
metadata so that you coerce data to a dict (e.g., data = ans.get("data") or {})
before accessing data.get("reference"), then only call _build_agent_reference
and rewrite answer when data is a dict and contains a non-None "reference"
(respecting include_reference_metadata). Apply the same fix to the other
occurrences that use ans.get("data", {}) (the similar blocks around the ans
handling later in this function) to prevent AttributeError when upstream emits
data: null.
---
Nitpick comments:
In `@api/apps/sdk/session.py`:
- Around line 1342-1441: Add logging to the new enrichment helpers: in
_parse_reference_metadata, log a warning (logging.warning) when returning
validation errors and include which field failed (extra_body,
reference_metadata, reference_metadata.fields, reference_metadata.include); in
_build_agent_reference, add debug logs for the early no-op returns (unknown
chunks shape, empty doc_ids_by_kb, empty meta_by_doc) and log a debug/info
summary when enrichment occurs (number of chunks enriched and/or docs found) and
reference the variables chunks, doc_ids_by_kb, meta_by_doc; in
_build_agent_openai_response, log the JSON parse failure in the except block
(logging.debug or logging.exception) including the raw SSE payload string and
ensure the rest of the behavior is unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4243bf3b-c10a-4a1f-a280-014e8e56e656
📒 Files selected for processing (1)
api/apps/sdk/session.py
|
@6ba3i May I know your opinion? :) |
|
We’re currently rebuilding our API system. Please feel free to open a new pull request once that work is complete. See #14157. |
|
Since #14157 has been merged, this feature should be able to be merged/finalized as well :) |
efddb76 to
56ecf78
Compare
What problem does this PR solve?
This PR adds support for
extra_body.reference_metadataon agent completion endpoints so agent responses can return document metadata alongside reference chunks, matching the existing chat completion behavior.Fixes: #14308
Before this change:
/api/v1/chats/{chat_id}/completionssupportedextra_body.reference_metadata/api/v1/agents/{agent_id}/completionsdid not parse or apply itauthororcreated_timein referencesThis change keeps the existing agent reference structure intact and only enriches reference chunks with
document_metadatawhen:extra_body.reference_metadata.includeistruefieldsare providedIt also adds request validation for malformed
extra_body.reference_metadatapayloads on the agent completion paths.Type of change
Implementation details
extra_body.reference_metadataparsing and validation to:/api/v1/agents/{agent_id}/completions/api/v1/agents_openai/{agent_id}/chat/completionsdocument_metadatawithout changing the existing response schemaAPI compatibility
This is a non-breaking API consistency fix:
reference_metadatapayloads now return clear validation errors, consistent with chat completionsTests
Validated with focused isolated tests covering:
extra_body/reference_metadatapayloads