Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion dingo/model/llm/llm_document_parsing_ocr.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,16 @@ def process_response(cls, response: str) -> EvalDetail:
json_match = re.search(r'\{[\s\S]*"errors"[\s\S]*\}', response)
types = []
names = []
parse_ok = False
errors_nonempty = False

if json_match:
try:
json_str = json_match.group()
result_data = json.loads(json_str)
errors = result_data.get("errors", [])
parse_ok = True
errors_nonempty = len(errors) > 0
Comment on lines 113 to +115
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation is not robust against unexpected JSON structures from the LLM. If the errors field is not a list (e.g., a string or null), len(errors) will raise a TypeError, and the subsequent loop will also fail. It's safer to verify that errors is indeed a list before proceeding.

Suggested change
errors = result_data.get("errors", [])
parse_ok = True
errors_nonempty = len(errors) > 0
errors = result_data.get("errors", [])
if isinstance(errors, list):
parse_ok = True
errors_nonempty = bool(errors)
else:
errors = []


for error in errors:
error_category = error.get("error_category", "")
Expand All @@ -123,7 +127,7 @@ def process_response(cls, response: str) -> EvalDetail:
log.error("未找到JSON内容")

result = EvalDetail(metric=cls.__name__)
result.status = False
result.status = (not parse_ok) or errors_nonempty
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The updated logic for result.status correctly identifies issues, but the subsequent code for constructing result.label has two problems:

  1. If no errors are found (status is False), the label becomes ["."] because types and names are empty. It should ideally be ["QUALITY_GOOD"].
  2. If parsing fails (parse_ok is False), the label also becomes ["."], which is not descriptive.

Returning early for the success case and setting a specific label for parsing errors would improve the output quality.

Suggested change
result.status = (not parse_ok) or errors_nonempty
result.status = (not parse_ok) or errors_nonempty
if not result.status:
result.label = ["QUALITY_GOOD"]
result.reason = [json_str] if 'json_str' in locals() else [response]
return result
if not parse_ok:
types, names = ["QUALITY_BAD"], ["ParseError"]


tmp_type = '.'.join(types)
tmp_name = '.'.join(names)
Expand Down
Loading