Skip to content

Improve script failure messages with error-code guidance and CI coverage#237

Open
Copilot wants to merge 12 commits into
devfrom
copilot/better-error-code-usage
Open

Improve script failure messages with error-code guidance and CI coverage#237
Copilot wants to merge 12 commits into
devfrom
copilot/better-error-code-usage

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 22, 2026

Script failures were surfacing raw errors without using the available return/error codes. This change turns those codes into actionable diagnostics for common failure modes such as network/download issues, disk exhaustion, and native segfaults, and ensures the new guidance test runs in GitHub Actions.

  • Error-code guidance

    • Detects meaningful error codes from script return values and error text.
    • Maps common failure classes to short, actionable suggestions:
      • disk space / No space left on device
      • network / download / timeout / TLS / DNS failures
      • segmentation faults / signal-based native crashes
      • command-not-found / not-executable / killed / interrupted
  • Script execution path

    • Enriches ScriptExecutionError with detected error_code and structured guidance.
    • Hooks the guidance into the existing script failure path in script_action.py at the point where script return codes are already available.
  • CLI reporting

    • Extends top-level error reporting to print:
      • detected error code
      • likely cause
      • targeted suggestions
    • Keeps the existing rerun command and issue-reporting context intact.
  • Focused coverage

    • Adds a small test covering guidance detection and emitted error-reporting output.
    • Wires .github/scripts/test_error_guidance.py into .github/workflows/test-mlc-core-actions.yaml so it runs in CI as part of the existing core actions workflow.
    • Ignores generated .mlc-log.txt to avoid accidental artifact commits.
{
    "return": 1,
    "error": "Command execution failed with error code 28. No space left on device."
}

Now reports guidance along the lines of:

  • detected error code: 28
  • likely cause: disk space exhaustion
  • suggestions: free space in work/cache directories and retry

✅ PR Checklist

  • Target branch is dev

📌 Note: PRs must be raised against dev. Do not commit directly to main.

✅ Testing & CI

  • Have tested the changes in my local environment, else have properly conveyed in the PR description
  • The change includes a GitHub Action to test the script(if it is possible to be added).
  • No existing GitHub Actions are failing because of this change.

📚 Documentation

  • README or help docs are updated for new features or changes.
  • CLI help messages are meaningful and complete.

📁 File Hygiene & Output Handling

  • No unintended files (e.g., logs, cache, temp files, pycache, output folders) are committed.

🛡️ Safety & Security

  • No secrets or credentials are committed.
  • Paths, shell commands, and environment handling are safe and portable.

🙌 Contribution Hygiene

  • PR title and description are concise and clearly state the purpose of the change.
  • Related issues (if any) are properly referenced using Fixes # or Closes #.
  • All reviewer feedback has been addressed.

Copilot AI and others added 6 commits May 22, 2026 11:33
Agent-Logs-Url: https://github.com/mlcommons/mlcflow/sessions/65d4f25c-8e7e-4b32-a7cf-a2489b2c4da4

Co-authored-by: arjunsuresh <4791823+arjunsuresh@users.noreply.github.com>
Agent-Logs-Url: https://github.com/mlcommons/mlcflow/sessions/65d4f25c-8e7e-4b32-a7cf-a2489b2c4da4

Co-authored-by: arjunsuresh <4791823+arjunsuresh@users.noreply.github.com>
Agent-Logs-Url: https://github.com/mlcommons/mlcflow/sessions/65d4f25c-8e7e-4b32-a7cf-a2489b2c4da4

Co-authored-by: arjunsuresh <4791823+arjunsuresh@users.noreply.github.com>
Agent-Logs-Url: https://github.com/mlcommons/mlcflow/sessions/65d4f25c-8e7e-4b32-a7cf-a2489b2c4da4

Co-authored-by: arjunsuresh <4791823+arjunsuresh@users.noreply.github.com>
Agent-Logs-Url: https://github.com/mlcommons/mlcflow/sessions/65d4f25c-8e7e-4b32-a7cf-a2489b2c4da4

Co-authored-by: arjunsuresh <4791823+arjunsuresh@users.noreply.github.com>
Agent-Logs-Url: https://github.com/mlcommons/mlcflow/sessions/65d4f25c-8e7e-4b32-a7cf-a2489b2c4da4

Co-authored-by: arjunsuresh <4791823+arjunsuresh@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve error message handling using error codes Improve script failure messages with error-code guidance May 22, 2026
Copilot AI requested a review from arjunsuresh May 22, 2026 11:38
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions
Copy link
Copy Markdown

🤖 AI PR Review Summary\n\nAdds error code detection and actionable guidance for script execution failures. Introduces parsing of error messages to extract error codes and provide user-friendly suggestions for common errors like disk space, segmentation faults, and network issues. Integrates this guidance into the error reporting mechanism and adds unit tests for validation. Risks include potential mismatches in error code extraction patterns and incomplete coverage of error scenarios, but overall design improves user experience by providing clearer error diagnostics.

@github-actions
Copy link
Copy Markdown

🤖 AI PR Review Summary\n\nAdds comprehensive error code detection and actionable guidance for script execution failures. Introduces new utility functions in error_codes.py to parse error codes from messages and provide user-friendly suggestions. Enhances ScriptExecutionError to carry error guidance, integrates guidance generation in script_action.py, and improves error reporting in main.py to log detailed guidance. Adds unit tests for error guidance detection and reporting. Risks include maintaining accurate error pattern matching and ensuring guidance remains relevant and clear.

@arjunsuresh arjunsuresh marked this pull request as ready for review May 22, 2026 13:36
@arjunsuresh arjunsuresh requested a review from a team as a code owner May 22, 2026 13:36
@arjunsuresh arjunsuresh changed the base branch from main to dev May 22, 2026 13:37
@github-actions
Copy link
Copy Markdown

🤖 AI PR Review Summary\n\nAdds comprehensive error code detection and actionable guidance for script failures, enhancing error reporting with detailed suggestions. Introduces new utility functions in error_codes.py to parse error codes from messages and return codes, and integrates these into script execution error handling and logging. Risks include potential mismatches in error code extraction patterns and the need to maintain updated error guidance as new error types emerge.

@github-actions
Copy link
Copy Markdown

🤖 AI PR Review Summary\n\nAdds comprehensive error code detection and actionable guidance for script execution failures. Introduces error code parsing from messages, enriches ScriptExecutionError with error guidance, and enhances error reporting with detailed suggestions. Includes unit tests for error guidance detection and logging. Risks include maintaining accuracy of error pattern matching and ensuring guidance remains relevant across diverse error scenarios.

Copy link
Copy Markdown
Contributor

@arjunsuresh arjunsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is .github/scripts/test_error_guidance.py used in github actions?

@github-actions
Copy link
Copy Markdown

🤖 AI PR Review Summary\n\nAdds comprehensive error code detection and actionable guidance for script execution failures. Introduces error code parsing from messages, enriches ScriptExecutionError with error guidance, and enhances error reporting with detailed suggestions. Includes unit tests for error guidance functionality. Risks include maintaining accuracy of error pattern matching and ensuring guidance remains relevant across diverse environments.

@@ -0,0 +1,60 @@
import io
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot where's this test used in github actions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn’t wired into GitHub Actions before. I’ve added it to .github/workflows/test-mlc-core-actions.yaml, so it now runs as Test 3c - error guidance handling in CI. Commit: bfd555a. No UI change.

Agent-Logs-Url: https://github.com/mlcommons/mlcflow/sessions/1631b761-e890-491f-8cb0-2fa4749c7327

Co-authored-by: arjunsuresh <4791823+arjunsuresh@users.noreply.github.com>
Copilot AI changed the title Improve script failure messages with error-code guidance Improve script failure messages with error-code guidance and CI coverage May 22, 2026
@github-actions
Copy link
Copy Markdown

🤖 AI PR Review Summary\n\nAdds error code detection and actionable guidance for script execution failures. Introduces get_error_guidance to parse error codes from return codes and messages, providing user-friendly suggestions for common errors like disk space, segmentation faults, and network issues. Integrates this guidance into ScriptExecutionError and _report_error for improved error reporting. Adds unit tests for error guidance and updates CI workflow to run these tests. Risks include ensuring error code detection patterns cover relevant cases and that error guidance messages remain accurate and helpful.

Copilot AI requested a review from arjunsuresh May 22, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better usage of error codes in error messages

2 participants