Feature/opencre clean by Nik-ui · Pull Request #1404 · OWASP/wstg

Nik-ui · 2026-04-16T21:12:49Z

This PR integrates OpenCRE enrichment directly into the existing checklist generation workflow.

This PR handles the issue and requires no additional PRs.
I have validated the need for this change.

What did this PR accomplish?

Integrated OpenCRE API calls into the existing generate_checklist_json.py script
Ensured CRE IDs are dynamically updated during checklist generation
Avoided unnecessary modifications to checklists/checklist.json to maintain formatting consistency
Added retry handling and controlled concurrency for reliable API interaction

Testing

Tested locally by running the checklist generation script
Verified successful execution without errors
Confirmed no unintended full-file changes to checklists/checklist.json

This aligns with the existing workflow and maintains consistency with current checklist generation processes.

Fixes #623

kingthorin

Thanks, lookin good

… failure

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <[email protected]>

kingthorin · 2026-04-18T00:10:30Z

I think this is probably good to go. I just want to test it a bit locally.

kingthorin · 2026-04-19T16:14:59Z

Sorry for the delay, my weekend isn't going how I anticipated, I'll try to get to testing and hopefully approving/merging this tomorrow.

Nik-ui · 2026-04-19T17:56:38Z

Sorry for the delay, my weekend isn't going how I anticipated, I'll try to get to testing and hopefully approving/merging this tomorrow.

No worries at all, thanks for the update.
I will be around if anything comes up during testing.

kingthorin · 2026-04-21T23:16:56Z

Could you apply the following patch? (Let me know if there's anything you disagree with.)

patch/diff

diff --git a/.github/json/scripts/generate_checklist_json.py b/.github/json/scripts/generate_checklist_json.py
index 926b06d..f9479b5 100644
--- a/.github/json/scripts/generate_checklist_json.py
+++ b/.github/json/scripts/generate_checklist_json.py
@@ -25,6 +25,11 @@ REFERENCE_PREFIX = (
 
 OPENCRE_STANDARD = "OWASP Web Security Testing Guide (WSTG)"
 OPENCRE_BASE_URL = "https://www.opencre.org/rest/v1/standard"
+OPENCRE_LOOKUP_DESCRIPTION = (
+    "OpenCRE is queried with `GET /rest/v1/standard/<OWASP Web Security Testing Guide "
+    "(WSTG)>?section=<WSTG-ID>` (plus `&page=` when the section spans multiple pages)."
+)
+CRE_IDS_CELL_MAX_LEN = 240
 DEFAULT_CONCURRENCY_LIMIT = 4
 RETRY_COUNT = 3
 REQUEST_TIMEOUT = 30
@@ -44,6 +49,23 @@ def get_concurrency_limit() -> int:
 
 
 CONCURRENCY_LIMIT = get_concurrency_limit()
+
+
+def emit_markdown_report(text: str) -> None:
+    """Print markdown to stdout and append to GITHUB_STEP_SUMMARY when set."""
+    print(text, flush=True)
+    summary_path = os.environ.get("GITHUB_STEP_SUMMARY")
+    if summary_path:
+        try:
+            with open(summary_path, "a", encoding="utf-8") as fh:
+                fh.write(text)
+        except OSError as exc:
+            print(
+                f"Warning: could not write GITHUB_STEP_SUMMARY: {exc}",
+                file=sys.stderr,
+            )
+
+
 class OpenCRELookupError(Exception):
     """Raised when an OpenCRE request cannot be resolved."""
 
@@ -183,6 +205,69 @@ def load_existing_cre_ids(path: Path) -> dict[str, list[str]]:
     return existing_cre_ids
 
 
+def _opencre_failure_is_404(message: str) -> bool:
+    return "404" in message
+
+
+def _opencre_failure_response_code(message: str) -> str:
+    """Best-effort HTTP status from OpenCRE error text; ``—`` when not present."""
+    m = re.search(r"returned (\d{3})\b", message)
+    if m:
+        return m.group(1)
+    if "404" in message:
+        return "404"
+    return "—"
+
+
+def _sort_opencre_failures_guide_order(
+    rows: list[tuple[str, str]], guide_rank: dict[str, int]
+) -> list[tuple[str, str]]:
+    """Order failures like the checklist (chapter order, then markdown file order)."""
+    sentinel = len(guide_rank) + 1
+    return sorted(
+        rows,
+        key=lambda r: (guide_rank.get(r[0], sentinel), r[0]),
+    )
+
+
+def _emit_opencre_failure_report(
+    failures: list[tuple[str, str]], guide_order_ids: list[str]
+) -> None:
+    if not failures:
+        return
+    guide_rank = {tid: i for i, tid in enumerate(guide_order_ids)}
+    lines: list[str] = [
+        "## Checklist JSON: OpenCRE lookup failures\n\n",
+        f"{OPENCRE_LOOKUP_DESCRIPTION}\n\n",
+        f"**{len(failures)}** WSTG test ID(s) could not be fetched from OpenCRE; "
+        "existing `cre_ids` in `checklist.json` are kept when present.\n\n",
+    ]
+    not_found = _sort_opencre_failures_guide_order(
+        [(tid, msg) for tid, msg in failures if _opencre_failure_is_404(msg)],
+        guide_rank,
+    )
+    other = _sort_opencre_failures_guide_order(
+        [(tid, msg) for tid, msg in failures if not _opencre_failure_is_404(msg)],
+        guide_rank,
+    )
+
+    def append_table(title: str, rows: list[tuple[str, str]]) -> None:
+        lines.append(f"### {title}\n\n")
+        if not rows:
+            lines.append("_None._\n\n")
+            return
+        lines.append("| WSTG ID | Response Code |\n")
+        lines.append("| --- | --- |\n")
+        for tid, msg in rows:
+            code = _opencre_failure_response_code(msg)
+            lines.append(f"| `{tid}` | {code} |\n")
+        lines.append("\n")
+
+    append_table(f"HTTP 404 ({len(not_found)})", not_found)
+    append_table(f"Other errors ({len(other)})", other)
+    emit_markdown_report("".join(lines))
+
+
 def enrich_with_opencre(checklist: OrderedDict) -> OrderedDict:
     all_tests = []
     existing_cre_ids = load_existing_cre_ids(OUTPUT_PATH)
@@ -221,16 +306,10 @@ def enrich_with_opencre(checklist: OrderedDict) -> OrderedDict:
                 results[returned_id] = cre_ids
             except Exception as exc:
                 message = str(exc)
-                print(f"WARNING: OpenCRE lookup failed for {test_id}: {message}")
                 results[test_id] = None
                 failures.append((test_id, message))
 
-    if failures:
-        print(
-            "WARNING: OpenCRE enrichment failures for the following test IDs:"
-        )
-        for failed_id, message in failures:
-            print(f"  - {failed_id}: {message}")
+    _emit_opencre_failure_report(failures, unique_ids)
 
     for test in all_tests:
         if not isinstance(test, dict):
@@ -392,21 +471,22 @@ def _empty_objective_entries(
 
 
 def _write_empty_objectives_report(entries: list[tuple[str, str, str, str]]) -> None:
-    """
-    In GitHub Actions, append to the job summary. Locally, print to stderr.
-    Never raises for missing env or IO errors beyond logging to stderr.
-    """
+    """Build markdown for Test Objectives quality; emit to stdout and job summary."""
     lines: list[str] = []
     if not entries:
-        lines.append("## Checklist JSON: Test Objectives\n\n")
+        lines.append("## Checklist JSON: WSTG markdown — Test Objectives\n\n")
         lines.append(
-            "All generated entries have at least one non-blank objective.\n"
+            "All checklist rows include at least one non-blank objective parsed from "
+            "each chapter's `## Test Objectives` section.\n"
         )
     else:
-        lines.append("## Checklist JSON: empty or blank Test Objectives\n\n")
         lines.append(
-            "These IDs have no non-blank objective strings; the Excel builder "
-            "will show **N/A** for objectives.\n\n"
+            "## Checklist JSON: WSTG markdown — empty or blank Test Objectives\n\n"
+        )
+        lines.append(
+            "These rows have empty or whitespace-only objectives in JSON (from each "
+            "chapter's `## Test Objectives` section). The Excel builder shows **N/A** "
+            "for the objective column.\n\n"
         )
         lines.append("| Category | ID | Name |\n")
         lines.append("| --- | --- | --- |\n")
@@ -415,20 +495,65 @@ def _write_empty_objectives_report(entries: list[tuple[str, str, str, str]]) ->
             safe_name = name.replace("|", "\\|")
             lines.append(f"| {safe_cat} | `{tid}` | {safe_name} |\n")
 
-    text = "".join(lines)
-    summary_path = os.environ.get("GITHUB_STEP_SUMMARY")
-    if summary_path:
-        try:
-            with open(summary_path, "a", encoding="utf-8") as fh:
-                fh.write(text)
-        except OSError as exc:
-            print(
-                f"Warning: could not write GITHUB_STEP_SUMMARY: {exc}",
-                file=sys.stderr,
-            )
-            print(text, file=sys.stderr)
-    else:
-        print(text, file=sys.stderr)
+    emit_markdown_report("".join(lines))
+
+
+def _cre_mapping_success_rows(
+    data: OrderedDict,
+) -> list[tuple[str, str, str, str]]:
+    """(category_label, test_id, test_name, cre_joined) for tests with non-empty cre_ids."""
+    rows: list[tuple[str, str, str, str]] = []
+    categories = data.get("categories", {})
+    if not isinstance(categories, dict):
+        return rows
+    for category_label, category in categories.items():
+        if not isinstance(category, dict):
+            continue
+        tests = category.get("tests", [])
+        if not isinstance(tests, list):
+            continue
+        for test in tests:
+            if not isinstance(test, dict):
+                continue
+            cre_ids = test.get("cre_ids")
+            if not isinstance(cre_ids, list) or not cre_ids:
+                continue
+            parts = [str(x) for x in cre_ids if isinstance(x, str) and x]
+            if not parts:
+                continue
+            joined = ", ".join(parts)
+            if len(joined) > CRE_IDS_CELL_MAX_LEN:
+                joined = joined[: CRE_IDS_CELL_MAX_LEN - 1] + "…"
+            tid = test.get("id", "")
+            name = test.get("name", "")
+            if not isinstance(tid, str):
+                tid = str(tid)
+            if not isinstance(name, str):
+                name = str(name)
+            rows.append((str(category_label), tid, name, joined))
+    rows.sort(key=lambda r: (r[0], r[1]))
+    return rows
+
+
+def _write_cre_mapping_success_report(data: OrderedDict) -> None:
+    rows = _cre_mapping_success_rows(data)
+    lines: list[str] = ["## Checklist JSON: OpenCRE mappings (success)\n\n"]
+    if not rows:
+        lines.append("No tests have non-empty `cre_ids` after this run.\n")
+        emit_markdown_report("".join(lines))
+        return
+    lines.append(
+        f"{OPENCRE_LOOKUP_DESCRIPTION}\n\n"
+        f"**{len(rows)}** checklist row(s) have at least one CRE id from OpenCRE.\n\n"
+    )
+    lines.append("| Category | ID | Name | CRE IDs |\n")
+    lines.append("| --- | --- | --- | --- |\n")
+    for category, tid, name, cre_cell in rows:
+        safe_cat = category.replace("|", "\\|")
+        safe_name = name.replace("|", "\\|")
+        safe_cre = cre_cell.replace("|", "\\|")
+        lines.append(f"| {safe_cat} | `{tid}` | {safe_name} | {safe_cre} |\n")
+    emit_markdown_report("".join(lines))
 
 
 def build_checklist() -> OrderedDict:
@@ -475,6 +600,7 @@ def build_checklist() -> OrderedDict:
 def main() -> None:
     data = build_checklist()
     data = enrich_with_opencre(data)
+    _write_cre_mapping_success_report(data)
     _write_empty_objectives_report(_empty_objective_entries(data))
     OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)
     text = json.dumps(data, indent=2, ensure_ascii=False) + "\n"

- Implement Markdown job summary reporting for OpenCRE lookup results - Add success report showing CRE-mapped test rows - Add failure report with categorized OpenCRE errors - Skip deprecated placeholder markdown (removed WSTG content stubs) - Fixes duplicate WSTG-INPV-13 entry

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <[email protected]>

kingthorin

@Nik-ui thanks for tackling this!! It wouldn't have moved for ages if you hadn't shown interest!!

I do want to adjust a few other things, like copilot's out standing points. I'll merge this and tackle them separately

😁🥳🎉

kingthorin · 2026-04-22T12:25:06Z

#1406

Nik-ui · 2026-04-22T12:40:28Z

@Nik-ui thanks for tackling this!! It wouldn't have moved for ages if you hadn't shown interest!!

I do want to adjust a few other things, like copilot's out standing points. I'll merge this and tackle them separately

😁🥳🎉

Please, if you’d like me to work on them, you can assign them to me.

Thanks, @kingthorin

Nik-ui force-pushed the feature/opencre-clean branch from 1dd8ac1 to 4ae637f Compare April 16, 2026 21:30

kingthorin reviewed Apr 17, 2026

View reviewed changes

kingthorin requested a review from Copilot April 17, 2026 01:51

Copilot started reviewing on behalf of kingthorin April 17, 2026 01:52 View session

kingthorin mentioned this pull request Apr 17, 2026

Add OpenCRE mapping script with safe fallback handling #1382

Closed

2 tasks

kingthorin added enhancement A new or improved feature for the WSTG or repo repo A task specifically related to the project repository labels Apr 17, 2026

This comment was marked as outdated.

Sign in to view

Improve OpenCRE enrichment reliability and preserve cre_ids on lookup…

9550a64

… failure

Nik-ui force-pushed the feature/opencre-clean branch from 61dc06a to 9550a64 Compare April 17, 2026 19:52

kingthorin requested a review from Copilot April 17, 2026 20:34

Copilot started reviewing on behalf of kingthorin April 17, 2026 20:35 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread .github/json/scripts/generate_checklist_json.py Outdated

Update generate_checklist_json.py

74e688f

Co-authored-by: Copilot <[email protected]>

kingthorin force-pushed the feature/opencre-clean branch from 74e688f to 0ccace7 Compare April 21, 2026 23:17

Nik-ui force-pushed the feature/opencre-clean branch from 0ccace7 to b307959 Compare April 22, 2026 04:06

kingthorin requested a review from Copilot April 22, 2026 11:15

Copilot started reviewing on behalf of kingthorin April 22, 2026 11:16 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread .github/json/scripts/generate_checklist_json.py Outdated

Comment thread .github/json/scripts/generate_checklist_json.py

Comment thread .github/json/scripts/generate_checklist_json.py

Update generate_checklist_json.py

d681705

Co-authored-by: Copilot <[email protected]>

kingthorin approved these changes Apr 22, 2026

View reviewed changes

kingthorin merged commit 2ec67f9 into OWASP:master Apr 22, 2026
1 check passed

Uh oh!

Conversation

Nik-ui commented Apr 16, 2026 • edited by kingthorin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kingthorin left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

kingthorin commented Apr 18, 2026

Uh oh!

kingthorin commented Apr 19, 2026

Uh oh!

Nik-ui commented Apr 19, 2026

Uh oh!

kingthorin commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kingthorin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kingthorin commented Apr 22, 2026

Uh oh!

Nik-ui commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Nik-ui commented Apr 16, 2026 •

edited by kingthorin

Loading