-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat(ci): preflight-triage AI review + PR compliance gates #4359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 69 commits
Commits
Show all changes
71 commits
Select commit
Hold shift + click to select a range
2be450c
ci(review): bundled PR review workflow + preflight triage design
yiliang114 bdb3707
feat(review): add preflight foundation files (script, hard rules, pro…
yiliang114 9044ec0
refactor(review): drop path-glob hard rules — preflight LLM judges bl…
yiliang114 01039bd
feat(review): add STANDARD tier review prompt
yiliang114 cbe386f
feat(review): wire preflight triage + 4-tier execution paths into wor…
yiliang114 8e10ff7
refactor(review): align with pr-gate-plan — AI review is advisory, no…
yiliang114 bd47c8a
feat(ci): add pr-gate.yml + pr-gate-plan + unify wording with impleme…
yiliang114 4b54ac5
fix(ci): P1-P3 cleanups — align plan with impl, drop WIP exemption, d…
yiliang114 d71b317
refactor: apply 砍3 / 砍5 / 砍6 cuts — drop size labels, LIGHT-upgrade, …
yiliang114 3923338
refactor: drop pr-title CI check + add rollback/disable section
yiliang114 6f08200
fix(ci): rename parse-review-stream.js → .cjs (repo is type:module)
yiliang114 5c9a62f
fix(ci): use 'printf --' in ULTRA_LIGHT compose to avoid '-' flag par…
yiliang114 a025a59
feat(ci): post PR review comments as qwen-code-bot (not github-action…
yiliang114 a30d176
fix(ci): address PR #4359 review findings (10 items) [skip ci]
yiliang114 823d6d9
fix(review): raw line count in verdicts + DEEP timeout re-run notice …
yiliang114 50460a4
fix(review): guard helper-script exits under set -e; add helper tests
yiliang114 d8203ec
fix(review): DEEP emits only the consolidated review, not narration
yiliang114 15e995e
feat(pr-gate): add oversized-ok escape hatch for the PR size check
yiliang114 da3c315
fix(review): make DEEP parser survive the bundled skill's tree mutation
yiliang114 f9db858
docs(review): note DEEP largest-segment edge case + tree-mutation rat…
yiliang114 789785e
docs(review): expand @latest CLI comment with the stream-debug ration…
yiliang114 18471a7
feat(review): add harden-runner egress audit; document QWEN_CODE_BOT_…
yiliang114 0c1c5a0
docs(review): correct the bot-identity secret note — CI_BOT_PAT is th…
yiliang114 b54579c
fix(review): address PR #4359 review feedback + add routing flowchart
yiliang114 8bc5b11
docs(review): add end-to-end verification record (4 tiers + gate chai…
yiliang114 63881d4
fix(ci): close oversized-ok self-waiver bypass + deliver review on po…
yiliang114 7f53438
fix(ci): stop pipefail aborting review steps on SIGPIPE; gate cache o…
yiliang114 847f77b
docs(review): drop compare.md and roadmap.md from the PR
yiliang114 291d6c5
docs(review): de-draft preflight-triage.md now that the design shippe…
yiliang114 59abfc5
fix(ci): harden preflight stdout/stderr handling and prompt-template …
yiliang114 35a537d
fix(ci): route preflight model var through env; document issue_commen…
yiliang114 31e235a
fix(ci): harden PR review workflow
yiliang114 ee7006c
fix(ci): close the always-emit gap when a tier step succeeds with no …
yiliang114 5551e3b
fix(ci): update pr gate github-script pin
yiliang114 520881e
docs(ci): align preflight review design
yiliang114 45b2769
fix(ci): replace coverage comment node20 action
yiliang114 86b3392
Revert "fix(ci): replace coverage comment node20 action"
yiliang114 c072a58
fix(ci): fence qwen review log streams
yiliang114 61ba57e
fix(ci): close round-2 review gaps in preflight + render template
wenshao b25c21a
fix(ci): tighten preflight tier strip, generated-file anchor, near-em…
wenshao 235f9ce
fix(ci): align deep PR review with bundled rubric
yiliang114 36eaeaa
fix(ci): refine PR review trigger and size warning
yiliang114 d9e8088
fix(ci): address PR review workflow feedback
yiliang114 09f233e
docs(ci): clarify PR size warning gate
yiliang114 b583706
fix(ci): keep review stderr out of json streams
yiliang114 4d1d82c
fix(ci): close PR review workflow follow-ups
yiliang114 db4021e
fix(ci): close round-3 PR review workflow gaps
yiliang114 7f61998
fix(ci): disable nounset around PIPESTATUS capture in pr-review [skip…
yiliang114 4b1fe9f
fix(ci): filter tool_call XML, simplify review footer and size warning
yiliang114 65b25ba
fix(ci): add tool-free constraint to LIGHT/STANDARD review prompts [s…
yiliang114 1e8ea90
test: add tool_call filtering tests for parse-review-stream [skip ci]
yiliang114 a5cb08d
feat(ci): enable Bash for inline PR comments in review workflow [skip…
yiliang114 f033fa5
fix(ci): pass large prompts via stdin to avoid ARG_MAX overflow [skip…
yiliang114 13a4712
fix(ci): discard non-review preamble when model emits fake tool calls
yiliang114 bec5072
fix(ci): strengthen no-tool-use constraint in review prompts
yiliang114 1f2bd9d
fix(ci): strip model preamble/thinking text from review output
yiliang114 7f0e7d0
refactor(review): remove tool restrictions — let model use full tools…
yiliang114 db57c9b
fix(review): filter tool_use preamble and fix inline comment instruct…
yiliang114 26ac19c
fix(ci): resolve empty HEAD_SHA in review tier steps
yiliang114 01743ba
fix(review): filter standalone preamble fragments from review output
yiliang114 4630340
fix(review): add pre-authorization note to unblock inline comment pos…
yiliang114 ff1c2f3
fix(ci): allow shell tool in non-interactive review mode
yiliang114 c53fdcb
fix(ci): enable inline comments for all findings & fix validation evi…
yiliang114 de4ea8e
fix(ci): filter narration via usage signal & allow all shell commands
yiliang114 1d80af3
fix(review): address review output quality issues
yiliang114 9b4ac7e
fix(review): remove noise from inline comment posting
yiliang114 9f3ee49
fix(review): align severity format with bundled skill and improve inl…
yiliang114 04a6f2d
fix(review): reverse inline comment execution order to prevent duplic…
yiliang114 4c54111
fix(review): pre-approve virtual ops so inline comments work in headl…
yiliang114 17f14a1
fix(review): use YOLO approval mode for CI review tool access
yiliang114 235d154
fix(test): update workflow test to expect YOLO approval mode
yiliang114 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,325 @@ | ||
| name: 'PR Gate' | ||
|
|
||
| # Fast deterministic PR gate/status checks. Each job emits an independent | ||
| # named status check. Branch Protection may require the check names, but an | ||
| # individual job can still report a warning-only signal such as PR Size. | ||
| # | ||
| # Sibling workflow `qwen-code-pr-review.yml` provides the (slow, advisory) | ||
| # AI review — it is intentionally NOT required for merge. See | ||
| # docs/design/pr-gate-plan.md for the full design rationale. | ||
| # | ||
| # Triggers: | ||
| # - pull_request (not pull_request_target): both jobs only read PR | ||
| # metadata, no secrets needed. Fork PRs trigger normally and produce | ||
| # status checks the maintainer can see before merge. | ||
| # - 'edited' is critical: when the contributor edits the PR body to fix a | ||
| # failed check, the workflow re-runs and updates the | ||
| # check status. Without 'edited', a failed PR Template check would | ||
| # stay red forever even after correction. | ||
| # - 'labeled'/'unlabeled' are needed so the PR Size `oversized-ok` | ||
| # acknowledgement label takes effect (or is revoked) without an extra | ||
| # push or edit to re-trigger the check. | ||
|
|
||
| on: | ||
| pull_request: | ||
| types: | ||
| - 'opened' | ||
| - 'edited' | ||
| - 'synchronize' | ||
| - 'ready_for_review' | ||
| - 'reopened' | ||
| - 'labeled' | ||
| - 'unlabeled' | ||
| branches: ['main', 'release/**'] | ||
|
|
||
| permissions: | ||
| contents: 'read' | ||
| pull-requests: 'read' # read-only: pr-template uses event payload; pr-size calls listFiles | ||
| issues: 'read' # pr-size resolves the oversized-ok label applier via the issue events API | ||
|
|
||
| # Cancel earlier in-flight runs on the same PR when a new event arrives. | ||
| # 'edited' events can fire frequently while the contributor iterates on | ||
| # the description — no point keeping stale runs around. | ||
| concurrency: | ||
| group: 'pr-gate-${{ github.event.pull_request.number }}' | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| # NOTE: a `pr-title` Conventional-Commits check used to live here. It was | ||
| # removed because the repo enforces commit-message format via local | ||
| # commit-msg hooks (so contributors hitting `git commit` are already | ||
| # caught on their own machine), and re-checking the same rule in CI just | ||
| # duplicates the gate without adding signal. PR title still becomes the | ||
| # squash-merge commit message; maintainers spot-check it at merge time. | ||
| # If we later see PRs with non-conforming titles slipping through, this | ||
| # job can be reintroduced (the action used was | ||
| # `amannn/action-semantic-pull-request` pinned to a SHA). | ||
|
|
||
| pr-template: | ||
|
yiliang114 marked this conversation as resolved.
|
||
| name: 'PR Template' | ||
| # `labeled`/`unlabeled` are in the workflow triggers only for the | ||
| # pr-size `oversized-ok` acknowledgement. PR Template does not depend on | ||
| # labels at all, so skip it on label events — otherwise every | ||
| # unrelated label add/remove re-runs (and, with cancel-in-progress, | ||
| # can cancel) this check for no reason. | ||
| if: |- | ||
| github.event.action != 'labeled' && | ||
| github.event.action != 'unlabeled' | ||
| runs-on: 'ubuntu-latest' | ||
| timeout-minutes: 2 | ||
| steps: | ||
| - name: 'Validate PR body has required sections' | ||
| uses: 'actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3' # v9.0.0 | ||
| with: | ||
| script: | | ||
| const body = context.payload.pull_request.body || ''; | ||
| const errors = []; | ||
|
|
||
| // Required sections (must literally appear as headers). | ||
| const requiredSections = [ | ||
| { | ||
| marker: '## Summary', | ||
| name: 'Summary', | ||
| hint: 'Describe what this PR does and why, in 1-3 sentences.', | ||
| }, | ||
| { | ||
| marker: '## Validation', | ||
| name: 'Validation', | ||
| hint: 'Provide reviewer-facing evidence: commands run, observed output, logs, screenshots, GIFs, or test reports. Saying "tested locally" alone is not enough.', | ||
| }, | ||
| { | ||
| marker: '## Linked Issues', | ||
| name: 'Linked Issues', | ||
| hint: 'List linked issues (e.g. `Fixes #123`, `Refs #456`) or write `N/A` if intentionally none.', | ||
| }, | ||
| ]; | ||
| for (const s of requiredSections) { | ||
| if (!body.includes(s.marker)) { | ||
| errors.push(`Missing "${s.name}" section.\n → ${s.hint}`); | ||
| } | ||
| } | ||
|
|
||
| // Validation must have substantive content, not just the | ||
| // template's empty bullet placeholders. Keep real fenced command | ||
| // output/logs, but drop the untouched "# paste ..." template | ||
| // fence so contributors can put all evidence inside code blocks. | ||
| const validationIdx = body.indexOf('## Validation'); | ||
| if (validationIdx !== -1) { | ||
| const after = body.slice(validationIdx + '## Validation'.length); | ||
| const nextHeaderIdx = after.search(/\n## /); | ||
| const validationSection = nextHeaderIdx === -1 | ||
| ? after | ||
| : after.slice(0, nextHeaderIdx); | ||
|
|
||
| const stripped = validationSection | ||
| .replace(/<!--[\s\S]*?-->/g, '') // HTML comments | ||
| .replace(/```[^\n\r]*(?:\r?\n)?([\s\S]*?)```/g, (_match, inner) => { | ||
| const content = inner.trim(); | ||
| if (!content) return ''; | ||
| if (/^#\s*paste\b/i.test(content)) return ''; | ||
| return `\n${content}\n`; | ||
| }) | ||
| .split('\n') | ||
| .filter((line) => { | ||
| const t = line.trim(); | ||
| if (!t) return false; | ||
| // Template-only bullet: "- LABEL:" with nothing after. | ||
| if (/^[-*]\s+[^:]+:\s*$/.test(t)) return false; | ||
| // Bullet with only a placeholder-style value like | ||
| // "- Foo: <something>" or "- Foo: [paste …]" or "- Foo: TBD". | ||
| if (/^[-*]\s+[^:]+:\s*(<[^>]+>|\[[^\]]+\]|TBD|N\/A?|todo|tbd)\s*$/i.test(t)) { | ||
| return false; | ||
| } | ||
| return true; | ||
| }) | ||
| .join('\n') | ||
| .trim(); | ||
|
|
||
| if (stripped.length < 20) { | ||
| errors.push( | ||
| 'Validation section looks like the unfilled template ' + | ||
| '(empty bullets / no real evidence after the colons).\n' + | ||
| ' → Fill in actual commands, output, screenshots, or test reports.' | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| if (errors.length > 0) { | ||
| const formatted = errors.map(e => `• ${e}`).join('\n\n'); | ||
| core.setFailed( | ||
| `PR description is incomplete:\n\n${formatted}\n\n` + | ||
| 'Edit your PR description to match the template at ' + | ||
| '`.github/pull_request_template.md` and the check will re-run automatically.' | ||
| ); | ||
| } | ||
|
|
||
| pr-size: | ||
| name: 'PR Size' | ||
| # Re-run on non-label events as usual; on label events run ONLY when | ||
| # the `oversized-ok` acknowledgement label itself is added or removed — | ||
| # an unrelated label (`bug`, `priority/*`) must not churn this check. | ||
| if: |- | ||
| (github.event.action != 'labeled' && | ||
| github.event.action != 'unlabeled') || | ||
| github.event.label.name == 'oversized-ok' | ||
| runs-on: 'ubuntu-latest' | ||
| timeout-minutes: 3 | ||
| steps: | ||
| - name: 'Compute reviewability size' | ||
| uses: 'actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3' # v9.0.0 | ||
| with: | ||
| script: | | ||
| // Pull the full file list. Extremely high file counts are still | ||
| // covered by the meaningful-line threshold below. | ||
| const files = await github.paginate( | ||
| github.rest.pulls.listFiles, | ||
| { | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| pull_number: context.issue.number, | ||
| per_page: 100, | ||
| }, | ||
| (response) => response.data, | ||
| ); | ||
|
|
||
| // Exclude files whose churn is not a meaningful review burden. | ||
| // Keep this list intentionally tight — every entry should | ||
| // correspond to "auto-generated or pure prose" content. | ||
| const ignored = [ | ||
| /(?:^|\/)package-lock\.json$/, | ||
| /(?:^|\/)pnpm-lock\.yaml$/, | ||
| /(?:^|\/)yarn\.lock$/, | ||
| // Anchor to the filename segment so generated-file exclusion | ||
| // applies only to filenames following the | ||
| // `<name>.generated.<ext>` convention, not to directory names | ||
| // such as `foo.generated/bar.ts`. | ||
| /(?:^|\/)[^/]*\.generated\.[^/]+$/, | ||
| /\.snap$/, // jest snapshot | ||
| /^docs\//, // docs prose | ||
| /^docs-site\//, | ||
| /\.md$/, // markdown anywhere | ||
| /^integration-tests\/fixtures\//, | ||
| /^packages\/.+\/__snapshots__\//, | ||
| ]; | ||
| const meaningful = files.filter((f) => | ||
| !ignored.some((re) => re.test(f.filename)) | ||
| ); | ||
| const totalChanges = meaningful.reduce( | ||
| (sum, f) => sum + (f.additions || 0) + (f.deletions || 0), | ||
| 0, | ||
| ); | ||
| const totalFiles = meaningful.length; | ||
|
|
||
| const REVIEWABILITY_THRESHOLD = 1500; | ||
| const WARN_THRESHOLD = 1000; | ||
|
|
||
| // A maintainer can consciously acknowledge an over-threshold PR | ||
| // (a large but cohesive infra change, a generated-file refactor | ||
| // the exclusion list doesn't catch, etc.) by applying the | ||
| // `oversized-ok` label. Size is a reviewability signal, not a | ||
| // correctness failure, so this job warns but does not block merge | ||
| // solely because a PR is large. | ||
| const OVERSIZE_OK_LABEL = 'oversized-ok'; | ||
| const labels = (context.payload.pull_request?.labels || []) | ||
| .map((l) => l.name); | ||
| const sizeAcknowledged = labels.includes(OVERSIZE_OK_LABEL); | ||
|
|
||
| // Self-acknowledgement guard: a PR author applying `oversized-ok` | ||
| // to their own PR is not a real maintainer acknowledgement, so | ||
| // keep it visible as a warning rather than treating it as an | ||
| // accepted maintainer signal. | ||
| // | ||
| // The applier is resolved from the issue events timeline, NOT | ||
| // from `payload.sender`. `sender` only names the applier on | ||
| // the `labeled` event itself; on a later `synchronize` the | ||
| // label is still on the PR but `sender` is just the pusher. | ||
| // A `sender`-only check can misclassify the label on later runs: | ||
| // self-apply the label, then push a commit — the synchronize run | ||
| // sees `sizeAcknowledged` true while `sender` is just the pusher. | ||
| // Resolving the actual applier makes the warning accurate on | ||
| // every event. | ||
| let acknowledgementApplier = null; | ||
| if (sizeAcknowledged) { | ||
| const events = await github.paginate( | ||
| github.rest.issues.listEvents, | ||
| { | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| issue_number: context.issue.number, | ||
| per_page: 100, | ||
| }, | ||
| (response) => response.data, | ||
| ); | ||
| // The label can be removed and re-added; the most recent | ||
| // `labeled` event for `oversized-ok` is the application | ||
| // that currently holds. | ||
| for (const ev of events) { | ||
| if ( | ||
| ev.event === 'labeled' && | ||
| ev.label && | ||
| ev.label.name === OVERSIZE_OK_LABEL | ||
| ) { | ||
| acknowledgementApplier = ev.actor ? ev.actor.login : null; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| const prAuthor = context.payload.pull_request?.user?.login || null; | ||
| const selfAcknowledged = | ||
| sizeAcknowledged && | ||
| acknowledgementApplier !== null && | ||
| prAuthor !== null && | ||
| acknowledgementApplier === prAuthor; | ||
| if (selfAcknowledged) { | ||
| core.warning( | ||
| `\`${OVERSIZE_OK_LABEL}\` was applied by the PR author ` + | ||
| `(@${acknowledgementApplier}); this is recorded as a self-acknowledgement, ` + | ||
| 'not a maintainer acknowledgement.' | ||
| ); | ||
| } | ||
| // `sizeAcknowledged` true but applier unresolved (events API returned | ||
| // no matching `labeled` event — e.g. the label predates event | ||
| // retention). Honor the label but flag it, so the acknowledgement | ||
| // stays auditable instead of being trusted silently. | ||
| if (sizeAcknowledged && acknowledgementApplier === null) { | ||
| core.warning( | ||
| `Could not resolve who applied the \`${OVERSIZE_OK_LABEL}\` ` + | ||
| 'label from the issue events timeline; honoring the acknowledgement ' + | ||
| 'but it could not be verified as maintainer-applied.' | ||
| ); | ||
| } | ||
|
|
||
| core.info( | ||
| `PR size: ${totalChanges} meaningful lines across ${totalFiles} files ` + | ||
| `(reviewability threshold at ${REVIEWABILITY_THRESHOLD}, warn at ${WARN_THRESHOLD}; lockfile/docs/snapshot/generated excluded).` | ||
| ); | ||
|
|
||
| if (totalChanges > REVIEWABILITY_THRESHOLD && sizeAcknowledged && !selfAcknowledged) { | ||
| // Name the applier (resolved from the events timeline above, | ||
| // so it works on every event, not just `labeled`) to keep an | ||
| // audit trail in the job log instead of relying only on | ||
| // GitHub's separate label event log. | ||
| const applier = acknowledgementApplier ? ` (applied by @${acknowledgementApplier})` : ''; | ||
| const acknowledgementText = acknowledgementApplier | ||
| ? 'a maintainer has consciously acknowledged the size' | ||
| : 'the label records explicit size acknowledgement'; | ||
| core.warning( | ||
| `PR is over the size threshold (${totalChanges} > ${REVIEWABILITY_THRESHOLD} ` + | ||
| `meaningful lines) but carries the \`${OVERSIZE_OK_LABEL}\` label${applier} — ` + | ||
| `${acknowledgementText}. Merge is allowed.` | ||
| ); | ||
| } else if (totalChanges > REVIEWABILITY_THRESHOLD) { | ||
| core.warning( | ||
| `PR is over the size threshold: ${totalChanges} meaningful lines across ${totalFiles} files ` + | ||
| `(threshold: ${REVIEWABILITY_THRESHOLD}). Merge is allowed, but this PR may be hard to review.\n\n` + | ||
| 'Consider splitting future large PRs by module/package or by concern. ' + | ||
| `If this PR is large but cohesive, a maintainer can apply the ` + | ||
| `\`${OVERSIZE_OK_LABEL}\` label to record explicit acknowledgement. ` + | ||
| 'Lockfile, docs, snapshot, and generated-file changes are already excluded from this count.' | ||
| ); | ||
| } else if (totalChanges > WARN_THRESHOLD) { | ||
| core.warning( | ||
| `PR is large (${totalChanges} lines). Merge is allowed, but consider splitting ` + | ||
| 'next time for easier review.' | ||
| ); | ||
| } | ||
|
yiliang114 marked this conversation as resolved.
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2]
labeled/unlabeledtriggers combined withcancel-in-progress: true(line 32) mean that adding/removing any unrelated label (e.g.,bug) cancels the runningpr-sizecheck and starts a new run wherepr-sizeis immediately skipped (theif:guard filters non-oversized-oklabels). During the window between cancellation and the new run completing, thePR Sizerequired check may show as cancelled/pending, temporarily blocking merge.Fix: Either remove
labeled/unlabeledfrom the workflow-level trigger and use a repository webhook to re-dispatch only onoversized-okchanges, or use a separate concurrency group for thepr-sizejob that is not cancelled by unrelated label events.— automated via Qwen Code /review