ci(review): bundled PR review + incremental cache wiring#4320
ci(review): bundled PR review + incremental cache wiring#4320yiliang114 wants to merge 46 commits into
Conversation
…orkflow The comment claimed the pipeline "also strips quoted blocks" but it only truncates to 2KB. Update the comment to match actual behavior. Co-authored-by: Copilot <copilot-pull-request-reviewer@github.com>
The bundled `/review` skill is invoked from a `pull_request_target` workflow with `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and a `GITHUB_TOKEN` that can write to issues and pull requests. PR diffs, descriptions, trigger comments, and `QWEN_REVIEW_ADDITIONAL_INSTRUCTIONS` are all attacker-controllable, so the only guardrail was the prompt itself asking the agent to behave. Tighten the contract and reduce the gates' ability to false-positive on legitimate contributors. SKILL.md (Step 3.0): - Add an explicit `--ci` safety contract enumerating disallowed binaries (npm/npx/pnpm/yarn/node/python/cargo/make/mvn/gradle/bash -c/sh -c/eval), forbidden git/gh write paths, blocked filesystem regions (~/.ssh, ~/.gnupg, /proc, /var, /etc), banned secret echoing, and disallowed gh api repository-mutating endpoints. - Require any prompt-injection attempt embedded in the PR to be surfaced under a dedicated heading in the final review report. SKILL.md (Step 2.5 gates): - Make the product-direction and validation-evidence gates advisory by default. Only the scope gate blocks. Product-direction can opt back into blocking by adding `product-direction-gate: blocking` to `.qwen/review-rules.md`. - Add a contributor-friendly comment template for blocking gates so a bot-generated process comment is clearly labeled as automated and invites a maintainer reply. SKILL.md (frontmatter): - Document that the workflow's `--core-tools` flag and the `allowedTools` list must stay in sync. `.qwen/review-rules.md`: - Add a Precedence section so project rules override per-agent default heuristics and `QWEN_REVIEW_ADDITIONAL_INSTRUCTIONS` cannot override the safety contract. - Reflect the advisory default for product-direction and validation-evidence gates.
…ew-action # Conflicts: # .github/workflows/qwen-code-pr-review.yml
Workflow: - Move OPENAI_API_KEY and OPENAI_BASE_URL out of job-level env so the npm install / build / bundle step cannot read them via dependency postinstall scripts. They now live only on the `Run bundled Qwen PR review` step. Docs (code-review.md): - Add a "CI Mode (--ci)" section that explains the static-only safety contract, non-interactive behavior, treat-as-data handling of PR content, dry-run vs comment mode, and the OWNER/MEMBER/COLLABORATOR trigger boundary on pull_request_target opened. - Add a "Review-readiness gates (--ci only)" subsection that documents each gate's default behavior and how to opt the product-direction gate into blocking via `.qwen/review-rules.md`.
…runs Three documentation-consistency fixes spotted in the post-merge integration review. SKILL.md Step 3.0: - The "MAY use" allowlist named "gh pr review, gh pr comment" as the Step 9 posting path, but Step 9 actually submits via the Create Review API: `gh api repos/<owner>/<repo>/pulls/<n>/reviews --input <file>`. A `--ci` agent reading the contract literally could refuse the real call. Replace with the actual `gh api` invocation and an explicit "at most one review per run" cap. - Replace the misleading reference to a "cleanup comment defined in Step 11" — Step 11 (worktree cleanup) does not post anything. Move the process comment allowance to Step 2.5 with a "blocking gate only" qualifier. .qwen/review-rules.md Product Direction: - The advisory-default rule I added contradicted the older "should usually produce a process comment" line. An advisory concern goes inside the Step 9 review body (one review), not as a separate `gh pr comment`. Reconcile so the advisory and blocking paths are unambiguous.
…m source Replace the manual npm ci + build + bundle + preflight steps with the published composite action. Revert the bundled review SKILL.md to vanilla (removing all --ci safety-contract additions that belong at the CLI/action level, not in the skill). Drop smoke mode and the preflight script since qwen-code-action handles its own setup validation.
…t SKILL.md Checkout main branch so the bundled /review skill enters normal (not lightweight) mode — it can find the local git remote, load project rules, and run linters. Move review-rules.md from .qwen/ to .github/ so it doesn't affect local /review runs; copy it to .qwen/ in CI only.
…y error message - Add sed '/./,$!d' to remove leading empty lines when @qwen /review is on its own line with follow-up instructions on subsequent lines. - Clarify the QWEN_PR_REVIEW_MODEL error message to note it maps to the OPENAI_MODEL env var.
Add pull_request_target types (reopened, ready_for_review) so draft-to-ready and reopen automatic review. Add edited types for issue_comment and pull_request_review_comment so editing a comment to include @qwen /review triggers re-review. Update the job-level if condition to match the new actions.
When workflow_dispatch runs from the PR branch before merge, checkout main does not have .github/review-rules.md yet. Guard with -f check instead of failing the step.
…ck and awk extraction - Move .github/review-rules.md → .qwen/review-rules.md and un-gitignore it. The /review skill loads it directly; no CI-time copy step needed. - Replace sed extraction with awk match() for correct @qwen /review boundary handling across all comment events. - Add should_run_review flag: skip review entirely when @qwen /review is absent from comment/review body events. - Add cross-repository check: block automated review on fork PRs since the workflow runs with review credentials and may install head-branch deps. - Pin qwen-code-action to a specific commit instead of @main.
…ck and awk extraction - Move .github/review-rules.md → .qwen/review-rules.md and un-gitignore it. The /review skill loads it directly; no CI-time copy step needed. - Replace sed extraction with awk match() for correct @qwen /review boundary handling across all comment events. - Add should_run_review flag: skip review entirely when @qwen /review is absent from comment/review body events. - Add cross-repository check: block automated review on fork PRs since the workflow runs with review credentials and may install head-branch deps. - Pin qwen-code-action to a specific commit instead of @main.
- Document blocking vs advisory gate behavior explicitly: blocking gates stop the review and post a process comment; the PR stays open and the author can re-trigger with @qwen /review. - Change product-direction from advisory (flag only) to blocking (stop and ask for design rationale) since clearly off-target features should be gated before deep code review. - Add a gate-defaults table with override keys for each gate.
Add docs/design/code-review/ covering the PR review automation system this branch is building toward: - code-review-design.md (569 lines): problem statement, design principles, 4-stage workflow pipeline, Design Gate spec with PR shape generation and fail modes, Feature PR Readiness Gate, author/maintainer feedback loop with override, historical PR/issue detection, incremental cache wiring, App integration plan, testing strategy, risks. - roadmap.md (179 lines): 7-phase rollout, each phase scoped to an independent PR with acceptance criteria. - compare.md (86 lines): capability comparison vs claude-code, coderabbit, copilot review, cursor bugbot, greptile. The design treats Direction/Scope/History/Validation as workflow preflight gates separate from bundled /review, so direction issues are decided in the first 30s rather than after a full deep pass. Anchors are existing repo artifacts (roadmap.md, architecture.md, docs/design/*) and historical close comments (#3863, #3627), not new team-policy docs.
Listen to pull_request_target.synchronize so author pushes automatically retrigger review, and persist .qwen/review-cache/ via actions/cache so the bundled /review skill's incremental review path can scope subsequent runs to the new commit range instead of evaluating the full PR every time. Cache key uses PR head SHA from gh pr view --json headRefOid, not github.sha, because in pull_request_target context github.sha points at the base branch. Restore is gated on synchronize only so manual @qwen /review and workflow_dispatch keep their force-rerun semantics. Save runs on any pull_request_target event with a successful review. Implements docs/design/code-review/roadmap.md Phase 2.
…ontamination The previous restore-keys prefix `qwen-review-<pr#>-` could still match an older cache after the PR's base ref changed (Update branch from base, or base retarget). Bundled /review would then read the stale lastCommitSha and compute git diff <oldHead>..<newHead>, which after a base-merging "Update branch" includes upstream commits that the PR did not author. Embed both baseRefOid and headRefOid in the cache key. The exact key becomes qwen-review-<pr#>-<base_sha>-<head_sha> and restore-keys narrows to qwen-review-<pr#>-<base_sha>-. A base change now naturally invalidates prior caches for the same PR and forces a full review on the next synchronize. Update docs/design/code-review/code-review-design.md and roadmap.md Phase 2 spec + acceptance criteria to match. Caught by /codex:review (P2 finding).
…lication Two correctness fixes to the Phase 2 cache wiring, both caught by /codex:review. 1. Cache discriminator must be the PR's merge base, not baseRefOid. When the base ref tip has not moved between two reviews but the PR author clicks "Update branch from base", baseRefOid stays the same, the restore-keys prefix still matches the prior cache, and the bundled /review skill computes git diff <oldHead>..<newHead> across upstream commits the PR did not author. The merge base advances on Update branch, rebase, and base retarget — exactly the boundaries cache must invalidate on. Compute merge_base via gh api compare and use it in the cache key. 2. Save cache only after the review summary comment is published. bundled /review can succeed (model output captured) and yet the subsequent gh pr comment can fail (rate-limit, network, deleted PR). If save advances the cache before the comment lands, the next synchronize either short-circuits on "No new changes" or reviews only the later diff, and the unpublished findings are lost forever. Gate save on steps.post-summary.outcome == 'success' so cache advancement tracks comment delivery, not just model success. Move the save step to after the post-summary step. Update docs/design/code-review/code-review-design.md and roadmap.md Phase 2 spec + acceptance criteria to reflect both invariants.
The compare endpoint can fail to resolve fork head SHAs in the base repo's namespace, which would let `set -euo pipefail` abort the size step before the existing is_cross_repository handler has a chance to post the fork-rejection comment. Guard the merge_base computation on `is_cross_repository != "true"` and let the cross-repo branch carry an empty merge_base_sha through the rest of the step. Forks set should_review=false anyway, so they never reach the cache restore/save steps where merge_base would be needed. Caught by /codex:review (P2 finding).
…atched ref Phase 1-3 scoped branch (split from the full Phase 1-5 work). - Remove the cross-repository / fork rejection gate and all is_cross_repository / head_owner / head_repo handling. fork PRs are no longer auto-blocked. (Security note: pull_request_target still checks out trusted base code; this only removes the up-front rejection.) - merge-base resolution is now best-effort and non-fatal: fork SHAs that the compare endpoint can't resolve no longer abort the job — the run falls back to a full (non-incremental) review. - workflow_dispatch now checks out the dispatched ref instead of hardcoded main, so pre-merge dry-run actually exercises the dispatched branch (pull_request_target still pinned to main).
Trim code-review-design.md from 597→~165 lines: keep problem statement, P1/P5 principles, triggers/permissions (no fork gate), the Phase 1-3 pipeline, the real Phase 2 incremental-cache design, testing, and Phase-1-3 risks. Move Design Gate / Feature Readiness / Override / history awareness / bundled-skill changes / R1-R6 out — they ship with their own Phase 4-7 PRs. roadmap.md: keep Phase 1-3 full, compress Phase 4-7 to stubs, drop the now-removed cross-repo gate from Phase 1 scope. compare.md: add a scope note that the 目标 column is the full Phase 1-7 end state; fix the fork row to reflect base-checkout isolation instead of an explicit fork rejection gate.
…w-rules Phase 1-3 has no Design Gate; the old 'Gate Behavior' section described CI preflight mechanics (process comments, stop/re-trigger) that don't exist in this PR and would leak process-flavored findings into local /review runs. Keep this file to review content + finding severity only.
📋 Review SummaryThis PR implements Phase 1-3 of the automated code review roadmap, switching to the bundled 🔍 General Feedback
🎯 Specific Feedback🔴 Critical
🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
…current safeguard Make explicit that the OWNER/MEMBER/COLLABORATOR trigger gate is a deliberate current-phase choice (pull_request_target + secrets + long LLM deep review = denial-of-wallet surface if opened), not an oversight. External PRs remain reviewable via a maintainer `@qwen /review` comment; broader community auto-trigger with per-author rate limiting is deferred to a later phase.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
The summary comment had no provenance line. Append a footer '_Reviewed by <model> via Qwen Code /review (automated). Reply @qwen /review to re-run._' so every posted review is attributed to the configured model and visibly distinct from other CI bots.
…n testing Dry-run validation showed a full (no-cache) 9-agent deep review on an ~800-line PR running ~30min and getting killed by timeout-minutes: 30 (GitHub reports the timed-out job as 'cancelled'). Incremental reviews are far shorter. Bump to 60min and correct the design doc's Stage 2 cost estimate accordingly.
… config Root cause of the 30-min zero-output hangs in dispatch testing: the Run Qwen Code Review step used an old action SHA (a08dc886), lowercase inputs, no settings_json and no GITHUB_TOKEN, so the bundled /review skill had no shell/write tools and no gh auth and stalled with no TTY until the job timeout. Adopt the enabling config from the deployed working job on main: action @5fd6818d, uppercase OPENAI_* inputs, settings_json enabling run_shell_command/write_file + sandbox:false, and GITHUB_TOKEN env. Keep REVIEW_OPENAI_* credential isolation and the bundled `/review <url>` prompt (this PR's purpose) unchanged.
…review can run Dry-run testing root-caused the 30-min zero-output hang: the settings_json copied from main pinned coreTools to [run_shell_command, write_file]. main's ad-hoc reviewer prompt only used those two so it worked, but the bundled /review skill also needs read_file/glob/grep/task; with that allowlist they require approval, and in CI (non-interactive, no TTY) the agent waits forever for an approval that never comes (log: 'Tool read_file requires user approval but cannot execute in non-interactive mode'). Remove the coreTools restriction (expose the full tool set the skill needs) and add security.folderTrust.enabled:false + tools.approvalMode yolo so the action's --yolo is not silently downgraded to default.
…es >=22) Dry-run logs showed npm EBADENGINE: @qwen-code/qwen-code@0.15.11 requires Node >=22 but the runner had Node v20.20.2, and qwen-code-action does not set up Node. Add actions/setup-node@v4 (node 22) before the review step.
…-action The action wraps qwen in `$(qwen ...)` command substitution, buffering all stdout until exit, so a slow/stuck bundled /review was completely unobservable for the whole job (root of the multi-cycle debugging pain). Install @qwen-code/qwen-code@latest globally and invoke qwen ourselves, piping through tee so progress streams to the live job log in real time; bound it with `timeout 50m` so a stall fails fast with a clear error instead of a silent job-timeout kill. settings.json mirrors the action's (folder trust off + yolo + sandbox off).
…l review to log without posting - Replace QwenLM/qwen-code-action with a direct global npm install + qwen invocation piped through tee, so the real review streams into the job log in real time (action's $(...) capture made it opaque). - Cache ~/.npm so repeat installs are offline-fast. - Dry-run appends an explicit no-post directive: the model produces the full review to stdout (visible in the Actions log) but posts nothing to GitHub. timeout 50m fails a stall fast and visibly.
…doesn't stream) Plain `qwen | tee` never streamed: qwen is a Node CLI and Node full-buffers stdout when it's a pipe, so the live job log stayed empty until exit (same opacity as the action's $(...)). Use qwen's purpose-built --output-format stream-json --include-partial-messages so every event is written as it happens and tee shows real-time progress in the Actions log. Parse the final assistant text out of the stream for a readable summary output/artifact.
…consolidate audit) Decisive dry-run evidence: a heavy PR (#4308, run 26101338786) ran solo (no endpoint contention) yet still hit the 50m timeout — the full bundled /review (9 agents incl. Agent 7 = npm ci + whole-monorepo build/test, redundant with the repo's own CI) cannot finish heavy PRs in CI. Steer the skill via the dry-run prompt to skip Agent 7 / npm ci / build / linter and consolidate the 6a/6b/6c personas into one audit. Prompt-level only; bundled SKILL.md unchanged.
✅ Phase 1-3 dry-run validation — resultsValidated the bundled
Criteria met: npm install + Engineering notes (honest):
Phase 1-3 wiring (triggers, author gate, size gate, fork-gate removal, |
…or all runs [skip ci] A-tier: every PR review must emit a '## Validation Evidence' PRESENT/MISSING section + a verbatim advisory/comment-only line that also tells the author how to re-trigger (the bot never approves; editing the PR description alone does not re-run it in this phase). Generalize the CI-lightweight steering to comment mode too so a real comment-mode run can actually finish and post (skill SKILL.md unchanged; prompt-level only). [skip ci]
…iggers [skip ci] These two triggers only let `@qwen /review` be typed inside an inline code-review comment or a submitted review body — fully redundant with the PR conversation comment path (issue_comment). On a busy repo every inline review comment / submitted review spawned a workflow run that the job if: immediately skipped, flooding the Actions list with skipped runs. Remove the on: entries + their if: branches; tidy the now-unreachable resolve-context case branch. Auto-review (pull_request_target), `@qwen /review` PR comments (issue_comment) and workflow_dispatch are unchanged. [skip ci]
Scope: Phase 1-3 only
Splits the original full-scope work (#4067) into the maintainable first slice per
docs/design/code-review/roadmap.md.This PR delivers:
QwenLM/qwen-code-action(bundled/reviewskill), add.qwen/review-rules.md,--output-format json/--channel=CI/ size gate / fallback comment.workflow_dispatchchecks out the dispatched ref so pre-merge dry-run works.pull_request_target.synchronizetrigger + cross-run incremental review cache keyed on merge-base + head SHA; cache save gated on summary-comment publication.docs/design/code-review/{code-review-design,roadmap,compare}.md, scoped to Phase 1-3.Not in this PR (separate PRs): Design Gate (Phase 4), history awareness (Phase 5), round suppression (Phase 6), GitHub App identity (Phase 7).
No fork gate: cross-repository PRs are reviewed; the
pull_request_targetsecurity boundary is the trusted-base checkout (PR head is never run with secrets on auto-triggers).Validation
should_reviewset on all shell paths; local review pass clean.workflow_dispatchdry-run links + a self-review run added as comments below.