Comprehensive e2e testing for Component Readiness with abstracted data backends by stbenjam · Pull Request #3411 · openshift/sippy

stbenjam · 2026-04-07T15:34:27Z

Note: still a WIP but if I mark it WIP, CodeRabbit won't review. Still thinking about a lot of things in this PR, and how I might make the first iteration smaller

When refactoring Component Readiness with AI assistance, the lack of tests made it difficult to verify changes didn't break anything. Adding comprehensive e2e tests made the refactor dramatically more reliable -- the AI could detect and fix regressions as it worked.

Component Readiness was tightly coupled to BigQuery. The first step was extracting a DataProvider interface, then adding implementations behind it. A mock provider with static fixtures came first, but was replaced with a PostgreSQL provider because: 1) we'd eventually like to run CR against postgres in production, 2) postgres is already available in e2e infrastructure, and 3) a real database lets us fully test triage, regressions, and other write-path features.

The e2e suite now seeds realistic synthetic data (4800 job runs across 4 releases with deterministic pass/fail rates), launches the full API server against it, and runs 70+ tests covering component reports, test details, variants, regressions, triage workflows, and the regression tracker. This also enables future UI testing with tools like Playwright or headless Chromium against a fully functional API.

Changes:

Add pgprovider implementing the ComponentRedinessDataProvider interface
Add seed-data command generating synthetic prow job runs and test results
Rewrite e2e tests to use postgres provider exclusively (BigQuery no longer needed for e2e)
Add e2e tests for test_details, variants, regressions, columnGroupBy/ dbGroupBy, and variant cross-compare endpoints
Fix triage e2e tests to work with real seed data regressions instead of fragile cache injection (delete E2ECacheManipulator)
Fix test isolation in regressiontracker tests (scoped cleanup)
Add optional data sync test (when GCS_SA_JSON_PATH is set) to verify prow/GCS loading still works
Add server metrics for postgres query performance

Summary by CodeRabbit

New Features
- Choose component-readiness backend via --data-provider (bigquery|postgres); deterministic database seeding; graceful server shutdown on SIGINT/SIGTERM.
Configuration Updates
- Default component-readiness views moved to 4.22 with streamlined variant options; some automated CR metrics/automation toggled off.
Testing
- Expanded e2e suites; end-to-end coverage collection enabled and merged; test harness/support updated for coverage.
Chores
- Updated ignore rules to exclude env, log/out files, and coverage artifacts.

…testing When refactoring Component Readiness with AI assistance, the lack of tests made it difficult to verify changes didn't break anything. Adding comprehensive e2e tests made the refactor dramatically more reliable -- the AI could detect and fix regressions as it worked. Component Readiness was tightly coupled to BigQuery. The first step was extracting a DataProvider interface, then adding implementations behind it. A mock provider with static fixtures came first, but was replaced with a PostgreSQL provider because: 1) we'd eventually like to run CR against postgres in production, 2) postgres is already available in e2e infrastructure, and 3) a real database lets us fully test triage, regressions, and other write-path features. The e2e suite now seeds realistic synthetic data (4800 job runs across 4 releases with deterministic pass/fail rates), launches the full API server against it, and runs 70+ tests covering component reports, test details, variants, regressions, triage workflows, and the regression tracker. This also enables future UI testing with tools like Playwright or headless Chromium against a fully functional API. Changes: - Add pgprovider implementing the ComponentRedinessDataProvider interface - Add seed-data command generating synthetic prow job runs and test results - Rewrite e2e tests to use postgres provider exclusively (BigQuery no longer needed for e2e) - Add e2e tests for test_details, variants, regressions, columnGroupBy/ dbGroupBy, and variant cross-compare endpoints - Fix triage e2e tests to work with real seed data regressions instead of fragile cache injection (delete E2ECacheManipulator) - Fix test isolation in regressiontracker tests (scoped cleanup) - Add optional data sync test (when GCS_SA_JSON_PATH is set) to verify prow/GCS loading still works - Add server metrics for postgres query performance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci-robot · 2026-04-07T15:34:32Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-04-07T15:34:37Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)

do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: d0289217-e096-4a6c-844a-f8ba0e2d7cbd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a dataprovider abstraction for Component Readiness with BigQuery and Postgres implementations, migrates pipeline types from BigQuery-specific to crstatus, rewires commands/server/metrics/tests to use the provider, replaces seed/load flows with deterministic seed-data, and adds e2e coverage collection and tests.

Changes

Cohort / File(s)	Summary
Git & Config `/.gitignore`, `config/e2e-openshift.yaml`, `config/e2e-views.yaml`	Updated .gitignore to exclude env/log/coverage artifacts; bumped e2e release versions to 4.22 and extensively simplified/rewrote e2e view variant/options and windows.
CLI & Command wiring `cmd/sippy/...`	Added `--data-provider` flag; commands now construct/pass `dataprovider.DataProvider` (BigQuery or Postgres) instead of raw BigQuery clients; updated call sites, constructors, and error handling/messages.
Seed/load & e2e scripts `cmd/sippy/seed_data.go`, `cmd/sippy/load.go`, `e2e-scripts/*`, `scripts/e2e.sh`, `Dockerfile`	Introduced deterministic `seed-data` command and removed GCS credential flows; updated e2e scripts to run `seed-data`, collect/merge Go coverage, add PVC for coverage, and build/copy `sippy-cover`.
Server & metrics `cmd/sippy/serve.go`, `pkg/sippyserver/server.go`, `pkg/sippyserver/metrics/metrics.go`	Server supports selecting BigQuery/Postgres provider, initializes provider-specific clients, wires `crDataProvider` into server and metrics refresh, and gates component-readiness endpoints on provider presence.
Dataprovider interfaces `pkg/api/componentreadiness/dataprovider/interface.go`	Added DataProvider interface composing TestStatus, TestDetails, Metadata, and Job query interfaces plus Cache() and JobRunStats DTO.
BigQuery provider `pkg/api/componentreadiness/dataprovider/bigquery/*`	New `BigQueryProvider` implementing DataProvider: base/sample concurrent queries, override merging, job/variant metadata, unique-variant extraction, and unit test for include-variant override logic.
Postgres provider `pkg/api/componentreadiness/dataprovider/postgres/provider.go`	New `PostgresProvider` implementing DataProvider: SQL-based variant parsing/filtering, CTE aggregations for test status/details, release date derivation, job-run stats, and variant lookups.
Component-readiness core `pkg/api/componentreadiness/*`	Refactored reports, test-details, middleware, release-fallback, regression tracker, and related constructors to accept DataProvider; replaced `bq.` types with `crstatus.` and updated public signatures.
CR DTOs & types `pkg/apis/api/componentreport/crstatus/types.go`, `pkg/apis/api/componentreport/testdetails/types.go`	Added `crstatus` DTOs (ReportTestStatus, TestStatus, TestJobRunStatuses, TestJobRunRows, Variant) and changed JobRunStats.StartTime to `time.Time`.
Middleware & wiring `pkg/api/componentreadiness/middleware/...`	Updated middleware interfaces and implementations to use `crstatus` types; ReleaseFallback and other middleware now accept/use DataProvider where applicable.
Regression & Jira integration `pkg/api/componentreadiness/regressiontracker.go`, `pkg/componentreadiness/jiraautomator/jiraautomator.go`	RegressionTracker and JiraAutomator now depend on `dataprovider.DataProvider` and call provider-backed report generation.
Annotator & cache loader `pkg/componentreadiness/jobrunannotator/...`, `pkg/dataloader/crcacheloader/...`	Switched IncludedVariants to `crstatus.Variant`; cache loader now builds a BigQueryProvider and passes DataProvider into generators.
Query generators & deserialization `pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go`, `.../releasedates.go`	Adjusted package/name and deserialization to `crstatus` types and runtime-safe time handling.
Tests: unit & e2e `pkg/api/componentreadiness/...`, `test/e2e/...`	Updated unit tests to `crstatus` types and added BigQuery provider unit test; extensive new/updated e2e tests for component-readiness, triage, report generation, datasync; removed Redis cache manipulator; updated release constants.
Misc `pkg/bigquery/bqlabel/labels.go`, `/.gitignore`	Added `CRViewJobs` BigQuery label constant and ignore rules for coverage artifacts.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Go Error Handling	✅ Passed	PR demonstrates sound Go error handling: errors checked with if err != nil, proper wrapping, no unchecked nil dereferences, cache initialization with fallback, error propagation maintained.
Sql Injection Prevention	✅ Passed	All SQL queries use parameterized queries with proper escaping via GORM's Raw() method for PostgreSQL, BigQuery's named parameters, and ORM methods.
Excessive Css In React Should Use Styles	✅ Passed	This pull request does not modify any React components, JSX files, TypeScript files, or CSS files in the sippy-ng/ directory.
Single Responsibility And Clear Naming	✅ Passed	Pull request maintains strong adherence to single responsibility and clear naming principles with well-organized packages, descriptive struct and method names, and focused interfaces.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-04-07T15:34:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [stbenjam]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

stbenjam · 2026-04-07T15:48:07Z

/test e2e

- Fix gofmt formatting in interface.go, provider.go, types.go, releasefallback.go, report_test.go - Fix gosec G204: add #nosec for exec.Command in e2e test - Fix gosec G306: use 0o600 permissions for WriteFile - Fix gocritic typeAssertChain: rewrite to type switch - Fix gocyclo: extract seedProwJobs, seedTestsAndOwnerships, seedJobRunsAndResults, seedRunsForJob from seedSyntheticData Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pkg/api/componentreadiness/query/querygenerators.go (1)

783-793: ⚠️ Potential issue | 🔴 Critical

Handle time.Time for last_failure too.

prowjob_start (lines 1057–1062) safely handles both civil.DateTime and time.Time with type assertion, but last_failure (lines 783–793) hard-casts to civil.DateTime only. This will panic if BigQuery returns time.Time, breaking component report generation.

Suggested fix

 		case col == "last_failure":
-			// ignore when we cant parse, its usually null
-			var err error
-			if row[i] != nil {
-				layout := "2006-01-02T15:04:05"
-				lftCivilDT := row[i].(civil.DateTime)
-				cts.LastFailure, err = time.Parse(layout, lftCivilDT.String())
-				if err != nil {
-					log.WithError(err).Error("error parsing last failure time from bigquery")
-				}
-			}
+			if row[i] != nil {
+				switch v := row[i].(type) {
+				case civil.DateTime:
+					cts.LastFailure = time.Date(v.Date.Year, v.Date.Month, v.Date.Day, v.Time.Hour, v.Time.Minute, v.Time.Second, v.Time.Nanosecond, time.UTC)
+				case time.Time:
+					cts.LastFailure = v.UTC()
+				default:
+					return "", crstatus.TestStatus{}, fmt.Errorf("unexpected last_failure type %T", row[i])
+				}
+			}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 783 - 793,
The last_failure handling currently assumes row[i] is civil.DateTime and will
panic if BigQuery returns time.Time; update the col == "last_failure" branch to
mirror prowjob_start's safe type assertions: check for nil, then use a type
switch on row[i] to handle civil.DateTime (call .String() and parse with layout)
and time.Time (assign directly to cts.LastFailure), capture and log parsing
errors via log.WithError, and avoid an unconditional type cast so
cts.LastFailure is correctly set for both civil.DateTime and time.Time.

cmd/sippy/component_readiness.go (1)

202-213: ⚠️ Potential issue | 🟠 Major

Choose the CR provider once instead of hard-wiring BigQuery here.

Line 202 and Line 213 both create a BigQuery provider directly. That means this entrypoint still has no way to run Component Readiness off the new postgres provider, and the server/metrics paths use separate provider instances. Build the backend-specific provider once and pass the same object to NewServer and RefreshMetricsDB.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sippy/component_readiness.go` around lines 202 - 213, The Component
Readiness entrypoint currently constructs two separate BigQuery providers (calls
to bqprovider.NewBigQueryProvider) which prevents using a single backend
provider (e.g., Postgres) and yields different instances for server and metrics;
instead, create the CR data provider once (e.g., assign crDataProvider :=
NewProvider(...) using the chosen backend client) and pass that same
crDataProvider into NewServer(...) and RefreshMetricsDB(...) (replace the two
direct bqprovider.NewBigQueryProvider(...) calls), so both the server and
metrics refresher use the same provider instance and the code can swap in a
Postgres provider by changing the single construction site.

🧹 Nitpick comments (3)

pkg/apis/api/componentreport/crstatus/types.go (1)

56-57: Mark synthesized fields as ignored by BigQuery.

TestKey and TestKeyStr are local bookkeeping fields, not query columns. Adding bigquery:"-" keeps this shared DTO explicit and avoids relying on implicit mapper behavior.

Suggested fix

-	TestKey      crtest.KeyWithVariants `json:"test_key"`
-	TestKeyStr   string                 `json:"-"` // transient field so we dont have to keep recalculating
+	TestKey      crtest.KeyWithVariants `json:"test_key" bigquery:"-"`
+	TestKeyStr   string                 `json:"-" bigquery:"-"` // transient field so we dont have to keep recalculating

As per coding guidelines, BigQuery struct fields must have bigquery:"column_name" tags that exactly match the BigQuery table column names.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/apis/api/componentreport/crstatus/types.go` around lines 56 - 57, The
TestKey and TestKeyStr fields in the struct are synthesized/local-only and must
be excluded from BigQuery mapping; update the struct definition to add
bigquery:"-" tags to both TestKey and TestKeyStr (i.e., annotate
crtest.KeyWithVariants TestKey and string TestKeyStr with `bigquery:"-"`) so the
BigQuery mapper ignores these transient bookkeeping fields while keeping the
rest of the DTO tags unchanged.

cmd/sippy/automatejira.go (1)

173-176: Include the job-variant query errors in the returned failure.

Line 174 already gives you the backend errors, but Line 176 discards them. Returning only "failed to get job variants" will make provider rollout failures very hard to diagnose from automation logs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sippy/automatejira.go` around lines 173 - 176, The current error path in
automatejira.go drops the detailed errors returned by
componentreadiness.GetJobVariants; update the failure return to include those
errors (errs) so callers see the underlying query problems — for example, wrap
or format errs into the returned error from the block where
NewBigQueryProvider(bigQueryClient) and componentreadiness.GetJobVariants(ctx,
provider) are called, referencing the variables/provider and function names
(NewBigQueryProvider, GetJobVariants, errs) when constructing the new error
message.

pkg/componentreadiness/jiraautomator/jiraautomator.go (1)

73-77: Validate provider in the constructor since report generation now depends on it.

The constructor guards bqClient, but getComponentReportForView now relies on j.dataProvider. Adding a nil-check here gives earlier, clearer failures.

Suggested fix

 func NewJiraAutomator(
 	jiraClient *jira.Client,
 	bqClient *bqclient.Client,
 	provider dataprovider.DataProvider,
@@
 	j := JiraAutomator{
@@
 	}
+	if provider == nil {
+		return j, fmt.Errorf("we don't have a component readiness data provider for jira integrator")
+	}
 	if bqClient == nil || bqClient.BQ == nil {
 		return j, fmt.Errorf("we don't have a bigquery client for jira integrator")
 	}

Also applies to: 89-106, 155-156

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/componentreadiness/jiraautomator/jiraautomator.go` around lines 73 - 77,
The NewJiraAutomator constructor currently guards bqClient but not provider; add
a nil-check for the provider (j.dataProvider) in NewJiraAutomator and handle it
the same way bqClient is handled (log the error and return/exit consistently
with the existing constructor pattern) so getComponentReportForView never
observes a nil dataProvider; apply the same validation in the other
constructor/initialization sites referenced (around the
getComponentReportForView use and the other similar blocks) to fail fast with a
clear message.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sippy/annotatejobruns.go`:
- Around line 183-186: The code path calling componentreadiness.GetJobVariants
currently discards the underlying errors in errs; change the error return to
include those details so failures are debuggable — e.g. when len(errs) > 0
return an error that wraps or formats errs (for example fmt.Errorf("failed to
get job variants: %v", errs) or using errors.Join) while keeping the call to
componentreadiness.GetJobVariants(ctx,
bqprovider.NewBigQueryProvider(bigQueryClient)) intact.

In `@cmd/sippy/load.go`:
- Around line 321-323: The code is hard-coding a BigQuery provider when calling
componentreadiness.NewRegressionTracker; replace
bqprovider.NewBigQueryProvider(bqc) with the actual configured DataProvider
instance used by the API (the variable or factory that constructs the backend
provider) so NewRegressionTracker receives the same DataProvider backend the API
is using (ensure the replacement implements the DataProvider interface and
remove the bqprovider call from the argument list).

In `@cmd/sippy/seed_data.go`:
- Line 62: The seed-data command currently bails out if any prow_jobs row exists
(the early-return around the prow_jobs existence check) and --init-database only
runs migrations, so re-running can't refresh or repair a partial dataset; change
the logic in seed_data.go (the seed-data entry/run function that checks
prow_jobs and handles --init-database) to make seeding idempotent or support a
refresh: either remove the unconditional early return and instead verify dataset
completeness, or add a --force/--refresh flag that deletes or upserts the seeded
rows before populating; implement idempotent inserts (use upsert/ON CONFLICT or
replace semantics) for the prow_jobs and related seed routines so re-running
seedData will update/mend existing data while keeping --init-database to only
migrate schema unless --refresh is provided.
- Around line 682-688: The regression-tracker errors are being logged but
ignored after calling tracker.Load(), causing the seeding to report success even
when tracker.Errors() contains failures; change this so the function returns a
non-nil error when tracker.Errors() is non-empty: after tracker.Load() check
tracker.Errors(), log each error as now but then aggregate them (e.g., wrap or
join messages) and return that error instead of nil so callers see the failure
(refer to tracker.Load() and tracker.Errors()).
- Around line 518-565: The bug is that the global testableRuns value reduces
every test's inserted rows by 2, stealing rows when counts.total == runCount;
fix by computing assignment limits per-test instead of using the single
testableRuns variable—introduce reservedInfraRuns (2) and inside the loop over
syntheticTests use runsToAssign := min(counts.total, runCount -
reservedInfraRuns) (ensuring runCount - reservedInfraRuns is non-negative) and
iterate i < runsToAssign when inserting prow_job_run_tests; update references to
testableRuns and remove the global reduction so counts.total is honored
correctly per test.

In `@cmd/sippy/serve.go`:
- Around line 169-172: The crash is caused by s.variantManager being nil when
BigQuery is not used; update jsonJobRunRiskAnalysis to guard calls to
s.variantManager (e.g., before calling s.variantManager.IdentifyVariants(...)
check if s.variantManager == nil) and either return a clear 400/500 response or
skip variant identification logic when nil. Alternatively, ensure variantManager
is always initialized by setting variantManager =
testidentification.NewNoopVariantManager() (or similar) in the serve setup when
bigQueryClient is nil; reference the variantManager variable, GetVariantManager
function, and the jsonJobRunRiskAnalysis / s.variantManager.IdentifyVariants
call to locate where to add the nil-check or default initialization.

In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go`:
- Around line 236-250: The query builds queryString in provider.go and currently
defines successful_runs as COUNTIF which can overcount when multiple rows share
a build id; change successful_runs to dedupe by build id just like total_runs —
e.g., replace the raw COUNTIF with a COUNT(DISTINCT CASE WHEN jobs.prowjob_state
= 'success' THEN jobs.prowjob_build_id END) (or equivalent DISTINCT COUNT over
build id filtered to success) so PassRate cannot exceed 100%; update the SELECT
expression for successful_runs in the queryString construction accordingly.

In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 343-345: fetchJobVariants currently swallows query errors and
returns an empty map which makes queryTestStatus silently drop rows; change
fetchJobVariants to return (map[int64]Variant, error) (or propagate an error)
and update callers like queryTestStatus to check the error and return it instead
of proceeding with an empty map; ensure error handling paths log context and
return a non-nil error to callers so callers surface the DB failure rather than
producing an empty report (update any other call sites of fetchJobVariants
accordingly, e.g., the similar block at the 397-418 region).
- Around line 136-184: QueryReleases currently computes GADate from time.Now()
using releaseMetadata and drops PreviousRelease for names missing from
releaseMetadata; replace the moving-clock logic by using a deterministic
reference (e.g., a provider-level injected now func or a fixed referenceDate
constant on PostgresProvider instead of time.Now()) or store absolute GA dates
in releaseMetadata so GADate is stable, and ensure PreviousRelease is derived
when releaseMetadata lacks an entry by using the sorted releaseNames order (set
rel.PreviousRelease from the prior element in releaseNames). Update
QueryReleases to use the injected/reference time and fallback previous-release
logic so GADate and PreviousRelease are deterministic across calls.

In `@pkg/sippyserver/server.go`:
- Around line 412-414: The ComponentReadinessCapability is being set too
broadly; update the capability logic so routes that require BigQuery are only
advertised when s.bigQueryClient is non-nil and routes that can work with just
the CR data provider are advertised when s.crDataProvider is non-nil.
Concretely, change the single check around s.bigQueryClient || s.crDataProvider
to two checks (or introduce a separate capability): keep
ComponentReadinessCapability (or a new PostgresOnlyCapability) only when
s.crDataProvider != nil for handlers that work with the CR provider, and ensure
jsonJobRunPayload, jsonPullRequestTestResults,
jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, and
jsonTestLifecyclesFromDB are gated behind s.bigQueryClient != nil so they are
not advertised in Postgres-only mode.
- Around line 182-190: The bug is mixing release catalogs between parse and
execution when both BigQuery and CR providers exist; fix by ensuring
componentreadiness.GetComponentReport does not re-query its own releases but
uses the same releases retrieved by Server.getReleases. Concretely: change
componentreadiness.GetComponentReport to accept a releases []sippyv1.Release
(instead of calling s.crDataProvider.QueryReleases or api.GetReleases
internally), update all call sites to pass the releases returned from
Server.getReleases, and remove the internal re-query (the
QueryReleases/api.GetReleases calls) so parsing and execution always use the
identical release slice.

In `@scripts/e2e.sh`:
- Around line 101-105: The cache-priming loop silently ignores failures; update
the VIEWS assignment and loop to fail fast by checking curl and jq exit codes
and aborting on errors: when retrieving VIEWS via VIEWS=$(curl -s
"http://localhost:$SIPPY_API_PORT/api/component_readiness/views") verify curl
succeeded and that jq produced non-empty output (or use pipe with jq -e to cause
non-zero exit), and inside the loop check each curl priming call (the curl to
/api/component_readiness?view=$VIEW) for a non-zero exit and exit the script
immediately (or set -e at the top) if any call fails so the script does not
print "Cache priming complete" on error; reference the VIEWS variable, the for
VIEW in ... loop, and both curl invocations to locate where to add these checks.
- Around line 14-16: Replace the bash-only conditional using [[ in the e2e.sh
script: locate the check that references GCS_SA_JSON_PATH and change the test to
the POSIX-compatible form using [ -z "$GCS_SA_JSON_PATH" ] (ensure proper
spacing around brackets and keep the existing echo and redirection unchanged) so
the script works with /bin/sh.

In `@test/e2e/datasync/datasync_test.go`:
- Around line 22-23: The DB Count calls (e.g., the expression using
dbc.DB.Table("prow_job_runs").Count(&countBefore) and the later count) ignore
the returned gorm result and any error; change them to capture the result (res
:= dbc.DB.Table(...).Count(&countBefore)) and assert res.Error == nil (use
t.Fatalf or t.Fatalff with the error) so DB failures fail the test. For the
external "sippy load" invocation, wrap the command in a context with a timeout
(ctx, cancel := context.WithTimeout(context.Background(), <sensible duration>);
defer cancel()) and run exec.CommandContext(ctx, "sippy", "load", ...) so the
process is killed on timeout; assert the command returned no error and surface
ctx.Err()/command error via t.Fatalf so the test doesn’t hang or silently ignore
failures.

---

Outside diff comments:
In `@cmd/sippy/component_readiness.go`:
- Around line 202-213: The Component Readiness entrypoint currently constructs
two separate BigQuery providers (calls to bqprovider.NewBigQueryProvider) which
prevents using a single backend provider (e.g., Postgres) and yields different
instances for server and metrics; instead, create the CR data provider once
(e.g., assign crDataProvider := NewProvider(...) using the chosen backend
client) and pass that same crDataProvider into NewServer(...) and
RefreshMetricsDB(...) (replace the two direct
bqprovider.NewBigQueryProvider(...) calls), so both the server and metrics
refresher use the same provider instance and the code can swap in a Postgres
provider by changing the single construction site.

In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Around line 783-793: The last_failure handling currently assumes row[i] is
civil.DateTime and will panic if BigQuery returns time.Time; update the col ==
"last_failure" branch to mirror prowjob_start's safe type assertions: check for
nil, then use a type switch on row[i] to handle civil.DateTime (call .String()
and parse with layout) and time.Time (assign directly to cts.LastFailure),
capture and log parsing errors via log.WithError, and avoid an unconditional
type cast so cts.LastFailure is correctly set for both civil.DateTime and
time.Time.

---

Nitpick comments:
In `@cmd/sippy/automatejira.go`:
- Around line 173-176: The current error path in automatejira.go drops the
detailed errors returned by componentreadiness.GetJobVariants; update the
failure return to include those errors (errs) so callers see the underlying
query problems — for example, wrap or format errs into the returned error from
the block where NewBigQueryProvider(bigQueryClient) and
componentreadiness.GetJobVariants(ctx, provider) are called, referencing the
variables/provider and function names (NewBigQueryProvider, GetJobVariants,
errs) when constructing the new error message.

In `@pkg/apis/api/componentreport/crstatus/types.go`:
- Around line 56-57: The TestKey and TestKeyStr fields in the struct are
synthesized/local-only and must be excluded from BigQuery mapping; update the
struct definition to add bigquery:"-" tags to both TestKey and TestKeyStr (i.e.,
annotate crtest.KeyWithVariants TestKey and string TestKeyStr with
`bigquery:"-"`) so the BigQuery mapper ignores these transient bookkeeping
fields while keeping the rest of the DTO tags unchanged.

In `@pkg/componentreadiness/jiraautomator/jiraautomator.go`:
- Around line 73-77: The NewJiraAutomator constructor currently guards bqClient
but not provider; add a nil-check for the provider (j.dataProvider) in
NewJiraAutomator and handle it the same way bqClient is handled (log the error
and return/exit consistently with the existing constructor pattern) so
getComponentReportForView never observes a nil dataProvider; apply the same
validation in the other constructor/initialization sites referenced (around the
getComponentReportForView use and the other similar blocks) to fail fast with a
clear message.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: d1efc215-eb0c-4b95-8a47-376f0f4e165f

📥 Commits

Reviewing files that changed from the base of the PR and between 29a6ba1 and c7533ef.

📒 Files selected for processing (41)

.gitignore
cmd/sippy/annotatejobruns.go
cmd/sippy/automatejira.go
cmd/sippy/component_readiness.go
cmd/sippy/load.go
cmd/sippy/seed_data.go
cmd/sippy/serve.go
config/e2e-openshift.yaml
config/e2e-views.yaml
pkg/api/componentreadiness/component_report.go
pkg/api/componentreadiness/component_report_test.go
pkg/api/componentreadiness/dataprovider/bigquery/provider.go
pkg/api/componentreadiness/dataprovider/interface.go
pkg/api/componentreadiness/dataprovider/postgres/provider.go
pkg/api/componentreadiness/middleware/interface.go
pkg/api/componentreadiness/middleware/linkinjector/linkinjector.go
pkg/api/componentreadiness/middleware/list.go
pkg/api/componentreadiness/middleware/regressionallowances/regressionallowances.go
pkg/api/componentreadiness/middleware/regressiontracker/regressiontracker.go
pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
pkg/api/componentreadiness/middleware/releasefallback/releasefallback_test.go
pkg/api/componentreadiness/query/querygenerators.go
pkg/api/componentreadiness/regressiontracker.go
pkg/api/componentreadiness/test_details.go
pkg/api/componentreadiness/utils/utils.go
pkg/apis/api/componentreport/crstatus/types.go
pkg/apis/api/componentreport/testdetails/types.go
pkg/bigquery/bqlabel/labels.go
pkg/componentreadiness/jiraautomator/jiraautomator.go
pkg/componentreadiness/jobrunannotator/jobrunannotator.go
pkg/dataloader/crcacheloader/crcacheloader.go
pkg/sippyserver/metrics/metrics.go
pkg/sippyserver/server.go
scripts/e2e.sh
test/e2e/componentreadiness/componentreadiness_test.go
test/e2e/componentreadiness/regressiontracker/regressiontracker_test.go
test/e2e/componentreadiness/report/report_test.go
test/e2e/componentreadiness/triage/triageapi_test.go
test/e2e/datasync/datasync_test.go
test/e2e/util/cache_manipulator.go
test/e2e/util/e2erequest.go

💤 Files with no reviewable changes (1)

test/e2e/util/cache_manipulator.go

coderabbitai · 2026-04-07T16:04:22Z

cmd/sippy/annotatejobruns.go

+			allVariants, errs := componentreadiness.GetJobVariants(ctx, bqprovider.NewBigQueryProvider(bigQueryClient))
 			if len(errs) > 0 {
-				return fmt.Errorf("failed to get variants from bigquery")
+				return fmt.Errorf("failed to get job variants")
 			}


⚠️ Potential issue | 🟡 Minor

Preserve provider error details when variant loading fails.

Line 185 currently drops all underlying errors, which makes CLI failures hard to debug.

Suggested fix

allVariants, errs := componentreadiness.GetJobVariants(ctx, bqprovider.NewBigQueryProvider(bigQueryClient)) if len(errs) > 0 { - return fmt.Errorf("failed to get job variants") + msgs := make([]string, 0, len(errs)) + for _, err := range errs { + msgs = append(msgs, err.Error()) + } + return fmt.Errorf("failed to get job variants: %s", strings.Join(msgs, "; ")) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/annotatejobruns.go` around lines 183 - 186, The code path calling componentreadiness.GetJobVariants currently discards the underlying errors in errs; change the error return to include those details so failures are debuggable — e.g. when len(errs) > 0 return an error that wraps or formats errs (for example fmt.Errorf("failed to get job variants: %v", errs) or using errors.Join) while keeping the call to componentreadiness.GetJobVariants(ctx, bqprovider.NewBigQueryProvider(bigQueryClient)) intact.

[Claude] Will fix — wrapping the errors into the returned error message.

@stbenjam, sounds good! For reference, using errors.Join(errs...) (Go 1.20+) would be the most concise approach if available, otherwise the strings.Join approach in the suggestion works well.

cmd/sippy/load.go

coderabbitai · 2026-04-07T16:04:22Z

cmd/sippy/seed_data.go


-The command can be re-run as needed to add more runs, or because your old job runs 
-rolled off the 1 week window.
+The command can be re-run to refresh data.


⚠️ Potential issue | 🟠 Major

seed-data can't currently refresh an existing dataset.

Line 62 says reruns refresh data, but Lines 385-392 return as soon as any prow_jobs row exists, and --init-database only migrates schema. A partially seeded database can't be repaired by rerunning this command.

Also applies to: 385-392

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/seed_data.go` at line 62, The seed-data command currently bails out if any prow_jobs row exists (the early-return around the prow_jobs existence check) and --init-database only runs migrations, so re-running can't refresh or repair a partial dataset; change the logic in seed_data.go (the seed-data entry/run function that checks prow_jobs and handles --init-database) to make seeding idempotent or support a refresh: either remove the unconditional early return and instead verify dataset completeness, or add a --force/--refresh flag that deletes or upserts the seeded rows before populating; implement idempotent inserts (use upsert/ON CONFLICT or replace semantics) for the prow_jobs and related seed routines so re-running seedData will update/mend existing data while keeping --init-database to only migrate schema unless --refresh is provided.

[Claude] Skipping — `--init-database` drops and recreates all tables before seeding, which handles the re-run case. The early return prevents accidentally double-seeding on a running instance.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check what UpdateSchema does - does it drop tables? ast-grep --pattern 'func ($_ *DB) UpdateSchema($_) $_ { $$$ }' rg -n "UpdateSchema" --type=go -A 10

Length of output: 2161

🏁 Script executed:

#!/bin/bash # Get the full UpdateSchema implementation sed -n '68,150p' pkg/db/db.go

Length of output: 2676

@stbenjam, just to flag a factual discrepancy: UpdateSchema uses GORM's AutoMigrate, which is additive-only — it adds missing tables and columns but never drops existing data or tables. So --init-database won't clear rows; re-running seed-data --init-database on a populated database will still hit the early return at Line 387–390 and do nothing.

If the intent is that users should manually drop the database (e.g., dropdb / docker compose down -v) before re-running, that's fine — but the doc string "The command can be re-run to refresh data" would be misleading. It might be worth either correcting the doc string or adding a --force flag that truncates the seeded tables.

cmd/sippy/seed_data.go

coderabbitai · 2026-04-07T16:04:22Z

cmd/sippy/seed_data.go

+	tracker.Load()
+	if len(tracker.Errors()) > 0 {
+		for _, err := range tracker.Errors() {
+			log.WithError(err).Warn("regression tracker error")
+		}
+	}
+	return nil


⚠️ Potential issue | 🟠 Major

Don't hide regression-tracker failures.

This command now seeds regression-tracker state too. Logging tracker.Errors() and still returning nil leaves test_regressions incomplete while the overall seed reports success.

Suggested fix

tracker.Load() if len(tracker.Errors()) > 0 { for _, err := range tracker.Errors() { log.WithError(err).Warn("regression tracker error") } + return fmt.Errorf("regression tracker failed with %d error(s)", len(tracker.Errors())) } return nil

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tracker.Load()

if len(tracker.Errors()) > 0 {

for _, err := range tracker.Errors() {

log.WithError(err).Warn("regression tracker error")

}

}

return nil

tracker.Load()

if len(tracker.Errors()) > 0 {

for _, err := range tracker.Errors() {

log.WithError(err).Warn("regression tracker error")

}

return fmt.Errorf("regression tracker failed with %d error(s)", len(tracker.Errors()))

}

return nil

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/seed_data.go` around lines 682 - 688, The regression-tracker errors are being logged but ignored after calling tracker.Load(), causing the seeding to report success even when tracker.Errors() contains failures; change this so the function returns a non-nil error when tracker.Errors() is non-empty: after tracker.Load() check tracker.Errors(), log each error as now but then aggregate them (e.g., wrap or join messages) and return that error instead of nil so callers see the failure (refer to tracker.Load() and tracker.Errors()).

[Claude] Will fix — returning an error when the regression tracker reports failures.

@stbenjam, sounds good! The fix aligns with the suggestion — returning a non-nil error after logging the individual tracker errors will ensure callers see the failure and the incomplete test_regressions state is surfaced clearly.

coderabbitai · 2026-04-07T16:04:23Z

pkg/sippyserver/server.go

+func (s *Server) getReleases(ctx context.Context, forceRefresh ...bool) ([]sippyv1.Release, error) {
+	if s.bigQueryClient != nil {
+		refresh := len(forceRefresh) > 0 && forceRefresh[0]
+		return api.GetReleases(ctx, s.bigQueryClient, refresh)
+	}
+	if s.crDataProvider != nil {
+		return s.crDataProvider.QueryReleases(ctx)
+	}
+	return nil, fmt.Errorf("no data source available for releases")


⚠️ Potential issue | 🟠 Major

Don't mix release catalogs between request parsing and execution.

This helper now prefers BigQuery, but componentreadiness.GetComponentReport re-queries releases from the provider in pkg/api/componentreadiness/component_report.go Lines 88-93. When both backends are configured, one request can be parsed against BigQuery releases and executed against Postgres releases.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/sippyserver/server.go` around lines 182 - 190, The bug is mixing release catalogs between parse and execution when both BigQuery and CR providers exist; fix by ensuring componentreadiness.GetComponentReport does not re-query its own releases but uses the same releases retrieved by Server.getReleases. Concretely: change componentreadiness.GetComponentReport to accept a releases []sippyv1.Release (instead of calling s.crDataProvider.QueryReleases or api.GetReleases internally), update all call sites to pass the releases returned from Server.getReleases, and remove the internal re-query (the QueryReleases/api.GetReleases calls) so parsing and execution always use the identical release slice.

[Claude] Skipping — threading releases through the call chain is a larger refactor. Follow-up.

@stbenjam, understood — threading releases through the call chain is a non-trivial change. Would you like me to open a follow-up GitHub issue to track this so it doesn't get lost?

coderabbitai · 2026-04-07T16:04:23Z

pkg/sippyserver/server.go

+	if s.bigQueryClient != nil || s.crDataProvider != nil {
 		capabilities = append(capabilities, ComponentReadinessCapability)
 	}


⚠️ Potential issue | 🟠 Major

ComponentReadinessCapability is now too broad.

This flag also gates handlers that still require s.bigQueryClient (jsonJobRunPayload, jsonPullRequestTestResults, jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, jsonTestLifecyclesFromDB). In postgres-only mode those routes are advertised and pass capability checks, then fail at request time.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/sippyserver/server.go` around lines 412 - 414, The ComponentReadinessCapability is being set too broadly; update the capability logic so routes that require BigQuery are only advertised when s.bigQueryClient is non-nil and routes that can work with just the CR data provider are advertised when s.crDataProvider is non-nil. Concretely, change the single check around s.bigQueryClient || s.crDataProvider to two checks (or introduce a separate capability): keep ComponentReadinessCapability (or a new PostgresOnlyCapability) only when s.crDataProvider != nil for handlers that work with the CR provider, and ensure jsonJobRunPayload, jsonPullRequestTestResults, jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, and jsonTestLifecyclesFromDB are gated behind s.bigQueryClient != nil so they are not advertised in Postgres-only mode.

[Claude] Skipping — splitting ComponentReadinessCapability into provider-specific capabilities is a larger refactor best done in a follow-up.

@stbenjam, understood — acknowledged that this is a larger refactor best handled separately. Would you like me to open a follow-up GitHub issue to track splitting ComponentReadinessCapability into provider-specific capabilities so the affected endpoints (jsonJobRunPayload, jsonPullRequestTestResults, jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, jsonTestLifecyclesFromDB) are not advertised in CR-data-provider-only mode?

coderabbitai · 2026-04-07T16:04:23Z

scripts/e2e.sh

 if [[ -z "$GCS_SA_JSON_PATH" ]]; then
-    echo "Must provide path to GCS credential in GCS_SA_JSON_PATH env var" 1>&2
-    exit 1
+    echo "WARNING: GCS_SA_JSON_PATH not set, data sync test will be skipped" 1>&2
 fi


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash sed -n '1,18p' scripts/e2e.sh rg -n '\[\[' scripts/e2e.sh

Repository: openshift/sippy

Length of output: 696

Use POSIX [ here instead of [[ to match the declared shebang.

The file declares #!/bin/sh but line 14 uses [[ -z "$GCS_SA_JSON_PATH" ]], a bash-only construct. This will error on systems with dash or other POSIX shells.

Suggested fix

-if [[ -z "$GCS_SA_JSON_PATH" ]]; then +if [ -z "$GCS_SA_JSON_PATH" ]; then

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ -z "$GCS_SA_JSON_PATH" ]]; then

echo "Must provide path to GCS credential in GCS_SA_JSON_PATH env var" 1>&2

exit 1

echo "WARNING: GCS_SA_JSON_PATH not set, data sync test will be skipped" 1>&2

fi

if [ -z "$GCS_SA_JSON_PATH" ]; then

echo "WARNING: GCS_SA_JSON_PATH not set, data sync test will be skipped" 1>&2

fi

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 14-14: In POSIX sh, [[ ]] is undefined.

(SC3010)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/e2e.sh` around lines 14 - 16, Replace the bash-only conditional using [[ in the e2e.sh script: locate the check that references GCS_SA_JSON_PATH and change the test to the POSIX-compatible form using [ -z "$GCS_SA_JSON_PATH" ] (ensure proper spacing around brackets and keep the existing echo and redirection unchanged) so the script works with /bin/sh.

[Claude] Will fix — changing to POSIX `[`.

@stbenjam, sounds good! The change from [[ to [ will make the script compatible with /bin/sh (dash, etc.) as declared in the shebang. Thanks!

coderabbitai · 2026-04-07T16:04:23Z

scripts/e2e.sh

+VIEWS=$(curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness/views")
+for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do
+    echo "  Priming cache for view: $VIEW"
+    curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null
+done


⚠️ Potential issue | 🟠 Major

Fail fast when cache priming breaks.

This new path is now required for the triage e2es, but curl -s/jq failures here are ignored and the script still prints Cache priming complete. That turns a startup or view-loading error into a later flaky test failure.

Suggested fix

-VIEWS=$(curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") -for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do +VIEWS=$(curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || exit 1 +VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || exit 1 +for VIEW in $VIEW_NAMES; do echo " Priming cache for view: $VIEW" - curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null + curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || exit 1 done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

VIEWS=$(curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness/views")

for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do

echo " Priming cache for view: $VIEW"

curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null

done

VIEWS=$(curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || exit 1

VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || exit 1

for VIEW in $VIEW_NAMES; do

echo " Priming cache for view: $VIEW"

curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || exit 1

done

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/e2e.sh` around lines 101 - 105, The cache-priming loop silently ignores failures; update the VIEWS assignment and loop to fail fast by checking curl and jq exit codes and aborting on errors: when retrieving VIEWS via VIEWS=$(curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") verify curl succeeded and that jq produced non-empty output (or use pipe with jq -e to cause non-zero exit), and inside the loop check each curl priming call (the curl to /api/component_readiness?view=$VIEW) for a non-zero exit and exit the script immediately (or set -e at the top) if any call fails so the script does not print "Cache priming complete" on error; reference the VIEWS variable, the for VIEW in ... loop, and both curl invocations to locate where to add these checks.

[Claude] Will fix — adding error checks to curl and jq calls.

@stbenjam, sounds good! The suggested fix in the review comment should cover all the necessary error checks. Let me know if you need anything else.

test/e2e/datasync/datasync_test.go

CI e2e tests previously loaded real data from BigQuery/GCS via `sippy load`, which was slow and required GCS credentials. Switch to `sippy seed-data` with the postgres data provider, matching what scripts/e2e.sh already does. This makes CI e2e faster and removes the dependency on GCS credentials for the core test suite. - Update setup script to run seed-data instead of sippy load - Update test script to serve with --data-provider postgres - Remove GCS credential requirements from CI e2e scripts - Fix lint: gofmt, gosec (G204, G306), gocritic typeAssertChain, gocyclo (extract seedProwJobs, seedTestsAndOwnerships, seedJobRunsAndResults, seedRunsForJob from seedSyntheticData) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-04-07T16:36:13Z

/test e2e

- Fix POSIX compatibility: [[ → [ in e2e.sh - Add error handling to cache priming curl calls in e2e.sh - Add DB error assertions and context timeout to datasync test - Fix last_failure type assertion chain → type switch in querygenerators.go - Return error from regression tracker failures in seed_data.go - Fix misleading re-run docstring in seed_data.go - Deduplicate BQ provider creation in component_readiness.go - Fix successful_runs dedup with COUNT(DISTINCT IF(...)) in BQ provider - Include error details in GetJobVariants failure message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci-robot · 2026-04-07T17:06:26Z

Scheduling required tests:
/test e2e

stbenjam · 2026-04-07T17:06:48Z

@coderabbitai Please re-review

coderabbitai · 2026-04-07T17:08:36Z

@stbenjam Sure, I'll re-review the PR now!

✅ Actions performed

Full review triggered.

Build sippy with -cover instrumentation and collect coverage data from the running server during e2e tests. Coverage report is printed at the end of the run, and a coverage.out file is generated for HTML reports. Also adds graceful shutdown signal handling to the server so coverage data is properly flushed on exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-04-07T17:58:10Z

/hold

…ovider - Move query package into dataprovider/bigquery, removing the separate query/ directory - Remove dataSource from DataProvider interface, pushing variant junit table override logic into the BigQuery provider - Add e2e tests for new test pass rate regression, fallback test details, include variants filtering, explicit API params, and triage validation - Merge test binary coverage into e2e report so direct-call tests in report_test.go contribute to measured coverage - Add seed data for new test pass rate regression scenario (70% pass rate below 80% extreme threshold) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add coverage collection to the Prow CI e2e script: build with sippy-cover, set GOCOVERDIR, collect server + test binary coverage profiles and merge them into a single report. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci-robot · 2026-04-07T19:08:57Z

Scheduling required tests:
/test e2e

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go (1)
384-404: ⚠️ Potential issue | 🟠 Major

Don’t cache partial fallback data as success.

If one of these per-release queries fails, the goroutine only logs it and getTestFallbackReleases still returns nil errors after wg.Wait(). That means incomplete fallback data can be cached for the full fallback TTL and silently skew regression classification. Collect these goroutine errors and return them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`
around lines 384 - 404, The goroutines inside getTestFallbackReleases call
f.getTestFallbackRelease and may log failures but still allow
getTestFallbackReleases to return nil error and leave
f.CachedFallbackTestStatuses partially populated; instead, collect per-goroutine
errors (e.g. via an error channel or a sync-safe slice) while executing the
goroutines, wait for all to finish, aggregate any errors, and if any occurred
return a non-nil aggregated error and avoid persisting
f.CachedFallbackTestStatuses; only commit/update f.CachedFallbackTestStatuses
(or call f.updateTestStatuses results into the cache) and return nil error when
no goroutine reported an error. Ensure you reference the existing symbols
f.getTestFallbackRelease, f.updateTestStatuses, f.CachedFallbackTestStatuses and
the wg/context logic when implementing the error collection and aggregation.

♻️ Duplicate comments (4)

scripts/e2e.sh (1)
117-121: ⚠️ Potential issue | 🟠 Major

Still fail fast on view-parsing errors.

curl -sf is checked now, but the jq inside for VIEW in $(...) is not. In /bin/sh, a failing command substitution there does not fail the loop, so malformed or empty /views output can still print Cache priming complete and turn into a later flaky test.
Suggested fix
 VIEWS=$(curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || { echo "Failed to fetch views"; exit 1; }
-for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do
+VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || { echo "Failed to parse view list"; exit 1; }
+for VIEW in $VIEW_NAMES; do
     echo "  Priming cache for view: $VIEW"
     curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || { echo "Failed to prime cache for view: $VIEW"; exit 1; }
 done
Run this to verify the shell behavior behind the current pattern:
#!/bin/bash
sed -n '115,121p' scripts/e2e.sh
/bin/sh -c 'for VIEW in $(false); do :; done; printf "loop_status=%s\n" "$?"'
Expected result: loop_status=0, which shows why a failing parser inside for VIEW in $(...) is currently masked.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/e2e.sh` around lines 117 - 121, The loop masks failures from the jq
command substitution: ensure parsing the VIEWS JSON with jq ('jq -r ".[].name"')
is executed and its exit status is checked before iterating so parsing errors
cause an immediate failure; specifically, run the jq parse against the VIEWS
content into a checked value (or temp) and verify jq succeeded and produced
non-empty output before the for VIEW in ... loop (references: VIEWS variable,
the for VIEW loop, and the jq -r '.[].name' invocation), failing fast with an
error message if jq fails or yields no views.
cmd/sippy/serve.go (1)
169-172: ⚠️ Potential issue | 🔴 Critical

variantManager can still be nil in postgres mode.

/api/jobs/runs/risk_analysis is still reachable via LocalDBCapability, and pkg/sippyserver/server.go dereferences s.variantManager on the transient-request path. With --data-provider=postgres, this leaves a panic path unless you initialize a no-op manager here or guard that handler.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sippy/serve.go` around lines 169 - 172, variantManager can remain nil
when bigQueryClient is nil (postgres mode), causing a panic when
s.variantManager is dereferenced by the /api/jobs/runs/risk_analysis handler;
replace the nil by creating/assigning a no-op implementation of
testidentification.VariantManager here (or call an existing factory like
f.ModeFlags.GetVariantManager with a noop client) so variantManager is never
nil, or alternatively add a guard in the handler that checks s.variantManager
before dereferencing; reference symbols: variantManager,
testidentification.VariantManager, f.ModeFlags.GetVariantManager,
s.variantManager, LocalDBCapability, and /api/jobs/runs/risk_analysis.
pkg/sippyserver/server.go (1)
414-416: ⚠️ Potential issue | 🟠 Major

ComponentReadinessCapability is still too broad.

In postgres mode this advertises and enables routes like /api/job/run/payload, /api/pull_requests/test_results, /api/tests/v2/runs, /api/tests/capabilities, and /api/tests/lifecycles, but each handler then immediately fails when s.bigQueryClient == nil.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sippyserver/server.go` around lines 414 - 416, The current capability
assembly appends ComponentReadinessCapability when either s.bigQueryClient or
s.crDataProvider is set, which incorrectly advertises BigQuery-backed routes
when only Postgres/CR is available; change the logic to register separate
capabilities (e.g., keep ComponentReadinessCapability for CR only: append it
only when s.crDataProvider != nil) and add a distinct
BigQueryReadinessCapability (or similar) that is appended only when
s.bigQueryClient != nil, and ensure route registration/handler exposure for
endpoints like the handlers tied to /api/job/run/payload,
/api/pull_requests/test_results, /api/tests/v2/runs, /api/tests/capabilities,
and /api/tests/lifecycles is gated on s.bigQueryClient being non-nil (or remove
those routes when BigQueryReadinessCapability is not present).
pkg/api/componentreadiness/dataprovider/postgres/provider.go (1)
397-418: ⚠️ Potential issue | 🟠 Major

Don't mask a variant lookup failure as an empty result.

On error this logs and returns {}, and queryTestStatus then drops every aggregated row because no ProwJobID matches. Callers see an empty report instead of the DB failure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go` around lines
397 - 418, The fetchJobVariants function currently swallows DB errors by logging
and returning an empty map which causes queryTestStatus to treat the failure as
“no matches” and drop rows; change fetchJobVariants on PostgresProvider to
return (map[uint]map[string]string, error) instead of silently returning {} on
failure, propagate the error back to queryTestStatus (and any other callers),
and update the call sites to handle the error (log/return the DB error up so the
overall report fails instead of producing an empty result); specifically adjust
the code around p.dbc.DB.Raw(...) in fetchJobVariants to return nil, err on
error, and update queryTestStatus to check the error from fetchJobVariants and
handle it appropriately.

🧹 Nitpick comments (6)

pkg/apis/api/componentreport/crstatus/types.go (1)
56-57: Verify these transient fields are excluded from GORM.

TestKey and TestKeyStr are computed values, not columns. If the new PostgreSQL provider scans crstatus.TestJobRunRows with GORM, add gorm:"-" here so this shared DTO is not treated like a mapped model.

As per coding guidelines, "Structs used with GORM that have fields not backed by database columns (e.g., computed fields, custom types, API-only fields) must include the gorm:"-" tag to explicitly exclude them from GORM operations."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apis/api/componentreport/crstatus/types.go` around lines 56 - 57, TestKey
and TestKeyStr in the crstatus DTO are transient computed fields and must be
excluded from GORM; update the struct that defines TestKey (type
crtest.KeyWithVariants) and TestKeyStr (string) by adding the struct tag
gorm:"-" to each field so GORM will ignore them during scanning/migrations
(ensure tags are added on the fields named TestKey and TestKeyStr in the
TestJobRunRows/crstatus type).
test/e2e/componentreadiness/componentreadiness_test.go (1)
220-224: Consider asserting the specific error type for the 404 case.

The test verifies an error is returned but doesn't confirm it's specifically a 404/not-found response. This could mask issues where the endpoint returns a different error type.
// If util.SippyGet returns HTTP status codes in the error, consider:
// assert.Contains(t, err.Error(), "404") or similar
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/componentreadiness/componentreadiness_test.go` around lines 220 -
224, The test TestRegressionByIDNotFound currently only checks that an error is
returned; update it to assert the error represents a 404/not-found response by
checking the error content or type from util.SippyGet (e.g., assert that
err.Error() contains "404" or use errors.Is()/errors.As() if util.SippyGet
returns a typed HTTP error). Locate TestRegressionByIDNotFound and modify the
require/assert after calling util.SippyGet to verify the specific 404 condition
while still keeping the existing models.TestRegression variable and error check.
pkg/api/componentreadiness/dataprovider/bigquery/provider_test.go (1)
96-104: Consider using consistent assertion style.

The test mixes assert.Equal (Line 99) with reflect.DeepEqual + t.Errorf (Lines 100-102). Using assert.Equal consistently would provide better error messages and simplify the code.
♻️ Suggested fix for consistency
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
 			result, skipQuery := copyIncludeVariantsAndRemoveOverrides(tt.overrides, tt.currOverride, tt.includeVariants)
 			assert.Equal(t, tt.expectedSkipQuery, skipQuery)
-			if !reflect.DeepEqual(result, tt.expected) {
-				t.Errorf("expected %v, got %v", tt.expected, result)
-			}
+			assert.Equal(t, tt.expected, result)
 		})
 	}
This would also allow removing the reflect import.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/dataprovider/bigquery/provider_test.go` around
lines 96 - 104, Replace the manual reflect.DeepEqual + t.Errorf check with the
same testify assertion used above: call assert.Equal(t, tt.expected, result) and
keep the existing assert for skipQuery; update the test loop inside Test to use
assert.Equal for both comparisons (result and skipQuery) referencing the
function copyIncludeVariantsAndRemoveOverrides, and then remove the now-unused
reflect import from the file.
test/e2e/datasync/datasync_test.go (1)
45-50: Assert that the sync changed the dataset.

Lines 45-50 only prove the command exited successfully and the count query worked. A no-op sippy load still passes here, so this test does not actually verify prow/GCS ingestion. Consider asserting a positive delta, or scoping the count to rows created during this run if re-runs share the same DB.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/datasync/datasync_test.go` around lines 45 - 50, The test currently
only checks cmd.Run() succeeded and that Table("prow_job_runs").Count returned,
but doesn't assert the dataset changed; update the test around
cmd.Run()/countBefore to compute and assert a positive delta on prow_job_runs:
after running cmd.Run() call dbc.DB.Table("prow_job_runs").Count(&countAfter)
and require.Greater(t, countAfter, countBefore) (or alternatively scope the
count to rows created during this run by filtering on a run-specific marker or a
created_at timestamp range via dbc.DB.Where(...).Count(&countAfter) and assert
the delta > 0) so the test fails if sippy load is a no-op.
test/e2e/componentreadiness/report/report_test.go (1)
20-49: Build these request options from the shared e2e view.

This is a second copy of the release windows, variant grouping, and thresholds already defined in config/e2e-views.yaml / test/e2e/util. If that view changes again, this suite can still pass while exercising a different configuration than the server. Prefer a shared helper or parse the e2e view definition here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/componentreadiness/report/report_test.go` around lines 20 - 49, The
test duplicates release windows, variant grouping and thresholds instead of
using the shared e2e view; in setupProvider replace the hard-coded reqOptions
construction (the reqopts.RequestOptions assigned to reqOptions) with a call to
the shared helper in test/e2e/util that builds RequestOptions from the
config/e2e-views.yaml (or parse that YAML into the same reqopts.Release,
reqopts.Variants and reqopts.Advanced values), so setupProvider uses the common
view definition rather than duplicating values.
pkg/api/componentreadiness/test_details.go (1)
199-202: Avoid the unconditional release-date lookup.

These dates are only used when BaseOverrideRelease is set, but every details report pays the query cost here. The multi-test priming path repeats the same provider call once per test. Move it into the override branch, or fetch it once in the caller and thread it through.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/test_details.go` around lines 199 - 202, The
unconditional call to c.dataProvider.QueryReleaseDates with c.ReqOptions causes
unnecessary queries for every details report; change the logic so
QueryReleaseDates is only invoked when BaseOverrideRelease is set (i.e., wrap
the call in the branch that checks c.BaseOverrideRelease) or, for the multi-test
priming path, move the QueryReleaseDates call up to the caller and thread the
resulting timeRanges into the details reporting path (add an extra parameter or
field used by the function that builds testdetails.Report). Ensure you reference
QueryReleaseDates, BaseOverrideRelease, c.ReqOptions and the function that
returns testdetails.Report when making the change so the lookup is performed
only when needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sippy/automatejira.go`:
- Around line 175-177: The return in automatejira.go discards the underlying
errors; update the failure to include the error details from the errs slice
(e.g., change the return in the block checking len(errs) to include errs).
Specifically, modify the code that currently does return fmt.Errorf("failed to
get job variants") to include the errors (for example: return fmt.Errorf("failed
to get job variants: %v", errs) or combine errs with errors.Join and wrap it),
referencing the errs variable in automatejira.go so callers/logs surface the
underlying causes (consistent with annotatejobruns.go's approach).

In `@cmd/sippy/seed_data.go`:
- Around line 64-67: The current RunE check only rejects DSNs containing
"amazonaws.com" and can still write to other remote/prod DBs; update the seed
command to require explicit opt-in and/or a strict allowlist: parse the DSN in
RunE (use f.DBFlags.DSN) to extract the host, reject unless either a new boolean
flag (e.g., --confirm-seed or --force-seed) is passed or the host is in a
white-list of safe hosts (localhost, 127.0.0.1, ::1, UNIX socket, and any e2e
test host you accept); ensure the logic lives inside the existing RunE handler
and uses the parsed host (not substring matching) so non-local endpoints are
refused by default.
- Around line 669-676: The UPDATE that backfills prow_job_runs.test_failures
currently logs a warning on failure, leaving partially-initialized state; make
this failure fatal by propagating the error instead of warning: replace the
log.WithError(err).Warn("failed to update test_failures count") with either
returning the error from the surrounding function (so callers can abort/handle)
or calling log.WithError(err).Fatal(...) to exit immediately. Locate the Exec
call on dbc.DB.Exec that updates prow_job_runs (references: prow_job_runs,
prow_job_run_tests, prowJob.ID) and change the error handling to abort on error
rather than continuing.

In `@cmd/sippy/serve.go`:
- Around line 138-149: The GCS client initialization is incorrectly nested under
the --data-provider=bigquery path causing gcsClient to remain nil for other
providers; move the NewGCSClient call out of the BigQuery-only branch so
gcsClient is initialized whenever GoogleCloudFlags.ServiceAccountCredentialFile
or OAuthClientCredentialFile are present. Concretely, ensure the call to
gcs.NewGCSClient(context.TODO(),
f.GoogleCloudFlags.ServiceAccountCredentialFile,
f.GoogleCloudFlags.OAuthClientCredentialFile) and its error handling (the
log.WithError(...).Warn(...)) run unconditionally (or in a shared initialization
section) prior to selecting the data provider (before the switch/case that sets
crDataProvider and pgprovider.NewPostgresProvider), so gcsClient is available to
pkg/sippyserver/server.go handlers like jsonJobRunIntervals, jsonJobRunEvents,
artifact queries and job summary code paths.

In `@e2e-scripts/sippy-e2e-sippy-e2e-setup-commands.sh`:
- Around line 246-251: The commands in the sippy seed job block use unquoted
shell variables which can cause word-splitting/globbing; update usages to quote
variables like "${KUBECTL_CMD}", "${SIPPY_LOAD_TIMEOUT}", "$retVal", and
"${ARTIFACT_DIR}" and quote the command substitution for job_pod
(job_pod="$(${KUBECTL_CMD} -n sippy-e2e get pod
--selector=job-name=sippy-seed-job
--output=jsonpath='{.items[0].metadata.name}')") and use "${job_pod}" when
passing to kubectl logs so all variables are safely expanded (refer to
KUBECTL_CMD, SIPPY_LOAD_TIMEOUT, retVal, job_pod, ARTIFACT_DIR, sippy-seed-job).

In `@pkg/api/componentreadiness/component_report.go`:
- Around line 129-145: Currently worstStatus is seeded from
report.Rows[ri].Columns[ci].Status before calling c.middlewares.PostAnalysis,
preventing PostAnalysis from ever raising (improving) the cell status; instead,
after running PostAnalysis for each entry in col.RegressedTests (using
c.middlewares.PostAnalysis with the constructed testKey), initialize worstStatus
from the first regressed test's TestComparison.ReportStatus (or start with a
very permissive max value) and then update it by comparing each regressed test's
TestComparison.ReportStatus (newStatus) to compute the worst (most severe)
status, finally assigning that computed worstStatus back to
report.Rows[ri].Columns[ci].Status.

In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go`:
- Around line 378-417: QueryJobRuns currently builds joinVariants by iterating
allJobVariants.Variants which adds LEFT JOINs for every known variant; restrict
it to only the variants actually used by the query by computing a set containing
"Release" plus the keys of reqOptions.VariantOption.IncludeVariants and then
iterate that set instead of allJobVariants.Variants. Update the joinVariants
construction (the loop that uses param.Cleanse and appends to joinVariants) to
only loop over this neededVariants set, keep using param.Cleanse for param
names, and leave the variantFilters/params logic unchanged so only required
JOINs are emitted.

In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 672-684: The Postgres query in the
p.dbc.DB.Raw(...).Scan(&rows).Error call returns all prow_jobs for a release but
must match the BigQuery backend filter (only jobs with names starting with
"periodic-", "release-", or "aggregator-"). Update the SQL in provider.go (the
Raw(...) invocation that populates rows) to include an additional WHERE clause
restricting pj.name to those prefixes (e.g., pj.name LIKE 'periodic-%' OR
pj.name LIKE 'release-%' OR pj.name LIKE 'aggregator-%') so Postgres results
remain in parity with the BigQuery provider.
- Around line 427-470: Both QueryBaseTestStatus and QuerySampleTestStatus are
ignoring the incoming context (declared as _ ) and thus not propagating
cancellation/timeouts to DB work; update the function signatures to use the ctx
parameter (replace _ with ctx) and pass that ctx into the p.queryTestStatus call
so the context flows down, i.e., in QueryBaseTestStatus and
QuerySampleTestStatus use the provided context variable and change the call to
p.queryTestStatus(..., ctx) or the existing queryTestStatus signature variant
that accepts context (or update queryTestStatus to accept context if needed) so
cancellations/timeouts are honored by the PostgresProvider.

In `@pkg/componentreadiness/jiraautomator/jiraautomator.go`:
- Around line 74-75: The constructor for JiraAutomator currently requires a
BigQuery client even though report building uses j.dataProvider; update the
NewJiraAutomator constructor (or the JiraAutomator constructor function) to stop
hard-failing when bqClient or its cache is nil: make the bqClient parameter
optional (allow nil) or remove it from the mandatory parameter list, validate
that provider (dataprovider.DataProvider) is non-nil instead, and set j.bqClient
only if provided. Also move any strict checks for bqClient into the specific
variant-mapping/reporting code paths that actually need BigQuery (check
j.bqClient and return a clear error where used), and remove the global hard
exit/validation that currently references j.bqClient in the constructor and in
the code around lines 87-98 and 151-152.

In `@pkg/sippyserver/metrics/metrics.go`:
- Around line 118-122: In RefreshMetricsDB, guard against a nil crProvider
before invoking its methods (e.g., crProvider.QueryReleases and later uses
around the block referenced at lines ~168-170); check if crProvider == nil early
in the function and skip component-readiness-specific calls while still allowing
the rest of RefreshMetricsDB to run, or return a controlled error if that is the
intended behavior, and ensure any later references to crProvider are similarly
protected (use the function name RefreshMetricsDB and the variable crProvider to
locate and update the checks).

In `@test/e2e/componentreadiness/triage/triageapi_test.go`:
- Around line 957-968: getSeedDataRegressions currently returns seedRegressions
in API order which is unstable; make the selection deterministic by sorting the
slice before returning (e.g., sort.SliceStable(seedRegressions, func(i,j int)
bool { return seedRegressions[i].TestID < seedRegressions[j].TestID })) or
alternatively return a map keyed by TestID so downstream subtests can pick
specific known seeded IDs; update getSeedDataRegressions to perform the sort (or
change its return type to map[string]models.TestRegression) and adjust callers
accordingly.

---

Outside diff comments:
In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`:
- Around line 384-404: The goroutines inside getTestFallbackReleases call
f.getTestFallbackRelease and may log failures but still allow
getTestFallbackReleases to return nil error and leave
f.CachedFallbackTestStatuses partially populated; instead, collect per-goroutine
errors (e.g. via an error channel or a sync-safe slice) while executing the
goroutines, wait for all to finish, aggregate any errors, and if any occurred
return a non-nil aggregated error and avoid persisting
f.CachedFallbackTestStatuses; only commit/update f.CachedFallbackTestStatuses
(or call f.updateTestStatuses results into the cache) and return nil error when
no goroutine reported an error. Ensure you reference the existing symbols
f.getTestFallbackRelease, f.updateTestStatuses, f.CachedFallbackTestStatuses and
the wg/context logic when implementing the error collection and aggregation.

---

Duplicate comments:
In `@cmd/sippy/serve.go`:
- Around line 169-172: variantManager can remain nil when bigQueryClient is nil
(postgres mode), causing a panic when s.variantManager is dereferenced by the
/api/jobs/runs/risk_analysis handler; replace the nil by creating/assigning a
no-op implementation of testidentification.VariantManager here (or call an
existing factory like f.ModeFlags.GetVariantManager with a noop client) so
variantManager is never nil, or alternatively add a guard in the handler that
checks s.variantManager before dereferencing; reference symbols: variantManager,
testidentification.VariantManager, f.ModeFlags.GetVariantManager,
s.variantManager, LocalDBCapability, and /api/jobs/runs/risk_analysis.

In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 397-418: The fetchJobVariants function currently swallows DB
errors by logging and returning an empty map which causes queryTestStatus to
treat the failure as “no matches” and drop rows; change fetchJobVariants on
PostgresProvider to return (map[uint]map[string]string, error) instead of
silently returning {} on failure, propagate the error back to queryTestStatus
(and any other callers), and update the call sites to handle the error
(log/return the DB error up so the overall report fails instead of producing an
empty result); specifically adjust the code around p.dbc.DB.Raw(...) in
fetchJobVariants to return nil, err on error, and update queryTestStatus to
check the error from fetchJobVariants and handle it appropriately.

In `@pkg/sippyserver/server.go`:
- Around line 414-416: The current capability assembly appends
ComponentReadinessCapability when either s.bigQueryClient or s.crDataProvider is
set, which incorrectly advertises BigQuery-backed routes when only Postgres/CR
is available; change the logic to register separate capabilities (e.g., keep
ComponentReadinessCapability for CR only: append it only when s.crDataProvider
!= nil) and add a distinct BigQueryReadinessCapability (or similar) that is
appended only when s.bigQueryClient != nil, and ensure route
registration/handler exposure for endpoints like the handlers tied to
/api/job/run/payload, /api/pull_requests/test_results, /api/tests/v2/runs,
/api/tests/capabilities, and /api/tests/lifecycles is gated on s.bigQueryClient
being non-nil (or remove those routes when BigQueryReadinessCapability is not
present).

In `@scripts/e2e.sh`:
- Around line 117-121: The loop masks failures from the jq command substitution:
ensure parsing the VIEWS JSON with jq ('jq -r ".[].name"') is executed and its
exit status is checked before iterating so parsing errors cause an immediate
failure; specifically, run the jq parse against the VIEWS content into a checked
value (or temp) and verify jq succeeded and produced non-empty output before the
for VIEW in ... loop (references: VIEWS variable, the for VIEW loop, and the jq
-r '.[].name' invocation), failing fast with an error message if jq fails or
yields no views.

---

Nitpick comments:
In `@pkg/api/componentreadiness/dataprovider/bigquery/provider_test.go`:
- Around line 96-104: Replace the manual reflect.DeepEqual + t.Errorf check with
the same testify assertion used above: call assert.Equal(t, tt.expected, result)
and keep the existing assert for skipQuery; update the test loop inside Test to
use assert.Equal for both comparisons (result and skipQuery) referencing the
function copyIncludeVariantsAndRemoveOverrides, and then remove the now-unused
reflect import from the file.

In `@pkg/api/componentreadiness/test_details.go`:
- Around line 199-202: The unconditional call to
c.dataProvider.QueryReleaseDates with c.ReqOptions causes unnecessary queries
for every details report; change the logic so QueryReleaseDates is only invoked
when BaseOverrideRelease is set (i.e., wrap the call in the branch that checks
c.BaseOverrideRelease) or, for the multi-test priming path, move the
QueryReleaseDates call up to the caller and thread the resulting timeRanges into
the details reporting path (add an extra parameter or field used by the function
that builds testdetails.Report). Ensure you reference QueryReleaseDates,
BaseOverrideRelease, c.ReqOptions and the function that returns
testdetails.Report when making the change so the lookup is performed only when
needed.

In `@pkg/apis/api/componentreport/crstatus/types.go`:
- Around line 56-57: TestKey and TestKeyStr in the crstatus DTO are transient
computed fields and must be excluded from GORM; update the struct that defines
TestKey (type crtest.KeyWithVariants) and TestKeyStr (string) by adding the
struct tag gorm:"-" to each field so GORM will ignore them during
scanning/migrations (ensure tags are added on the fields named TestKey and
TestKeyStr in the TestJobRunRows/crstatus type).

In `@test/e2e/componentreadiness/componentreadiness_test.go`:
- Around line 220-224: The test TestRegressionByIDNotFound currently only checks
that an error is returned; update it to assert the error represents a
404/not-found response by checking the error content or type from util.SippyGet
(e.g., assert that err.Error() contains "404" or use errors.Is()/errors.As() if
util.SippyGet returns a typed HTTP error). Locate TestRegressionByIDNotFound and
modify the require/assert after calling util.SippyGet to verify the specific 404
condition while still keeping the existing models.TestRegression variable and
error check.

In `@test/e2e/componentreadiness/report/report_test.go`:
- Around line 20-49: The test duplicates release windows, variant grouping and
thresholds instead of using the shared e2e view; in setupProvider replace the
hard-coded reqOptions construction (the reqopts.RequestOptions assigned to
reqOptions) with a call to the shared helper in test/e2e/util that builds
RequestOptions from the config/e2e-views.yaml (or parse that YAML into the same
reqopts.Release, reqopts.Variants and reqopts.Advanced values), so setupProvider
uses the common view definition rather than duplicating values.

In `@test/e2e/datasync/datasync_test.go`:
- Around line 45-50: The test currently only checks cmd.Run() succeeded and that
Table("prow_job_runs").Count returned, but doesn't assert the dataset changed;
update the test around cmd.Run()/countBefore to compute and assert a positive
delta on prow_job_runs: after running cmd.Run() call
dbc.DB.Table("prow_job_runs").Count(&countAfter) and require.Greater(t,
countAfter, countBefore) (or alternatively scope the count to rows created
during this run by filtering on a run-specific marker or a created_at timestamp
range via dbc.DB.Where(...).Count(&countAfter) and assert the delta > 0) so the
test fails if sippy load is a no-op.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 047d1cbf-6556-43c7-8877-5f217b0f5940

📥 Commits

Reviewing files that changed from the base of the PR and between 29a6ba1 and e821604.

📒 Files selected for processing (46)

.gitignore
cmd/sippy/annotatejobruns.go
cmd/sippy/automatejira.go
cmd/sippy/component_readiness.go
cmd/sippy/load.go
cmd/sippy/seed_data.go
cmd/sippy/serve.go
config/e2e-openshift.yaml
config/e2e-views.yaml
e2e-scripts/sippy-e2e-sippy-e2e-setup-commands.sh
e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh
pkg/api/componentreadiness/component_report.go
pkg/api/componentreadiness/component_report_test.go
pkg/api/componentreadiness/dataprovider/bigquery/provider.go
pkg/api/componentreadiness/dataprovider/bigquery/provider_test.go
pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go
pkg/api/componentreadiness/dataprovider/bigquery/querygenerators_test.go
pkg/api/componentreadiness/dataprovider/bigquery/releasedates.go
pkg/api/componentreadiness/dataprovider/interface.go
pkg/api/componentreadiness/dataprovider/postgres/provider.go
pkg/api/componentreadiness/middleware/interface.go
pkg/api/componentreadiness/middleware/linkinjector/linkinjector.go
pkg/api/componentreadiness/middleware/list.go
pkg/api/componentreadiness/middleware/regressionallowances/regressionallowances.go
pkg/api/componentreadiness/middleware/regressiontracker/regressiontracker.go
pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
pkg/api/componentreadiness/middleware/releasefallback/releasefallback_test.go
pkg/api/componentreadiness/regressiontracker.go
pkg/api/componentreadiness/test_details.go
pkg/api/componentreadiness/utils/utils.go
pkg/apis/api/componentreport/crstatus/types.go
pkg/apis/api/componentreport/testdetails/types.go
pkg/bigquery/bqlabel/labels.go
pkg/componentreadiness/jiraautomator/jiraautomator.go
pkg/componentreadiness/jobrunannotator/jobrunannotator.go
pkg/dataloader/crcacheloader/crcacheloader.go
pkg/sippyserver/metrics/metrics.go
pkg/sippyserver/server.go
scripts/e2e.sh
test/e2e/componentreadiness/componentreadiness_test.go
test/e2e/componentreadiness/regressiontracker/regressiontracker_test.go
test/e2e/componentreadiness/report/report_test.go
test/e2e/componentreadiness/triage/triageapi_test.go
test/e2e/datasync/datasync_test.go
test/e2e/util/cache_manipulator.go
test/e2e/util/e2erequest.go

💤 Files with no reviewable changes (1)

test/e2e/util/cache_manipulator.go

cmd/sippy/automatejira.go

coderabbitai · 2026-04-07T19:09:29Z

cmd/sippy/seed_data.go

 		RunE: func(cmd *cobra.Command, args []string) error {
+			if strings.Contains(f.DBFlags.DSN, "amazonaws.com") {
+				return fmt.Errorf("refusing to seed synthetic data into a production database")
+			}


⚠️ Potential issue | 🟠 Major

This production-DB guard still fails open.

Line 65 only blocks DSNs containing amazonaws.com. Any other remote/prod Postgres endpoint—or a connection target assembled outside f.DBFlags.DSN—will still accept destructive synthetic writes. For this command, prefer an explicit opt-in flag or a strict allowlist of local/e2e hosts.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/seed_data.go` around lines 64 - 67, The current RunE check only rejects DSNs containing "amazonaws.com" and can still write to other remote/prod DBs; update the seed command to require explicit opt-in and/or a strict allowlist: parse the DSN in RunE (use f.DBFlags.DSN) to extract the host, reject unless either a new boolean flag (e.g., --confirm-seed or --force-seed) is passed or the host is in a white-list of safe hosts (localhost, 127.0.0.1, ::1, UNIX socket, and any e2e test host you accept); ensure the logic lives inside the existing RunE handler and uses the parsed host (not substring matching) so non-local endpoints are refused by default.

cmd/sippy/seed_data.go

cmd/sippy/serve.go

coderabbitai · 2026-04-07T19:09:29Z

e2e-scripts/sippy-e2e-sippy-e2e-setup-commands.sh

+${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout ${SIPPY_LOAD_TIMEOUT}
 retVal=$?
 set -e

-job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-load-job --output=jsonpath='{.items[0].metadata.name}')
-${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-load.log
+job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-seed-job --output=jsonpath='{.items[0].metadata.name}')
+${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-seed.log


⚠️ Potential issue | 🟡 Minor

Quote variables to prevent word splitting.

Shell variables should be double-quoted to prevent globbing and word splitting issues.

🔧 Suggested fix

-${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout ${SIPPY_LOAD_TIMEOUT} +${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout "${SIPPY_LOAD_TIMEOUT}"

-${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-seed.log +${KUBECTL_CMD} -n sippy-e2e logs "${job_pod}" > "${ARTIFACT_DIR}/sippy-seed.log"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout ${SIPPY_LOAD_TIMEOUT}

retVal=$?

set -e

job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-load-job --output=jsonpath='{.items[0].metadata.name}')

${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-load.log

job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-seed-job --output=jsonpath='{.items[0].metadata.name}')

${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-seed.log

${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout "${SIPPY_LOAD_TIMEOUT}"

retVal=$?

set -e

job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-seed-job --output=jsonpath='{.items[0].metadata.name}')

${KUBECTL_CMD} -n sippy-e2e logs "${job_pod}" > "${ARTIFACT_DIR}/sippy-seed.log"

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 246-246: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 251-251: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 251-251: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@e2e-scripts/sippy-e2e-sippy-e2e-setup-commands.sh` around lines 246 - 251, The commands in the sippy seed job block use unquoted shell variables which can cause word-splitting/globbing; update usages to quote variables like "${KUBECTL_CMD}", "${SIPPY_LOAD_TIMEOUT}", "$retVal", and "${ARTIFACT_DIR}" and quote the command substitution for job_pod (job_pod="$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-seed-job --output=jsonpath='{.items[0].metadata.name}')") and use "${job_pod}" when passing to kubectl logs so all variables are safely expanded (refer to KUBECTL_CMD, SIPPY_LOAD_TIMEOUT, retVal, job_pod, ARTIFACT_DIR, sippy-seed-job).

pkg/api/componentreadiness/dataprovider/postgres/provider.go

coderabbitai · 2026-04-07T19:09:29Z

pkg/componentreadiness/jiraautomator/jiraautomator.go

+	provider dataprovider.DataProvider,
 	dbc *db.DB,


⚠️ Potential issue | 🟠 Major

Don't keep BigQuery as a mandatory constructor dependency here.

Line 151 now builds reports through j.dataProvider, but the constructor still hard-fails when bqClient or its cache is missing. That keeps Postgres-backed Jira automation from starting even though this type no longer uses j.bqClient elsewhere in the file. Validate provider here instead, and make BigQuery optional or move that requirement to the variant-mapping path.

Also applies to: 87-98, 151-152

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/componentreadiness/jiraautomator/jiraautomator.go` around lines 74 - 75, The constructor for JiraAutomator currently requires a BigQuery client even though report building uses j.dataProvider; update the NewJiraAutomator constructor (or the JiraAutomator constructor function) to stop hard-failing when bqClient or its cache is nil: make the bqClient parameter optional (allow nil) or remove it from the mandatory parameter list, validate that provider (dataprovider.DataProvider) is non-nil instead, and set j.bqClient only if provided. Also move any strict checks for bqClient into the specific variant-mapping/reporting code paths that actually need BigQuery (check j.bqClient and return a clear error where used), and remove the global hard exit/validation that currently references j.bqClient in the constructor and in the code around lines 87-98 and 151-152.

coderabbitai · 2026-04-07T19:09:29Z

pkg/sippyserver/metrics/metrics.go

+func RefreshMetricsDB(ctx context.Context, dbc *db.DB, bqc *bqclient.Client, crProvider dataprovider.DataProvider, reportEnd time.Time, cacheOptions cache.RequestOptions, views []crview.View) error {
 	start := time.Now()
 	log.Info("beginning refresh metrics")
-	releases, err := api.GetReleases(context.Background(), bqc, false)
+
+	releases, err := crProvider.QueryReleases(ctx)


⚠️ Potential issue | 🔴 Critical

Guard crProvider before using it.

Line 122 dereferences crProvider before the later nil check, so configurations that intentionally skip component-readiness metrics will panic here instead of refreshing the other metric families.

Suggested fix

- releases, err := crProvider.QueryReleases(ctx) - if err != nil { - return err - } + var releases []v1.Release + var err error + switch { + case crProvider != nil: + releases, err = crProvider.QueryReleases(ctx) + case bqc != nil: + releases, err = api.GetReleases(ctx, bqc, false) + default: + releases = nil + } + if err != nil { + return err + }

Also applies to: 168-170

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/sippyserver/metrics/metrics.go` around lines 118 - 122, In RefreshMetricsDB, guard against a nil crProvider before invoking its methods (e.g., crProvider.QueryReleases and later uses around the block referenced at lines ~168-170); check if crProvider == nil early in the function and skip component-readiness-specific calls while still allowing the rest of RefreshMetricsDB to run, or return a controlled error if that is the intended behavior, and ensure any later references to crProvider are similarly protected (use the function name RefreshMetricsDB and the variable crProvider to locate and update the checks).

test/e2e/componentreadiness/triage/triageapi_test.go

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

pkg/sippyserver/server.go (2)
182-193: ⚠️ Potential issue | 🟠 Major

Keep request parsing and execution on the same release catalog.

Line 1015 parses the request against s.getReleases(req.Context()), but Line 1030 delegates to componentreadiness.GetComponentReport, which still re-fetches releases via provider.QueryReleases(...) in pkg/api/componentreadiness/component_report.go. When both backends are configured, one request can be validated against BigQuery releases and executed against provider releases. Pass the same allReleases slice through report generation instead of re-querying it.

As per coding guidelines, "Avoid calling the same utility function multiple times with identical arguments in the same code path."

Also applies to: 1015-1036
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sippyserver/server.go` around lines 182 - 193, The request is validated
against s.getReleases(...) but componentreadiness.GetComponentReport re-queries
the provider, causing inconsistent catalogs; change GetComponentReport to accept
a pre-fetched []sippyv1.Release (e.g., add a parameter allReleases) and update
the call site that currently calls componentreadiness.GetComponentReport(...) to
pass the allReleases slice returned by s.getReleases(ctx), then remove or bypass
the provider.QueryReleases(...) call inside
pkg/api/componentreadiness/component_report.go so the report generation uses the
supplied allReleases instead of re-fetching.
414-416: ⚠️ Potential issue | 🟠 Major

Split ComponentReadinessCapability by backend dependency.

Lines 414-416 still enable one shared capability when either backend exists, but these routes no longer share the same dependency. In postgres-only mode, /api/job/run/payload, /api/pull_requests/test_results, /api/tests/v2/runs, /api/tests/capabilities, and /api/tests/lifecycles pass capability checks and then 400 because s.bigQueryClient is nil; in BigQuery-only mode, /api/job_variants and the component-readiness report handlers now fail the same way because s.crDataProvider is nil. Gate routes on the concrete backend they use, or split this capability.

Also applies to: 2304-2308, 2443-2446, 2456-2463, 2534-2561, 2620-2635
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sippyserver/server.go` around lines 414 - 416, The code currently appends
a single ComponentReadinessCapability when either s.bigQueryClient or
s.crDataProvider exists, which wrongly grants access to routes that require a
specific backend; change this to gate capabilities by backend: add two distinct
capabilities (e.g., ComponentReadinessBigQueryCapability and
ComponentReadinessCRDataProviderCapability) or append capabilities conditionally
per backend by checking s.bigQueryClient and s.crDataProvider separately, and
then register each route/handler to require the concrete capability that matches
the backend it uses (routes referencing s.bigQueryClient should require the
BigQuery capability; routes referencing s.crDataProvider should require the CR
data provider capability).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh`:
- Around line 172-177: The script currently checks for covcounters.* under
${COVDIR} but always runs go tool covdata against ${COVDIR}; update the logic to
detect the actual directory that contains the copied coverage files and use that
directory as the input root for go tool covdata. Specifically, use find to
locate the first directory containing 'covcounters.*' (e.g., assign it to a
variable like COVERAGE_ROOT) after the kubectl cp step, and replace the fixed
${COVDIR} arguments in the go tool covdata percent and textfmt commands with
that COVERAGE_ROOT (fall back to ${COVDIR} if find fails).
- Around line 56-57: The pod command switches to /bin/sippy-cover which is not
built or copied into the image; update the build and image or revert the
command: either add a Makefile build target for sippy-cover and update the
Dockerfile to build and COPY the resulting sippy-cover binary into the image
(alongside sippy, sippy-daemon, fetchdata.sh), or change the pod command back to
/bin/sippy (or another existing binary) in
e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh so the entrypoint references a
binary that actually exists in the image.

---

Duplicate comments:
In `@pkg/sippyserver/server.go`:
- Around line 182-193: The request is validated against s.getReleases(...) but
componentreadiness.GetComponentReport re-queries the provider, causing
inconsistent catalogs; change GetComponentReport to accept a pre-fetched
[]sippyv1.Release (e.g., add a parameter allReleases) and update the call site
that currently calls componentreadiness.GetComponentReport(...) to pass the
allReleases slice returned by s.getReleases(ctx), then remove or bypass the
provider.QueryReleases(...) call inside
pkg/api/componentreadiness/component_report.go so the report generation uses the
supplied allReleases instead of re-fetching.
- Around line 414-416: The code currently appends a single
ComponentReadinessCapability when either s.bigQueryClient or s.crDataProvider
exists, which wrongly grants access to routes that require a specific backend;
change this to gate capabilities by backend: add two distinct capabilities
(e.g., ComponentReadinessBigQueryCapability and
ComponentReadinessCRDataProviderCapability) or append capabilities conditionally
per backend by checking s.bigQueryClient and s.crDataProvider separately, and
then register each route/handler to require the concrete capability that matches
the backend it uses (routes referencing s.bigQueryClient should require the
BigQuery capability; routes referencing s.crDataProvider should require the CR
data provider capability).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 36ae543a-4a29-473d-80a4-60fc33ec3b52

📥 Commits

Reviewing files that changed from the base of the PR and between e821604 and 1acebbc.

📒 Files selected for processing (3)

e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh
pkg/componentreadiness/jiraautomator/jiraautomator.go
pkg/sippyserver/server.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/componentreadiness/jiraautomator/jiraautomator.go

e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh

coderabbitai · 2026-04-07T19:21:58Z

e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh

+${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true
+
+if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then
+    echo "Generating coverage report..."
+    go tool covdata percent -i="${COVDIR}"
+    go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"


⚠️ Potential issue | 🟠 Major

Use the directory that actually contains the copied coverage files.

Line 174 searches recursively, but Lines 176-177 always read ${COVDIR}. If the coverage files land under a child directory, the check passes and go tool covdata still points at the wrong root. Because the script later exits with ${TEST_EXIT}, that failure just drops the coverage artifact.

Possible fix

COVDIR=$(mktemp -d) ${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true -if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then +if COVDATA_FILE=$(find "${COVDIR}" -type f $ -name 'covmeta.*' -o -name 'covcounters.*' $ 2>/dev/null | head -n1) \ + && [ -n "${COVDATA_FILE}" ]; then + COVDATA_DIR=$(dirname "${COVDATA_FILE}") echo "Generating coverage report..." - go tool covdata percent -i="${COVDIR}" - go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out" + go tool covdata percent -i="${COVDATA_DIR}" + go tool covdata textfmt -i="${COVDATA_DIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out" # Merge test binary coverage (from -coverprofile) into server binary coverage if [ -f "${ARTIFACT_DIR}/e2e-test-coverage.out" ]; then

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true

if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then

echo "Generating coverage report..."

go tool covdata percent -i="${COVDIR}"

go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"

${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true

if COVDATA_FILE=$(find "${COVDIR}" -type f $ -name 'covmeta.*' -o -name 'covcounters.*' $ 2>/dev/null | head -n1) \

&& [ -n "${COVDATA_FILE}" ]; then

COVDATA_DIR=$(dirname "${COVDATA_FILE}")

echo "Generating coverage report..."

go tool covdata percent -i="${COVDATA_DIR}"

go tool covdata textfmt -i="${COVDATA_DIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` around lines 172 - 177, The script currently checks for covcounters.* under ${COVDIR} but always runs go tool covdata against ${COVDIR}; update the logic to detect the actual directory that contains the copied coverage files and use that directory as the input root for go tool covdata. Specifically, use find to locate the first directory containing 'covcounters.*' (e.g., assign it to a variable like COVERAGE_ROOT) after the kubectl cp step, and replace the fixed ${COVDIR} arguments in the go tool covdata percent and textfmt commands with that COVERAGE_ROOT (fall back to ${COVDIR} if find fails).

Build a coverage-instrumented sippy-cover binary in the Dockerfile and create a PVC in the CI setup script to persist coverage data across pod restarts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- automatejira: include error details in GetJobVariants failure - seed_data: make test_failures UPDATE failure fatal - serve: initialize GCS client independently of data provider - component_report: recompute cell status from post-analysis results so middleware (triage) can improve cells, not just worsen them - postgres provider: propagate context into query methods, add job name filter (periodic/release/aggregator) for BQ parity - metrics: guard against nil crProvider to prevent panic - triage tests: sort regressions by TestID for deterministic ordering - setup script: quote shell variables Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-04-07T19:41:05Z

/test e2e

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

pkg/api/componentreadiness/dataprovider/postgres/provider.go (1)

343-345: ⚠️ Potential issue | 🟠 Major

Don't turn variant lookup failures into empty reports.

If the follow-up prow_jobs query fails, this falls back to an empty jobVariantMap and the !ok check later silently drops every aggregated row. Return the DB error instead of synthesizing an empty result set.

Suggested fix

-	jobVariantMap := p.fetchJobVariants(rows)
+	jobVariantMap, err := p.fetchJobVariants(rows)
+	if err != nil {
+		return nil, []error{err}
+	}
@@
-func (p *PostgresProvider) fetchJobVariants(rows []testStatusRow) map[uint]map[string]string {
+func (p *PostgresProvider) fetchJobVariants(rows []testStatusRow) (map[uint]map[string]string, error) {
@@
-	if err := p.dbc.DB.Raw(`SELECT id, variants FROM prow_jobs WHERE id IN (?)`, ids).Scan(&jobRows).Error; err != nil {
-		log.WithError(err).Error("error fetching job variants")
-		return map[uint]map[string]string{}
+	if err := p.dbc.DB.Raw(`SELECT id, variants FROM prow_jobs WHERE id IN (?)`, ids).Scan(&jobRows).Error; err != nil {
+		return nil, fmt.Errorf("fetching job variants: %w", err)
 	}
@@
-	return result
+	return result, nil
 }

Also applies to: 397-418

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go` around lines
343 - 345, The current code turns failures from the follow-up prow_jobs query
into an empty jobVariantMap by calling p.fetchJobVariants(rows) and proceeding,
which causes aggregated rows to be silently dropped; change p.fetchJobVariants
to return (map[string]YourVariantType, error) (or otherwise surface the DB
error), update the call site where jobVariantMap := p.fetchJobVariants(rows) to
check and return the error instead of using an empty map (and adjust the
subsequent !ok handling accordingly), and make the same change for the second
occurrence around the 397-418 block so any DB query error is propagated rather
than treated as an empty result.

cmd/sippy/seed_data.go (1)

64-67: ⚠️ Potential issue | 🟠 Major

Refuse non-local DSNs by default.

Blocking only DSNs containing amazonaws.com still allows destructive writes to any other remote Postgres host. For a command that seeds synthetic data and refreshes regressions, parse the DSN and require either a strict localhost/e2e allowlist or an explicit confirmation flag.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sippy/seed_data.go` around lines 64 - 67, The RunE handler currently only
blocks DSNs containing "amazonaws.com"; instead parse f.DBFlags.DSN (the DB
connection string) using url.Parse (or a DSN parser) to extract the host and
refuse any non-local hosts by default (allowlist only localhost, 127.0.0.1, ::1
and unix socket/empty host and any e2e-specific host tokens you require), and
add an explicit boolean flag (e.g. --force or --confirm) that must be set to
permit seeding into remote hosts; update the RunE logic to check parsedHost
against the allowlist and the confirm flag before proceeding (use the symbols
RunE and f.DBFlags.DSN to locate the change).

🧹 Nitpick comments (1)

Dockerfile (1)

7-13: Avoid shipping the coverage binary in the default runtime image.

Line 7 and Line 13 make coverage build/copy unconditional, which adds build overhead and runtime image bloat even when e2e coverage is not needed. Consider moving sippy-cover to a dedicated e2e stage/target so the default image stays lean.

Proposed Dockerfile split (default runtime vs e2e coverage target)

 FROM registry.access.redhat.com/ubi9/ubi:latest AS builder
 WORKDIR /go/src/sippy
 COPY . .
 ENV PATH="/go/bin:${PATH}"
 ENV GOPATH="/go"
 RUN dnf module enable nodejs:18 -y && dnf install -y go make npm && make build
-RUN go build -cover -coverpkg=./cmd/...,./pkg/... -mod=vendor -o ./sippy-cover ./cmd/sippy
+
+FROM builder AS cover-builder
+RUN go build -cover -coverpkg=./cmd/...,./pkg/... -mod=vendor -o ./sippy-cover ./cmd/sippy
 
 FROM registry.access.redhat.com/ubi9/ubi:latest AS base
 RUN mkdir -p /historical-data
 RUN mkdir -p /config
 COPY --from=builder /go/src/sippy/sippy /bin/sippy
-COPY --from=builder /go/src/sippy/sippy-cover /bin/sippy-cover
 COPY --from=builder /go/src/sippy/sippy-daemon /bin/sippy-daemon
 COPY --from=builder /go/src/sippy/scripts/fetchdata.sh /bin/fetchdata.sh
 COPY --from=builder /go/src/sippy/config/*.yaml /config/
 ENTRYPOINT ["/bin/sippy"]
 EXPOSE 8080
+
+FROM base AS e2e
+COPY --from=cover-builder /go/src/sippy/sippy-cover /bin/sippy-cover

As per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 7 - 13, The Dockerfile currently always builds and
copies the coverage binary (sippy-cover) — via the go build -cover... producing
./sippy-cover and COPY --from=builder .../sippy-cover /bin/sippy-cover — which
bloats the default runtime image; split the Dockerfile into two targets: keep
the existing builder stage but move the coverage build step (the RUN go build
-cover -coverpkg=... producing ./sippy-cover) into a dedicated e2e coverage
stage/target (or conditionally run it via a build-arg/target), and remove the
COPY --from=builder .../sippy-cover line from the default runtime/base stage so
only the normal sippy binary (/bin/sippy) is copied into the base image; ensure
the new e2e stage still copies sippy-cover into its image when that target is
built.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 491-516: testDetailQuery currently reads prow_job_run_tests
directly and can emit duplicate rows; mirror the deduplication used by
testStatusQuery by introducing the same CTE (the deduping of prow_job_run_tests
per run/test/suite with status precedence) and use that CTE in place of direct
joins to prow_job_run_tests when building testDetailQuery and the other detail
query around lines 572-637 so crstatus.TestJobRunRows is populated from the
deduped/resultant rows; update JOINs to reference the deduped CTE name and
ensure the status precedence logic from testStatusQuery is preserved.

---

Duplicate comments:
In `@cmd/sippy/seed_data.go`:
- Around line 64-67: The RunE handler currently only blocks DSNs containing
"amazonaws.com"; instead parse f.DBFlags.DSN (the DB connection string) using
url.Parse (or a DSN parser) to extract the host and refuse any non-local hosts
by default (allowlist only localhost, 127.0.0.1, ::1 and unix socket/empty host
and any e2e-specific host tokens you require), and add an explicit boolean flag
(e.g. --force or --confirm) that must be set to permit seeding into remote
hosts; update the RunE logic to check parsedHost against the allowlist and the
confirm flag before proceeding (use the symbols RunE and f.DBFlags.DSN to locate
the change).

In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 343-345: The current code turns failures from the follow-up
prow_jobs query into an empty jobVariantMap by calling p.fetchJobVariants(rows)
and proceeding, which causes aggregated rows to be silently dropped; change
p.fetchJobVariants to return (map[string]YourVariantType, error) (or otherwise
surface the DB error), update the call site where jobVariantMap :=
p.fetchJobVariants(rows) to check and return the error instead of using an empty
map (and adjust the subsequent !ok handling accordingly), and make the same
change for the second occurrence around the 397-418 block so any DB query error
is propagated rather than treated as an empty result.

---

Nitpick comments:
In `@Dockerfile`:
- Around line 7-13: The Dockerfile currently always builds and copies the
coverage binary (sippy-cover) — via the go build -cover... producing
./sippy-cover and COPY --from=builder .../sippy-cover /bin/sippy-cover — which
bloats the default runtime image; split the Dockerfile into two targets: keep
the existing builder stage but move the coverage build step (the RUN go build
-cover -coverpkg=... producing ./sippy-cover) into a dedicated e2e coverage
stage/target (or conditionally run it via a build-arg/target), and remove the
COPY --from=builder .../sippy-cover line from the default runtime/base stage so
only the normal sippy binary (/bin/sippy) is copied into the base image; ensure
the new e2e stage still copies sippy-cover into its image when that target is
built.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0229d4ea-3303-45c4-af8a-6f197ad5b0b1

📥 Commits

Reviewing files that changed from the base of the PR and between 1acebbc and de9f1ec.

📒 Files selected for processing (10)

.gitignore
Dockerfile
cmd/sippy/automatejira.go
cmd/sippy/seed_data.go
cmd/sippy/serve.go
e2e-scripts/sippy-e2e-sippy-e2e-setup-commands.sh
pkg/api/componentreadiness/component_report.go
pkg/api/componentreadiness/dataprovider/postgres/provider.go
pkg/sippyserver/metrics/metrics.go
test/e2e/componentreadiness/triage/triageapi_test.go

✅ Files skipped from review due to trivial changes (1)

.gitignore

🚧 Files skipped from review as they are similar to previous changes (1)

cmd/sippy/automatejira.go

coderabbitai · 2026-04-07T19:50:50Z

pkg/api/componentreadiness/dataprovider/postgres/provider.go

+const testDetailQuery = `
+SELECT
+    tow.unique_id AS test_id,
+    t.name AS test_name,
+    pj.name AS prowjob_name,
+    CAST(pjr.id AS TEXT) AS prowjob_run_id,
+    COALESCE(pjr.url, '') AS prowjob_url,
+    pjr.timestamp AS prowjob_start,
+    pj.id AS prow_job_id,
+    pjrt.status,
+    COALESCE(tow.jira_component, '') AS jira_component,
+    tow.jira_component_id,
+    tow.capabilities
+FROM prow_job_run_tests pjrt
+JOIN prow_job_runs pjr ON pjr.id = pjrt.prow_job_run_id
+JOIN prow_jobs pj ON pj.id = pjr.prow_job_id
+JOIN tests t ON t.id = pjrt.test_id
+JOIN test_ownerships tow ON tow.test_id = pjrt.test_id
+    AND (tow.suite_id = pjrt.suite_id OR (tow.suite_id IS NULL AND pjrt.suite_id IS NULL))
+WHERE pj.release = ?
+    AND pjr.timestamp >= ? AND pjr.timestamp < ?
+    AND pjrt.deleted_at IS NULL AND pjr.deleted_at IS NULL AND pj.deleted_at IS NULL
+    AND tow.staff_approved_obsolete = false
+    AND (pjr.labels IS NULL OR NOT pjr.labels @> ARRAY['InfraFailure'])
+ORDER BY pjr.timestamp
+`


⚠️ Potential issue | 🟠 Major

Deduplicate test-detail rows the same way as status aggregation.

testStatusQuery already collapses duplicate prow_job_run_tests rows per run/test/suite before counting, but testDetailQuery reads the raw table directly. On real data with duplicate uploads, the test-details endpoints will emit the same run more than once and inflate pass/fail counts. Reuse the same deduping CTE and status precedence here before building crstatus.TestJobRunRows.

Also applies to: 572-637

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go` around lines 491 - 516, testDetailQuery currently reads prow_job_run_tests directly and can emit duplicate rows; mirror the deduplication used by testStatusQuery by introducing the same CTE (the deduping of prow_job_run_tests per run/test/suite with status precedence) and use that CTE in place of direct joins to prow_job_run_tests when building testDetailQuery and the other detail query around lines 572-637 so crstatus.TestJobRunRows is populated from the deduped/resultant rows; update JOINs to reference the deduped CTE name and ensure the status precedence logic from testStatusQuery is preserved.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

scripts/e2e.sh (1)

118-123: ⚠️ Potential issue | 🟠 Major

Still fail fast on jq errors during cache priming.

curl is checked now, but Line 119 still ignores jq failures. If jq is missing or the response shape changes, the loop runs zero times and Line 123 still prints Cache priming complete, which pushes the failure into later flaky triage tests.

Possible fix

 VIEWS=$(curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || { echo "Failed to fetch views"; exit 1; }
-for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do
+VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || { echo "Failed to parse views"; exit 1; }
+for VIEW in $VIEW_NAMES; do
     echo "  Priming cache for view: $VIEW"
     curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || { echo "Failed to prime cache for view: $VIEW"; exit 1; }
 done

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/e2e.sh` around lines 118 - 123, The cache-priming loop currently
ignores jq failures because VIEWS assignment doesn't validate jq output; update
the VIEWS assignment and loop to abort on jq errors or empty output by using
jq's exit status (e.g., jq -e) or explicitly checking that VIEWS is valid JSON
and contains at least one name before iterating; ensure the variable VIEWS, the
jq invocation that extracts '.[].name', and the loop over VIEW are changed so a
jq failure causes the script to echo a clear error and exit non-zero instead of
silently printing "Cache priming complete".

e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh (1)

171-177: ⚠️ Potential issue | 🟠 Major

Use the copied coverage directory as the covdata input root.

kubectl cp coverage-helper:/tmp/coverage "${COVDIR}" creates a coverage/ subdirectory under ${COVDIR}. The recursive check on Line 174 can pass while Lines 176-177 still point go tool covdata at the parent directory, so artifact generation drops coverage even though the files were copied.

Possible fix

 COVDIR=$(mktemp -d)
 ${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true
 
-if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then
+if COVDATA_FILE=$(find "${COVDIR}" -type f \( -name 'covmeta.*' -o -name 'covcounters.*' \) -print -quit 2>/dev/null) \
+    && [ -n "${COVDATA_FILE}" ]; then
+    COVDATA_DIR=$(dirname "${COVDATA_FILE}")
     echo "Generating coverage report..."
-    go tool covdata percent -i="${COVDIR}"
-    go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"
+    go tool covdata percent -i="${COVDATA_DIR}"
+    go tool covdata textfmt -i="${COVDATA_DIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` around lines 171 - 177, The
copied coverage files end up under a coverage/ subdirectory of COVDIR but the
covdata commands use COVDIR itself; update the script so the covdata input root
points to that coverage subdirectory (e.g., derive a COVROOT from
COVDIR/coverage or locate the coverage dir and pass that to the go tool covdata
percent and go tool covdata textfmt commands) so the coverage files found by the
earlier check are the same files fed to covdata; update references to COVDIR in
the covdata calls (symbols: COVDIR, go tool covdata percent, go tool covdata
textfmt, ${ARTIFACT_DIR}/e2e-coverage.out).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh`:
- Line 135: The current profile merge uses naive text concatenation (tail -n +2
>>) which duplicates blocks and breaks Go coverage semantics; update the merge
logic in the e2e merge step (scripts sippy-e2e-test-commands.sh and
scripts/e2e.sh where e2e-test-coverage.out and other coverage files are
combined) to use a proper Go coverage merging tool (e.g., gocovmerge or the
official go tool covdata/convert sequence) instead of tail/append: invoke the
merger to read the individual profiles (ensure they share the same -covermode
and coverpkg settings like ./pkg/... and ./cmd/...), produce a single merged
profile file, and replace the current tail-based concatenation with that merged
output so the header appears once and statement counts are summed correctly.
- Line 135: The e2e CI path runs the tests without priming the
component-readiness cache; add the same cache-warming steps used in
scripts/e2e.sh (the logic around Lines 116-123 that primes
/api/component_readiness) before the gotestsum invocation so triage cases see
cached reports; locate the CI script block that runs the gotestsum command and
insert the HTTP/kubectl call(s) that hit /api/component_readiness (reusing the
same endpoint, namespace and auth used by scripts/e2e.sh) to ensure the cache is
warmed prior to executing ./test/e2e/....

---

Duplicate comments:
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh`:
- Around line 171-177: The copied coverage files end up under a coverage/
subdirectory of COVDIR but the covdata commands use COVDIR itself; update the
script so the covdata input root points to that coverage subdirectory (e.g.,
derive a COVROOT from COVDIR/coverage or locate the coverage dir and pass that
to the go tool covdata percent and go tool covdata textfmt commands) so the
coverage files found by the earlier check are the same files fed to covdata;
update references to COVDIR in the covdata calls (symbols: COVDIR, go tool
covdata percent, go tool covdata textfmt, ${ARTIFACT_DIR}/e2e-coverage.out).

In `@scripts/e2e.sh`:
- Around line 118-123: The cache-priming loop currently ignores jq failures
because VIEWS assignment doesn't validate jq output; update the VIEWS assignment
and loop to abort on jq errors or empty output by using jq's exit status (e.g.,
jq -e) or explicitly checking that VIEWS is valid JSON and contains at least one
name before iterating; ensure the variable VIEWS, the jq invocation that
extracts '.[].name', and the loop over VIEW are changed so a jq failure causes
the script to echo a clear error and exit non-zero instead of silently printing
"Cache priming complete".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 510c67c7-4627-48e6-a103-28542ca6d88a

📥 Commits

Reviewing files that changed from the base of the PR and between de9f1ec and 1467315.

📒 Files selected for processing (2)

e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh
scripts/e2e.sh

coderabbitai · 2026-04-07T23:14:36Z

e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh

-
 # only 1 in parallel, some tests will clash if run at the same time
-gotestsum --junitfile ${ARTIFACT_DIR}/junit_e2e.xml -- ./test/e2e/... -v -p 1
+gotestsum --junitfile ${ARTIFACT_DIR}/junit_e2e.xml -- ./test/e2e/... -v -p 1 -coverprofile=${ARTIFACT_DIR}/e2e-test-coverage.out -coverpkg=./pkg/...,./cmd/...


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Instrumented server binary ===" rg -n --context 2 'sippy-cover|coverpkg=./cmd/\.\.\.,./pkg/\.\.\.' Dockerfile echo echo "=== Test coverage collection ===" sed -n '132,136p' e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh echo echo "=== Merge logic in CI script ===" sed -n '178,183p' e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh echo echo "=== Same merge logic in local script ===" sed -n '28,31p' scripts/e2e.sh

Repository: openshift/sippy

Length of output: 1666

Fix coverage merging to use proper Go coverage semantics instead of text concatenation.

Both the server binary (Dockerfile Line 7) and test coverage collection (Line 135) instrument the same packages (./cmd/... and ./pkg/...). The merge logic at Lines 179-183 (and scripts/e2e.sh Lines 29-31) uses naive text append (tail -n +2 ... >>), which duplicates coverage blocks instead of properly merging them. Go coverage profiles require proper merge semantics to avoid inaccurate coverage reports.

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 135-135: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 135-135: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` at line 135, The current profile merge uses naive text concatenation (tail -n +2 >>) which duplicates blocks and breaks Go coverage semantics; update the merge logic in the e2e merge step (scripts sippy-e2e-test-commands.sh and scripts/e2e.sh where e2e-test-coverage.out and other coverage files are combined) to use a proper Go coverage merging tool (e.g., gocovmerge or the official go tool covdata/convert sequence) instead of tail/append: invoke the merger to read the individual profiles (ensure they share the same -covermode and coverpkg settings like ./pkg/... and ./cmd/...), produce a single merged profile file, and replace the current tail-based concatenation with that merged output so the header appears once and statement counts are summed correctly.

⚠️ Potential issue | 🟠 Major

Warm the component-readiness cache before starting the suite.

scripts/e2e.sh now primes /api/component_readiness in Lines 116-123 because the triage cases rely on cached reports. This CI path goes straight to Line 135, so it still runs the same ./test/e2e/... suite against a cold cache.

Suggested change

${KUBECTL_CMD} -n sippy-e2e get svc,ep +# Prime the component readiness cache so triage tests can find cached reports. +VIEWS=$(curl -sf "http://localhost:${SIPPY_API_PORT}/api/component_readiness/views") || { echo "Failed to fetch views"; exit 1; } +VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || { echo "Failed to parse views"; exit 1; } +for VIEW in $VIEW_NAMES; do + curl -sf "http://localhost:${SIPPY_API_PORT}/api/component_readiness?view=${VIEW}" > /dev/null || { echo "Failed to prime cache for view: ${VIEW}"; exit 1; } +done + # only 1 in parallel, some tests will clash if run at the same time gotestsum --junitfile ${ARTIFACT_DIR}/junit_e2e.xml -- ./test/e2e/... -v -p 1 -coverprofile=${ARTIFACT_DIR}/e2e-test-coverage.out -coverpkg=./pkg/...,./cmd/...

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 135-135: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 135-135: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` at line 135, The e2e CI path runs the tests without priming the component-readiness cache; add the same cache-warming steps used in scripts/e2e.sh (the logic around Lines 116-123 that primes /api/component_readiness) before the gotestsum invocation so triage cases see cached reports; locate the CI script block that runs the gotestsum command and insert the HTTP/kubectl call(s) that hit /api/component_readiness (reusing the same endpoint, namespace and auth used by scripts/e2e.sh) to ensure the cache is warmed prior to executing ./test/e2e/....

openshift-ci-robot · 2026-04-07T23:17:04Z

Scheduling required tests:
/test e2e

Prefer sippy-cover over sippy when available, and propagate GOCOVERDIR to the subprocess so load command coverage is captured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Exercises the regression tracker PostAnalysis middleware path where triages adjust report cell status from SignificantRegression to SignificantTriagedRegression (or Extreme equivalent). This improves regressiontracker middleware coverage from ~20% to ~71% for PostAnalysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-04-08T01:05:04Z

/test e2e

The coverage instrumentation changes inadvertently removed the GCS credential setup, causing the datasync test to skip in CI. Restore GCS_CRED, GCS_SA_JSON_PATH, and SIPPY_E2E_REPO_ROOT exports so the test runner can access them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam · 2026-04-08T01:38:04Z

/test e2e

openshift-ci · 2026-04-08T01:52:43Z

@stbenjam: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e	`23af877`	link	true	`/test e2e`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci · 2026-04-09T09:19:11Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2026

openshift-ci bot requested review from petr-muller and smg247 April 7, 2026 15:34

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2026

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

stbenjam and others added 2 commits April 7, 2026 12:37

Fix gofmt alignment in jobrunannotator.go

c68a4fd

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam changed the title ~~[WIP] Comprehensive e2e testing for Component Readiness with abstracted data backends~~ Comprehensive e2e testing for Component Readiness with abstracted data backends Apr 7, 2026

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2026

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 7, 2026

stbenjam and others added 3 commits April 7, 2026 14:35

Fix gofmt formatting in jiraautomator and server

6a89578

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add coverage instrumentation to CI e2e test script

1acebbc

Add coverage collection to the Prow CI e2e script: build with sippy-cover, set GOCOVERDIR, collect server + test binary coverage profiles and merge them into a single report. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Add sippy-cover binary to Dockerfile and coverage PVC to CI setup

80edd64

Build a coverage-instrumented sippy-cover binary in the Dockerfile and create a PVC in the CI setup script to persist coverage data across pod restarts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam and others added 2 commits April 7, 2026 15:36

Add .out

de9f1ec

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Mem and remove artifact

1467315

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

stbenjam and others added 2 commits April 7, 2026 20:45

Use coverage-instrumented binary for datasync e2e test

2366f73

Prefer sippy-cover over sippy when available, and propagate GOCOVERDIR to the subprocess so load command coverage is captured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stbenjam force-pushed the enhance-cr-testing branch from 5c6e9f2 to 0f47bcd Compare April 8, 2026 01:04

petr-muller added a commit to petr-muller/sippy that referenced this pull request Apr 9, 2026

openshift#3411

8735587

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2026

Conversation

stbenjam commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Apr 7, 2026

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci bot commented Apr 7, 2026

Uh oh!

stbenjam commented Apr 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam Apr 7, 2026

Choose a reason for hiding this comment

stbenjam commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading