Comprehensive e2e testing for Component Readiness with abstracted data backends#3411
Comprehensive e2e testing for Component Readiness with abstracted data backends#3411stbenjam wants to merge 16 commits intoopenshift:mainfrom
Conversation
…testing When refactoring Component Readiness with AI assistance, the lack of tests made it difficult to verify changes didn't break anything. Adding comprehensive e2e tests made the refactor dramatically more reliable -- the AI could detect and fix regressions as it worked. Component Readiness was tightly coupled to BigQuery. The first step was extracting a DataProvider interface, then adding implementations behind it. A mock provider with static fixtures came first, but was replaced with a PostgreSQL provider because: 1) we'd eventually like to run CR against postgres in production, 2) postgres is already available in e2e infrastructure, and 3) a real database lets us fully test triage, regressions, and other write-path features. The e2e suite now seeds realistic synthetic data (4800 job runs across 4 releases with deterministic pass/fail rates), launches the full API server against it, and runs 70+ tests covering component reports, test details, variants, regressions, triage workflows, and the regression tracker. This also enables future UI testing with tools like Playwright or headless Chromium against a fully functional API. Changes: - Add pgprovider implementing the ComponentRedinessDataProvider interface - Add seed-data command generating synthetic prow job runs and test results - Rewrite e2e tests to use postgres provider exclusively (BigQuery no longer needed for e2e) - Add e2e tests for test_details, variants, regressions, columnGroupBy/ dbGroupBy, and variant cross-compare endpoints - Fix triage e2e tests to work with real seed data regressions instead of fragile cache injection (delete E2ECacheManipulator) - Fix test isolation in regressiontracker tests (scoped cleanup) - Add optional data sync test (when GCS_SA_JSON_PATH is set) to verify prow/GCS loading still works - Add server metrics for postgres query performance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a dataprovider abstraction for Component Readiness with BigQuery and Postgres implementations, migrates pipeline types from BigQuery-specific to Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stbenjam The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e |
- Fix gofmt formatting in interface.go, provider.go, types.go, releasefallback.go, report_test.go - Fix gosec G204: add #nosec for exec.Command in e2e test - Fix gosec G306: use 0o600 permissions for WriteFile - Fix gocritic typeAssertChain: rewrite to type switch - Fix gocyclo: extract seedProwJobs, seedTestsAndOwnerships, seedJobRunsAndResults, seedRunsForJob from seedSyntheticData Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/api/componentreadiness/query/querygenerators.go (1)
783-793:⚠️ Potential issue | 🔴 CriticalHandle
time.Timeforlast_failuretoo.
prowjob_start(lines 1057–1062) safely handles bothcivil.DateTimeandtime.Timewith type assertion, butlast_failure(lines 783–793) hard-casts tocivil.DateTimeonly. This will panic if BigQuery returnstime.Time, breaking component report generation.Suggested fix
case col == "last_failure": - // ignore when we cant parse, its usually null - var err error - if row[i] != nil { - layout := "2006-01-02T15:04:05" - lftCivilDT := row[i].(civil.DateTime) - cts.LastFailure, err = time.Parse(layout, lftCivilDT.String()) - if err != nil { - log.WithError(err).Error("error parsing last failure time from bigquery") - } - } + if row[i] != nil { + switch v := row[i].(type) { + case civil.DateTime: + cts.LastFailure = time.Date(v.Date.Year, v.Date.Month, v.Date.Day, v.Time.Hour, v.Time.Minute, v.Time.Second, v.Time.Nanosecond, time.UTC) + case time.Time: + cts.LastFailure = v.UTC() + default: + return "", crstatus.TestStatus{}, fmt.Errorf("unexpected last_failure type %T", row[i]) + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 783 - 793, The last_failure handling currently assumes row[i] is civil.DateTime and will panic if BigQuery returns time.Time; update the col == "last_failure" branch to mirror prowjob_start's safe type assertions: check for nil, then use a type switch on row[i] to handle civil.DateTime (call .String() and parse with layout) and time.Time (assign directly to cts.LastFailure), capture and log parsing errors via log.WithError, and avoid an unconditional type cast so cts.LastFailure is correctly set for both civil.DateTime and time.Time.cmd/sippy/component_readiness.go (1)
202-213:⚠️ Potential issue | 🟠 MajorChoose the CR provider once instead of hard-wiring BigQuery here.
Line 202 and Line 213 both create a BigQuery provider directly. That means this entrypoint still has no way to run Component Readiness off the new postgres provider, and the server/metrics paths use separate provider instances. Build the backend-specific provider once and pass the same object to
NewServerandRefreshMetricsDB.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/component_readiness.go` around lines 202 - 213, The Component Readiness entrypoint currently constructs two separate BigQuery providers (calls to bqprovider.NewBigQueryProvider) which prevents using a single backend provider (e.g., Postgres) and yields different instances for server and metrics; instead, create the CR data provider once (e.g., assign crDataProvider := NewProvider(...) using the chosen backend client) and pass that same crDataProvider into NewServer(...) and RefreshMetricsDB(...) (replace the two direct bqprovider.NewBigQueryProvider(...) calls), so both the server and metrics refresher use the same provider instance and the code can swap in a Postgres provider by changing the single construction site.
🧹 Nitpick comments (3)
pkg/apis/api/componentreport/crstatus/types.go (1)
56-57: Mark synthesized fields as ignored by BigQuery.
TestKeyandTestKeyStrare local bookkeeping fields, not query columns. Addingbigquery:"-"keeps this shared DTO explicit and avoids relying on implicit mapper behavior.Suggested fix
- TestKey crtest.KeyWithVariants `json:"test_key"` - TestKeyStr string `json:"-"` // transient field so we dont have to keep recalculating + TestKey crtest.KeyWithVariants `json:"test_key" bigquery:"-"` + TestKeyStr string `json:"-" bigquery:"-"` // transient field so we dont have to keep recalculatingAs per coding guidelines, BigQuery struct fields must have
bigquery:"column_name"tags that exactly match the BigQuery table column names.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/apis/api/componentreport/crstatus/types.go` around lines 56 - 57, The TestKey and TestKeyStr fields in the struct are synthesized/local-only and must be excluded from BigQuery mapping; update the struct definition to add bigquery:"-" tags to both TestKey and TestKeyStr (i.e., annotate crtest.KeyWithVariants TestKey and string TestKeyStr with `bigquery:"-"`) so the BigQuery mapper ignores these transient bookkeeping fields while keeping the rest of the DTO tags unchanged.cmd/sippy/automatejira.go (1)
173-176: Include the job-variant query errors in the returned failure.Line 174 already gives you the backend errors, but Line 176 discards them. Returning only
"failed to get job variants"will make provider rollout failures very hard to diagnose from automation logs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/automatejira.go` around lines 173 - 176, The current error path in automatejira.go drops the detailed errors returned by componentreadiness.GetJobVariants; update the failure return to include those errors (errs) so callers see the underlying query problems — for example, wrap or format errs into the returned error from the block where NewBigQueryProvider(bigQueryClient) and componentreadiness.GetJobVariants(ctx, provider) are called, referencing the variables/provider and function names (NewBigQueryProvider, GetJobVariants, errs) when constructing the new error message.pkg/componentreadiness/jiraautomator/jiraautomator.go (1)
73-77: Validateproviderin the constructor since report generation now depends on it.The constructor guards
bqClient, butgetComponentReportForViewnow relies onj.dataProvider. Adding a nil-check here gives earlier, clearer failures.Suggested fix
func NewJiraAutomator( jiraClient *jira.Client, bqClient *bqclient.Client, provider dataprovider.DataProvider, @@ j := JiraAutomator{ @@ } + if provider == nil { + return j, fmt.Errorf("we don't have a component readiness data provider for jira integrator") + } if bqClient == nil || bqClient.BQ == nil { return j, fmt.Errorf("we don't have a bigquery client for jira integrator") }Also applies to: 89-106, 155-156
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/componentreadiness/jiraautomator/jiraautomator.go` around lines 73 - 77, The NewJiraAutomator constructor currently guards bqClient but not provider; add a nil-check for the provider (j.dataProvider) in NewJiraAutomator and handle it the same way bqClient is handled (log the error and return/exit consistently with the existing constructor pattern) so getComponentReportForView never observes a nil dataProvider; apply the same validation in the other constructor/initialization sites referenced (around the getComponentReportForView use and the other similar blocks) to fail fast with a clear message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/sippy/annotatejobruns.go`:
- Around line 183-186: The code path calling componentreadiness.GetJobVariants
currently discards the underlying errors in errs; change the error return to
include those details so failures are debuggable — e.g. when len(errs) > 0
return an error that wraps or formats errs (for example fmt.Errorf("failed to
get job variants: %v", errs) or using errors.Join) while keeping the call to
componentreadiness.GetJobVariants(ctx,
bqprovider.NewBigQueryProvider(bigQueryClient)) intact.
In `@cmd/sippy/load.go`:
- Around line 321-323: The code is hard-coding a BigQuery provider when calling
componentreadiness.NewRegressionTracker; replace
bqprovider.NewBigQueryProvider(bqc) with the actual configured DataProvider
instance used by the API (the variable or factory that constructs the backend
provider) so NewRegressionTracker receives the same DataProvider backend the API
is using (ensure the replacement implements the DataProvider interface and
remove the bqprovider call from the argument list).
In `@cmd/sippy/seed_data.go`:
- Line 62: The seed-data command currently bails out if any prow_jobs row exists
(the early-return around the prow_jobs existence check) and --init-database only
runs migrations, so re-running can't refresh or repair a partial dataset; change
the logic in seed_data.go (the seed-data entry/run function that checks
prow_jobs and handles --init-database) to make seeding idempotent or support a
refresh: either remove the unconditional early return and instead verify dataset
completeness, or add a --force/--refresh flag that deletes or upserts the seeded
rows before populating; implement idempotent inserts (use upsert/ON CONFLICT or
replace semantics) for the prow_jobs and related seed routines so re-running
seedData will update/mend existing data while keeping --init-database to only
migrate schema unless --refresh is provided.
- Around line 682-688: The regression-tracker errors are being logged but
ignored after calling tracker.Load(), causing the seeding to report success even
when tracker.Errors() contains failures; change this so the function returns a
non-nil error when tracker.Errors() is non-empty: after tracker.Load() check
tracker.Errors(), log each error as now but then aggregate them (e.g., wrap or
join messages) and return that error instead of nil so callers see the failure
(refer to tracker.Load() and tracker.Errors()).
- Around line 518-565: The bug is that the global testableRuns value reduces
every test's inserted rows by 2, stealing rows when counts.total == runCount;
fix by computing assignment limits per-test instead of using the single
testableRuns variable—introduce reservedInfraRuns (2) and inside the loop over
syntheticTests use runsToAssign := min(counts.total, runCount -
reservedInfraRuns) (ensuring runCount - reservedInfraRuns is non-negative) and
iterate i < runsToAssign when inserting prow_job_run_tests; update references to
testableRuns and remove the global reduction so counts.total is honored
correctly per test.
In `@cmd/sippy/serve.go`:
- Around line 169-172: The crash is caused by s.variantManager being nil when
BigQuery is not used; update jsonJobRunRiskAnalysis to guard calls to
s.variantManager (e.g., before calling s.variantManager.IdentifyVariants(...)
check if s.variantManager == nil) and either return a clear 400/500 response or
skip variant identification logic when nil. Alternatively, ensure variantManager
is always initialized by setting variantManager =
testidentification.NewNoopVariantManager() (or similar) in the serve setup when
bigQueryClient is nil; reference the variantManager variable, GetVariantManager
function, and the jsonJobRunRiskAnalysis / s.variantManager.IdentifyVariants
call to locate where to add the nil-check or default initialization.
In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go`:
- Around line 236-250: The query builds queryString in provider.go and currently
defines successful_runs as COUNTIF which can overcount when multiple rows share
a build id; change successful_runs to dedupe by build id just like total_runs —
e.g., replace the raw COUNTIF with a COUNT(DISTINCT CASE WHEN jobs.prowjob_state
= 'success' THEN jobs.prowjob_build_id END) (or equivalent DISTINCT COUNT over
build id filtered to success) so PassRate cannot exceed 100%; update the SELECT
expression for successful_runs in the queryString construction accordingly.
In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 343-345: fetchJobVariants currently swallows query errors and
returns an empty map which makes queryTestStatus silently drop rows; change
fetchJobVariants to return (map[int64]Variant, error) (or propagate an error)
and update callers like queryTestStatus to check the error and return it instead
of proceeding with an empty map; ensure error handling paths log context and
return a non-nil error to callers so callers surface the DB failure rather than
producing an empty report (update any other call sites of fetchJobVariants
accordingly, e.g., the similar block at the 397-418 region).
- Around line 136-184: QueryReleases currently computes GADate from time.Now()
using releaseMetadata and drops PreviousRelease for names missing from
releaseMetadata; replace the moving-clock logic by using a deterministic
reference (e.g., a provider-level injected now func or a fixed referenceDate
constant on PostgresProvider instead of time.Now()) or store absolute GA dates
in releaseMetadata so GADate is stable, and ensure PreviousRelease is derived
when releaseMetadata lacks an entry by using the sorted releaseNames order (set
rel.PreviousRelease from the prior element in releaseNames). Update
QueryReleases to use the injected/reference time and fallback previous-release
logic so GADate and PreviousRelease are deterministic across calls.
In `@pkg/sippyserver/server.go`:
- Around line 412-414: The ComponentReadinessCapability is being set too
broadly; update the capability logic so routes that require BigQuery are only
advertised when s.bigQueryClient is non-nil and routes that can work with just
the CR data provider are advertised when s.crDataProvider is non-nil.
Concretely, change the single check around s.bigQueryClient || s.crDataProvider
to two checks (or introduce a separate capability): keep
ComponentReadinessCapability (or a new PostgresOnlyCapability) only when
s.crDataProvider != nil for handlers that work with the CR provider, and ensure
jsonJobRunPayload, jsonPullRequestTestResults,
jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, and
jsonTestLifecyclesFromDB are gated behind s.bigQueryClient != nil so they are
not advertised in Postgres-only mode.
- Around line 182-190: The bug is mixing release catalogs between parse and
execution when both BigQuery and CR providers exist; fix by ensuring
componentreadiness.GetComponentReport does not re-query its own releases but
uses the same releases retrieved by Server.getReleases. Concretely: change
componentreadiness.GetComponentReport to accept a releases []sippyv1.Release
(instead of calling s.crDataProvider.QueryReleases or api.GetReleases
internally), update all call sites to pass the releases returned from
Server.getReleases, and remove the internal re-query (the
QueryReleases/api.GetReleases calls) so parsing and execution always use the
identical release slice.
In `@scripts/e2e.sh`:
- Around line 101-105: The cache-priming loop silently ignores failures; update
the VIEWS assignment and loop to fail fast by checking curl and jq exit codes
and aborting on errors: when retrieving VIEWS via VIEWS=$(curl -s
"http://localhost:$SIPPY_API_PORT/api/component_readiness/views") verify curl
succeeded and that jq produced non-empty output (or use pipe with jq -e to cause
non-zero exit), and inside the loop check each curl priming call (the curl to
/api/component_readiness?view=$VIEW) for a non-zero exit and exit the script
immediately (or set -e at the top) if any call fails so the script does not
print "Cache priming complete" on error; reference the VIEWS variable, the for
VIEW in ... loop, and both curl invocations to locate where to add these checks.
- Around line 14-16: Replace the bash-only conditional using [[ in the e2e.sh
script: locate the check that references GCS_SA_JSON_PATH and change the test to
the POSIX-compatible form using [ -z "$GCS_SA_JSON_PATH" ] (ensure proper
spacing around brackets and keep the existing echo and redirection unchanged) so
the script works with /bin/sh.
In `@test/e2e/datasync/datasync_test.go`:
- Around line 22-23: The DB Count calls (e.g., the expression using
dbc.DB.Table("prow_job_runs").Count(&countBefore) and the later count) ignore
the returned gorm result and any error; change them to capture the result (res
:= dbc.DB.Table(...).Count(&countBefore)) and assert res.Error == nil (use
t.Fatalf or t.Fatalff with the error) so DB failures fail the test. For the
external "sippy load" invocation, wrap the command in a context with a timeout
(ctx, cancel := context.WithTimeout(context.Background(), <sensible duration>);
defer cancel()) and run exec.CommandContext(ctx, "sippy", "load", ...) so the
process is killed on timeout; assert the command returned no error and surface
ctx.Err()/command error via t.Fatalf so the test doesn’t hang or silently ignore
failures.
---
Outside diff comments:
In `@cmd/sippy/component_readiness.go`:
- Around line 202-213: The Component Readiness entrypoint currently constructs
two separate BigQuery providers (calls to bqprovider.NewBigQueryProvider) which
prevents using a single backend provider (e.g., Postgres) and yields different
instances for server and metrics; instead, create the CR data provider once
(e.g., assign crDataProvider := NewProvider(...) using the chosen backend
client) and pass that same crDataProvider into NewServer(...) and
RefreshMetricsDB(...) (replace the two direct
bqprovider.NewBigQueryProvider(...) calls), so both the server and metrics
refresher use the same provider instance and the code can swap in a Postgres
provider by changing the single construction site.
In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Around line 783-793: The last_failure handling currently assumes row[i] is
civil.DateTime and will panic if BigQuery returns time.Time; update the col ==
"last_failure" branch to mirror prowjob_start's safe type assertions: check for
nil, then use a type switch on row[i] to handle civil.DateTime (call .String()
and parse with layout) and time.Time (assign directly to cts.LastFailure),
capture and log parsing errors via log.WithError, and avoid an unconditional
type cast so cts.LastFailure is correctly set for both civil.DateTime and
time.Time.
---
Nitpick comments:
In `@cmd/sippy/automatejira.go`:
- Around line 173-176: The current error path in automatejira.go drops the
detailed errors returned by componentreadiness.GetJobVariants; update the
failure return to include those errors (errs) so callers see the underlying
query problems — for example, wrap or format errs into the returned error from
the block where NewBigQueryProvider(bigQueryClient) and
componentreadiness.GetJobVariants(ctx, provider) are called, referencing the
variables/provider and function names (NewBigQueryProvider, GetJobVariants,
errs) when constructing the new error message.
In `@pkg/apis/api/componentreport/crstatus/types.go`:
- Around line 56-57: The TestKey and TestKeyStr fields in the struct are
synthesized/local-only and must be excluded from BigQuery mapping; update the
struct definition to add bigquery:"-" tags to both TestKey and TestKeyStr (i.e.,
annotate crtest.KeyWithVariants TestKey and string TestKeyStr with
`bigquery:"-"`) so the BigQuery mapper ignores these transient bookkeeping
fields while keeping the rest of the DTO tags unchanged.
In `@pkg/componentreadiness/jiraautomator/jiraautomator.go`:
- Around line 73-77: The NewJiraAutomator constructor currently guards bqClient
but not provider; add a nil-check for the provider (j.dataProvider) in
NewJiraAutomator and handle it the same way bqClient is handled (log the error
and return/exit consistently with the existing constructor pattern) so
getComponentReportForView never observes a nil dataProvider; apply the same
validation in the other constructor/initialization sites referenced (around the
getComponentReportForView use and the other similar blocks) to fail fast with a
clear message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: d1efc215-eb0c-4b95-8a47-376f0f4e165f
📒 Files selected for processing (41)
.gitignorecmd/sippy/annotatejobruns.gocmd/sippy/automatejira.gocmd/sippy/component_readiness.gocmd/sippy/load.gocmd/sippy/seed_data.gocmd/sippy/serve.goconfig/e2e-openshift.yamlconfig/e2e-views.yamlpkg/api/componentreadiness/component_report.gopkg/api/componentreadiness/component_report_test.gopkg/api/componentreadiness/dataprovider/bigquery/provider.gopkg/api/componentreadiness/dataprovider/interface.gopkg/api/componentreadiness/dataprovider/postgres/provider.gopkg/api/componentreadiness/middleware/interface.gopkg/api/componentreadiness/middleware/linkinjector/linkinjector.gopkg/api/componentreadiness/middleware/list.gopkg/api/componentreadiness/middleware/regressionallowances/regressionallowances.gopkg/api/componentreadiness/middleware/regressiontracker/regressiontracker.gopkg/api/componentreadiness/middleware/releasefallback/releasefallback.gopkg/api/componentreadiness/middleware/releasefallback/releasefallback_test.gopkg/api/componentreadiness/query/querygenerators.gopkg/api/componentreadiness/regressiontracker.gopkg/api/componentreadiness/test_details.gopkg/api/componentreadiness/utils/utils.gopkg/apis/api/componentreport/crstatus/types.gopkg/apis/api/componentreport/testdetails/types.gopkg/bigquery/bqlabel/labels.gopkg/componentreadiness/jiraautomator/jiraautomator.gopkg/componentreadiness/jobrunannotator/jobrunannotator.gopkg/dataloader/crcacheloader/crcacheloader.gopkg/sippyserver/metrics/metrics.gopkg/sippyserver/server.goscripts/e2e.shtest/e2e/componentreadiness/componentreadiness_test.gotest/e2e/componentreadiness/regressiontracker/regressiontracker_test.gotest/e2e/componentreadiness/report/report_test.gotest/e2e/componentreadiness/triage/triageapi_test.gotest/e2e/datasync/datasync_test.gotest/e2e/util/cache_manipulator.gotest/e2e/util/e2erequest.go
💤 Files with no reviewable changes (1)
- test/e2e/util/cache_manipulator.go
cmd/sippy/annotatejobruns.go
Outdated
| allVariants, errs := componentreadiness.GetJobVariants(ctx, bqprovider.NewBigQueryProvider(bigQueryClient)) | ||
| if len(errs) > 0 { | ||
| return fmt.Errorf("failed to get variants from bigquery") | ||
| return fmt.Errorf("failed to get job variants") | ||
| } |
There was a problem hiding this comment.
Preserve provider error details when variant loading fails.
Line 185 currently drops all underlying errors, which makes CLI failures hard to debug.
Suggested fix
allVariants, errs := componentreadiness.GetJobVariants(ctx, bqprovider.NewBigQueryProvider(bigQueryClient))
if len(errs) > 0 {
- return fmt.Errorf("failed to get job variants")
+ msgs := make([]string, 0, len(errs))
+ for _, err := range errs {
+ msgs = append(msgs, err.Error())
+ }
+ return fmt.Errorf("failed to get job variants: %s", strings.Join(msgs, "; "))
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/sippy/annotatejobruns.go` around lines 183 - 186, The code path calling
componentreadiness.GetJobVariants currently discards the underlying errors in
errs; change the error return to include those details so failures are
debuggable — e.g. when len(errs) > 0 return an error that wraps or formats errs
(for example fmt.Errorf("failed to get job variants: %v", errs) or using
errors.Join) while keeping the call to componentreadiness.GetJobVariants(ctx,
bqprovider.NewBigQueryProvider(bigQueryClient)) intact.
There was a problem hiding this comment.
[Claude] Will fix — wrapping the errors into the returned error message.
There was a problem hiding this comment.
@stbenjam, sounds good! For reference, using errors.Join(errs...) (Go 1.20+) would be the most concise approach if available, otherwise the strings.Join approach in the suggestion works well.
cmd/sippy/seed_data.go
Outdated
|
|
||
| The command can be re-run as needed to add more runs, or because your old job runs | ||
| rolled off the 1 week window. | ||
| The command can be re-run to refresh data. |
There was a problem hiding this comment.
seed-data can't currently refresh an existing dataset.
Line 62 says reruns refresh data, but Lines 385-392 return as soon as any prow_jobs row exists, and --init-database only migrates schema. A partially seeded database can't be repaired by rerunning this command.
Also applies to: 385-392
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/sippy/seed_data.go` at line 62, The seed-data command currently bails out
if any prow_jobs row exists (the early-return around the prow_jobs existence
check) and --init-database only runs migrations, so re-running can't refresh or
repair a partial dataset; change the logic in seed_data.go (the seed-data
entry/run function that checks prow_jobs and handles --init-database) to make
seeding idempotent or support a refresh: either remove the unconditional early
return and instead verify dataset completeness, or add a --force/--refresh flag
that deletes or upserts the seeded rows before populating; implement idempotent
inserts (use upsert/ON CONFLICT or replace semantics) for the prow_jobs and
related seed routines so re-running seedData will update/mend existing data
while keeping --init-database to only migrate schema unless --refresh is
provided.
There was a problem hiding this comment.
[Claude] Skipping — `--init-database` drops and recreates all tables before seeding, which handles the re-run case. The early return prevents accidentally double-seeding on a running instance.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check what UpdateSchema does - does it drop tables?
ast-grep --pattern 'func ($_ *DB) UpdateSchema($_) $_ {
$$$
}'
rg -n "UpdateSchema" --type=go -A 10
Length of output: 2161
🏁 Script executed:
#!/bin/bash
# Get the full UpdateSchema implementation
sed -n '68,150p' pkg/db/db.go
Length of output: 2676
@stbenjam, just to flag a factual discrepancy: UpdateSchema uses GORM's AutoMigrate, which is additive-only — it adds missing tables and columns but never drops existing data or tables. So --init-database won't clear rows; re-running seed-data --init-database on a populated database will still hit the early return at Line 387–390 and do nothing.
If the intent is that users should manually drop the database (e.g., dropdb / docker compose down -v) before re-running, that's fine — but the doc string "The command can be re-run to refresh data" would be misleading. It might be worth either correcting the doc string or adding a --force flag that truncates the seeded tables.
| tracker.Load() | ||
| if len(tracker.Errors()) > 0 { | ||
| for _, err := range tracker.Errors() { | ||
| log.WithError(err).Warn("regression tracker error") | ||
| } | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Don't hide regression-tracker failures.
This command now seeds regression-tracker state too. Logging tracker.Errors() and still returning nil leaves test_regressions incomplete while the overall seed reports success.
Suggested fix
tracker.Load()
if len(tracker.Errors()) > 0 {
for _, err := range tracker.Errors() {
log.WithError(err).Warn("regression tracker error")
}
+ return fmt.Errorf("regression tracker failed with %d error(s)", len(tracker.Errors()))
}
return nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tracker.Load() | |
| if len(tracker.Errors()) > 0 { | |
| for _, err := range tracker.Errors() { | |
| log.WithError(err).Warn("regression tracker error") | |
| } | |
| } | |
| return nil | |
| tracker.Load() | |
| if len(tracker.Errors()) > 0 { | |
| for _, err := range tracker.Errors() { | |
| log.WithError(err).Warn("regression tracker error") | |
| } | |
| return fmt.Errorf("regression tracker failed with %d error(s)", len(tracker.Errors())) | |
| } | |
| return nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/sippy/seed_data.go` around lines 682 - 688, The regression-tracker errors
are being logged but ignored after calling tracker.Load(), causing the seeding
to report success even when tracker.Errors() contains failures; change this so
the function returns a non-nil error when tracker.Errors() is non-empty: after
tracker.Load() check tracker.Errors(), log each error as now but then aggregate
them (e.g., wrap or join messages) and return that error instead of nil so
callers see the failure (refer to tracker.Load() and tracker.Errors()).
There was a problem hiding this comment.
[Claude] Will fix — returning an error when the regression tracker reports failures.
There was a problem hiding this comment.
@stbenjam, sounds good! The fix aligns with the suggestion — returning a non-nil error after logging the individual tracker errors will ensure callers see the failure and the incomplete test_regressions state is surfaced clearly.
| func (s *Server) getReleases(ctx context.Context, forceRefresh ...bool) ([]sippyv1.Release, error) { | ||
| if s.bigQueryClient != nil { | ||
| refresh := len(forceRefresh) > 0 && forceRefresh[0] | ||
| return api.GetReleases(ctx, s.bigQueryClient, refresh) | ||
| } | ||
| if s.crDataProvider != nil { | ||
| return s.crDataProvider.QueryReleases(ctx) | ||
| } | ||
| return nil, fmt.Errorf("no data source available for releases") |
There was a problem hiding this comment.
Don't mix release catalogs between request parsing and execution.
This helper now prefers BigQuery, but componentreadiness.GetComponentReport re-queries releases from the provider in pkg/api/componentreadiness/component_report.go Lines 88-93. When both backends are configured, one request can be parsed against BigQuery releases and executed against Postgres releases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sippyserver/server.go` around lines 182 - 190, The bug is mixing release
catalogs between parse and execution when both BigQuery and CR providers exist;
fix by ensuring componentreadiness.GetComponentReport does not re-query its own
releases but uses the same releases retrieved by Server.getReleases. Concretely:
change componentreadiness.GetComponentReport to accept a releases
[]sippyv1.Release (instead of calling s.crDataProvider.QueryReleases or
api.GetReleases internally), update all call sites to pass the releases returned
from Server.getReleases, and remove the internal re-query (the
QueryReleases/api.GetReleases calls) so parsing and execution always use the
identical release slice.
There was a problem hiding this comment.
[Claude] Skipping — threading releases through the call chain is a larger refactor. Follow-up.
There was a problem hiding this comment.
@stbenjam, understood — threading releases through the call chain is a non-trivial change. Would you like me to open a follow-up GitHub issue to track this so it doesn't get lost?
| if s.bigQueryClient != nil || s.crDataProvider != nil { | ||
| capabilities = append(capabilities, ComponentReadinessCapability) | ||
| } |
There was a problem hiding this comment.
ComponentReadinessCapability is now too broad.
This flag also gates handlers that still require s.bigQueryClient (jsonJobRunPayload, jsonPullRequestTestResults, jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, jsonTestLifecyclesFromDB). In postgres-only mode those routes are advertised and pass capability checks, then fail at request time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sippyserver/server.go` around lines 412 - 414, The
ComponentReadinessCapability is being set too broadly; update the capability
logic so routes that require BigQuery are only advertised when s.bigQueryClient
is non-nil and routes that can work with just the CR data provider are
advertised when s.crDataProvider is non-nil. Concretely, change the single check
around s.bigQueryClient || s.crDataProvider to two checks (or introduce a
separate capability): keep ComponentReadinessCapability (or a new
PostgresOnlyCapability) only when s.crDataProvider != nil for handlers that work
with the CR provider, and ensure jsonJobRunPayload, jsonPullRequestTestResults,
jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, and
jsonTestLifecyclesFromDB are gated behind s.bigQueryClient != nil so they are
not advertised in Postgres-only mode.
There was a problem hiding this comment.
[Claude] Skipping — splitting ComponentReadinessCapability into provider-specific capabilities is a larger refactor best done in a follow-up.
There was a problem hiding this comment.
@stbenjam, understood — acknowledged that this is a larger refactor best handled separately. Would you like me to open a follow-up GitHub issue to track splitting ComponentReadinessCapability into provider-specific capabilities so the affected endpoints (jsonJobRunPayload, jsonPullRequestTestResults, jsonTestRunsAndOutputsFromBigQuery, jsonTestCapabilitiesFromDB, jsonTestLifecyclesFromDB) are not advertised in CR-data-provider-only mode?
scripts/e2e.sh
Outdated
| if [[ -z "$GCS_SA_JSON_PATH" ]]; then | ||
| echo "Must provide path to GCS credential in GCS_SA_JSON_PATH env var" 1>&2 | ||
| exit 1 | ||
| echo "WARNING: GCS_SA_JSON_PATH not set, data sync test will be skipped" 1>&2 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
sed -n '1,18p' scripts/e2e.sh
rg -n '\[\[' scripts/e2e.shRepository: openshift/sippy
Length of output: 696
Use POSIX [ here instead of [[ to match the declared shebang.
The file declares #!/bin/sh but line 14 uses [[ -z "$GCS_SA_JSON_PATH" ]], a bash-only construct. This will error on systems with dash or other POSIX shells.
Suggested fix
-if [[ -z "$GCS_SA_JSON_PATH" ]]; then
+if [ -z "$GCS_SA_JSON_PATH" ]; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ -z "$GCS_SA_JSON_PATH" ]]; then | |
| echo "Must provide path to GCS credential in GCS_SA_JSON_PATH env var" 1>&2 | |
| exit 1 | |
| echo "WARNING: GCS_SA_JSON_PATH not set, data sync test will be skipped" 1>&2 | |
| fi | |
| if [ -z "$GCS_SA_JSON_PATH" ]; then | |
| echo "WARNING: GCS_SA_JSON_PATH not set, data sync test will be skipped" 1>&2 | |
| fi |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 14-14: In POSIX sh, [[ ]] is undefined.
(SC3010)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/e2e.sh` around lines 14 - 16, Replace the bash-only conditional using
[[ in the e2e.sh script: locate the check that references GCS_SA_JSON_PATH and
change the test to the POSIX-compatible form using [ -z "$GCS_SA_JSON_PATH" ]
(ensure proper spacing around brackets and keep the existing echo and
redirection unchanged) so the script works with /bin/sh.
There was a problem hiding this comment.
[Claude] Will fix — changing to POSIX `[`.
There was a problem hiding this comment.
@stbenjam, sounds good! The change from [[ to [ will make the script compatible with /bin/sh (dash, etc.) as declared in the shebang. Thanks!
scripts/e2e.sh
Outdated
| VIEWS=$(curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") | ||
| for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do | ||
| echo " Priming cache for view: $VIEW" | ||
| curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null | ||
| done |
There was a problem hiding this comment.
Fail fast when cache priming breaks.
This new path is now required for the triage e2es, but curl -s/jq failures here are ignored and the script still prints Cache priming complete. That turns a startup or view-loading error into a later flaky test failure.
Suggested fix
-VIEWS=$(curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness/views")
-for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do
+VIEWS=$(curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || exit 1
+VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || exit 1
+for VIEW in $VIEW_NAMES; do
echo " Priming cache for view: $VIEW"
- curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null
+ curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || exit 1
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| VIEWS=$(curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") | |
| for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do | |
| echo " Priming cache for view: $VIEW" | |
| curl -s "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null | |
| done | |
| VIEWS=$(curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || exit 1 | |
| VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || exit 1 | |
| for VIEW in $VIEW_NAMES; do | |
| echo " Priming cache for view: $VIEW" | |
| curl -fsS "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || exit 1 | |
| done |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/e2e.sh` around lines 101 - 105, The cache-priming loop silently
ignores failures; update the VIEWS assignment and loop to fail fast by checking
curl and jq exit codes and aborting on errors: when retrieving VIEWS via
VIEWS=$(curl -s
"http://localhost:$SIPPY_API_PORT/api/component_readiness/views") verify curl
succeeded and that jq produced non-empty output (or use pipe with jq -e to cause
non-zero exit), and inside the loop check each curl priming call (the curl to
/api/component_readiness?view=$VIEW) for a non-zero exit and exit the script
immediately (or set -e at the top) if any call fails so the script does not
print "Cache priming complete" on error; reference the VIEWS variable, the for
VIEW in ... loop, and both curl invocations to locate where to add these checks.
There was a problem hiding this comment.
[Claude] Will fix — adding error checks to curl and jq calls.
There was a problem hiding this comment.
@stbenjam, sounds good! The suggested fix in the review comment should cover all the necessary error checks. Let me know if you need anything else.
CI e2e tests previously loaded real data from BigQuery/GCS via `sippy load`, which was slow and required GCS credentials. Switch to `sippy seed-data` with the postgres data provider, matching what scripts/e2e.sh already does. This makes CI e2e faster and removes the dependency on GCS credentials for the core test suite. - Update setup script to run seed-data instead of sippy load - Update test script to serve with --data-provider postgres - Remove GCS credential requirements from CI e2e scripts - Fix lint: gofmt, gosec (G204, G306), gocritic typeAssertChain, gocyclo (extract seedProwJobs, seedTestsAndOwnerships, seedJobRunsAndResults, seedRunsForJob from seedSyntheticData) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/test e2e |
- Fix POSIX compatibility: [[ → [ in e2e.sh - Add error handling to cache priming curl calls in e2e.sh - Add DB error assertions and context timeout to datasync test - Fix last_failure type assertion chain → type switch in querygenerators.go - Return error from regression tracker failures in seed_data.go - Fix misleading re-run docstring in seed_data.go - Deduplicate BQ provider creation in component_readiness.go - Fix successful_runs dedup with COUNT(DISTINCT IF(...)) in BQ provider - Include error details in GetJobVariants failure message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Scheduling required tests: |
|
@coderabbitai Please re-review |
|
✅ Actions performedFull review triggered. |
Build sippy with -cover instrumentation and collect coverage data from the running server during e2e tests. Coverage report is printed at the end of the run, and a coverage.out file is generated for HTML reports. Also adds graceful shutdown signal handling to the server so coverage data is properly flushed on exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/hold |
…ovider - Move query package into dataprovider/bigquery, removing the separate query/ directory - Remove dataSource from DataProvider interface, pushing variant junit table override logic into the BigQuery provider - Add e2e tests for new test pass rate regression, fallback test details, include variants filtering, explicit API params, and triage validation - Merge test binary coverage into e2e report so direct-call tests in report_test.go contribute to measured coverage - Add seed data for new test pass rate regression scenario (70% pass rate below 80% extreme threshold) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add coverage collection to the Prow CI e2e script: build with sippy-cover, set GOCOVERDIR, collect server + test binary coverage profiles and merge them into a single report. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Scheduling required tests: |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go (1)
384-404:⚠️ Potential issue | 🟠 MajorDon’t cache partial fallback data as success.
If one of these per-release queries fails, the goroutine only logs it and
getTestFallbackReleasesstill returnsnilerrors afterwg.Wait(). That means incomplete fallback data can be cached for the full fallback TTL and silently skew regression classification. Collect these goroutine errors and return them.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go` around lines 384 - 404, The goroutines inside getTestFallbackReleases call f.getTestFallbackRelease and may log failures but still allow getTestFallbackReleases to return nil error and leave f.CachedFallbackTestStatuses partially populated; instead, collect per-goroutine errors (e.g. via an error channel or a sync-safe slice) while executing the goroutines, wait for all to finish, aggregate any errors, and if any occurred return a non-nil aggregated error and avoid persisting f.CachedFallbackTestStatuses; only commit/update f.CachedFallbackTestStatuses (or call f.updateTestStatuses results into the cache) and return nil error when no goroutine reported an error. Ensure you reference the existing symbols f.getTestFallbackRelease, f.updateTestStatuses, f.CachedFallbackTestStatuses and the wg/context logic when implementing the error collection and aggregation.
♻️ Duplicate comments (4)
scripts/e2e.sh (1)
117-121:⚠️ Potential issue | 🟠 MajorStill fail fast on view-parsing errors.
curl -sfis checked now, but thejqinsidefor VIEW in $(...)is not. In/bin/sh, a failing command substitution there does not fail the loop, so malformed or empty/viewsoutput can still printCache priming completeand turn into a later flaky test.Suggested fix
VIEWS=$(curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || { echo "Failed to fetch views"; exit 1; } -for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do +VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || { echo "Failed to parse view list"; exit 1; } +for VIEW in $VIEW_NAMES; do echo " Priming cache for view: $VIEW" curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || { echo "Failed to prime cache for view: $VIEW"; exit 1; } doneRun this to verify the shell behavior behind the current pattern:
#!/bin/bash sed -n '115,121p' scripts/e2e.sh /bin/sh -c 'for VIEW in $(false); do :; done; printf "loop_status=%s\n" "$?"'Expected result:
loop_status=0, which shows why a failing parser insidefor VIEW in $(...)is currently masked.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/e2e.sh` around lines 117 - 121, The loop masks failures from the jq command substitution: ensure parsing the VIEWS JSON with jq ('jq -r ".[].name"') is executed and its exit status is checked before iterating so parsing errors cause an immediate failure; specifically, run the jq parse against the VIEWS content into a checked value (or temp) and verify jq succeeded and produced non-empty output before the for VIEW in ... loop (references: VIEWS variable, the for VIEW loop, and the jq -r '.[].name' invocation), failing fast with an error message if jq fails or yields no views.cmd/sippy/serve.go (1)
169-172:⚠️ Potential issue | 🔴 Critical
variantManagercan still be nil in postgres mode.
/api/jobs/runs/risk_analysisis still reachable viaLocalDBCapability, andpkg/sippyserver/server.godereferencess.variantManageron the transient-request path. With--data-provider=postgres, this leaves a panic path unless you initialize a no-op manager here or guard that handler.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/serve.go` around lines 169 - 172, variantManager can remain nil when bigQueryClient is nil (postgres mode), causing a panic when s.variantManager is dereferenced by the /api/jobs/runs/risk_analysis handler; replace the nil by creating/assigning a no-op implementation of testidentification.VariantManager here (or call an existing factory like f.ModeFlags.GetVariantManager with a noop client) so variantManager is never nil, or alternatively add a guard in the handler that checks s.variantManager before dereferencing; reference symbols: variantManager, testidentification.VariantManager, f.ModeFlags.GetVariantManager, s.variantManager, LocalDBCapability, and /api/jobs/runs/risk_analysis.pkg/sippyserver/server.go (1)
414-416:⚠️ Potential issue | 🟠 Major
ComponentReadinessCapabilityis still too broad.In postgres mode this advertises and enables routes like
/api/job/run/payload,/api/pull_requests/test_results,/api/tests/v2/runs,/api/tests/capabilities, and/api/tests/lifecycles, but each handler then immediately fails whens.bigQueryClient == nil.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sippyserver/server.go` around lines 414 - 416, The current capability assembly appends ComponentReadinessCapability when either s.bigQueryClient or s.crDataProvider is set, which incorrectly advertises BigQuery-backed routes when only Postgres/CR is available; change the logic to register separate capabilities (e.g., keep ComponentReadinessCapability for CR only: append it only when s.crDataProvider != nil) and add a distinct BigQueryReadinessCapability (or similar) that is appended only when s.bigQueryClient != nil, and ensure route registration/handler exposure for endpoints like the handlers tied to /api/job/run/payload, /api/pull_requests/test_results, /api/tests/v2/runs, /api/tests/capabilities, and /api/tests/lifecycles is gated on s.bigQueryClient being non-nil (or remove those routes when BigQueryReadinessCapability is not present).pkg/api/componentreadiness/dataprovider/postgres/provider.go (1)
397-418:⚠️ Potential issue | 🟠 MajorDon't mask a variant lookup failure as an empty result.
On error this logs and returns
{}, andqueryTestStatusthen drops every aggregated row because noProwJobIDmatches. Callers see an empty report instead of the DB failure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go` around lines 397 - 418, The fetchJobVariants function currently swallows DB errors by logging and returning an empty map which causes queryTestStatus to treat the failure as “no matches” and drop rows; change fetchJobVariants on PostgresProvider to return (map[uint]map[string]string, error) instead of silently returning {} on failure, propagate the error back to queryTestStatus (and any other callers), and update the call sites to handle the error (log/return the DB error up so the overall report fails instead of producing an empty result); specifically adjust the code around p.dbc.DB.Raw(...) in fetchJobVariants to return nil, err on error, and update queryTestStatus to check the error from fetchJobVariants and handle it appropriately.
🧹 Nitpick comments (6)
pkg/apis/api/componentreport/crstatus/types.go (1)
56-57: Verify these transient fields are excluded from GORM.
TestKeyandTestKeyStrare computed values, not columns. If the new PostgreSQL provider scanscrstatus.TestJobRunRowswith GORM, addgorm:"-"here so this shared DTO is not treated like a mapped model.As per coding guidelines, "Structs used with GORM that have fields not backed by database columns (e.g., computed fields, custom types, API-only fields) must include the
gorm:"-"tag to explicitly exclude them from GORM operations."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/apis/api/componentreport/crstatus/types.go` around lines 56 - 57, TestKey and TestKeyStr in the crstatus DTO are transient computed fields and must be excluded from GORM; update the struct that defines TestKey (type crtest.KeyWithVariants) and TestKeyStr (string) by adding the struct tag gorm:"-" to each field so GORM will ignore them during scanning/migrations (ensure tags are added on the fields named TestKey and TestKeyStr in the TestJobRunRows/crstatus type).test/e2e/componentreadiness/componentreadiness_test.go (1)
220-224: Consider asserting the specific error type for the 404 case.The test verifies an error is returned but doesn't confirm it's specifically a 404/not-found response. This could mask issues where the endpoint returns a different error type.
// If util.SippyGet returns HTTP status codes in the error, consider: // assert.Contains(t, err.Error(), "404") or similar🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/componentreadiness/componentreadiness_test.go` around lines 220 - 224, The test TestRegressionByIDNotFound currently only checks that an error is returned; update it to assert the error represents a 404/not-found response by checking the error content or type from util.SippyGet (e.g., assert that err.Error() contains "404" or use errors.Is()/errors.As() if util.SippyGet returns a typed HTTP error). Locate TestRegressionByIDNotFound and modify the require/assert after calling util.SippyGet to verify the specific 404 condition while still keeping the existing models.TestRegression variable and error check.pkg/api/componentreadiness/dataprovider/bigquery/provider_test.go (1)
96-104: Consider using consistent assertion style.The test mixes
assert.Equal(Line 99) withreflect.DeepEqual+t.Errorf(Lines 100-102). Usingassert.Equalconsistently would provide better error messages and simplify the code.♻️ Suggested fix for consistency
for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { result, skipQuery := copyIncludeVariantsAndRemoveOverrides(tt.overrides, tt.currOverride, tt.includeVariants) assert.Equal(t, tt.expectedSkipQuery, skipQuery) - if !reflect.DeepEqual(result, tt.expected) { - t.Errorf("expected %v, got %v", tt.expected, result) - } + assert.Equal(t, tt.expected, result) }) }This would also allow removing the
reflectimport.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/api/componentreadiness/dataprovider/bigquery/provider_test.go` around lines 96 - 104, Replace the manual reflect.DeepEqual + t.Errorf check with the same testify assertion used above: call assert.Equal(t, tt.expected, result) and keep the existing assert for skipQuery; update the test loop inside Test to use assert.Equal for both comparisons (result and skipQuery) referencing the function copyIncludeVariantsAndRemoveOverrides, and then remove the now-unused reflect import from the file.test/e2e/datasync/datasync_test.go (1)
45-50: Assert that the sync changed the dataset.Lines 45-50 only prove the command exited successfully and the count query worked. A no-op
sippy loadstill passes here, so this test does not actually verify prow/GCS ingestion. Consider asserting a positive delta, or scoping the count to rows created during this run if re-runs share the same DB.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/datasync/datasync_test.go` around lines 45 - 50, The test currently only checks cmd.Run() succeeded and that Table("prow_job_runs").Count returned, but doesn't assert the dataset changed; update the test around cmd.Run()/countBefore to compute and assert a positive delta on prow_job_runs: after running cmd.Run() call dbc.DB.Table("prow_job_runs").Count(&countAfter) and require.Greater(t, countAfter, countBefore) (or alternatively scope the count to rows created during this run by filtering on a run-specific marker or a created_at timestamp range via dbc.DB.Where(...).Count(&countAfter) and assert the delta > 0) so the test fails if sippy load is a no-op.test/e2e/componentreadiness/report/report_test.go (1)
20-49: Build these request options from the shared e2e view.This is a second copy of the release windows, variant grouping, and thresholds already defined in
config/e2e-views.yaml/test/e2e/util. If that view changes again, this suite can still pass while exercising a different configuration than the server. Prefer a shared helper or parse the e2e view definition here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/componentreadiness/report/report_test.go` around lines 20 - 49, The test duplicates release windows, variant grouping and thresholds instead of using the shared e2e view; in setupProvider replace the hard-coded reqOptions construction (the reqopts.RequestOptions assigned to reqOptions) with a call to the shared helper in test/e2e/util that builds RequestOptions from the config/e2e-views.yaml (or parse that YAML into the same reqopts.Release, reqopts.Variants and reqopts.Advanced values), so setupProvider uses the common view definition rather than duplicating values.pkg/api/componentreadiness/test_details.go (1)
199-202: Avoid the unconditional release-date lookup.These dates are only used when
BaseOverrideReleaseis set, but every details report pays the query cost here. The multi-test priming path repeats the same provider call once per test. Move it into the override branch, or fetch it once in the caller and thread it through.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/api/componentreadiness/test_details.go` around lines 199 - 202, The unconditional call to c.dataProvider.QueryReleaseDates with c.ReqOptions causes unnecessary queries for every details report; change the logic so QueryReleaseDates is only invoked when BaseOverrideRelease is set (i.e., wrap the call in the branch that checks c.BaseOverrideRelease) or, for the multi-test priming path, move the QueryReleaseDates call up to the caller and thread the resulting timeRanges into the details reporting path (add an extra parameter or field used by the function that builds testdetails.Report). Ensure you reference QueryReleaseDates, BaseOverrideRelease, c.ReqOptions and the function that returns testdetails.Report when making the change so the lookup is performed only when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/sippy/automatejira.go`:
- Around line 175-177: The return in automatejira.go discards the underlying
errors; update the failure to include the error details from the errs slice
(e.g., change the return in the block checking len(errs) to include errs).
Specifically, modify the code that currently does return fmt.Errorf("failed to
get job variants") to include the errors (for example: return fmt.Errorf("failed
to get job variants: %v", errs) or combine errs with errors.Join and wrap it),
referencing the errs variable in automatejira.go so callers/logs surface the
underlying causes (consistent with annotatejobruns.go's approach).
In `@cmd/sippy/seed_data.go`:
- Around line 64-67: The current RunE check only rejects DSNs containing
"amazonaws.com" and can still write to other remote/prod DBs; update the seed
command to require explicit opt-in and/or a strict allowlist: parse the DSN in
RunE (use f.DBFlags.DSN) to extract the host, reject unless either a new boolean
flag (e.g., --confirm-seed or --force-seed) is passed or the host is in a
white-list of safe hosts (localhost, 127.0.0.1, ::1, UNIX socket, and any e2e
test host you accept); ensure the logic lives inside the existing RunE handler
and uses the parsed host (not substring matching) so non-local endpoints are
refused by default.
- Around line 669-676: The UPDATE that backfills prow_job_runs.test_failures
currently logs a warning on failure, leaving partially-initialized state; make
this failure fatal by propagating the error instead of warning: replace the
log.WithError(err).Warn("failed to update test_failures count") with either
returning the error from the surrounding function (so callers can abort/handle)
or calling log.WithError(err).Fatal(...) to exit immediately. Locate the Exec
call on dbc.DB.Exec that updates prow_job_runs (references: prow_job_runs,
prow_job_run_tests, prowJob.ID) and change the error handling to abort on error
rather than continuing.
In `@cmd/sippy/serve.go`:
- Around line 138-149: The GCS client initialization is incorrectly nested under
the --data-provider=bigquery path causing gcsClient to remain nil for other
providers; move the NewGCSClient call out of the BigQuery-only branch so
gcsClient is initialized whenever GoogleCloudFlags.ServiceAccountCredentialFile
or OAuthClientCredentialFile are present. Concretely, ensure the call to
gcs.NewGCSClient(context.TODO(),
f.GoogleCloudFlags.ServiceAccountCredentialFile,
f.GoogleCloudFlags.OAuthClientCredentialFile) and its error handling (the
log.WithError(...).Warn(...)) run unconditionally (or in a shared initialization
section) prior to selecting the data provider (before the switch/case that sets
crDataProvider and pgprovider.NewPostgresProvider), so gcsClient is available to
pkg/sippyserver/server.go handlers like jsonJobRunIntervals, jsonJobRunEvents,
artifact queries and job summary code paths.
In `@e2e-scripts/sippy-e2e-sippy-e2e-setup-commands.sh`:
- Around line 246-251: The commands in the sippy seed job block use unquoted
shell variables which can cause word-splitting/globbing; update usages to quote
variables like "${KUBECTL_CMD}", "${SIPPY_LOAD_TIMEOUT}", "$retVal", and
"${ARTIFACT_DIR}" and quote the command substitution for job_pod
(job_pod="$(${KUBECTL_CMD} -n sippy-e2e get pod
--selector=job-name=sippy-seed-job
--output=jsonpath='{.items[0].metadata.name}')") and use "${job_pod}" when
passing to kubectl logs so all variables are safely expanded (refer to
KUBECTL_CMD, SIPPY_LOAD_TIMEOUT, retVal, job_pod, ARTIFACT_DIR, sippy-seed-job).
In `@pkg/api/componentreadiness/component_report.go`:
- Around line 129-145: Currently worstStatus is seeded from
report.Rows[ri].Columns[ci].Status before calling c.middlewares.PostAnalysis,
preventing PostAnalysis from ever raising (improving) the cell status; instead,
after running PostAnalysis for each entry in col.RegressedTests (using
c.middlewares.PostAnalysis with the constructed testKey), initialize worstStatus
from the first regressed test's TestComparison.ReportStatus (or start with a
very permissive max value) and then update it by comparing each regressed test's
TestComparison.ReportStatus (newStatus) to compute the worst (most severe)
status, finally assigning that computed worstStatus back to
report.Rows[ri].Columns[ci].Status.
In `@pkg/api/componentreadiness/dataprovider/bigquery/provider.go`:
- Around line 378-417: QueryJobRuns currently builds joinVariants by iterating
allJobVariants.Variants which adds LEFT JOINs for every known variant; restrict
it to only the variants actually used by the query by computing a set containing
"Release" plus the keys of reqOptions.VariantOption.IncludeVariants and then
iterate that set instead of allJobVariants.Variants. Update the joinVariants
construction (the loop that uses param.Cleanse and appends to joinVariants) to
only loop over this neededVariants set, keep using param.Cleanse for param
names, and leave the variantFilters/params logic unchanged so only required
JOINs are emitted.
In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 672-684: The Postgres query in the
p.dbc.DB.Raw(...).Scan(&rows).Error call returns all prow_jobs for a release but
must match the BigQuery backend filter (only jobs with names starting with
"periodic-", "release-", or "aggregator-"). Update the SQL in provider.go (the
Raw(...) invocation that populates rows) to include an additional WHERE clause
restricting pj.name to those prefixes (e.g., pj.name LIKE 'periodic-%' OR
pj.name LIKE 'release-%' OR pj.name LIKE 'aggregator-%') so Postgres results
remain in parity with the BigQuery provider.
- Around line 427-470: Both QueryBaseTestStatus and QuerySampleTestStatus are
ignoring the incoming context (declared as _ ) and thus not propagating
cancellation/timeouts to DB work; update the function signatures to use the ctx
parameter (replace _ with ctx) and pass that ctx into the p.queryTestStatus call
so the context flows down, i.e., in QueryBaseTestStatus and
QuerySampleTestStatus use the provided context variable and change the call to
p.queryTestStatus(..., ctx) or the existing queryTestStatus signature variant
that accepts context (or update queryTestStatus to accept context if needed) so
cancellations/timeouts are honored by the PostgresProvider.
In `@pkg/componentreadiness/jiraautomator/jiraautomator.go`:
- Around line 74-75: The constructor for JiraAutomator currently requires a
BigQuery client even though report building uses j.dataProvider; update the
NewJiraAutomator constructor (or the JiraAutomator constructor function) to stop
hard-failing when bqClient or its cache is nil: make the bqClient parameter
optional (allow nil) or remove it from the mandatory parameter list, validate
that provider (dataprovider.DataProvider) is non-nil instead, and set j.bqClient
only if provided. Also move any strict checks for bqClient into the specific
variant-mapping/reporting code paths that actually need BigQuery (check
j.bqClient and return a clear error where used), and remove the global hard
exit/validation that currently references j.bqClient in the constructor and in
the code around lines 87-98 and 151-152.
In `@pkg/sippyserver/metrics/metrics.go`:
- Around line 118-122: In RefreshMetricsDB, guard against a nil crProvider
before invoking its methods (e.g., crProvider.QueryReleases and later uses
around the block referenced at lines ~168-170); check if crProvider == nil early
in the function and skip component-readiness-specific calls while still allowing
the rest of RefreshMetricsDB to run, or return a controlled error if that is the
intended behavior, and ensure any later references to crProvider are similarly
protected (use the function name RefreshMetricsDB and the variable crProvider to
locate and update the checks).
In `@test/e2e/componentreadiness/triage/triageapi_test.go`:
- Around line 957-968: getSeedDataRegressions currently returns seedRegressions
in API order which is unstable; make the selection deterministic by sorting the
slice before returning (e.g., sort.SliceStable(seedRegressions, func(i,j int)
bool { return seedRegressions[i].TestID < seedRegressions[j].TestID })) or
alternatively return a map keyed by TestID so downstream subtests can pick
specific known seeded IDs; update getSeedDataRegressions to perform the sort (or
change its return type to map[string]models.TestRegression) and adjust callers
accordingly.
---
Outside diff comments:
In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`:
- Around line 384-404: The goroutines inside getTestFallbackReleases call
f.getTestFallbackRelease and may log failures but still allow
getTestFallbackReleases to return nil error and leave
f.CachedFallbackTestStatuses partially populated; instead, collect per-goroutine
errors (e.g. via an error channel or a sync-safe slice) while executing the
goroutines, wait for all to finish, aggregate any errors, and if any occurred
return a non-nil aggregated error and avoid persisting
f.CachedFallbackTestStatuses; only commit/update f.CachedFallbackTestStatuses
(or call f.updateTestStatuses results into the cache) and return nil error when
no goroutine reported an error. Ensure you reference the existing symbols
f.getTestFallbackRelease, f.updateTestStatuses, f.CachedFallbackTestStatuses and
the wg/context logic when implementing the error collection and aggregation.
---
Duplicate comments:
In `@cmd/sippy/serve.go`:
- Around line 169-172: variantManager can remain nil when bigQueryClient is nil
(postgres mode), causing a panic when s.variantManager is dereferenced by the
/api/jobs/runs/risk_analysis handler; replace the nil by creating/assigning a
no-op implementation of testidentification.VariantManager here (or call an
existing factory like f.ModeFlags.GetVariantManager with a noop client) so
variantManager is never nil, or alternatively add a guard in the handler that
checks s.variantManager before dereferencing; reference symbols: variantManager,
testidentification.VariantManager, f.ModeFlags.GetVariantManager,
s.variantManager, LocalDBCapability, and /api/jobs/runs/risk_analysis.
In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 397-418: The fetchJobVariants function currently swallows DB
errors by logging and returning an empty map which causes queryTestStatus to
treat the failure as “no matches” and drop rows; change fetchJobVariants on
PostgresProvider to return (map[uint]map[string]string, error) instead of
silently returning {} on failure, propagate the error back to queryTestStatus
(and any other callers), and update the call sites to handle the error
(log/return the DB error up so the overall report fails instead of producing an
empty result); specifically adjust the code around p.dbc.DB.Raw(...) in
fetchJobVariants to return nil, err on error, and update queryTestStatus to
check the error from fetchJobVariants and handle it appropriately.
In `@pkg/sippyserver/server.go`:
- Around line 414-416: The current capability assembly appends
ComponentReadinessCapability when either s.bigQueryClient or s.crDataProvider is
set, which incorrectly advertises BigQuery-backed routes when only Postgres/CR
is available; change the logic to register separate capabilities (e.g., keep
ComponentReadinessCapability for CR only: append it only when s.crDataProvider
!= nil) and add a distinct BigQueryReadinessCapability (or similar) that is
appended only when s.bigQueryClient != nil, and ensure route
registration/handler exposure for endpoints like the handlers tied to
/api/job/run/payload, /api/pull_requests/test_results, /api/tests/v2/runs,
/api/tests/capabilities, and /api/tests/lifecycles is gated on s.bigQueryClient
being non-nil (or remove those routes when BigQueryReadinessCapability is not
present).
In `@scripts/e2e.sh`:
- Around line 117-121: The loop masks failures from the jq command substitution:
ensure parsing the VIEWS JSON with jq ('jq -r ".[].name"') is executed and its
exit status is checked before iterating so parsing errors cause an immediate
failure; specifically, run the jq parse against the VIEWS content into a checked
value (or temp) and verify jq succeeded and produced non-empty output before the
for VIEW in ... loop (references: VIEWS variable, the for VIEW loop, and the jq
-r '.[].name' invocation), failing fast with an error message if jq fails or
yields no views.
---
Nitpick comments:
In `@pkg/api/componentreadiness/dataprovider/bigquery/provider_test.go`:
- Around line 96-104: Replace the manual reflect.DeepEqual + t.Errorf check with
the same testify assertion used above: call assert.Equal(t, tt.expected, result)
and keep the existing assert for skipQuery; update the test loop inside Test to
use assert.Equal for both comparisons (result and skipQuery) referencing the
function copyIncludeVariantsAndRemoveOverrides, and then remove the now-unused
reflect import from the file.
In `@pkg/api/componentreadiness/test_details.go`:
- Around line 199-202: The unconditional call to
c.dataProvider.QueryReleaseDates with c.ReqOptions causes unnecessary queries
for every details report; change the logic so QueryReleaseDates is only invoked
when BaseOverrideRelease is set (i.e., wrap the call in the branch that checks
c.BaseOverrideRelease) or, for the multi-test priming path, move the
QueryReleaseDates call up to the caller and thread the resulting timeRanges into
the details reporting path (add an extra parameter or field used by the function
that builds testdetails.Report). Ensure you reference QueryReleaseDates,
BaseOverrideRelease, c.ReqOptions and the function that returns
testdetails.Report when making the change so the lookup is performed only when
needed.
In `@pkg/apis/api/componentreport/crstatus/types.go`:
- Around line 56-57: TestKey and TestKeyStr in the crstatus DTO are transient
computed fields and must be excluded from GORM; update the struct that defines
TestKey (type crtest.KeyWithVariants) and TestKeyStr (string) by adding the
struct tag gorm:"-" to each field so GORM will ignore them during
scanning/migrations (ensure tags are added on the fields named TestKey and
TestKeyStr in the TestJobRunRows/crstatus type).
In `@test/e2e/componentreadiness/componentreadiness_test.go`:
- Around line 220-224: The test TestRegressionByIDNotFound currently only checks
that an error is returned; update it to assert the error represents a
404/not-found response by checking the error content or type from util.SippyGet
(e.g., assert that err.Error() contains "404" or use errors.Is()/errors.As() if
util.SippyGet returns a typed HTTP error). Locate TestRegressionByIDNotFound and
modify the require/assert after calling util.SippyGet to verify the specific 404
condition while still keeping the existing models.TestRegression variable and
error check.
In `@test/e2e/componentreadiness/report/report_test.go`:
- Around line 20-49: The test duplicates release windows, variant grouping and
thresholds instead of using the shared e2e view; in setupProvider replace the
hard-coded reqOptions construction (the reqopts.RequestOptions assigned to
reqOptions) with a call to the shared helper in test/e2e/util that builds
RequestOptions from the config/e2e-views.yaml (or parse that YAML into the same
reqopts.Release, reqopts.Variants and reqopts.Advanced values), so setupProvider
uses the common view definition rather than duplicating values.
In `@test/e2e/datasync/datasync_test.go`:
- Around line 45-50: The test currently only checks cmd.Run() succeeded and that
Table("prow_job_runs").Count returned, but doesn't assert the dataset changed;
update the test around cmd.Run()/countBefore to compute and assert a positive
delta on prow_job_runs: after running cmd.Run() call
dbc.DB.Table("prow_job_runs").Count(&countAfter) and require.Greater(t,
countAfter, countBefore) (or alternatively scope the count to rows created
during this run by filtering on a run-specific marker or a created_at timestamp
range via dbc.DB.Where(...).Count(&countAfter) and assert the delta > 0) so the
test fails if sippy load is a no-op.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 047d1cbf-6556-43c7-8877-5f217b0f5940
📒 Files selected for processing (46)
.gitignorecmd/sippy/annotatejobruns.gocmd/sippy/automatejira.gocmd/sippy/component_readiness.gocmd/sippy/load.gocmd/sippy/seed_data.gocmd/sippy/serve.goconfig/e2e-openshift.yamlconfig/e2e-views.yamle2e-scripts/sippy-e2e-sippy-e2e-setup-commands.she2e-scripts/sippy-e2e-sippy-e2e-test-commands.shpkg/api/componentreadiness/component_report.gopkg/api/componentreadiness/component_report_test.gopkg/api/componentreadiness/dataprovider/bigquery/provider.gopkg/api/componentreadiness/dataprovider/bigquery/provider_test.gopkg/api/componentreadiness/dataprovider/bigquery/querygenerators.gopkg/api/componentreadiness/dataprovider/bigquery/querygenerators_test.gopkg/api/componentreadiness/dataprovider/bigquery/releasedates.gopkg/api/componentreadiness/dataprovider/interface.gopkg/api/componentreadiness/dataprovider/postgres/provider.gopkg/api/componentreadiness/middleware/interface.gopkg/api/componentreadiness/middleware/linkinjector/linkinjector.gopkg/api/componentreadiness/middleware/list.gopkg/api/componentreadiness/middleware/regressionallowances/regressionallowances.gopkg/api/componentreadiness/middleware/regressiontracker/regressiontracker.gopkg/api/componentreadiness/middleware/releasefallback/releasefallback.gopkg/api/componentreadiness/middleware/releasefallback/releasefallback_test.gopkg/api/componentreadiness/regressiontracker.gopkg/api/componentreadiness/test_details.gopkg/api/componentreadiness/utils/utils.gopkg/apis/api/componentreport/crstatus/types.gopkg/apis/api/componentreport/testdetails/types.gopkg/bigquery/bqlabel/labels.gopkg/componentreadiness/jiraautomator/jiraautomator.gopkg/componentreadiness/jobrunannotator/jobrunannotator.gopkg/dataloader/crcacheloader/crcacheloader.gopkg/sippyserver/metrics/metrics.gopkg/sippyserver/server.goscripts/e2e.shtest/e2e/componentreadiness/componentreadiness_test.gotest/e2e/componentreadiness/regressiontracker/regressiontracker_test.gotest/e2e/componentreadiness/report/report_test.gotest/e2e/componentreadiness/triage/triageapi_test.gotest/e2e/datasync/datasync_test.gotest/e2e/util/cache_manipulator.gotest/e2e/util/e2erequest.go
💤 Files with no reviewable changes (1)
- test/e2e/util/cache_manipulator.go
| RunE: func(cmd *cobra.Command, args []string) error { | ||
| if strings.Contains(f.DBFlags.DSN, "amazonaws.com") { | ||
| return fmt.Errorf("refusing to seed synthetic data into a production database") | ||
| } |
There was a problem hiding this comment.
This production-DB guard still fails open.
Line 65 only blocks DSNs containing amazonaws.com. Any other remote/prod Postgres endpoint—or a connection target assembled outside f.DBFlags.DSN—will still accept destructive synthetic writes. For this command, prefer an explicit opt-in flag or a strict allowlist of local/e2e hosts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/sippy/seed_data.go` around lines 64 - 67, The current RunE check only
rejects DSNs containing "amazonaws.com" and can still write to other remote/prod
DBs; update the seed command to require explicit opt-in and/or a strict
allowlist: parse the DSN in RunE (use f.DBFlags.DSN) to extract the host, reject
unless either a new boolean flag (e.g., --confirm-seed or --force-seed) is
passed or the host is in a white-list of safe hosts (localhost, 127.0.0.1, ::1,
UNIX socket, and any e2e test host you accept); ensure the logic lives inside
the existing RunE handler and uses the parsed host (not substring matching) so
non-local endpoints are refused by default.
| ${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout ${SIPPY_LOAD_TIMEOUT} | ||
| retVal=$? | ||
| set -e | ||
|
|
||
| job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-load-job --output=jsonpath='{.items[0].metadata.name}') | ||
| ${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-load.log | ||
| job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-seed-job --output=jsonpath='{.items[0].metadata.name}') | ||
| ${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-seed.log |
There was a problem hiding this comment.
Quote variables to prevent word splitting.
Shell variables should be double-quoted to prevent globbing and word splitting issues.
🔧 Suggested fix
-${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout ${SIPPY_LOAD_TIMEOUT}
+${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout "${SIPPY_LOAD_TIMEOUT}"-${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-seed.log
+${KUBECTL_CMD} -n sippy-e2e logs "${job_pod}" > "${ARTIFACT_DIR}/sippy-seed.log"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout ${SIPPY_LOAD_TIMEOUT} | |
| retVal=$? | |
| set -e | |
| job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-load-job --output=jsonpath='{.items[0].metadata.name}') | |
| ${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-load.log | |
| job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-seed-job --output=jsonpath='{.items[0].metadata.name}') | |
| ${KUBECTL_CMD} -n sippy-e2e logs ${job_pod} > ${ARTIFACT_DIR}/sippy-seed.log | |
| ${KUBECTL_CMD} -n sippy-e2e wait --for=condition=complete job/sippy-seed-job --timeout "${SIPPY_LOAD_TIMEOUT}" | |
| retVal=$? | |
| set -e | |
| job_pod=$(${KUBECTL_CMD} -n sippy-e2e get pod --selector=job-name=sippy-seed-job --output=jsonpath='{.items[0].metadata.name}') | |
| ${KUBECTL_CMD} -n sippy-e2e logs "${job_pod}" > "${ARTIFACT_DIR}/sippy-seed.log" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 246-246: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 251-251: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 251-251: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@e2e-scripts/sippy-e2e-sippy-e2e-setup-commands.sh` around lines 246 - 251,
The commands in the sippy seed job block use unquoted shell variables which can
cause word-splitting/globbing; update usages to quote variables like
"${KUBECTL_CMD}", "${SIPPY_LOAD_TIMEOUT}", "$retVal", and "${ARTIFACT_DIR}" and
quote the command substitution for job_pod (job_pod="$(${KUBECTL_CMD} -n
sippy-e2e get pod --selector=job-name=sippy-seed-job
--output=jsonpath='{.items[0].metadata.name}')") and use "${job_pod}" when
passing to kubectl logs so all variables are safely expanded (refer to
KUBECTL_CMD, SIPPY_LOAD_TIMEOUT, retVal, job_pod, ARTIFACT_DIR, sippy-seed-job).
| provider dataprovider.DataProvider, | ||
| dbc *db.DB, |
There was a problem hiding this comment.
Don't keep BigQuery as a mandatory constructor dependency here.
Line 151 now builds reports through j.dataProvider, but the constructor still hard-fails when bqClient or its cache is missing. That keeps Postgres-backed Jira automation from starting even though this type no longer uses j.bqClient elsewhere in the file. Validate provider here instead, and make BigQuery optional or move that requirement to the variant-mapping path.
Also applies to: 87-98, 151-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/componentreadiness/jiraautomator/jiraautomator.go` around lines 74 - 75,
The constructor for JiraAutomator currently requires a BigQuery client even
though report building uses j.dataProvider; update the NewJiraAutomator
constructor (or the JiraAutomator constructor function) to stop hard-failing
when bqClient or its cache is nil: make the bqClient parameter optional (allow
nil) or remove it from the mandatory parameter list, validate that provider
(dataprovider.DataProvider) is non-nil instead, and set j.bqClient only if
provided. Also move any strict checks for bqClient into the specific
variant-mapping/reporting code paths that actually need BigQuery (check
j.bqClient and return a clear error where used), and remove the global hard
exit/validation that currently references j.bqClient in the constructor and in
the code around lines 87-98 and 151-152.
pkg/sippyserver/metrics/metrics.go
Outdated
| func RefreshMetricsDB(ctx context.Context, dbc *db.DB, bqc *bqclient.Client, crProvider dataprovider.DataProvider, reportEnd time.Time, cacheOptions cache.RequestOptions, views []crview.View) error { | ||
| start := time.Now() | ||
| log.Info("beginning refresh metrics") | ||
| releases, err := api.GetReleases(context.Background(), bqc, false) | ||
|
|
||
| releases, err := crProvider.QueryReleases(ctx) |
There was a problem hiding this comment.
Guard crProvider before using it.
Line 122 dereferences crProvider before the later nil check, so configurations that intentionally skip component-readiness metrics will panic here instead of refreshing the other metric families.
Suggested fix
- releases, err := crProvider.QueryReleases(ctx)
- if err != nil {
- return err
- }
+ var releases []v1.Release
+ var err error
+ switch {
+ case crProvider != nil:
+ releases, err = crProvider.QueryReleases(ctx)
+ case bqc != nil:
+ releases, err = api.GetReleases(ctx, bqc, false)
+ default:
+ releases = nil
+ }
+ if err != nil {
+ return err
+ }Also applies to: 168-170
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sippyserver/metrics/metrics.go` around lines 118 - 122, In
RefreshMetricsDB, guard against a nil crProvider before invoking its methods
(e.g., crProvider.QueryReleases and later uses around the block referenced at
lines ~168-170); check if crProvider == nil early in the function and skip
component-readiness-specific calls while still allowing the rest of
RefreshMetricsDB to run, or return a controlled error if that is the intended
behavior, and ensure any later references to crProvider are similarly protected
(use the function name RefreshMetricsDB and the variable crProvider to locate
and update the checks).
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
pkg/sippyserver/server.go (2)
182-193:⚠️ Potential issue | 🟠 MajorKeep request parsing and execution on the same release catalog.
Line 1015 parses the request against
s.getReleases(req.Context()), but Line 1030 delegates tocomponentreadiness.GetComponentReport, which still re-fetches releases viaprovider.QueryReleases(...)inpkg/api/componentreadiness/component_report.go. When both backends are configured, one request can be validated against BigQuery releases and executed against provider releases. Pass the sameallReleasesslice through report generation instead of re-querying it.As per coding guidelines, "Avoid calling the same utility function multiple times with identical arguments in the same code path."
Also applies to: 1015-1036
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sippyserver/server.go` around lines 182 - 193, The request is validated against s.getReleases(...) but componentreadiness.GetComponentReport re-queries the provider, causing inconsistent catalogs; change GetComponentReport to accept a pre-fetched []sippyv1.Release (e.g., add a parameter allReleases) and update the call site that currently calls componentreadiness.GetComponentReport(...) to pass the allReleases slice returned by s.getReleases(ctx), then remove or bypass the provider.QueryReleases(...) call inside pkg/api/componentreadiness/component_report.go so the report generation uses the supplied allReleases instead of re-fetching.
414-416:⚠️ Potential issue | 🟠 MajorSplit
ComponentReadinessCapabilityby backend dependency.Lines 414-416 still enable one shared capability when either backend exists, but these routes no longer share the same dependency. In postgres-only mode,
/api/job/run/payload,/api/pull_requests/test_results,/api/tests/v2/runs,/api/tests/capabilities, and/api/tests/lifecyclespass capability checks and then 400 becauses.bigQueryClientis nil; in BigQuery-only mode,/api/job_variantsand the component-readiness report handlers now fail the same way becauses.crDataProvideris nil. Gate routes on the concrete backend they use, or split this capability.Also applies to: 2304-2308, 2443-2446, 2456-2463, 2534-2561, 2620-2635
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sippyserver/server.go` around lines 414 - 416, The code currently appends a single ComponentReadinessCapability when either s.bigQueryClient or s.crDataProvider exists, which wrongly grants access to routes that require a specific backend; change this to gate capabilities by backend: add two distinct capabilities (e.g., ComponentReadinessBigQueryCapability and ComponentReadinessCRDataProviderCapability) or append capabilities conditionally per backend by checking s.bigQueryClient and s.crDataProvider separately, and then register each route/handler to require the concrete capability that matches the backend it uses (routes referencing s.bigQueryClient should require the BigQuery capability; routes referencing s.crDataProvider should require the CR data provider capability).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh`:
- Around line 172-177: The script currently checks for covcounters.* under
${COVDIR} but always runs go tool covdata against ${COVDIR}; update the logic to
detect the actual directory that contains the copied coverage files and use that
directory as the input root for go tool covdata. Specifically, use find to
locate the first directory containing 'covcounters.*' (e.g., assign it to a
variable like COVERAGE_ROOT) after the kubectl cp step, and replace the fixed
${COVDIR} arguments in the go tool covdata percent and textfmt commands with
that COVERAGE_ROOT (fall back to ${COVDIR} if find fails).
- Around line 56-57: The pod command switches to /bin/sippy-cover which is not
built or copied into the image; update the build and image or revert the
command: either add a Makefile build target for sippy-cover and update the
Dockerfile to build and COPY the resulting sippy-cover binary into the image
(alongside sippy, sippy-daemon, fetchdata.sh), or change the pod command back to
/bin/sippy (or another existing binary) in
e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh so the entrypoint references a
binary that actually exists in the image.
---
Duplicate comments:
In `@pkg/sippyserver/server.go`:
- Around line 182-193: The request is validated against s.getReleases(...) but
componentreadiness.GetComponentReport re-queries the provider, causing
inconsistent catalogs; change GetComponentReport to accept a pre-fetched
[]sippyv1.Release (e.g., add a parameter allReleases) and update the call site
that currently calls componentreadiness.GetComponentReport(...) to pass the
allReleases slice returned by s.getReleases(ctx), then remove or bypass the
provider.QueryReleases(...) call inside
pkg/api/componentreadiness/component_report.go so the report generation uses the
supplied allReleases instead of re-fetching.
- Around line 414-416: The code currently appends a single
ComponentReadinessCapability when either s.bigQueryClient or s.crDataProvider
exists, which wrongly grants access to routes that require a specific backend;
change this to gate capabilities by backend: add two distinct capabilities
(e.g., ComponentReadinessBigQueryCapability and
ComponentReadinessCRDataProviderCapability) or append capabilities conditionally
per backend by checking s.bigQueryClient and s.crDataProvider separately, and
then register each route/handler to require the concrete capability that matches
the backend it uses (routes referencing s.bigQueryClient should require the
BigQuery capability; routes referencing s.crDataProvider should require the CR
data provider capability).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 36ae543a-4a29-473d-80a4-60fc33ec3b52
📒 Files selected for processing (3)
e2e-scripts/sippy-e2e-sippy-e2e-test-commands.shpkg/componentreadiness/jiraautomator/jiraautomator.gopkg/sippyserver/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/componentreadiness/jiraautomator/jiraautomator.go
| ${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true | ||
|
|
||
| if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then | ||
| echo "Generating coverage report..." | ||
| go tool covdata percent -i="${COVDIR}" | ||
| go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out" |
There was a problem hiding this comment.
Use the directory that actually contains the copied coverage files.
Line 174 searches recursively, but Lines 176-177 always read ${COVDIR}. If the coverage files land under a child directory, the check passes and go tool covdata still points at the wrong root. Because the script later exits with ${TEST_EXIT}, that failure just drops the coverage artifact.
Possible fix
COVDIR=$(mktemp -d)
${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true
-if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then
+if COVDATA_FILE=$(find "${COVDIR}" -type f \( -name 'covmeta.*' -o -name 'covcounters.*' \) 2>/dev/null | head -n1) \
+ && [ -n "${COVDATA_FILE}" ]; then
+ COVDATA_DIR=$(dirname "${COVDATA_FILE}")
echo "Generating coverage report..."
- go tool covdata percent -i="${COVDIR}"
- go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"
+ go tool covdata percent -i="${COVDATA_DIR}"
+ go tool covdata textfmt -i="${COVDATA_DIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"
# Merge test binary coverage (from -coverprofile) into server binary coverage
if [ -f "${ARTIFACT_DIR}/e2e-test-coverage.out" ]; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true | |
| if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then | |
| echo "Generating coverage report..." | |
| go tool covdata percent -i="${COVDIR}" | |
| go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out" | |
| ${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true | |
| if COVDATA_FILE=$(find "${COVDIR}" -type f \( -name 'covmeta.*' -o -name 'covcounters.*' \) 2>/dev/null | head -n1) \ | |
| && [ -n "${COVDATA_FILE}" ]; then | |
| COVDATA_DIR=$(dirname "${COVDATA_FILE}") | |
| echo "Generating coverage report..." | |
| go tool covdata percent -i="${COVDATA_DIR}" | |
| go tool covdata textfmt -i="${COVDATA_DIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` around lines 172 - 177, The
script currently checks for covcounters.* under ${COVDIR} but always runs go
tool covdata against ${COVDIR}; update the logic to detect the actual directory
that contains the copied coverage files and use that directory as the input root
for go tool covdata. Specifically, use find to locate the first directory
containing 'covcounters.*' (e.g., assign it to a variable like COVERAGE_ROOT)
after the kubectl cp step, and replace the fixed ${COVDIR} arguments in the go
tool covdata percent and textfmt commands with that COVERAGE_ROOT (fall back to
${COVDIR} if find fails).
Build a coverage-instrumented sippy-cover binary in the Dockerfile and create a PVC in the CI setup script to persist coverage data across pod restarts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- automatejira: include error details in GetJobVariants failure - seed_data: make test_failures UPDATE failure fatal - serve: initialize GCS client independently of data provider - component_report: recompute cell status from post-analysis results so middleware (triage) can improve cells, not just worsen them - postgres provider: propagate context into query methods, add job name filter (periodic/release/aggregator) for BQ parity - metrics: guard against nil crProvider to prevent panic - triage tests: sort regressions by TestID for deterministic ordering - setup script: quote shell variables Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/test e2e |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
pkg/api/componentreadiness/dataprovider/postgres/provider.go (1)
343-345:⚠️ Potential issue | 🟠 MajorDon't turn variant lookup failures into empty reports.
If the follow-up
prow_jobsquery fails, this falls back to an emptyjobVariantMapand the!okcheck later silently drops every aggregated row. Return the DB error instead of synthesizing an empty result set.Suggested fix
- jobVariantMap := p.fetchJobVariants(rows) + jobVariantMap, err := p.fetchJobVariants(rows) + if err != nil { + return nil, []error{err} + } @@ -func (p *PostgresProvider) fetchJobVariants(rows []testStatusRow) map[uint]map[string]string { +func (p *PostgresProvider) fetchJobVariants(rows []testStatusRow) (map[uint]map[string]string, error) { @@ - if err := p.dbc.DB.Raw(`SELECT id, variants FROM prow_jobs WHERE id IN (?)`, ids).Scan(&jobRows).Error; err != nil { - log.WithError(err).Error("error fetching job variants") - return map[uint]map[string]string{} + if err := p.dbc.DB.Raw(`SELECT id, variants FROM prow_jobs WHERE id IN (?)`, ids).Scan(&jobRows).Error; err != nil { + return nil, fmt.Errorf("fetching job variants: %w", err) } @@ - return result + return result, nil }Also applies to: 397-418
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go` around lines 343 - 345, The current code turns failures from the follow-up prow_jobs query into an empty jobVariantMap by calling p.fetchJobVariants(rows) and proceeding, which causes aggregated rows to be silently dropped; change p.fetchJobVariants to return (map[string]YourVariantType, error) (or otherwise surface the DB error), update the call site where jobVariantMap := p.fetchJobVariants(rows) to check and return the error instead of using an empty map (and adjust the subsequent !ok handling accordingly), and make the same change for the second occurrence around the 397-418 block so any DB query error is propagated rather than treated as an empty result.cmd/sippy/seed_data.go (1)
64-67:⚠️ Potential issue | 🟠 MajorRefuse non-local DSNs by default.
Blocking only DSNs containing
amazonaws.comstill allows destructive writes to any other remote Postgres host. For a command that seeds synthetic data and refreshes regressions, parse the DSN and require either a strict localhost/e2e allowlist or an explicit confirmation flag.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/sippy/seed_data.go` around lines 64 - 67, The RunE handler currently only blocks DSNs containing "amazonaws.com"; instead parse f.DBFlags.DSN (the DB connection string) using url.Parse (or a DSN parser) to extract the host and refuse any non-local hosts by default (allowlist only localhost, 127.0.0.1, ::1 and unix socket/empty host and any e2e-specific host tokens you require), and add an explicit boolean flag (e.g. --force or --confirm) that must be set to permit seeding into remote hosts; update the RunE logic to check parsedHost against the allowlist and the confirm flag before proceeding (use the symbols RunE and f.DBFlags.DSN to locate the change).
🧹 Nitpick comments (1)
Dockerfile (1)
7-13: Avoid shipping the coverage binary in the default runtime image.Line 7 and Line 13 make coverage build/copy unconditional, which adds build overhead and runtime image bloat even when e2e coverage is not needed. Consider moving
sippy-coverto a dedicated e2e stage/target so the default image stays lean.Proposed Dockerfile split (default runtime vs e2e coverage target)
FROM registry.access.redhat.com/ubi9/ubi:latest AS builder WORKDIR /go/src/sippy COPY . . ENV PATH="/go/bin:${PATH}" ENV GOPATH="/go" RUN dnf module enable nodejs:18 -y && dnf install -y go make npm && make build -RUN go build -cover -coverpkg=./cmd/...,./pkg/... -mod=vendor -o ./sippy-cover ./cmd/sippy + +FROM builder AS cover-builder +RUN go build -cover -coverpkg=./cmd/...,./pkg/... -mod=vendor -o ./sippy-cover ./cmd/sippy FROM registry.access.redhat.com/ubi9/ubi:latest AS base RUN mkdir -p /historical-data RUN mkdir -p /config COPY --from=builder /go/src/sippy/sippy /bin/sippy -COPY --from=builder /go/src/sippy/sippy-cover /bin/sippy-cover COPY --from=builder /go/src/sippy/sippy-daemon /bin/sippy-daemon COPY --from=builder /go/src/sippy/scripts/fetchdata.sh /bin/fetchdata.sh COPY --from=builder /go/src/sippy/config/*.yaml /config/ ENTRYPOINT ["/bin/sippy"] EXPOSE 8080 + +FROM base AS e2e +COPY --from=cover-builder /go/src/sippy/sippy-cover /bin/sippy-coverAs per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 7 - 13, The Dockerfile currently always builds and copies the coverage binary (sippy-cover) — via the go build -cover... producing ./sippy-cover and COPY --from=builder .../sippy-cover /bin/sippy-cover — which bloats the default runtime image; split the Dockerfile into two targets: keep the existing builder stage but move the coverage build step (the RUN go build -cover -coverpkg=... producing ./sippy-cover) into a dedicated e2e coverage stage/target (or conditionally run it via a build-arg/target), and remove the COPY --from=builder .../sippy-cover line from the default runtime/base stage so only the normal sippy binary (/bin/sippy) is copied into the base image; ensure the new e2e stage still copies sippy-cover into its image when that target is built.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 491-516: testDetailQuery currently reads prow_job_run_tests
directly and can emit duplicate rows; mirror the deduplication used by
testStatusQuery by introducing the same CTE (the deduping of prow_job_run_tests
per run/test/suite with status precedence) and use that CTE in place of direct
joins to prow_job_run_tests when building testDetailQuery and the other detail
query around lines 572-637 so crstatus.TestJobRunRows is populated from the
deduped/resultant rows; update JOINs to reference the deduped CTE name and
ensure the status precedence logic from testStatusQuery is preserved.
---
Duplicate comments:
In `@cmd/sippy/seed_data.go`:
- Around line 64-67: The RunE handler currently only blocks DSNs containing
"amazonaws.com"; instead parse f.DBFlags.DSN (the DB connection string) using
url.Parse (or a DSN parser) to extract the host and refuse any non-local hosts
by default (allowlist only localhost, 127.0.0.1, ::1 and unix socket/empty host
and any e2e-specific host tokens you require), and add an explicit boolean flag
(e.g. --force or --confirm) that must be set to permit seeding into remote
hosts; update the RunE logic to check parsedHost against the allowlist and the
confirm flag before proceeding (use the symbols RunE and f.DBFlags.DSN to locate
the change).
In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go`:
- Around line 343-345: The current code turns failures from the follow-up
prow_jobs query into an empty jobVariantMap by calling p.fetchJobVariants(rows)
and proceeding, which causes aggregated rows to be silently dropped; change
p.fetchJobVariants to return (map[string]YourVariantType, error) (or otherwise
surface the DB error), update the call site where jobVariantMap :=
p.fetchJobVariants(rows) to check and return the error instead of using an empty
map (and adjust the subsequent !ok handling accordingly), and make the same
change for the second occurrence around the 397-418 block so any DB query error
is propagated rather than treated as an empty result.
---
Nitpick comments:
In `@Dockerfile`:
- Around line 7-13: The Dockerfile currently always builds and copies the
coverage binary (sippy-cover) — via the go build -cover... producing
./sippy-cover and COPY --from=builder .../sippy-cover /bin/sippy-cover — which
bloats the default runtime image; split the Dockerfile into two targets: keep
the existing builder stage but move the coverage build step (the RUN go build
-cover -coverpkg=... producing ./sippy-cover) into a dedicated e2e coverage
stage/target (or conditionally run it via a build-arg/target), and remove the
COPY --from=builder .../sippy-cover line from the default runtime/base stage so
only the normal sippy binary (/bin/sippy) is copied into the base image; ensure
the new e2e stage still copies sippy-cover into its image when that target is
built.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 0229d4ea-3303-45c4-af8a-6f197ad5b0b1
📒 Files selected for processing (10)
.gitignoreDockerfilecmd/sippy/automatejira.gocmd/sippy/seed_data.gocmd/sippy/serve.goe2e-scripts/sippy-e2e-sippy-e2e-setup-commands.shpkg/api/componentreadiness/component_report.gopkg/api/componentreadiness/dataprovider/postgres/provider.gopkg/sippyserver/metrics/metrics.gotest/e2e/componentreadiness/triage/triageapi_test.go
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/sippy/automatejira.go
| const testDetailQuery = ` | ||
| SELECT | ||
| tow.unique_id AS test_id, | ||
| t.name AS test_name, | ||
| pj.name AS prowjob_name, | ||
| CAST(pjr.id AS TEXT) AS prowjob_run_id, | ||
| COALESCE(pjr.url, '') AS prowjob_url, | ||
| pjr.timestamp AS prowjob_start, | ||
| pj.id AS prow_job_id, | ||
| pjrt.status, | ||
| COALESCE(tow.jira_component, '') AS jira_component, | ||
| tow.jira_component_id, | ||
| tow.capabilities | ||
| FROM prow_job_run_tests pjrt | ||
| JOIN prow_job_runs pjr ON pjr.id = pjrt.prow_job_run_id | ||
| JOIN prow_jobs pj ON pj.id = pjr.prow_job_id | ||
| JOIN tests t ON t.id = pjrt.test_id | ||
| JOIN test_ownerships tow ON tow.test_id = pjrt.test_id | ||
| AND (tow.suite_id = pjrt.suite_id OR (tow.suite_id IS NULL AND pjrt.suite_id IS NULL)) | ||
| WHERE pj.release = ? | ||
| AND pjr.timestamp >= ? AND pjr.timestamp < ? | ||
| AND pjrt.deleted_at IS NULL AND pjr.deleted_at IS NULL AND pj.deleted_at IS NULL | ||
| AND tow.staff_approved_obsolete = false | ||
| AND (pjr.labels IS NULL OR NOT pjr.labels @> ARRAY['InfraFailure']) | ||
| ORDER BY pjr.timestamp | ||
| ` |
There was a problem hiding this comment.
Deduplicate test-detail rows the same way as status aggregation.
testStatusQuery already collapses duplicate prow_job_run_tests rows per run/test/suite before counting, but testDetailQuery reads the raw table directly. On real data with duplicate uploads, the test-details endpoints will emit the same run more than once and inflate pass/fail counts. Reuse the same deduping CTE and status precedence here before building crstatus.TestJobRunRows.
Also applies to: 572-637
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/api/componentreadiness/dataprovider/postgres/provider.go` around lines
491 - 516, testDetailQuery currently reads prow_job_run_tests directly and can
emit duplicate rows; mirror the deduplication used by testStatusQuery by
introducing the same CTE (the deduping of prow_job_run_tests per run/test/suite
with status precedence) and use that CTE in place of direct joins to
prow_job_run_tests when building testDetailQuery and the other detail query
around lines 572-637 so crstatus.TestJobRunRows is populated from the
deduped/resultant rows; update JOINs to reference the deduped CTE name and
ensure the status precedence logic from testStatusQuery is preserved.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
scripts/e2e.sh (1)
118-123:⚠️ Potential issue | 🟠 MajorStill fail fast on
jqerrors during cache priming.
curlis checked now, but Line 119 still ignoresjqfailures. Ifjqis missing or the response shape changes, the loop runs zero times and Line 123 still printsCache priming complete, which pushes the failure into later flaky triage tests.Possible fix
VIEWS=$(curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness/views") || { echo "Failed to fetch views"; exit 1; } -for VIEW in $(echo "$VIEWS" | jq -r '.[].name'); do +VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || { echo "Failed to parse views"; exit 1; } +for VIEW in $VIEW_NAMES; do echo " Priming cache for view: $VIEW" curl -sf "http://localhost:$SIPPY_API_PORT/api/component_readiness?view=$VIEW" > /dev/null || { echo "Failed to prime cache for view: $VIEW"; exit 1; } done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/e2e.sh` around lines 118 - 123, The cache-priming loop currently ignores jq failures because VIEWS assignment doesn't validate jq output; update the VIEWS assignment and loop to abort on jq errors or empty output by using jq's exit status (e.g., jq -e) or explicitly checking that VIEWS is valid JSON and contains at least one name before iterating; ensure the variable VIEWS, the jq invocation that extracts '.[].name', and the loop over VIEW are changed so a jq failure causes the script to echo a clear error and exit non-zero instead of silently printing "Cache priming complete".e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh (1)
171-177:⚠️ Potential issue | 🟠 MajorUse the copied coverage directory as the
covdatainput root.
kubectl cp coverage-helper:/tmp/coverage "${COVDIR}"creates acoverage/subdirectory under${COVDIR}. The recursive check on Line 174 can pass while Lines 176-177 still pointgo tool covdataat the parent directory, so artifact generation drops coverage even though the files were copied.Possible fix
COVDIR=$(mktemp -d) ${KUBECTL_CMD} -n sippy-e2e cp coverage-helper:/tmp/coverage "${COVDIR}" -c helper || true -if find "${COVDIR}" -name 'covcounters.*' -print -quit 2>/dev/null | grep -q .; then +if COVDATA_FILE=$(find "${COVDIR}" -type f \( -name 'covmeta.*' -o -name 'covcounters.*' \) -print -quit 2>/dev/null) \ + && [ -n "${COVDATA_FILE}" ]; then + COVDATA_DIR=$(dirname "${COVDATA_FILE}") echo "Generating coverage report..." - go tool covdata percent -i="${COVDIR}" - go tool covdata textfmt -i="${COVDIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out" + go tool covdata percent -i="${COVDATA_DIR}" + go tool covdata textfmt -i="${COVDATA_DIR}" -o="${ARTIFACT_DIR}/e2e-coverage.out"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` around lines 171 - 177, The copied coverage files end up under a coverage/ subdirectory of COVDIR but the covdata commands use COVDIR itself; update the script so the covdata input root points to that coverage subdirectory (e.g., derive a COVROOT from COVDIR/coverage or locate the coverage dir and pass that to the go tool covdata percent and go tool covdata textfmt commands) so the coverage files found by the earlier check are the same files fed to covdata; update references to COVDIR in the covdata calls (symbols: COVDIR, go tool covdata percent, go tool covdata textfmt, ${ARTIFACT_DIR}/e2e-coverage.out).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh`:
- Line 135: The current profile merge uses naive text concatenation (tail -n +2
>>) which duplicates blocks and breaks Go coverage semantics; update the merge
logic in the e2e merge step (scripts sippy-e2e-test-commands.sh and
scripts/e2e.sh where e2e-test-coverage.out and other coverage files are
combined) to use a proper Go coverage merging tool (e.g., gocovmerge or the
official go tool covdata/convert sequence) instead of tail/append: invoke the
merger to read the individual profiles (ensure they share the same -covermode
and coverpkg settings like ./pkg/... and ./cmd/...), produce a single merged
profile file, and replace the current tail-based concatenation with that merged
output so the header appears once and statement counts are summed correctly.
- Line 135: The e2e CI path runs the tests without priming the
component-readiness cache; add the same cache-warming steps used in
scripts/e2e.sh (the logic around Lines 116-123 that primes
/api/component_readiness) before the gotestsum invocation so triage cases see
cached reports; locate the CI script block that runs the gotestsum command and
insert the HTTP/kubectl call(s) that hit /api/component_readiness (reusing the
same endpoint, namespace and auth used by scripts/e2e.sh) to ensure the cache is
warmed prior to executing ./test/e2e/....
---
Duplicate comments:
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh`:
- Around line 171-177: The copied coverage files end up under a coverage/
subdirectory of COVDIR but the covdata commands use COVDIR itself; update the
script so the covdata input root points to that coverage subdirectory (e.g.,
derive a COVROOT from COVDIR/coverage or locate the coverage dir and pass that
to the go tool covdata percent and go tool covdata textfmt commands) so the
coverage files found by the earlier check are the same files fed to covdata;
update references to COVDIR in the covdata calls (symbols: COVDIR, go tool
covdata percent, go tool covdata textfmt, ${ARTIFACT_DIR}/e2e-coverage.out).
In `@scripts/e2e.sh`:
- Around line 118-123: The cache-priming loop currently ignores jq failures
because VIEWS assignment doesn't validate jq output; update the VIEWS assignment
and loop to abort on jq errors or empty output by using jq's exit status (e.g.,
jq -e) or explicitly checking that VIEWS is valid JSON and contains at least one
name before iterating; ensure the variable VIEWS, the jq invocation that
extracts '.[].name', and the loop over VIEW are changed so a jq failure causes
the script to echo a clear error and exit non-zero instead of silently printing
"Cache priming complete".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 510c67c7-4627-48e6-a103-28542ca6d88a
📒 Files selected for processing (2)
e2e-scripts/sippy-e2e-sippy-e2e-test-commands.shscripts/e2e.sh
|
|
||
| # only 1 in parallel, some tests will clash if run at the same time | ||
| gotestsum --junitfile ${ARTIFACT_DIR}/junit_e2e.xml -- ./test/e2e/... -v -p 1 | ||
| gotestsum --junitfile ${ARTIFACT_DIR}/junit_e2e.xml -- ./test/e2e/... -v -p 1 -coverprofile=${ARTIFACT_DIR}/e2e-test-coverage.out -coverpkg=./pkg/...,./cmd/... |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Instrumented server binary ==="
rg -n --context 2 'sippy-cover|coverpkg=./cmd/\.\.\.,./pkg/\.\.\.' Dockerfile
echo
echo "=== Test coverage collection ==="
sed -n '132,136p' e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh
echo
echo "=== Merge logic in CI script ==="
sed -n '178,183p' e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh
echo
echo "=== Same merge logic in local script ==="
sed -n '28,31p' scripts/e2e.shRepository: openshift/sippy
Length of output: 1666
Fix coverage merging to use proper Go coverage semantics instead of text concatenation.
Both the server binary (Dockerfile Line 7) and test coverage collection (Line 135) instrument the same packages (./cmd/... and ./pkg/...). The merge logic at Lines 179-183 (and scripts/e2e.sh Lines 29-31) uses naive text append (tail -n +2 ... >>), which duplicates coverage blocks instead of properly merging them. Go coverage profiles require proper merge semantics to avoid inaccurate coverage reports.
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 135-135: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 135-135: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` at line 135, The current
profile merge uses naive text concatenation (tail -n +2 >>) which duplicates
blocks and breaks Go coverage semantics; update the merge logic in the e2e merge
step (scripts sippy-e2e-test-commands.sh and scripts/e2e.sh where
e2e-test-coverage.out and other coverage files are combined) to use a proper Go
coverage merging tool (e.g., gocovmerge or the official go tool covdata/convert
sequence) instead of tail/append: invoke the merger to read the individual
profiles (ensure they share the same -covermode and coverpkg settings like
./pkg/... and ./cmd/...), produce a single merged profile file, and replace the
current tail-based concatenation with that merged output so the header appears
once and statement counts are summed correctly.
Warm the component-readiness cache before starting the suite.
scripts/e2e.sh now primes /api/component_readiness in Lines 116-123 because the triage cases rely on cached reports. This CI path goes straight to Line 135, so it still runs the same ./test/e2e/... suite against a cold cache.
Suggested change
${KUBECTL_CMD} -n sippy-e2e get svc,ep
+# Prime the component readiness cache so triage tests can find cached reports.
+VIEWS=$(curl -sf "http://localhost:${SIPPY_API_PORT}/api/component_readiness/views") || { echo "Failed to fetch views"; exit 1; }
+VIEW_NAMES=$(printf '%s' "$VIEWS" | jq -er '.[].name') || { echo "Failed to parse views"; exit 1; }
+for VIEW in $VIEW_NAMES; do
+ curl -sf "http://localhost:${SIPPY_API_PORT}/api/component_readiness?view=${VIEW}" > /dev/null || { echo "Failed to prime cache for view: ${VIEW}"; exit 1; }
+done
+
# only 1 in parallel, some tests will clash if run at the same time
gotestsum --junitfile ${ARTIFACT_DIR}/junit_e2e.xml -- ./test/e2e/... -v -p 1 -coverprofile=${ARTIFACT_DIR}/e2e-test-coverage.out -coverpkg=./pkg/...,./cmd/...🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 135-135: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 135-135: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@e2e-scripts/sippy-e2e-sippy-e2e-test-commands.sh` at line 135, The e2e CI
path runs the tests without priming the component-readiness cache; add the same
cache-warming steps used in scripts/e2e.sh (the logic around Lines 116-123 that
primes /api/component_readiness) before the gotestsum invocation so triage cases
see cached reports; locate the CI script block that runs the gotestsum command
and insert the HTTP/kubectl call(s) that hit /api/component_readiness (reusing
the same endpoint, namespace and auth used by scripts/e2e.sh) to ensure the
cache is warmed prior to executing ./test/e2e/....
|
Scheduling required tests: |
Prefer sippy-cover over sippy when available, and propagate GOCOVERDIR to the subprocess so load command coverage is captured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exercises the regression tracker PostAnalysis middleware path where triages adjust report cell status from SignificantRegression to SignificantTriagedRegression (or Extreme equivalent). This improves regressiontracker middleware coverage from ~20% to ~71% for PostAnalysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5c6e9f2 to
0f47bcd
Compare
|
/test e2e |
The coverage instrumentation changes inadvertently removed the GCS credential setup, causing the datasync test to skip in CI. Restore GCS_CRED, GCS_SA_JSON_PATH, and SIPPY_E2E_REPO_ROOT exports so the test runner can access them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/test e2e |
|
@stbenjam: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Note: still a WIP but if I mark it WIP, CodeRabbit won't review. Still thinking about a lot of things in this PR, and how I might make the first iteration smaller
When refactoring Component Readiness with AI assistance, the lack of tests made it difficult to verify changes didn't break anything. Adding comprehensive e2e tests made the refactor dramatically more reliable -- the AI could detect and fix regressions as it worked.
Component Readiness was tightly coupled to BigQuery. The first step was extracting a DataProvider interface, then adding implementations behind it. A mock provider with static fixtures came first, but was replaced with a PostgreSQL provider because: 1) we'd eventually like to run CR against postgres in production, 2) postgres is already available in e2e infrastructure, and 3) a real database lets us fully test triage, regressions, and other write-path features.
The e2e suite now seeds realistic synthetic data (4800 job runs across 4 releases with deterministic pass/fail rates), launches the full API server against it, and runs 70+ tests covering component reports, test details, variants, regressions, triage workflows, and the regression tracker. This also enables future UI testing with tools like Playwright or headless Chromium against a fully functional API.
Changes:
Summary by CodeRabbit
New Features
Configuration Updates
Testing
Chores