Skip to content

TRT-2564: Cache Risk Analysis responses#3409

Merged
openshift-merge-bot[bot] merged 16 commits intoopenshift:mainfrom
sosiouxme:20260402-TRT-2564-RA-caching
Apr 22, 2026
Merged

TRT-2564: Cache Risk Analysis responses#3409
openshift-merge-bot[bot] merged 16 commits intoopenshift:mainfrom
sosiouxme:20260402-TRT-2564-RA-caching

Conversation

@sosiouxme
Copy link
Copy Markdown
Member

@sosiouxme sosiouxme commented Apr 7, 2026

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

  1. Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.
  2. Query results are written to the cache with a timestamp.
  3. On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

  • New Features

    • Global caching support added across APIs, CLI commands, background processing and matview refresh tracking.
  • Changed

    • Refresh/seed flows, server endpoints and background workers now accept and use optional cache clients; matview refreshes record cache timestamps; tests and risk/result generation were refactored to be cache-aware and unified.
  • Removals

    • /api/canary endpoint removed.
  • Tests

    • Extensive tests added for cache and matview invalidation/refresh behaviors.

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 7, 2026

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

  1. Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.
  2. Query results are written to the cache with a timestamp.
  3. On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from stbenjam and xueqzhan April 7, 2026 00:05
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2026
@sosiouxme
Copy link
Copy Markdown
Member Author

/retest-required

@openshift-ci-robot
Copy link
Copy Markdown

Scheduling required tests:
/test e2e

petr-muller added a commit to petr-muller/sippy that referenced this pull request Apr 9, 2026
@sosiouxme
Copy link
Copy Markdown
Member Author

@coderabbitai help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve merge conflict to automatically resolve merge conflicts.
  • @coderabbitai autofix to automatically fix issues identified in unresolved review comments.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@sosiouxme
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Threads a cache client through CLI commands, server, processors, and API flows; introduces GetDataFromCacheOrMatview and RefreshMatviewKey; records matview refresh timestamps in cache; updates function signatures and flags to accept and propagate a cache client.

Changes

Cohort / File(s) Summary
Command entry points
cmd/sippy-daemon/main.go, cmd/sippy/load.go, cmd/sippy/refresh.go, cmd/sippy/seed_data.go
Introduce and bind CacheFlags, obtain cacheClient (nil-tolerant or fatal depending on command), and pass cacheClient into refresh/load/seed flows and into sippyserver.RefreshData. Remove pinned-time usage in favor of cache-aware refresh.
Cache core & tests
pkg/api/cache.go, pkg/api/cache_test.go
Add GetDataFromCacheOrMatview[T], RefreshMatviewKey, and update CalculateRoundedCacheDuration to honor RequestOptions.Expiry. Add tests covering nil cache, hits/misses, matview refresh semantics, timestamp comparisons, and cache write behavior.
API logic & types
pkg/api/tests.go, pkg/api/job_runs.go
Add TestResultsSpec with backend-specific build methods that use caching wrappers; rename result slice types to exported TestsAPIResult*; change callers to accept cache.Cache; adapt job-run analysis and tests generation to use cacheClient and combined result structs.
Server & processors
pkg/sippyserver/server.go, pkg/sippyserver/pr_commenting_processor.go
Thread cache.Cache into Server, WorkProcessor, and AnalysisWorker; update NewWorkProcessor and RefreshData signatures; record per-matview refresh timestamps after materialized view refreshes; update handlers to pass s.cache.
Cache API types
pkg/apis/cache/cache.go
Add Expiry time.Duration to exported RequestOptions and clarify CR time-rounding behavior in comments.
Scripts / minor docs
scripts/updatesuites.go
Add a comment noting the script’s refresh invocation does not update matview refresh timestamps; no functional change.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Cache
    participant DB
    participant BigQuery

    Client->>Server: Request (e.g., /api/tests)
    Server->>Cache: GET RefreshMatviewKey(matview)
    Cache-->>Server: refresh_time or nil

    alt no refresh_time OR refresh_time <= cache_timestamp
        Server->>Cache: GET cached data key
        Cache-->>Server: cached_value + cache_timestamp or MISS
        alt cache hit and valid
            Server->>Client: Return cached_value
        else
            Server->>DB: Query matview (or BigQuery)
            DB-->>Server: query_result
            Server->>Cache: SET cached key = {Val, Timestamp=now} (cacheDuration)
            Cache-->>Server: OK
            Server->>Client: Return generated data
        end
    else refresh_time > cache_timestamp
        Server->>DB: Query matview (or BigQuery)
        DB-->>Server: query_result
        Server->>Cache: SET RefreshMatviewKey(matview) = now (TTL 24h)
        Cache-->>Server: OK
        Server->>Cache: SET cached key = {Val, Timestamp=now} (cacheDuration)
        Cache-->>Server: OK
        Server->>Client: Return generated data
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 15 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Coverage For New Features ⚠️ Warning Pull request adds comprehensive low-level cache tests but lacks coverage for critical higher-level functionality including TestResultsSpec, RefreshData, and cache client integration. Create tests for TestResultsSpec, JobRunRiskAnalysis cache client usage, and materialized view refresh orchestration to verify end-to-end cache invalidation behavior.
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the core change: adding caching infrastructure for Risk Analysis responses, matching the primary objective of the PR to reduce repetitive queries by implementing a shared caching mechanism.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Go Error Handling ✅ Passed PR demonstrates proper Go error handling with appropriate panic/fatalf usage, correct error wrapping with %w, and consistent nil checks.
Sql Injection Prevention ✅ Passed All database queries use properly parameterized statements with placeholders (GORM's ? for PostgreSQL and BigQuery's @param style). String concatenation only involves hardcoded constants and internal configuration values, never user input.
Excessive Css In React Should Use Styles ✅ Passed This PR exclusively modifies Go backend source files; no React components or frontend code changes are present.
Single Responsibility And Clear Naming ✅ Passed Function and type names are descriptive and action-oriented with clear purposes; struct fields are appropriately organized with focused responsibility.
Stable And Deterministic Test Names ✅ Passed Repository uses Go's standard testing package with testify, not Ginkgo. All 27 test functions have stable, descriptive names without dynamic values.
Test Structure And Quality ✅ Passed PR adds standard Go unit tests using testing package and testify, not Ginkgo tests, so custom Ginkgo check is not applicable.
Microshift Test Compatibility ✅ Passed PR adds only standard Go unit tests in pkg/api/cache_test.go, not Ginkgo e2e tests, so the custom check for Ginkgo e2e tests is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. The only test file modified is pkg/api/cache_test.go, which is a unit test using testify assertions rather than Ginkgo. A new utility helper file (test/e2e/util/cache_manipulator.go) is added to support cache manipulation in e2e testing, but it contains no test code. Since no new Ginkgo e2e tests are being added, there are no SNO compatibility concerns to evaluate.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only Go source code changes with no Kubernetes manifests, operators, or scheduling constraints.
Ote Binary Stdout Contract ✅ Passed Pull request does not introduce stdout writes in process-level code that violate OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR is for the Sippy repository and does not add any new Ginkgo e2e tests; changes are limited to unit tests for caching infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 13, 2026

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

  1. Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.
  2. Query results are written to the cache with a timestamp.
  3. On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

  • New Features

  • Added caching infrastructure for test results and job run risk analysis to improve response times.

  • Implemented automatic cache invalidation tracking for materialized view refreshes to ensure data freshness.

  • Removals

  • Removed /api/canary endpoint.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/api/tests.go (2)

686-690: ⚠️ Potential issue | 🟡 Minor

Fix the copied completion log label.

This is the BigQuery path, but the message still says "buildTestsResultsFromPostgres completed". That makes latency/debug logs ambiguous when you're comparing the BQ and Postgres builders.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/tests.go` around lines 686 - 690, The completion log message is wrong
for the BigQuery path: update the Debug message in the log.WithFields block that
currently says "buildTestsResultsFromPostgres completed" to the correct label
(e.g., "buildTestsResultsFromBigQuery completed") so the elapsed/testReports
metrics are clearly attributed; locate the logging call that uses elapsed and
testReports (the log.WithFields(...) followed by .Debug(...)) and replace the
string to match the BigQuery builder function name.

322-335: ⚠️ Potential issue | 🟠 Major

Normalize equivalent period values before they reach the cache key.

buildTestsResultsFromPostgres only treats "twoDay" specially, so "", "default", and "current" all hit the same 7-day matview. Right now those values are serialized into different TestResultsSpec cache keys, which fragments the cache and cuts the hit rate for equivalent requests.

Suggested fix
 	period := req.URL.Query().Get("period")
 	if period != "" && period != "default" && period != "current" && period != "twoDay" {
 		RespondWithJSON(http.StatusBadRequest, w, map[string]interface{}{"code": http.StatusBadRequest, "message": "Unknown period"})
 		return
 	}
+	if period == "" || period == "default" {
+		period = "current"
+	}
 
 	spec := TestResultsSpec{
 		Release:        release,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/tests.go` around lines 322 - 335, Normalize the incoming period value
to a canonical token before constructing TestResultsSpec so equivalent periods
share the same cache key: map ""/"default"/"current" to the same normalized
value (e.g., ""), leaving "twoDay" untouched, then use that normalized variable
when building the TestResultsSpec and calling
spec.buildTestsResultsFromPostgres(dbc, cacheClient); update the validation
logic that currently checks period to still allow the accepted inputs but
perform the normalization immediately prior to creating the TestResultsSpec.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sippy/load.go`:
- Around line 148-152: In the cache initialization block around GetCacheClient,
fix the error variable mismatch by wrapping and returning cacheErr (not err)
when cacheErr != nil; update references in the component-readiness-cache path
and any early-return logic so that errors.Wrap or the returned error uses
cacheErr with a message like "couldn't get cache client", and keep cacheClient
nil when cacheErr is present to preserve existing error hygiene.

In `@cmd/sippy/refresh.go`:
- Around line 4-6: The current cache setup turns a best-effort write into a hard
failure; instead of returning the error when initializing the cache client, log
the error (using logrus) and set cacheClient = nil so RefreshData can proceed
(RefreshData already handles a nil cacheClient). Update the cache initialization
code that currently returns the error (around the cacheClient
setup/WriteTimestamp call) to log the failure and continue, ensuring calls to
RefreshData use the possibly-nil cacheClient.

In `@pkg/api/cache_test.go`:
- Around line 637-640: The test is supposed to hit the equality branch but loses
it because seedRefreshTimestamp serializes times with time.RFC3339 (no
sub-second precision) while cachedAt usually has nanoseconds; fix by normalizing
cachedAt to the same precision before seeding so the parsed refresh time equals
cachedAt — e.g., truncate or round cachedAt to seconds (or format+parse with
time.RFC3339) so the value passed to seedMatviewCache/cachedAt matches what
seedRefreshTimestamp writes for testMatview in mc and the equality branch is
exercised.

In `@pkg/api/cache.go`:
- Around line 212-215: The current comparison in the cache invalidation logic
uses lastRefresh.Before(cacheVal.Timestamp) which treats equal timestamps as
stale; change the condition to allow equality (i.e., treat refresh newer than
cache as miss, but equal or older as a hit). Locate the block around the cache
check (where cacheVal.Timestamp and lastRefresh are compared) and replace the
strict Before check with a comparison that returns the cached value when
lastRefresh is equal or before cacheVal.Timestamp (for example, use a
!lastRefresh.After(cacheVal.Timestamp) style check) so equal timestamps reuse
the cache.

In `@pkg/api/tests.go`:
- Around line 444-458: Change buildTestsResultsFromPostgres to accept and use a
request context and pass it into GetDataFromCacheOrMatview instead of
context.TODO(); update the local generator closure to call
spec.buildTestsResults with that ctx (so the closure uses the same context) and
ensure the GetDataFromCacheOrMatview invocation uses the incoming ctx.
Specifically, modify the signature of buildTestsResultsFromPostgres to take ctx
context.Context, change generator := func(ctx context.Context) ... to
capture/forward the outer ctx to spec.buildTestsResults, and replace the
context.TODO() argument in the GetDataFromCacheOrMatview call with the incoming
ctx; update any callers of buildTestsResultsFromPostgres accordingly.

In `@scripts/updatesuites.go`:
- Around line 153-156: The call to sippyserver.RefreshData(&db.DB{DB: dbc}, nil,
false) passes nil for the cache client so the new matview refresh timestamps are
never written; change RefreshData usage so it either receives a real cache
client or makes the script explicitly offline-only: locate
sippyserver.RefreshData and the call site using db.DB and dbc, then either (A)
construct and pass the appropriate cache client (Redis client/adapter used
elsewhere in the app) instead of nil so timestamps are persisted, or (B) add a
guard in this script to refuse to run against a live deployment (e.g., require
an --offline flag or check an env var) and return an error if no offline mode is
set, documenting that RefreshData must not be invoked with a nil cache in
production.

---

Outside diff comments:
In `@pkg/api/tests.go`:
- Around line 686-690: The completion log message is wrong for the BigQuery
path: update the Debug message in the log.WithFields block that currently says
"buildTestsResultsFromPostgres completed" to the correct label (e.g.,
"buildTestsResultsFromBigQuery completed") so the elapsed/testReports metrics
are clearly attributed; locate the logging call that uses elapsed and
testReports (the log.WithFields(...) followed by .Debug(...)) and replace the
string to match the BigQuery builder function name.
- Around line 322-335: Normalize the incoming period value to a canonical token
before constructing TestResultsSpec so equivalent periods share the same cache
key: map ""/"default"/"current" to the same normalized value (e.g., ""), leaving
"twoDay" untouched, then use that normalized variable when building the
TestResultsSpec and calling spec.buildTestsResultsFromPostgres(dbc,
cacheClient); update the validation logic that currently checks period to still
allow the accepted inputs but perform the normalization immediately prior to
creating the TestResultsSpec.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 57ca2b8b-6b2a-4111-a4fa-a6ccd9c7f29a

📥 Commits

Reviewing files that changed from the base of the PR and between c14224a and 55b4ca0.

📒 Files selected for processing (11)
  • cmd/sippy-daemon/main.go
  • cmd/sippy/load.go
  • cmd/sippy/refresh.go
  • cmd/sippy/seed_data.go
  • pkg/api/cache.go
  • pkg/api/cache_test.go
  • pkg/api/job_runs.go
  • pkg/api/tests.go
  • pkg/sippyserver/pr_commenting_processor.go
  • pkg/sippyserver/server.go
  • scripts/updatesuites.go

Comment thread cmd/sippy/load.go
Comment thread cmd/sippy/refresh.go Outdated
Comment thread pkg/api/cache_test.go Outdated
Comment thread pkg/api/cache.go Outdated
Comment thread pkg/api/tests.go Outdated
Comment thread scripts/updatesuites.go
Copy link
Copy Markdown
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
/lgtm
/approve
/meow

Some comments inline, nothing serious. Feel free to unhold or address, your call.

Comment thread pkg/api/cache.go
}
logrus.Debugf("matview %q refreshed at %v, will not use earlier cache entry from %v", matview, lastRefresh, cacheVal.Timestamp)
} else if strings.Contains(err.Error(), "connection refused") {
logrus.WithError(err).Fatalf("redis URL specified but got connection refused; exiting due to cost issues in this configuration")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fatal in a library is brutal 😁

could be a networking blip but I guess we can robustify if we run into this too often

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not know the history of why it was done this way and it did not seem likely to be a common problem that a blip would kill the server (but maybe it's been happening all the time and we just didn't notice because they're deployed redundantly)

Comment thread cmd/sippy/load.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(cacheErr, "couldn't get cache client")

an LLM pointed me here, it's not that I read all code

Comment thread pkg/api/tests.go Outdated
"elapsed": elapsed,
"reports": len(testReports),
}).Debug("BuildTestsResults completed")
}).Debug("buildTestsResultsFromPostgres completed")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}).Debug("buildTestsResultsFromPostgres completed")
}).Debug("buildTestsResultsFromBigQuery completed")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

Comment thread pkg/api/tests.go Outdated
generator := func(ctx context.Context) (testResults, []error) {
return spec.buildTestsResults(dbc, matview)
}
result, errs := GetDataFromCacheOrMatview(context.TODO(), cacheClient,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a request somewhere upstream, which means there should be a ctx somewhere too, would it be hard to plumb it here to avoid the context.TODO()?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, i guess i was being lazy

Comment thread pkg/api/tests.go Outdated
"elapsed": elapsed,
"reports": len(testReports),
}).Debug("BuildTestsResults completed")
}).Debug("buildTestsResultsFromPostgres completed")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}).Debug("buildTestsResultsFromPostgres completed")
}).Debug("buildTestsResults completed")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... actually it needs a name change anyway.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 13, 2026

@petr-muller: cat image

Details

In response to this:

/hold
/lgtm
/approve
/meow

Some comments inline, nothing serious. Feel free to unhold or address, your call.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026
Copy link
Copy Markdown
Member Author

@sosiouxme sosiouxme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edits incoming

Comment thread pkg/api/cache.go Outdated
Comment thread pkg/api/cache.go
}
logrus.Debugf("matview %q refreshed at %v, will not use earlier cache entry from %v", matview, lastRefresh, cacheVal.Timestamp)
} else if strings.Contains(err.Error(), "connection refused") {
logrus.WithError(err).Fatalf("redis URL specified but got connection refused; exiting due to cost issues in this configuration")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not know the history of why it was done this way and it did not seem likely to be a common problem that a blip would kill the server (but maybe it's been happening all the time and we just didn't notice because they're deployed redundantly)

Comment thread pkg/api/tests.go Outdated
generator := func(ctx context.Context) (testResults, []error) {
return spec.buildTestsResults(dbc, matview)
}
result, errs := GetDataFromCacheOrMatview(context.TODO(), cacheClient,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, i guess i was being lazy

Comment thread pkg/api/tests.go Outdated
"elapsed": elapsed,
"reports": len(testReports),
}).Debug("BuildTestsResults completed")
}).Debug("buildTestsResultsFromPostgres completed")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... actually it needs a name change anyway.

Comment thread pkg/api/tests.go Outdated
"elapsed": elapsed,
"reports": len(testReports),
}).Debug("BuildTestsResults completed")
}).Debug("buildTestsResultsFromPostgres completed")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

Comment thread scripts/updatesuites.go
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 15, 2026

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "4.22" instead.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

  1. Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.
  2. Query results are written to the cache with a timestamp.
  3. On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

  • New Features

  • Added caching across test results, job-run risk analysis, and API responses to improve performance and use cache-aware refresh timestamps for matview freshness.

  • CLI commands for refresh/seed now support optional cache usage.

  • Changed

  • Test and risk endpoints now leverage cache when available.

  • Removals

  • Removed /api/canary endpoint.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/api/cache.go (1)

220-221: ⚠️ Potential issue | 🟠 Major

Don't Fatal from this cache helper.

GetDataFromCacheOrMatview is used on request paths; a transient Redis outage here should degrade to regeneration, not terminate the process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/cache.go` around lines 220 - 221, In GetDataFromCacheOrMatview, do
not call logrus.Fatalf (which exits the process) on a transient Redis
"connection refused" error; instead log non-fatally (e.g.,
logrus.WithError(err).Warnf or Errorf) and treat it as a cache miss by returning
control to the caller so the data can be regenerated (e.g., return a nil cache
result / an error that the caller interprets as "no cache"). Replace the Fatalf
call in the connection-refused branch with a non-fatal log and adjust the
function return path so it does not terminate the process.
🧹 Nitpick comments (1)
cmd/sippy/load.go (1)

148-152: Log the degraded cache mode here.

Swallowing cacheErr is fine for keeping the load moving, but it also means later matview refreshes run with cacheClient == nil, so refresh timestamps are never recorded and cached RA/test responses can stay stale until TTL expiry. Please surface this path with at least a warning/metric so a successful load does not hide disabled invalidation.

Possible tweak
 // likewise get a cache client if possible, though some things operate without it.
 cacheClient, cacheErr := f.CacheFlags.GetCacheClient()
 if cacheErr != nil {
+	log.WithError(cacheErr).Warn("cache unavailable; matview refreshes will skip invalidation timestamp writes")
 	cacheClient = nil // error hygiene, since we pass this down to quite a few functions
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sippy/load.go` around lines 148 - 152, The code swallows cacheErr after
calling f.CacheFlags.GetCacheClient(), which hides degraded-cache mode; update
the block around f.CacheFlags.GetCacheClient() so that when cacheErr != nil you
log a warning (including cacheErr.Error()) and/or emit a metric indicating cache
invalidation is disabled before setting cacheClient = nil; reference the
cacheClient and cacheErr variables and the GetCacheClient call so the log/metric
is placed right where cacheErr is handled to surface the degraded cache path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/cache.go`:
- Around line 201-202: The json.Unmarshal error handling in the cache retrieval
path should treat corrupt cache entries as cache-misses instead of returning an
error; modify the block that unmarshals into cacheVal to log or warn about the
unmarshal failure (include error and cacheKey), optionally evict the bad entry,
and then fall through to call generateFn to produce the value and continue
normal execution rather than returning defaultVal and an error slice; update the
code that currently references json.Unmarshal, cacheVal, cacheKey, defaultVal,
and generateFn so the function returns the generated value (and no error) on
corrupt cache blobs.

In `@pkg/api/tests.go`:
- Around line 533-537: The cache key currently uses a constant
NewCacheSpec(testResultsBQ{}, "BigQueryTestsResults~", nil), causing all
BigQuery /api/tests requests to share one entry; update the
GetDataFromCacheOrGenerate call to build the cache spec from the actual request
spec (include release, period, filter, collapse or the request
struct/stringified key) so each unique request produces a unique NewCacheSpec
(replace the constant testResultsBQ{} key with the request-derived key) while
keeping the same cache, RequestOptions (testResultsCacheDuration) and the call
to GetDataFromCacheOrGenerate.

---

Duplicate comments:
In `@pkg/api/cache.go`:
- Around line 220-221: In GetDataFromCacheOrMatview, do not call logrus.Fatalf
(which exits the process) on a transient Redis "connection refused" error;
instead log non-fatally (e.g., logrus.WithError(err).Warnf or Errorf) and treat
it as a cache miss by returning control to the caller so the data can be
regenerated (e.g., return a nil cache result / an error that the caller
interprets as "no cache"). Replace the Fatalf call in the connection-refused
branch with a non-fatal log and adjust the function return path so it does not
terminate the process.

---

Nitpick comments:
In `@cmd/sippy/load.go`:
- Around line 148-152: The code swallows cacheErr after calling
f.CacheFlags.GetCacheClient(), which hides degraded-cache mode; update the block
around f.CacheFlags.GetCacheClient() so that when cacheErr != nil you log a
warning (including cacheErr.Error()) and/or emit a metric indicating cache
invalidation is disabled before setting cacheClient = nil; reference the
cacheClient and cacheErr variables and the GetCacheClient call so the log/metric
is placed right where cacheErr is handled to surface the degraded cache path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 093e1c8b-5796-4aa8-958a-4556161ef67a

📥 Commits

Reviewing files that changed from the base of the PR and between 55b4ca0 and 49e5677.

📒 Files selected for processing (8)
  • cmd/sippy/load.go
  • cmd/sippy/refresh.go
  • pkg/api/cache.go
  • pkg/api/cache_test.go
  • pkg/api/job_runs.go
  • pkg/api/tests.go
  • pkg/apis/cache/cache.go
  • pkg/sippyserver/server.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/sippyserver/server.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • cmd/sippy/refresh.go
  • pkg/api/cache_test.go
  • pkg/api/job_runs.go

Comment thread pkg/api/cache.go Outdated
Comment thread pkg/api/tests.go Outdated
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@sosiouxme sosiouxme force-pushed the 20260402-TRT-2564-RA-caching branch from 49e5677 to f81c9a4 Compare April 15, 2026 22:33
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 15, 2026

@sosiouxme: This pull request references TRT-2564 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "4.22" instead.

Details

In response to this:

Risk Analysis queries are fairly heavy and can impact Sippy DB performance. During regression periods we are likely to see an increase in repetitive calls for the same job / test failures. So this adds a caching mechanism.

Both RA and Sippy classic test pages use the same underlying query mechanism, so the same cache is introduced for both (replacing the existing request-level cache for the test API). This caches the test queries for an hour by default, but cache entries are invalidated if the underlying matview is refreshed, so new data is surfaced faster.

  1. Sippy's matview refresh code is updated to write a timestamp to redis whenever a matview refresh completes.
  2. Query results are written to the cache with a timestamp.
  3. On a cache hit, if the underlying matview timestamp is newer than the result timestamp, it is treated as a miss and regenerated. (In the case where the matview timestamp is missing, cached data is used until it expires normally.)

Individual commits separate the concepts that built this change, so reviewing them serially may be easiest.

openshift/origin#30965 proposes one simple way to benchmark risk analysis requests.

Summary by CodeRabbit

  • New Features

  • Global caching added for test results, job-run risk analysis, materialized-view refresh tracking, and API responses; CLI refresh/seed support optional cache usage.

  • Changed

  • Refresh workflow now records cache-based matview timestamps; endpoints and processors use cache when available; tests and result builders refactored to use cache-aware flows.

  • Removals

  • /api/canary endpoint removed.

  • Tests

  • Comprehensive tests added for cache/matview behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/api/cache.go (1)

140-146: ⚠️ Potential issue | 🟡 Minor

Don't let rounding make Expiry a no-op.

Once Line 143 runs, the duration is recomputed from now and the previously selected cacheOptions.Expiry is discarded. Any caller that sets both Expiry and CRTimeRoundingFactor will get the next rounding boundary from "now", not from now + Expiry.

Suggested fix
 func CalculateRoundedCacheDuration(cacheOptions cache.RequestOptions) time.Duration {
 	// require cacheDuration for persistence logic
 	cacheDuration := defaultCacheDuration
 	if cacheOptions.Expiry > 0 {
 		cacheDuration = cacheOptions.Expiry
 	}
 	if cacheOptions.CRTimeRoundingFactor > 0 {
 		now := time.Now().UTC()
+		expiresAt := now.Add(cacheDuration)
 		// Only cache until the next rounding duration
-		cacheDuration = now.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
+		cacheDuration = expiresAt.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
 	}
 	return cacheDuration
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/cache.go` around lines 140 - 146, The current logic overwrites
cacheDuration set from cacheOptions.Expiry when
cacheOptions.CRTimeRoundingFactor is present, causing Expiry to be ignored;
update the CRTimeRoundingFactor branch to compute the rounding boundary from now
+ cacheDuration (use the existing cacheDuration after applying Expiry) instead
of from now so that the final cacheDuration = roundedBoundary.Sub(now).
Concretely, after setting cacheDuration from cacheOptions.Expiry, when
CRTimeRoundingFactor > 0 compute target := now.Add(cacheDuration), then set
cacheDuration =
target.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
(or equivalent) so both cacheOptions.Expiry and
cacheOptions.CRTimeRoundingFactor are honored for cacheDuration.
pkg/api/tests.go (1)

251-257: ⚠️ Potential issue | 🟠 Major

Guard limit before slicing.

Line 253 and Line 286 only reject zero. A request like ?limit=-1 falls through to tests[:limit] and panics the handler.

Suggested fix
 func (tests TestsAPIResult) limit(req *http.Request) TestsAPIResult {
-	limit, _ := strconv.Atoi(req.URL.Query().Get("limit"))
-	if limit == 0 || len(tests) < limit {
+	limit, err := strconv.Atoi(req.URL.Query().Get("limit"))
+	if err != nil || limit <= 0 || len(tests) < limit {
 		return tests
 	}
 
 	return tests[:limit]
 }
 func (tests TestsAPIResultBQ) limit(req *http.Request) TestsAPIResultBQ {
-	limit, _ := strconv.Atoi(req.URL.Query().Get("limit"))
-	if limit == 0 || len(tests) < limit {
+	limit, err := strconv.Atoi(req.URL.Query().Get("limit"))
+	if err != nil || limit <= 0 || len(tests) < limit {
 		return tests
 	}
 
 	return tests[:limit]
 }

Also applies to: 284-290

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/tests.go` around lines 251 - 257, The limit parsing in
TestsAPIResult.limit allows negative or parse-failed values through and can
cause a panic when slicing (tests[:limit]); update the TestsAPIResult.limit
method to validate the parsed limit from strconv.Atoi by checking the Atoi
error, treating non-positive (<=0) or failed parses as "no limit", and clamp the
positive limit to at most len(tests) before returning tests[:limit]; apply the
same guard/clamping logic to the other identical limit-handling block in this
file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/tests.go`:
- Around line 443-445: The fmt.Errorf calls that format the variable errs (a
[]error) use the "%s" verb; change both occurrences to use "%v" so the error
slice prints its elements correctly (update the fmt.Errorf invocations that
reference errs in pkg/api/tests.go to use "%v" instead of "%s").

---

Outside diff comments:
In `@pkg/api/cache.go`:
- Around line 140-146: The current logic overwrites cacheDuration set from
cacheOptions.Expiry when cacheOptions.CRTimeRoundingFactor is present, causing
Expiry to be ignored; update the CRTimeRoundingFactor branch to compute the
rounding boundary from now + cacheDuration (use the existing cacheDuration after
applying Expiry) instead of from now so that the final cacheDuration =
roundedBoundary.Sub(now). Concretely, after setting cacheDuration from
cacheOptions.Expiry, when CRTimeRoundingFactor > 0 compute target :=
now.Add(cacheDuration), then set cacheDuration =
target.Truncate(cacheOptions.CRTimeRoundingFactor).Add(cacheOptions.CRTimeRoundingFactor).Sub(now)
(or equivalent) so both cacheOptions.Expiry and
cacheOptions.CRTimeRoundingFactor are honored for cacheDuration.

In `@pkg/api/tests.go`:
- Around line 251-257: The limit parsing in TestsAPIResult.limit allows negative
or parse-failed values through and can cause a panic when slicing
(tests[:limit]); update the TestsAPIResult.limit method to validate the parsed
limit from strconv.Atoi by checking the Atoi error, treating non-positive (<=0)
or failed parses as "no limit", and clamp the positive limit to at most
len(tests) before returning tests[:limit]; apply the same guard/clamping logic
to the other identical limit-handling block in this file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 80e6632c-b39c-4fe5-8870-31f91c66ec69

📥 Commits

Reviewing files that changed from the base of the PR and between 49e5677 and f81c9a4.

📒 Files selected for processing (4)
  • pkg/api/cache.go
  • pkg/api/tests.go
  • pkg/apis/cache/cache.go
  • pkg/sippyserver/server.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/apis/cache/cache.go
  • pkg/sippyserver/server.go

Comment thread pkg/api/tests.go
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

also minor refactoring to prep for caching

🤖 Assisted by Claude Code
... which replaces caching the tests at the api response
fix silly error found by coderabbit
as coderabbit pointed out, we probably don't really want a caching
problem to prevent running the matview refreshes.
practically speaking, it doesn't matter
…uery

gives a chance to cancel the query if the server request is canceled.
this is not strictly necessary since the results were already cached at
the request level and we don't have a way to find out when the backing
tables in BQ are refreshed. but it seemed like a good idea and not too
much trouble to have the two forks work the same.
both postgres and BQ forks can use the same request processing code.
also the "period" is now standardized for caching purposes.
fix problem coderabbit noted where the cache entry doesn't unmarshal
for some reason... better to treat it like a cache miss.
@sosiouxme sosiouxme force-pushed the 20260402-TRT-2564-RA-caching branch from e06b56a to d2822a5 Compare April 22, 2026 13:52
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 22, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/sippy/seed_data.go (1)

92-97: Consider aligning error handling with refresh.go for consistency.

The seed-data command returns an error when cache client retrieval fails, while refresh.go logs a warning and proceeds with cacheClient = nil. Since RefreshData handles nil cache gracefully, consider using the same lenient pattern here:

Suggested change for consistency
 			cacheClient, cacheErr := f.CacheFlags.GetCacheClient()
 			if cacheErr != nil {
-				return fmt.Errorf("failed to get cache client: %v", cacheErr)
+				log.WithError(cacheErr).Warn("failed to get cache client; proceeding without cache")
+				cacheClient = nil
 			} else if cacheClient == nil {
 				log.Warn("no cache provided; refresh timestamps will not be cached")
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sippy/seed_data.go` around lines 92 - 97, seed-data currently returns an
error when f.CacheFlags.GetCacheClient() fails whereas refresh.go treats this
leniently; change the cache handling in seed-data (the cacheClient, cacheErr :=
f.CacheFlags.GetCacheClient() block) to mirror refresh.go by logging a warning
on error, setting cacheClient = nil, and continuing (do not return the error) so
RefreshData can handle a nil cacheClient gracefully.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/sippy/seed_data.go`:
- Around line 92-97: seed-data currently returns an error when
f.CacheFlags.GetCacheClient() fails whereas refresh.go treats this leniently;
change the cache handling in seed-data (the cacheClient, cacheErr :=
f.CacheFlags.GetCacheClient() block) to mirror refresh.go by logging a warning
on error, setting cacheClient = nil, and continuing (do not return the error) so
RefreshData can handle a nil cacheClient gracefully.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: fccb5b8f-aed1-4484-ae37-0287dc87c468

📥 Commits

Reviewing files that changed from the base of the PR and between e06b56a and d2822a5.

📒 Files selected for processing (12)
  • cmd/sippy-daemon/main.go
  • cmd/sippy/load.go
  • cmd/sippy/refresh.go
  • cmd/sippy/seed_data.go
  • pkg/api/cache.go
  • pkg/api/cache_test.go
  • pkg/api/job_runs.go
  • pkg/api/tests.go
  • pkg/apis/cache/cache.go
  • pkg/sippyserver/pr_commenting_processor.go
  • pkg/sippyserver/server.go
  • scripts/updatesuites.go
✅ Files skipped from review due to trivial changes (1)
  • scripts/updatesuites.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/apis/cache/cache.go
  • pkg/sippyserver/pr_commenting_processor.go
  • cmd/sippy/load.go
  • pkg/sippyserver/server.go

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@sosiouxme: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sosiouxme
Copy link
Copy Markdown
Member Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 22, 2026
@petr-muller
Copy link
Copy Markdown
Member

@CodeRabbit approve

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Comments resolved and changes approved.

@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Apr 22, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@coderabbitai[bot]: changing LGTM is restricted to collaborators

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Looks good after the fixes. The logrus.Fatalf on Redis connection refused (pkg/api/cache.go) is still a concern for production resilience, but not a merge blocker — it can be revisited if it causes issues in practice.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 22, 2026
Copy link
Copy Markdown
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: coderabbitai[bot], petr-muller, sosiouxme

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [petr-muller,sosiouxme]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 56c66c4 into openshift:main Apr 22, 2026
8 checks passed
@sosiouxme sosiouxme deleted the 20260402-TRT-2564-RA-caching branch April 22, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants