NETOBSERV-2675: Centralized K8s informers to reduce API load by leandroberetta · Pull Request #1221 · netobserv/flowlogs-pipeline

leandroberetta · 2026-03-26T12:54:05Z

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API
flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)
KubernetesStore: Thread-safe in-memory cache for metadata
Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)
K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer
Maintains consistency across all processors
Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?
- if so please describe it in PR description.
I have added thorough unit tests for the change.
QE requirements (check 1 from the list):
- Standard QE validation, with pre-merge tests unless stated otherwise.
- Regression tests only (e.g. refactoring with no user-facing change).
- No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

New Features
- Real-time Kubernetes cache sync (k8scache) with a new flp-informers component, CLI flags and TLS options to enable/configure k8scache client/server, and local/kind deployment flows plus Make targets to deploy/undeploy k8scache and flp-informers.
- Docker image now contains flp-informers; Kubernetes manifests and namespace-aware deploy/cleanup flows added.
Documentation
- Updated CLI help and proto regeneration instructions for k8scache.
Tests
- New unit and integration tests covering k8scache protocol, discovery, event handling, and end-to-end flows.

openshift-ci · 2026-03-26T12:54:09Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2026-03-26T12:54:09Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-03-26T12:54:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a k8scache subsystem: new flp-informers binary pushes Kubernetes resource events to processors via a bidirectional gRPC cache protocol; flowlogs-pipeline can host a k8scache server applying updates into a new KubernetesStore used for enrichment/lookups. Manifests, Make targets, proto, client/server, discovery, handlers, and tests included.

Changes

Cohort / File(s)	Summary
FLP runtime & config `cmd/flowlogs-pipeline/main.go`, `pkg/config/config.go`, `README.md`, `Makefile`	Added K8sCacheServer CLI/config and TLS options; conditional gRPC server startup/shutdown in flowlogs-pipeline; added `K8sCacheServer` to config; README help updated; added `NAMESPACE` Make variable.
flp-informers CLI & image `cmd/flp-informers/main.go`, `contrib/docker/Dockerfile`	New `flp-informers` Cobra CLI implemented; Docker build now compiles and includes `flp-informers` binary.
Make / dev targets `.mk/development.mk`, `Makefile`	New make targets for deploy/undeploy and local k8scache workflow (deploy-k8scache, deploy-flp-informers, and cleanup/redeploy variants); `NAMESPACE` variable added.
Kubernetes manifests `contrib/kubernetes/deployment-k8scache.yaml`, `contrib/kubernetes/deployment-flp-informers.yaml`	New Deployment/Service/ServiceAccount/ClusterRole/ClusterRoleBinding manifests for k8scache-enabled flowlogs-pipeline and flp-informers.
Protobuf & docs `proto/k8scache.proto`, `proto/README.md`	Added k8scache proto (CacheUpdate/SyncMessage/ResourceEntry + service) and codegen instructions.
k8scache client & discovery `pkg/pipeline/transform/kubernetes/k8scache/client.go`, `.../converter.go`, `discovery.go`	New Client managing per-processor gRPC StreamUpdates connections, reconnection/backoff, queuing/versioning, SendAdd/Update/Delete APIs, conversion helpers and processor discovery logic.
k8scache server `pkg/pipeline/transform/kubernetes/k8scache/server.go`, `.../server_test.go`	New KubernetesCacheServer implementing StreamUpdates with versioning/ACKs; applies updates to datasource/store; unit tests added.
KubernetesStore & datasource integration `pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go`, `.../datasource.go`, `pkg/pipeline/transform/kubernetes/enrich.go`	New concurrency-safe KubernetesStore with Replace/AddOrUpdate/Delete and multi-index lookups; Datasource supports atomic KubernetesStore pointer and cache-apply methods; enrich package gains k8scache toggle and accessor.
Informer cache API `pkg/pipeline/transform/kubernetes/informers/cache_export.go`	Added exported EventHandler interface and Informers.AddEventHandler to register handlers across informers.
Event handler & tests `pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go`, `.../eventhandler_test.go`	Informer EventHandler forwards add/update/delete (handles tombstones) to client; unit tests for OnDelete/tombstone handling.
Integration & discovery tests `pkg/pipeline/transform/kubernetes/k8scache/integration_test.go`, `.../discovery_test.go`	Integration tests exercising StreamUpdates across single/multiple clients and multi-update flows; discovery validation tests for ResyncInterval.
Dev infra summary `.mk/development.mk`, `Makefile`	New dev targets and namespace handling used by deployment targets.

Sequence Diagram(s)

sequenceDiagram
    participant Informer as flp-informers
    participant KubeAPI as Kubernetes API
    participant Client as k8scache Client
    participant Processor as flowlogs-pipeline (processor)
    participant Server as k8scache Server
    participant KStore as KubernetesStore

    Informer->>KubeAPI: List/Watch resources
    KubeAPI-->>Informer: Resource events
    Informer->>Client: OnAdd/OnUpdate/OnDelete (enqueue)
    Client->>Processor: Dial & Open StreamUpdates (per-processor)
    Client->>Processor: Send CacheUpdate (versioned)
    Processor->>Server: StreamUpdates receives CacheUpdate
    Server->>KStore: Apply Add/Update/Delete
    KStore-->>Server: Applied
    Server->>Client: Send SyncAck (success + version)

sequenceDiagram
    participant Discovery as Processor Discovery
    participant ClientQueue as k8scache Client
    participant Broadcaster as Broadcaster Loop
    participant ProcConn as Processor Connection
    participant Server as k8scache Server
    participant DS as Datasource

    Discovery->>ClientQueue: Discover pods -> AddProcessorWithTimeout
    Informer->>ClientQueue: Enqueue update
    ClientQueue->>Broadcaster: Dequeue update (version++)
    Broadcaster->>ProcConn: Send update to healthy connections (concurrent)
    ProcConn->>Server: StreamUpdates RPC receives CacheUpdate
    Server->>DS: ApplyCacheAddOrUpdate/Delete
    DS-->>Server: ✓
    Server->>ProcConn: SyncAck (version)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly summarizes the main objective: implementing centralized Kubernetes informers via k8scache to reduce API load.
Description check	✅ Passed	Description covers architecture, deployment modes, benefits, and includes required checklist items; QE validation option is selected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-03-27T10:37:21Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

Release Notes

New Features

Introduced Kubernetes cache synchronization server for real-time resource metadata sharing across pipeline processors.

Added flp-informers component to automatically forward incremental Kubernetes resource updates to running processors.

New --k8scache.address and --k8scache.port configuration options for cache server connectivity.

New Kubernetes deployment configurations and automated deployment targets for streamlined cluster operations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (7)

proto/k8scache.proto (2)
1-5: Address Buf lint: package directory mismatch.

Static analysis reports: Files with package "k8scache" must be within a directory "k8scache" relative to root. Consider moving k8scache.proto to proto/k8scache/k8scache.proto or adjusting the package name.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proto/k8scache.proto` around lines 1 - 5, The proto package declaration
"package k8scache" and option go_package = "./k8scache" conflict with Buf lint
expecting the proto file to live in a matching directory; either move this proto
file into a directory named "k8scache" under proto (e.g.,
proto/k8scache/k8scache.proto) or change the package/option to match its current
directory. Update the package line (package k8scache) and the option go_package
accordingly if you choose to rename, or relocate the file into a "k8scache"
folder if you want to keep the package and go_package as-is, then re-run buf
lint to confirm the mismatch is resolved.
55-69: operation field is ambiguous when is_snapshot = true.

The operation field is only relevant for incremental updates, but nothing prevents setting it on snapshots. Consider using a oneof to make this mutually exclusive, or document explicitly that operation is ignored for snapshots.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proto/k8scache.proto` around lines 55 - 69, CacheUpdate's operation field is
ambiguous when is_snapshot = true; change CacheUpdate to make snapshot vs
incremental mutually exclusive by introducing a oneof (e.g., oneof update_kind {
bool is_snapshot = 2; OperationType operation = 4; }) or a more explicit oneof
with message Snapshot {} and message Incremental { OperationType operation = 1;
} so that OperationType (OperationType) cannot be set on snapshots; update the
CacheUpdate definition and any code that reads CacheUpdate (parsers/handlers) to
check the oneof variant instead of reading both is_snapshot and operation.
cmd/flowlogs-pipeline/main.go (2)
269-275: Consider adding server address to startup log before Serve blocks.

The log at line 271 runs in the goroutine, but if Serve fails immediately, the startup message may not appear. Moving the log before the goroutine or adding a startup confirmation would improve observability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flowlogs-pipeline/main.go` around lines 269 - 275, The startup info log
currently runs inside the goroutine and may never appear if
grpcServer.Serve(listener) fails immediately; move or duplicate the
log.WithField("address", address).Info("starting K8s cache sync server") so it
runs synchronously before launching the goroutine (or emit an immediate
confirmation log right before go func) and keep the existing error log that
wraps grpcServer.Serve's returned error; reference grpcServer.Serve, listener,
address and the log.WithField call to locate and update the code.
240-278: Unreachable code after log.Fatal.

Line 266 (return nil) is unreachable since log.Fatal calls os.Exit(1). While harmless, it's dead code.
Proposed fix
 	listener, err := net.Listen("tcp", address)
 	if err != nil {
 		log.WithError(err).WithField("address", address).Fatal("failed to start K8s cache server")
-		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flowlogs-pipeline/main.go` around lines 240 - 278, In
startK8sCacheServer, the return nil after log.WithError(...).Fatal(...) is
unreachable; remove that dead code (the return nil following the Fatal call)
from the net.Listen error branch so the function doesn't contain unreachable
statements in the error handling for net.Listen in startK8sCacheServer.
pkg/pipeline/transform/kubernetes/k8scache/server_test.go (2)
80-96: Test name mismatch with actual coverage.

TestBackwardCompatibility_DatasourceLookup tests the FakeInformers path (via SetupStubs), not the KubernetesStore integration. The test name suggests it validates backward compatibility with the new store, but it only exercises the old informer-backed lookups.

Consider renaming to TestDatasourceLookup_ViaInformers or adding a parallel test that verifies lookups work via KubernetesStore when set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/server_test.go` around lines 80 -
96, Test name TestBackwardCompatibility_DatasourceLookup is misleading because
it exercises the informer-backed path (via SetupStubs/setupTestDatasource)
rather than the KubernetesStore integration; either rename the test to
TestDatasourceLookup_ViaInformers to reflect it calls setupTestDatasource and
exercises ds.IndexLookup and ds.GetNodeByName, or add a new parallel test that
creates a KubernetesStore (use the same fixture setup flow you use for
KubernetesStore integration), performs the same lookups against that store, and
asserts identical results so the KubernetesStore path is validated.
235-253: DELETE operation not verified.

The test sends a DELETE for pod1 but never verifies the entry was removed from the cache. Consider adding assertions similar to TestKubernetesCacheServer_ReceivesAdd that confirm cache state after operations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/server_test.go` around lines 235 -
253, The test sends a DELETE CacheUpdate (OperationType_OPERATION_DELETE) for
pod1 but never asserts the cache state; update the test (around
server.StreamUpdates and after GetCurrentVersion()) to verify the server's cache
no longer contains the ResourceEntry for Kind "Pod", Name "pod1", Namespace
"default" (similar to assertions in TestKubernetesCacheServer_ReceivesAdd),
e.g., use the cache lookup or server-provided accessors to assert the entry is
absent and optionally assert the remaining entry count; keep checks for version
via server.GetCurrentVersion() and existing ACK/SYNC message assertions.
pkg/pipeline/transform/kubernetes/k8scache/client.go (1)
271-283: SyncRequest.LastVersion is ignored.

The handler logs LastVersion but doesn't use it. Per the comment, this is by design (no snapshots). However, the protocol includes LastVersion which suggests catch-up was considered.

If intentional, consider removing LastVersion from the proto or documenting why it exists but isn't used.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 271 - 283,
The handler handleSyncRequest currently logs SyncRequest.LastVersion but doesn't
use it; decide whether to remove the unused field from the protocol or
explicitly document why it's ignored: either (A) remove LastVersion from the
proto and regenerate types (and update any processor code that sets it), or (B)
keep the proto and add a clear comment in Client.handleSyncRequest stating that
LastVersion is intentionally ignored (no initial snapshots sent), update the log
message to reflect "ignored LastVersion", and add a TODO or link to design
rationale so future readers know this is by-design; reference symbols:
handleSyncRequest, SyncRequest.LastVersion, processorConnection, Client.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/flowlogs-pipeline/main.go`:
- Around line 251-252: The assignment to the Datasource.KubernetesStore pointer
is unsynchronized and can race with goroutines started by mainPipeline.Run; fix
by making pointer access atomic or mutex-protected: add a sync.RWMutex field (or
use atomic.Pointer[*KubernetesStore]) on Datasource, update SetKubernetesStore
to perform a write under the mutex (or atomic.StorePointer), and change all
places that read d.KubernetesStore (the enrichment/lookup code that runs in
pipeline goroutines and any methods that dereference KubernetesStore) to use the
corresponding read path (mutex RLock/RUnlock or atomic.LoadPointer) so
reads/writes are properly synchronized. Ensure SetKubernetesStore, the
Datasource.KubernetesStore field, and the enrichment/lookup code that references
it are all updated consistently.

In `@cmd/flp-informers/main.go`:
- Around line 77-80: The flag/env binding currently calls
viper.BindPFlags(rootCmd.PersistentFlags()) but the rest of the code continues
to read values directly from the opts struct (opts.LogLevel, opts.Kubeconfig,
opts.ProcessorSelector, opts.ProcessorPort, opts.ResyncInterval), so environment
overrides never take effect; fix by configuring key mapping with
viper.SetEnvKeyReplacer(strings.NewReplacer("-", "_")), then after Cobra parses
flags call either viper.Unmarshal(&opts) to populate the opts struct from Viper
or replace direct reads with viper.GetString/viper.GetInt for the named keys
(e.g., "log-level", "kubeconfig", "processor-selector", "processor-port",
"resync-interval") so env vars like FLP_INFORMERS_PROCESSOR_SELECTOR override
flag values.
- Around line 180-185: OnDelete in cacheEventHandler currently does a direct
type assertion to *model.ResourceMetaData and ignores
cache.DeletedFinalStateUnknown tombstones, so missed-delete events leave stale
cache entries; change OnDelete to unwrap tombstones by checking for
cache.DeletedFinalStateUnknown and extracting its Obj into a
*model.ResourceMetaData (falling back to a direct *model.ResourceMetaData
assertion), handle nil after extraction, then proceed with the existing delete
logic so deletions from tombstones are processed.

In `@contrib/kubernetes/deployment-flp-informers.yaml`:
- Around line 62-65: The ClusterRoleBinding subjects block currently pins the
ServiceAccount to namespace "default" (subjects -> kind: ServiceAccount, name:
flp-informers, namespace: default), which breaks RBAC when deployed into other
namespaces; update the ClusterRoleBinding so the subject's namespace is not
hardcoded — either remove the namespace field or set it to the release/manifest
namespace (so the ServiceAccount in the target namespace receives the
ClusterRole) ensuring the binding references the same namespace as the
deployment.

In `@contrib/kubernetes/deployment-k8scache.yaml`:
- Around line 94-96: The ClusterRole granting apiGroups: ["*"], resources:
["*"], verbs: ["*"] for flowlogs-pipeline is effectively cluster-admin and
should be narrowed; update the ClusterRole named flowlogs-pipeline to only
include the specific apiGroups, resources and verbs the k8scache processor
actually needs (e.g., namespaced resources like
pods/endpoints/configmaps/secrets or watching CRDs) and limit scope to the
target namespace via a Role or by restricting resourceNames and verbs
(get/list/watch/create/update/patch/exec as required) instead of wildcards;
ensure any ClusterRoleBindings are replaced with RoleBindings if cluster-wide
access is not required.
- Around line 102-105: The ClusterRoleBinding's subjects block hardcodes
namespace: default which breaks when the manifest is applied to a different
namespace; update the subjects entry for the ServiceAccount (name:
flowlogs-pipeline) so the namespace is not hardcoded but uses the manifest's
target namespace (e.g., reference the resource's metadata.namespace or the
project's templating variable like {{ .Release.Namespace }} / ${NAMESPACE}), or
remove the namespace field so the binding targets the ServiceAccount in the
current namespace, ensuring the subjects block dynamically matches the
ServiceAccount's namespace.

In `@pkg/pipeline/transform/kubernetes/informers/cache_export.go`:
- Around line 4-21: The AddEventHandler method on Informers currently registers
handlers for pods, nodes, and services but omits replicaSets and deployments;
update Informers.AddEventHandler to also call
k.replicaSets.AddEventHandler(handler) and
k.deployments.AddEventHandler(handler) (guarded by nil checks like the others)
and return any errors from those calls so replica set and deployment events are
registered and kept in sync with pods/nodes/services.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 59-105: Add reconnection/cleanup for processor connections: update
receiveFromProcessor (and/or AddProcessor) to detect stream errors where
pc.healthy is set false and either attempt automatic reconnection with
exponential backoff using the existing address (re-creating the grpc client,
stream and replacing pc.conn/pc.stream/pc.cancel and setting pc.healthy true
upon success) or, if retries exceed a threshold, remove the processorConnection
from the Client.connections map and close resources. Ensure synchronization with
c.mu when modifying c.connections, reuse the processorConnection.address, conn,
stream, cancel, and healthy symbols, and log retry attempts and final removal so
broadcasts no longer skip permanently-failed processors.
- Around line 227-240: The broadcast currently blocks on wg.Wait because a stuck
pc.stream.Send can hang forever; modify the goroutine that calls pc.stream.Send
to enforce a per-send timeout (use a context with timeout if the stream supports
Send with context, or run the Send in a nested goroutine and select on a
time.After timeout for that send). On timeout or send error, log the error with
clog.WithError(...).WithField("address", pc.address).Error(...), mark
pc.healthy.Store(false), and ensure wg.Done() is still called so the loop cannot
block indefinitely; keep references to processorConnection, pc.stream.Send,
pc.healthy, and connections to locate the change.
- Around line 72-75: The gRPC client currently uses insecure.NewCredentials() in
the grpc.NewClient call (using variables address and the MaxCallRecvMsgSize
option), leaving traffic unencrypted; replace this with TLS/mTLS by loading
client certificate and CA (e.g., from Kubernetes Secrets) and passing real
transport credentials to grpc.NewClient (use
grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)) or the mTLS
equivalent) or wire in code paths to use platform-provided mTLS (sidecar/service
mesh) when enabled; ensure certificate validation (RootCAs) and SNI are
configured and preserve existing call options like grpc.MaxCallRecvMsgSize.

In `@pkg/pipeline/transform/kubernetes/k8scache/server_test.go`:
- Around line 75-78: The tests use setupTestDatasource() which only returns a
Datasource with Informers set, but most tests (e.g.,
TestKubernetesCacheServer_ReceivesIncrementalUpdate and
TestKubernetesCacheServer_MultipleUpdates) never call SetKubernetesStore so
server methods ApplyCacheAddOrUpdate/ApplyCacheDelete have nowhere to persist
data; update setupTestDatasource() to attach a KubernetesStore (call
SetKubernetesStore on the Datasource) so every test has a real store, or
alternatively add explicit assertions in those tests that validate cache
contents after operations; modify either setupTestDatasource() to initialize the
store (preferred) or add checks in
TestKubernetesCacheServer_ReceivesIncrementalUpdate and
TestKubernetesCacheServer_MultipleUpdates to ensure the cache actually contains
the expected entries after ApplyCacheAddOrUpdate/ApplyCacheDelete.

In `@pkg/pipeline/transform/kubernetes/k8scache/server.go`:
- Around line 100-112: The ACK path unconditionally stores
s.version.Store(update.Version) and sends a success ACK via
stream.Send(&SyncMessage{... SyncAck{ProcessorId: connectionID, Version:
update.Version, Success: true}}) even when an operation was dropped/unknown;
change the logic so s.version.Store and sending Success=true only happen for
successfully applied updates. Locate the switch/handler that logs
OPERATION_UNSPECIFIED/unknown values and make it return or set a failed result
for that update (or send an ACK with Success=false) instead of falling through
to the success ACK; ensure stream.Send and s.version.Store are executed only
when the apply/dispatch path indicates success (use the update result flag or
error from the handler to gate those calls, referencing s.version.Store,
stream.Send, SyncMessage_Ack, SyncAck, connectionID, and update.Version).
- Around line 45-53: The server currently sends a SyncRequest with lastVersion
from s.version.Load() but does not honor snapshot semantics for fresh processors
(version==0) and the sender drops initial adds; update the sync protocol so when
SyncRequest.LastVersion == 0 the server responds with a full snapshot
SyncMessage (use the existing Replace semantics) instead of/in addition to
incrementals, by wiring KubernetesStore.Replace into the response path and
ensuring stream.Send sends that full snapshot before any incremental events;
alternatively, introduce an per-connection initialSyncComplete flag so the
sender will not drop initial add events and will gate switching to
KubernetesStore until Replace has been applied, referencing SyncMessage,
SyncMessage_Request/SyncRequest, s.version.Load(), KubernetesStore.Replace, and
the stream.Send sender logic to locate and modify the code paths.

In `@proto/k8scache.proto`:
- Around line 71-94: The proto is missing a secondary_net_names mapping needed
to preserve secondary network name resolution: add a map<string, string>
secondary_net_names field to the ResourceEntry message (alongside the existing
repeated secondary_net_keys) and bump the field number accordingly, then update
the conversion logic in converter.go (the code that maps SecondaryNetNames ->
ResourceEntry) and the informer population in informers.go to populate this new
map so FLP processors can resolve NetworkName from secondary network keys.

---

Nitpick comments:
In `@cmd/flowlogs-pipeline/main.go`:
- Around line 269-275: The startup info log currently runs inside the goroutine
and may never appear if grpcServer.Serve(listener) fails immediately; move or
duplicate the log.WithField("address", address).Info("starting K8s cache sync
server") so it runs synchronously before launching the goroutine (or emit an
immediate confirmation log right before go func) and keep the existing error log
that wraps grpcServer.Serve's returned error; reference grpcServer.Serve,
listener, address and the log.WithField call to locate and update the code.
- Around line 240-278: In startK8sCacheServer, the return nil after
log.WithError(...).Fatal(...) is unreachable; remove that dead code (the return
nil following the Fatal call) from the net.Listen error branch so the function
doesn't contain unreachable statements in the error handling for net.Listen in
startK8sCacheServer.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 271-283: The handler handleSyncRequest currently logs
SyncRequest.LastVersion but doesn't use it; decide whether to remove the unused
field from the protocol or explicitly document why it's ignored: either (A)
remove LastVersion from the proto and regenerate types (and update any processor
code that sets it), or (B) keep the proto and add a clear comment in
Client.handleSyncRequest stating that LastVersion is intentionally ignored (no
initial snapshots sent), update the log message to reflect "ignored
LastVersion", and add a TODO or link to design rationale so future readers know
this is by-design; reference symbols: handleSyncRequest,
SyncRequest.LastVersion, processorConnection, Client.

In `@pkg/pipeline/transform/kubernetes/k8scache/server_test.go`:
- Around line 80-96: Test name TestBackwardCompatibility_DatasourceLookup is
misleading because it exercises the informer-backed path (via
SetupStubs/setupTestDatasource) rather than the KubernetesStore integration;
either rename the test to TestDatasourceLookup_ViaInformers to reflect it calls
setupTestDatasource and exercises ds.IndexLookup and ds.GetNodeByName, or add a
new parallel test that creates a KubernetesStore (use the same fixture setup
flow you use for KubernetesStore integration), performs the same lookups against
that store, and asserts identical results so the KubernetesStore path is
validated.
- Around line 235-253: The test sends a DELETE CacheUpdate
(OperationType_OPERATION_DELETE) for pod1 but never asserts the cache state;
update the test (around server.StreamUpdates and after GetCurrentVersion()) to
verify the server's cache no longer contains the ResourceEntry for Kind "Pod",
Name "pod1", Namespace "default" (similar to assertions in
TestKubernetesCacheServer_ReceivesAdd), e.g., use the cache lookup or
server-provided accessors to assert the entry is absent and optionally assert
the remaining entry count; keep checks for version via
server.GetCurrentVersion() and existing ACK/SYNC message assertions.

In `@proto/k8scache.proto`:
- Around line 1-5: The proto package declaration "package k8scache" and option
go_package = "./k8scache" conflict with Buf lint expecting the proto file to
live in a matching directory; either move this proto file into a directory named
"k8scache" under proto (e.g., proto/k8scache/k8scache.proto) or change the
package/option to match its current directory. Update the package line (package
k8scache) and the option go_package accordingly if you choose to rename, or
relocate the file into a "k8scache" folder if you want to keep the package and
go_package as-is, then re-run buf lint to confirm the mismatch is resolved.
- Around line 55-69: CacheUpdate's operation field is ambiguous when is_snapshot
= true; change CacheUpdate to make snapshot vs incremental mutually exclusive by
introducing a oneof (e.g., oneof update_kind { bool is_snapshot = 2;
OperationType operation = 4; }) or a more explicit oneof with message Snapshot
{} and message Incremental { OperationType operation = 1; } so that
OperationType (OperationType) cannot be set on snapshots; update the CacheUpdate
definition and any code that reads CacheUpdate (parsers/handlers) to check the
oneof variant instead of reading both is_snapshot and operation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26b3ef17-050e-40d9-a5a8-68ead7d65bb8

📥 Commits

Reviewing files that changed from the base of the PR and between 37d1378 and 62a46f4.

⛔ Files ignored due to path filters (2)

pkg/pipeline/transform/kubernetes/k8scache/k8scache.pb.go is excluded by !**/*.pb.go
pkg/pipeline/transform/kubernetes/k8scache/k8scache_grpc.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (18)

.mk/development.mk
README.md
cmd/flowlogs-pipeline/main.go
cmd/flp-informers/main.go
contrib/docker/Dockerfile
contrib/kubernetes/deployment-flp-informers.yaml
contrib/kubernetes/deployment-k8scache.yaml
pkg/config/config.go
pkg/pipeline/transform/kubernetes/datasource/datasource.go
pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go
pkg/pipeline/transform/kubernetes/enrich.go
pkg/pipeline/transform/kubernetes/informers/cache_export.go
pkg/pipeline/transform/kubernetes/k8scache/client.go
pkg/pipeline/transform/kubernetes/k8scache/converter.go
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/server_test.go
proto/README.md
proto/k8scache.proto

coderabbitai · 2026-03-27T10:57:46Z

+func (h *cacheEventHandler) OnDelete(obj interface{}) {
+	meta, ok := obj.(*model.ResourceMetaData)
+	if !ok {
+		log.Warnf("unexpected object type in OnDelete: %T", obj)
+		return
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Can a k8s.io/client-go SharedInformer OnDelete callback receive cache.DeletedFinalStateUnknown, and what is the recommended way to unwrap it before type assertions?

💡 Result:

Yes, a k8s.io/client-go SharedInformer OnDelete callback can receive cache.DeletedFinalStateUnknown. This occurs when the watch deletion event is missed (e.g., due to disconnection from the API server) and the deletion is detected during a subsequent re-list. The OnDelete handler receives the last known non-absent state of the object wrapped in DeletedFinalStateUnknown, or the tombstone directly. The recommended way to unwrap it before type assertions is to check the type and access the .Obj field, which contains the actual object (potentially stale). Use helper functions like DeletionHandlingMetaNamespaceKeyFunc for key extraction, which handles unwrapping internally. Example code pattern from client-go usage: DeleteFunc: func(obj interface{}) { var pod *corev1.Pod switch t := obj.(type) { case cache.DeletedFinalStateUnknown: pod = t.Obj.(*corev1.Pod) case *corev1.Pod: pod = t default: runtime.HandleError(fmt.Errorf("unexpected type: %T", obj)) return } // Now use pod safely } Alternatively, for key-based handling without full object: key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj) if err != nil { runtime.HandleError(err) return } // Use key for workqueue, etc.

Citations:

1: https://pkg.go.dev/k8s.io/client-go/tools/cache

2: K8s Informers OnDelete: handle cache.DeletedFinalStateUnknown grafana/beyla#1242

3: https://oneuptime.com/blog/post/2026-01-07-go-client-go-kubernetes/view

4: panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Service kubernetes/ingress-nginx#10015

🏁 Script executed:

# First, check the full OnDelete function implementation cat -n cmd/flp-informers/main.go | sed -n '175,195p'

Repository: netobserv/flowlogs-pipeline

Length of output: 954

🏁 Script executed:

# Search for any handling of cache.DeletedFinalStateUnknown in the codebase rg "DeletedFinalStateUnknown" cmd/flp-informers/

Repository: netobserv/flowlogs-pipeline

Length of output: 53

🏁 Script executed:

# Check the imports in main.go to see what packages are available head -30 cmd/flp-informers/main.go

Repository: netobserv/flowlogs-pipeline

Length of output: 916

🏁 Script executed:

# Check what cache imports are available in main.go rg "import|cache" cmd/flp-informers/main.go | head -20

Repository: netobserv/flowlogs-pipeline

Length of output: 1081

🏁 Script executed:

# Look at the full import section to understand dependencies sed -n '20,60p' cmd/flp-informers/main.go

Repository: netobserv/flowlogs-pipeline

Length of output: 1260

🏁 Script executed:

# Check what informers package is being used rg "import.*informers" cmd/flp-informers/main.go -A 1 -B 1

Repository: netobserv/flowlogs-pipeline

Length of output: 53

🏁 Script executed:

# Look for where the informer is created and handlers are added rg "setupInformerHandlers|AddEventHandler" cmd/flp-informers/main.go -B 3 -A 8

Repository: netobserv/flowlogs-pipeline

Length of output: 1055

🏁 Script executed:

# Check the informers package to see if it uses client-go find . -name "informers.go" -o -name "informers" -type d | head -5

Repository: netobserv/flowlogs-pipeline

Length of output: 211

🏁 Script executed:

# Check the local informers package to understand how AddEventHandler works cat -n ./pkg/pipeline/transform/kubernetes/informers/informers.go | head -100

Repository: netobserv/flowlogs-pipeline

Length of output: 3756

🏁 Script executed:

# Look for where the actual SharedInformer is being used rg "SharedInformer|cache\.DeletedFinalStateUnknown" ./pkg/pipeline/transform/kubernetes/informers/

Repository: netobserv/flowlogs-pipeline

Length of output: 1268

🏁 Script executed:

# Look for the AddEventHandler method in the informers package rg "AddEventHandler" ./pkg/pipeline/transform/kubernetes/informers/ -B 3 -A 10

Repository: netobserv/flowlogs-pipeline

Length of output: 2369

🏁 Script executed:

# Search for where informer.AddEventHandler or similar is called rg "AddEventHandler|cache.ResourceEventHandlerFuncs" ./pkg/pipeline/transform/kubernetes/informers/ -B 2 -A 5

Repository: netobserv/flowlogs-pipeline

Length of output: 1871

🏁 Script executed:

# Get the full informers.go file to understand event handler implementation wc -l ./pkg/pipeline/transform/kubernetes/informers/informers.go

Repository: netobserv/flowlogs-pipeline

Length of output: 133

🏁 Script executed:

# Read the complete cache_export.go file cat -n ./pkg/pipeline/transform/kubernetes/informers/cache_export.go

Repository: netobserv/flowlogs-pipeline

Length of output: 1027

Handle cache.DeletedFinalStateUnknown tombstones in OnDelete callback.

When a watch deletion is missed (e.g., due to API server disconnection), k8s.io/client-go sends cache.DeletedFinalStateUnknown containing the last known state. The current code's direct type assertion to *model.ResourceMetaData fails for these tombstones, causing the handler to return without processing the deletion. This leaves stale entries in the remote cache.

Unwrap tombstones before the assertion:

func (h *cacheEventHandler) OnDelete(obj interface{}) { var meta *model.ResourceMetaData switch t := obj.(type) { case cache.DeletedFinalStateUnknown: meta, _ = t.Obj.(*model.ResourceMetaData) case *model.ResourceMetaData: meta = t default: log.Warnf("unexpected object type in OnDelete: %T", obj) return } if meta == nil { return } // ... send delete ... }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flp-informers/main.go` around lines 180 - 185, OnDelete in cacheEventHandler currently does a direct type assertion to *model.ResourceMetaData and ignores cache.DeletedFinalStateUnknown tombstones, so missed-delete events leave stale cache entries; change OnDelete to unwrap tombstones by checking for cache.DeletedFinalStateUnknown and extracting its Obj into a *model.ResourceMetaData (falling back to a direct *model.ResourceMetaData assertion), handle nil after extraction, then proceed with the existing delete logic so deletions from tombstones are processed.

coderabbitai · 2026-03-27T10:57:46Z

+	// Send SyncRequest with our current version so the client knows what to send
+	// (full snapshot if 0, or incrementals from that version)
+	lastVersion := s.version.Load()
+	err := stream.Send(&SyncMessage{
+		Message: &SyncMessage_Request{
+			Request: &SyncRequest{
+				ProcessorId: connectionID,
+				LastVersion: lastVersion,
+			},


⚠️ Potential issue | 🔴 Critical

Fresh processors never receive a full cache.

Line 46 says version 0 should recover full state, but Lines 132-146 ignore snapshot semantics, and the sender side currently drops initial adds. Once the datasource switches lookups to KubernetesStore, a new/restarted processor misses every pre-existing pod/node/service until those objects change. There is already a Replace path in KubernetesStore; this needs to be wired into the sync protocol or gated behind an initial-sync completion signal.

Also applies to: 128-146

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/pipeline/transform/kubernetes/k8scache/server.go` around lines 45 - 53, The server currently sends a SyncRequest with lastVersion from s.version.Load() but does not honor snapshot semantics for fresh processors (version==0) and the sender drops initial adds; update the sync protocol so when SyncRequest.LastVersion == 0 the server responds with a full snapshot SyncMessage (use the existing Replace semantics) instead of/in addition to incrementals, by wiring KubernetesStore.Replace into the response path and ensuring stream.Send sends that full snapshot before any incremental events; alternatively, introduce an per-connection initialSyncComplete flag so the sender will not drop initial add events and will gate switching to KubernetesStore until Replace has been applied, referencing SyncMessage, SyncMessage_Request/SyncRequest, s.version.Load(), KubernetesStore.Replace, and the stream.Send sender logic to locate and modify the code paths.

coderabbitai · 2026-03-27T10:57:46Z

+// ResourceEntry represents a single Kubernetes resource in the cache
+message ResourceEntry {
+  string kind = 1;           // "Pod", "Node", or "Service"
+  string namespace = 2;
+  string name = 3;
+  string uid = 4;
+
+  // Resource-specific metadata
+  string owner_name = 5;
+  string owner_kind = 6;
+  string host_name = 7;      // For Pods: the node name
+  string host_ip = 8;        // For Pods: the node IP
+  string network_name = 9;   // For multi-network support
+
+  repeated string ips = 10;  // IP addresses associated with this resource
+  repeated string secondary_net_keys = 11;  // Secondary network keys for CNI plugins
+
+  map<string, string> labels = 12;
+  map<string, string> annotations = 13;
+
+  // Kubernetes metadata
+  int64 creation_timestamp = 14;  // Unix timestamp
+  string resource_version = 15;   // K8s resource version
+}


⚠️ Potential issue | 🟠 Major

Missing secondary_net_names field causes data loss for multi-network support.

Per converter.go:56-80 and informers.go:133-137, SecondaryNetNames (a map[string]string) is actively used to resolve NetworkName from secondary network keys. The proto only has secondary_net_keys (repeated string) but omits the names mapping.

Without this field, FLP processors using the cache can't perform network name resolution for secondary networks.

Proposed fix

repeated string secondary_net_keys = 11; // Secondary network keys for CNI plugins + map<string, string> secondary_net_names = 16; // Maps secondary network keys to network names map<string, string> labels = 12;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// ResourceEntry represents a single Kubernetes resource in the cache

message ResourceEntry {

string kind = 1; // "Pod", "Node", or "Service"

string namespace = 2;

string name = 3;

string uid = 4;

// Resource-specific metadata

string owner_name = 5;

string owner_kind = 6;

string host_name = 7; // For Pods: the node name

string host_ip = 8; // For Pods: the node IP

string network_name = 9; // For multi-network support

repeated string ips = 10; // IP addresses associated with this resource

repeated string secondary_net_keys = 11; // Secondary network keys for CNI plugins

map<string, string> labels = 12;

map<string, string> annotations = 13;

// Kubernetes metadata

int64 creation_timestamp = 14; // Unix timestamp

string resource_version = 15; // K8s resource version

}

// ResourceEntry represents a single Kubernetes resource in the cache

message ResourceEntry {

string kind = 1; // "Pod", "Node", or "Service"

string namespace = 2;

string name = 3;

string uid = 4;

// Resource-specific metadata

string owner_name = 5;

string owner_kind = 6;

string host_name = 7; // For Pods: the node name

string host_ip = 8; // For Pods: the node IP

string network_name = 9; // For multi-network support

repeated string ips = 10; // IP addresses associated with this resource

repeated string secondary_net_keys = 11; // Secondary network keys for CNI plugins

map<string, string> secondary_net_names = 16; // Maps secondary network keys to network names

map<string, string> labels = 12;

map<string, string> annotations = 13;

// Kubernetes metadata

int64 creation_timestamp = 14; // Unix timestamp

string resource_version = 15; // K8s resource version

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@proto/k8scache.proto` around lines 71 - 94, The proto is missing a secondary_net_names mapping needed to preserve secondary network name resolution: add a map<string, string> secondary_net_names field to the ResourceEntry message (alongside the existing repeated secondary_net_keys) and bump the field number accordingly, then update the conversion logic in converter.go (the code that maps SecondaryNetNames -> ResourceEntry) and the informer population in informers.go to populate this new map so FLP processors can resolve NetworkName from secondary network keys.

openshift-ci-robot · 2026-03-27T15:15:53Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

New Features

Kubernetes cache sync server for real-time resource metadata sharing across processors.

flp-informers component to forward incremental Kubernetes updates to processors.

New CLI flags to configure k8scache connectivity (--k8scache.address, --k8scache.port).

New Kubernetes manifests and Make targets to deploy/undeploy k8scache and flp-informers, plus local deploy/cleanup flows.

Documentation

Help output and proto regeneration docs updated for k8scache.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 9

♻️ Duplicate comments (1)

pkg/pipeline/transform/kubernetes/k8scache/client.go (1)
466-478: ⚠️ Potential issue | 🔴 Critical

Fresh processors still never receive the current cache state.

Lines 474-477 explicitly drop snapshot/replay. Combined with pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go, Lines 42-45 skipping initial-list ADDs, and pkg/pipeline/transform/kubernetes/datasource/datasource.go, Lines 11-45 switching lookups entirely to KubernetesStore, a restarted processor has no enrichment until each object changes again. This needs a snapshot/replace path or a gated fallback until initial sync completes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 466 - 478,
handleSyncRequest currently never sends an initial snapshot, so restarted
processors (processorConnection / SyncRequest) never receive existing cache
state; update handleSyncRequest in Client to detect when a processor needs an
initial snapshot (e.g., req.LastVersion==0 or explicit snapshot flag) and send a
snapshot/replace sequence using the current cache from the
KubernetesStore/KubernetesClient (iterate the store and emit ADD/REPLACE events
or a single REPLACE message) before switching to incremental updates;
alternatively implement a gated fallback where processorConnection is marked
syncing until an initial snapshot is sent and lookups from
datasource.KubernetesStore honor that sync-complete flag so processors get
consistent enrichment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.mk/development.mk:
- Around line 69-73: The kubectl invocations in the Makefile targets (e.g.,
target deploy-flp-informers and the deploy-k8scache block) currently apply
templated manifests but do not pass the intended namespace, so resources are
created/read in the default kubectl namespace; update all kubectl calls (kubectl
apply, kubectl rollout status, kubectl get/delete/patch etc. referenced in the
deploy-k8scache section) to include -n $(NAMESPACE) (or
--namespace=$(NAMESPACE)) so the operations use the same templated %NAMESPACE%
as the manifest generation; ensure every kubectl command string in those targets
is changed consistently.

In `@Makefile`:
- Around line 33-34: The NAMESPACE assignment (variable NAMESPACE) should treat
an empty jsonpath result as "default" in addition to handling kubectl failure;
update the shell expression that runs kubectl config view --minify --output
'jsonpath={..namespace}' so that if the command returns empty string it falls
back to "default" (for example by capturing the output and using a shell
conditional or a piped substitution to replace an empty result with "default").
Ensure you modify the Makefile line that defines NAMESPACE so it covers both
non-zero exit and empty output cases.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 135-168: Currently c.mu is held while dialing and creating the
stream (see c.mu, c.connections, getTransportCredentials, grpc.NewClient,
NewKubernetesCacheServiceClient, StreamUpdates), which can block broadcastUpdate
and back up updateChan; change the flow to: acquire c.mu and only check/record
existence, then release c.mu before calling
getTransportCredentials/grpc.NewClient and creating the stream, then re-acquire
c.mu to verify no other goroutine inserted a connection and either store the new
connection/stream into c.connections or, if one exists, close the newly-created
conn/stream and return; ensure you transfer the ctx/cancel handling correctly
when moving creation outside the lock so cancel/conn.Close are called on the new
resources if you drop them due to a race.
- Around line 336-400: The reconnect path races on pc.conn/pc.stream/pc.cancel
because reconnect sets those fields under pc.mu but broadcastUpdate,
receiveFromProcessor, RemoveProcessor, and Stop access/modify them without
locking; fix by ensuring all reads/writes of pc.conn, pc.stream, and pc.cancel
are performed while holding pc.mu (or via new accessor methods that acquire
pc.mu) — update broadcastUpdate, receiveFromProcessor, RemoveProcessor, and Stop
to lock pc.mu around any access/close/cancel of pc.conn, pc.stream, or pc.cancel
(or replace direct field access with synchronized getters/setters) so the mutex
consistently protects these fields along with reconnect.
- Around line 314-328: The timeout branch must actively tear down the blocked
send so the goroutine in receiveFromProcessor sees a receive error and
reconnects: when sendTimeout fires in the select (the case currently logging
"send operation timed out"), invoke the stream teardown (e.g., call
pc.stream.CloseSend() or cancel the stream/context associated with pc.stream),
handle/ log any CloseSend/cancel error, then mark pc.healthy.Store(false); this
ensures the goroutine blocked in pc.stream.Send(update) is unblocked and the
existing reconnect path will run. Use the same pc.stream reference used in the
goroutine and ensure any returned error from CloseSend/cancel is logged via
clog.WithError(...).WithField("address", pc.address).

In `@pkg/pipeline/transform/kubernetes/k8scache/discovery.go`:
- Around line 67-68: Before calling time.NewTicker in discovery.go, validate
cfg.ResyncInterval and fail fast for non-positive values: check if
cfg.ResyncInterval <= 0 and return a clear configuration error (e.g.,
fmt.Errorf("invalid ResyncInterval: %d", cfg.ResyncInterval)) instead of
creating the ticker; update the code around the ticker := time.NewTicker(...) /
defer ticker.Stop() to perform this validation first so time.NewTicker is never
called with a non-positive duration.

In `@pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go`:
- Around line 40-62: The test TestOnDelete_TombstoneEvent is reimplementing the
tombstone unwrapping instead of exercising the real handler; update the test to
call the production OnDelete method (the event handler function named OnDelete)
with a cache.DeletedFinalStateUnknown wrapping the test pod, then
capture/inspect the emitted metadata (e.g., via the same channel, logger, or
mock used by the handler) and assert Name and Namespace on that emitted
model.ResourceMetaData; do the same replacement for the other similar tests
(lines 64-90 and 93-156) so they drive the real OnDelete path rather than
performing the type-assertion inline.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go`:
- Around line 20-109: The tests only verify ACKs and not that the datasource was
mutated—ensure each integration test initializes a real store (call
ds.SetKubernetesStore(datasource.NewKubernetesStore()) where missing, e.g. in
TestIntegration_MultipleUpdatesFlow) and after receiving the server ACKs query
the store to assert the expected state: for the ADD path use the store lookup
API (by Uid, Name/Namespace or IP) to assert the ResourceEntry was persisted and
that Ips contains "10.0.0.100"; for DELETE/UPDATE paths assert the resource is
removed or updated accordingly. Locate the test functions
(TestIntegration_ServerReceivesAdd, TestIntegration_MultipleUpdatesFlow), the
datasource.Datasource instance, NewKubernetesStore, SetKubernetesStore,
KubernetesCacheServer, StreamUpdates, CacheUpdate and ResourceEntry to add these
post-ACK assertions.
- Around line 266-290: The test currently treats any successful stream.Recv() as
a valid ACK; update the ACK receive/validation after sending the ADD CacheUpdate
so that you assert the received message is the expected SyncAck and that its
fields are correct (e.g., type is SyncAck, Version matches the sent update or
expected ack version, and any status field indicates success). Locate the
Receive ACK block (the call to stream.Recv() used to validate the ACK) and
replace the loose nil-check with a typed assertion of the returned message
(ensure it's a SyncAck) and explicit checks on relevant fields to fail the test
on wrong message type or bad payload.

---

Duplicate comments:
In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 466-478: handleSyncRequest currently never sends an initial
snapshot, so restarted processors (processorConnection / SyncRequest) never
receive existing cache state; update handleSyncRequest in Client to detect when
a processor needs an initial snapshot (e.g., req.LastVersion==0 or explicit
snapshot flag) and send a snapshot/replace sequence using the current cache from
the KubernetesStore/KubernetesClient (iterate the store and emit ADD/REPLACE
events or a single REPLACE message) before switching to incremental updates;
alternatively implement a gated fallback where processorConnection is marked
syncing until an initial snapshot is sent and lookups from
datasource.KubernetesStore honor that sync-complete flag so processors get
consistent enrichment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3bb4c139-47bc-4de5-8063-30b43f38c81f

📥 Commits

Reviewing files that changed from the base of the PR and between 62a46f4 and 6b7585a.

⛔ Files ignored due to path filters (1)

pkg/pipeline/transform/kubernetes/k8scache/k8scache.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (16)

.mk/development.mk
Makefile
cmd/flp-informers/main.go
contrib/kubernetes/deployment-flp-informers.yaml
contrib/kubernetes/deployment-k8scache.yaml
pkg/pipeline/transform/kubernetes/datasource/datasource.go
pkg/pipeline/transform/kubernetes/informers/cache_export.go
pkg/pipeline/transform/kubernetes/k8scache/client.go
pkg/pipeline/transform/kubernetes/k8scache/converter.go
pkg/pipeline/transform/kubernetes/k8scache/discovery.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/server_test.go
proto/k8scache.proto

✅ Files skipped from review due to trivial changes (2)

pkg/pipeline/transform/kubernetes/k8scache/converter.go
contrib/kubernetes/deployment-flp-informers.yaml

🚧 Files skipped from review as they are similar to previous changes (3)

pkg/pipeline/transform/kubernetes/datasource/datasource.go
contrib/kubernetes/deployment-k8scache.yaml
pkg/pipeline/transform/kubernetes/informers/cache_export.go

coderabbitai · 2026-03-27T15:31:29Z

+func TestIntegration_ServerReceivesAdd(t *testing.T) {
+	// Setup test datasource
+	_, informers := inf.SetupStubs(testIPInfo, nil, testNodes)
+	ds := &datasource.Datasource{Informers: informers}
+	ds.SetKubernetesStore(datasource.NewKubernetesStore())
+
+	// Create cache server
+	cacheServer := NewKubernetesCacheServer(ds)
+
+	// Create gRPC server
+	grpcServer := grpc.NewServer()
+	RegisterKubernetesCacheServiceServer(grpcServer, cacheServer)
+
+	// Start server on random port
+	listener, err := net.Listen("tcp", "127.0.0.1:0")
+	require.NoError(t, err)
+	defer listener.Close()
+
+	address := listener.Addr().String()
+
+	// Start server in background
+	go func() {
+		_ = grpcServer.Serve(listener)
+	}()
+	defer grpcServer.Stop()
+
+	// Give server time to start
+	time.Sleep(100 * time.Millisecond)
+
+	// Create client connection
+	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer cancel()
+
+	conn, err := grpc.NewClient(address,
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+	)
+	require.NoError(t, err, "Failed to connect to server")
+	defer conn.Close()
+
+	// Create client
+	client := NewKubernetesCacheServiceClient(conn)
+
+	// Open bidirectional stream
+	stream, err := client.StreamUpdates(ctx)
+	require.NoError(t, err)
+
+	// Client should receive SyncRequest from server first
+	syncMsg, err := stream.Recv()
+	require.NoError(t, err)
+	require.NotNil(t, syncMsg)
+
+	req, ok := syncMsg.Message.(*SyncMessage_Request)
+	require.True(t, ok, "Expected SyncRequest from server")
+	assert.Equal(t, int64(0), req.Request.LastVersion)
+
+	// Client sends ADD update
+	addUpdate := &CacheUpdate{
+		Version:    1,
+		IsSnapshot: false,
+		Operation:  OperationType_OPERATION_ADD,
+		Entries: []*ResourceEntry{
+			{
+				Kind:      "Pod",
+				Namespace: "default",
+				Name:      "test-pod",
+				Uid:       "test-uid",
+				Ips:       []string{"10.0.0.100"},
+			},
+		},
+	}
+
+	err = stream.Send(addUpdate)
+	require.NoError(t, err)
+
+	// Client should receive ACK from server
+	ackMsg, err := stream.Recv()
+	require.NoError(t, err)
+	require.NotNil(t, ackMsg)
+
+	ack, ok := ackMsg.Message.(*SyncMessage_Ack)
+	require.True(t, ok, "Expected SyncAck from server")
+	assert.True(t, ack.Ack.Success, "Server should ACK successfully")
+	assert.Equal(t, int64(1), ack.Ack.Version)
+
+	// Close stream
+	err = stream.CloseSend()
+	assert.NoError(t, err)
+
+	fmt.Printf("✓ Integration test passed: server on %s received and acknowledged ADD update\n", address)
+}


⚠️ Potential issue | 🟡 Minor

Make these integration tests assert cache state.

Both tests stop at ACK/version checks, so they can still pass if the gRPC path never mutates the datasource. TestIntegration_MultipleUpdatesFlow also skips KubernetesStore, which means it cannot verify ADD/DELETE effects end-to-end at all. Please back these cases with a store, include queryable fields like IPs, and assert the datasource state after each update.

Also applies to: 112-214

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go` around lines 20 - 109, The tests only verify ACKs and not that the datasource was mutated—ensure each integration test initializes a real store (call ds.SetKubernetesStore(datasource.NewKubernetesStore()) where missing, e.g. in TestIntegration_MultipleUpdatesFlow) and after receiving the server ACKs query the store to assert the expected state: for the ADD path use the store lookup API (by Uid, Name/Namespace or IP) to assert the ResourceEntry was persisted and that Ips contains "10.0.0.100"; for DELETE/UPDATE paths assert the resource is removed or updated accordingly. Locate the test functions (TestIntegration_ServerReceivesAdd, TestIntegration_MultipleUpdatesFlow), the datasource.Datasource instance, NewKubernetesStore, SetKubernetesStore, KubernetesCacheServer, StreamUpdates, CacheUpdate and ResourceEntry to add these post-ACK assertions.

openshift-ci-robot · 2026-03-27T16:28:46Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

New Features

Kubernetes cache sync server for real-time resource metadata sharing across processors.

flp-informers component to forward incremental Kubernetes updates to processors.

CLI flags to configure k8scache connectivity (--k8scache.address, --k8scache.port).

Kubernetes manifests and Make targets to deploy/undeploy k8scache and flp-informers, plus local deploy/cleanup flows.

Documentation

Help output and proto regeneration docs updated for k8scache.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

pkg/pipeline/transform/kubernetes/k8scache/integration_test.go (1)
114-116: ⚠️ Potential issue | 🟠 Major

Missing SetKubernetesStore - store operations are no-ops.

Without ds.SetKubernetesStore(datasource.NewKubernetesStore()), ApplyCacheAddOrUpdate and ApplyCacheDelete silently do nothing. The test verifies ACKs but not actual data persistence.
🐛 Proposed fix
 	// Setup
 	_, informers := inf.SetupStubs(testIPInfo, nil, testNodes)
 	ds := &datasource.Datasource{Informers: informers}
+	ds.SetKubernetesStore(datasource.NewKubernetesStore())
 	cacheServer := NewKubernetesCacheServer(ds)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go` around lines
114 - 116, The test constructs a Datasource with Informers but never initializes
its internal store, so calls to ApplyCacheAddOrUpdate and ApplyCacheDelete
become no-ops; fix by calling
ds.SetKubernetesStore(datasource.NewKubernetesStore()) before creating the
KubernetesCacheServer (NewKubernetesCacheServer) so the Datasource has a real
store and the ApplyCacheAddOrUpdate / ApplyCacheDelete behavior is exercised and
persisted during the test.

🧹 Nitpick comments (3)

pkg/pipeline/transform/kubernetes/k8scache/client.go (2)

166-174: Consider early exit when client is stopping.

AddProcessor can be called by the discovery goroutine after Stop() has cancelled c.ctx. While not a data race, this wastes work establishing connections that will immediately fail.

💡 Proposed improvement

 func (c *Client) AddProcessor(address string) error {
+	// Early exit if client is stopping
+	select {
+	case <-c.ctx.Done():
+		return c.ctx.Err()
+	default:
+	}
+
 	// First check: quick lock to see if already connected
 	c.mu.Lock()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 166 - 174,
Add an early-exit in AddProcessor to avoid dialing when the client is stopping:
immediately after the initial existence check (or at the very start of
AddProcessor) check c.ctx for cancellation (e.g., if c.ctx.Err() != nil or
select on c.ctx.Done()) and return early (nil or a contextual error) so you
don't proceed to establish a connection that will fail; update the logic around
the existing lock/unlock and the code paths that follow to respect this
canceled-context check before performing any network work or spawning connection
goroutines.

409-497: Long-held lock during reconnection may block Stop().

pc.mu is held for the entire reconnect loop (up to ~2 minutes with backoff). Stop() → closeConnection() will block waiting for this lock. Consider releasing the lock during time.After backoff or checking for shutdown before each retry.

💡 Suggested approach

Check for client shutdown at the start of each retry iteration before acquiring pc.mu:

 func (c *Client) reconnect(pc *processorConnection) bool {
-	pc.mu.Lock()
-	defer pc.mu.Unlock()
-
 	backoff := initialBackoff
 	for attempt := 1; attempt <= maxReconnectAttempts; attempt++ {
+		// Check if client is shutting down
+		select {
+		case <-c.ctx.Done():
+			clog.WithField("address", pc.address).Info("reconnection cancelled")
+			return false
+		default:
+		}
+
+		pc.mu.Lock()
 		pc.reconnectAttempts = attempt
 		// ... rest of attempt logic ...
+		pc.mu.Unlock()
+
+		// Wait before retry (outside lock)
+		select {
+		case <-time.After(backoff):
+		case <-c.ctx.Done():
+			return false
+		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 409 - 497,
The reconnect loop holds pc.mu for the entire backoff loop, blocking
Stop()/closeConnection(); to fix, restructure reconnect so it checks
c.ctx.Done() at the start of each retry and does not hold pc.mu while sleeping:
move pc.mu.Lock()/Unlock() inside the loop only around sections that mutate pc
(e.g., updating pc.reconnectAttempts, closing/setting pc.conn, pc.stream,
pc.cancel, and pc.healthy), perform the time.After(backoff) wait without holding
pc.mu, and only acquire the lock immediately before touching pc fields (and
release it promptly afterward); keep use of c.getTransportCredentials(),
grpc.NewClient, client.StreamUpdates, and the final deletion from c.connections
unchanged but ensure mutations to pc and the connections map occur under their
respective locks.

pkg/pipeline/transform/kubernetes/k8scache/integration_test.go (1)

46-48: Consider replacing time.Sleep with retry-based connection.

Sleep-based waits can cause flaky tests. Consider using grpc.WithBlock() option or a retry loop for more reliable server-ready detection.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go` around lines
46 - 48, Replace the brittle time.Sleep(100 * time.Millisecond) used to wait for
the test server with a deterministic connection attempt: remove the time.Sleep
call and instead perform a blocking/retry dial using grpc.WithBlock() (or
grpc.DialContext) with a context timeout (context.WithTimeout) or a short retry
loop that repeatedly calls grpc.Dial until success, then proceed once the
connection is established; update the test setup where time.Sleep is invoked to
use grpc.WithBlock/ctx timeout or a retry loop around grpc.Dial to reliably
detect server readiness.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 131-133: The tls.Config initialization in the client (the
tlsConfig variable in client.go) omits MinVersion, leaving protocol downgrade
risk; update the tls.Config used in the client (where tlsConfig is created) to
explicitly set MinVersion: tls.VersionTLS12 (or tls.VersionTLS13 for stricter
security) while keeping InsecureSkipVerify as currently sourced from
c.tlsConfig.InsecureSkipVerify so the config enforces a minimum TLS version for
connections.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go`:
- Around line 219-222: The test is missing initialization of the Kubernetes
store so the cache data path isn't exercised; call SetKubernetesStore on the
Datasource used by NewKubernetesCacheServer after creating ds
(datasource.Datasource) so the informer-backed store is wired before running the
integration test (mirror the fix applied in TestIntegration_MultipleUpdatesFlow
by invoking SetKubernetesStore with the appropriate informer/store stub returned
from inf.SetupStubs).

---

Duplicate comments:
In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go`:
- Around line 114-116: The test constructs a Datasource with Informers but never
initializes its internal store, so calls to ApplyCacheAddOrUpdate and
ApplyCacheDelete become no-ops; fix by calling
ds.SetKubernetesStore(datasource.NewKubernetesStore()) before creating the
KubernetesCacheServer (NewKubernetesCacheServer) so the Datasource has a real
store and the ApplyCacheAddOrUpdate / ApplyCacheDelete behavior is exercised and
persisted during the test.

---

Nitpick comments:
In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 166-174: Add an early-exit in AddProcessor to avoid dialing when
the client is stopping: immediately after the initial existence check (or at the
very start of AddProcessor) check c.ctx for cancellation (e.g., if c.ctx.Err()
!= nil or select on c.ctx.Done()) and return early (nil or a contextual error)
so you don't proceed to establish a connection that will fail; update the logic
around the existing lock/unlock and the code paths that follow to respect this
canceled-context check before performing any network work or spawning connection
goroutines.
- Around line 409-497: The reconnect loop holds pc.mu for the entire backoff
loop, blocking Stop()/closeConnection(); to fix, restructure reconnect so it
checks c.ctx.Done() at the start of each retry and does not hold pc.mu while
sleeping: move pc.mu.Lock()/Unlock() inside the loop only around sections that
mutate pc (e.g., updating pc.reconnectAttempts, closing/setting pc.conn,
pc.stream, pc.cancel, and pc.healthy), perform the time.After(backoff) wait
without holding pc.mu, and only acquire the lock immediately before touching pc
fields (and release it promptly afterward); keep use of
c.getTransportCredentials(), grpc.NewClient, client.StreamUpdates, and the final
deletion from c.connections unchanged but ensure mutations to pc and the
connections map occur under their respective locks.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go`:
- Around line 46-48: Replace the brittle time.Sleep(100 * time.Millisecond) used
to wait for the test server with a deterministic connection attempt: remove the
time.Sleep call and instead perform a blocking/retry dial using grpc.WithBlock()
(or grpc.DialContext) with a context timeout (context.WithTimeout) or a short
retry loop that repeatedly calls grpc.Dial until success, then proceed once the
connection is established; update the test setup where time.Sleep is invoked to
use grpc.WithBlock/ctx timeout or a retry loop around grpc.Dial to reliably
detect server readiness.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d5a6fdb2-fc7f-4715-9a77-86e86d2025f3

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7585a and de63e00.

📒 Files selected for processing (7)

.mk/development.mk
Makefile
pkg/pipeline/transform/kubernetes/k8scache/client.go
pkg/pipeline/transform/kubernetes/k8scache/discovery.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go

🚧 Files skipped from review as they are similar to previous changes (3)

Makefile
pkg/pipeline/transform/kubernetes/k8scache/discovery.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go

coderabbitai · 2026-03-27T16:33:14Z

+	tlsConfig := &tls.Config{
+		InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify,
+	}


⚠️ Potential issue | 🟡 Minor

Set TLS MinVersion to prevent downgrade attacks.

The tls.Config lacks MinVersion, defaulting to TLS 1.0 for servers (TLS 1.2 for clients). Explicitly set MinVersion: tls.VersionTLS12 (or tls.VersionTLS13 for stronger security).

🔒 Proposed fix

tlsConfig := &tls.Config{ InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify, + MinVersion: tls.VersionTLS12, }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 131 - 133, The tls.Config initialization in the client (the tlsConfig variable in client.go) omits MinVersion, leaving protocol downgrade risk; update the tls.Config used in the client (where tlsConfig is created) to explicitly set MinVersion: tls.VersionTLS12 (or tls.VersionTLS13 for stricter security) while keeping InsecureSkipVerify as currently sourced from c.tlsConfig.InsecureSkipVerify so the config enforces a minimum TLS version for connections.

coderabbitai · 2026-03-27T16:33:14Z

+	// Setup
+	_, informers := inf.SetupStubs(testIPInfo, nil, testNodes)
+	ds := &datasource.Datasource{Informers: informers}
+	cacheServer := NewKubernetesCacheServer(ds)


⚠️ Potential issue | 🟠 Major

Missing SetKubernetesStore - same issue as TestIntegration_MultipleUpdatesFlow.

Add store initialization for complete data path coverage.

🐛 Proposed fix

// Setup _, informers := inf.SetupStubs(testIPInfo, nil, testNodes) ds := &datasource.Datasource{Informers: informers} + ds.SetKubernetesStore(datasource.NewKubernetesStore()) cacheServer := NewKubernetesCacheServer(ds)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Setup

_, informers := inf.SetupStubs(testIPInfo, nil, testNodes)

ds := &datasource.Datasource{Informers: informers}

cacheServer := NewKubernetesCacheServer(ds)

// Setup

_, informers := inf.SetupStubs(testIPInfo, nil, testNodes)

ds := &datasource.Datasource{Informers: informers}

ds.SetKubernetesStore(datasource.NewKubernetesStore())

cacheServer := NewKubernetesCacheServer(ds)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go` around lines 219 - 222, The test is missing initialization of the Kubernetes store so the cache data path isn't exercised; call SetKubernetesStore on the Datasource used by NewKubernetesCacheServer after creating ds (datasource.Datasource) so the informer-backed store is wired before running the integration test (mirror the fix applied in TestIntegration_MultipleUpdatesFlow by invoking SetKubernetesStore with the appropriate informer/store stub returned from inf.SetupStubs).

openshift-ci-robot · 2026-03-30T11:20:30Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

New Features

Real-time Kubernetes cache sync (k8scache) and flp-informers component for forwarding incremental resource updates.

New CLI flags to enable/configure k8scache.

Kubernetes manifests, Docker image, and Make targets to deploy/undeploy k8scache, flp-informers, and local k8scache workflows.

Documentation

Updated help output and proto regeneration instructions for k8scache.

Tests

Added unit and integration tests covering k8scache protocol, discovery, and event handler behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go (1)

41-60: Consider adding tests for OnAdd and OnUpdate handlers.

Only OnDelete is covered. OnAdd (with isInInitialList flag) and OnUpdate have their own branching logic that should be tested.

Want me to generate test cases for OnAdd and OnUpdate handlers?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go` around lines
41 - 60, Add unit tests for the event handler's OnAdd and OnUpdate paths: create
tests that call the handler's OnAdd with isInInitialList both true and false to
verify that SendAdd on mockClient is called (or suppressed) appropriately, and
add tests for OnUpdate covering both a real-change case (old vs new
ResourceMetaData differ) that should call SendUpdate and a no-change case that
should not; use the existing mockClient (methods SendAdd, SendUpdate,
SendDelete) and the same setup used for the OnDelete test (construct the handler
via NewEventHandler or the eventHandler type and assert
mockClient.addedMeta/updatedMeta contents and sendError behavior).

.mk/development.mk (1)

142-148: Missing undeploy-k8scache target.

There's deploy-k8scache but no corresponding undeploy-k8scache target. The cleanup chain (local-deployments-cleanup-k8scache) uses the generic undeploy which may not fully clean up the k8scache-specific deployment.

Proposed fix: Add undeploy-k8scache target

+.PHONY: undeploy-k8scache
+undeploy-k8scache: ## Undeploy FLP with k8scache server
+	sed 's|%IMAGE_TAG_BASE%|$(IMAGE_TAG_BASE)|g;s|%VERSION%|$(VERSION)|g;s|%NAMESPACE%|$(NAMESPACE)|g' contrib/kubernetes/deployment-k8scache.yaml > /tmp/deployment-k8scache.yaml
+	kubectl --ignore-not-found=true delete configmap flowlogs-pipeline-configuration -n $(NAMESPACE) || true
+	kubectl --ignore-not-found=true delete -f /tmp/deployment-k8scache.yaml -n $(NAMESPACE) || true
+
 .PHONY: local-deploy-k8scache

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.mk/development.mk around lines 142 - 148, There is a missing
undeploy-k8scache Make target to mirror deploy-k8scache; add an
undeploy-k8scache target that removes the k8scache-specific Kubernetes resources
(delete the deployment created from /tmp/deployment-k8scache.yaml and the
flowlogs-pipeline-configuration configmap) and wait for rollout/termination as
needed; ensure the new target name is exactly undeploy-k8scache so
local-deployments-cleanup-k8scache can call it, and reuse the same identifiers
used in deploy-k8scache (e.g., flowlogs-pipeline configmap name and
deploy/flowlogs-pipeline) so cleanup fully removes the k8scache artifacts.

pkg/pipeline/transform/kubernetes/k8scache/server.go (1)

67-81: Recv blocking in select-default may cause tight loop on slow receivers.

The select with default case means if ctx.Done() isn't signaled, it immediately calls stream.Recv(). This is fine since Recv() blocks, but the pattern is slightly unusual. Consider using a dedicated select on ctx.Done() inside the Recv goroutine if you need cancellation responsiveness during receives.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/server.go` around lines 67 - 81,
The current loop uses select with a default and directly calls stream.Recv(),
which can lead to a tight/select-default pattern; change to read from the stream
in a dedicated goroutine that sends received updates and errors into channels,
then in the main loop select between ctx.Done() and those channels (use symbols
stream.Recv, ctx.Done, connectionID) so cancellation is handled promptly without
relying on the default case—on ctx.Done() cancel/cleanup and on recv-channel
handle io.EOF and errors the same way you currently log using connectionID.

pkg/pipeline/transform/kubernetes/k8scache/discovery.go (1)

88-121: No cleanup of connections when pods terminate.

discoverAndConnect adds processors but never removes stale ones. When a pod terminates, its processorConnection remains in the client until the connection fails. This is acceptable since broadcastUpdate skips unhealthy connections, but consider adding explicit cleanup for faster resource reclamation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/discovery.go` around lines 88 -
121, discoverAndConnect currently only adds processors and never removes stale
connections; update it to reconcile the client's processor list with the current
pod IPs: after listing pods build the set of desired addresses
(fmt.Sprintf("%s:%d", pod.Status.PodIP, cfg.ProcessorPort) for running pods with
PodIP) and then remove any client-side processors not in that set by calling a
remove method (e.g., Client.RemoveProcessor or a new method you add) or by
iterating client.processorConnections and closing/removing entries for addresses
absent from the desired set; keep the existing AddProcessor calls for new
addresses so discoverAndConnect performs full reconciliation.

pkg/pipeline/transform/kubernetes/k8scache/client.go (1)

236-252: Lock released and re-acquired in RemoveProcessor creates a window.

Lines 249-251 release c.mu, call pc.closeConnection(), then re-lock. Due to defer c.mu.Unlock() at line 238, the lock is re-acquired only to be immediately released. This is benign but confusing. Restructure to avoid the double unlock pattern.

Cleaner pattern

 func (c *Client) RemoveProcessor(address string) {
 	c.mu.Lock()
-	defer c.mu.Unlock()
-
 	pc, exists := c.connections[address]
 	if !exists {
+		c.mu.Unlock()
 		return
 	}
 
 	clog.WithField("address", address).Info("disconnecting from FLP processor")
 	delete(c.connections, address)
-
-	// Close connection after releasing client lock to avoid holding both locks
 	c.mu.Unlock()
+
+	// Close connection after releasing client lock to avoid holding both locks
 	pc.closeConnection()
-	c.mu.Lock()
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 236 - 252,
The RemoveProcessor method currently uses defer c.mu.Unlock() and also does an
explicit c.mu.Unlock()/c.mu.Lock() around pc.closeConnection(), creating a
confusing double-unlock/re-lock window; change the pattern to lock, look up pc,
delete the map entry, then explicitly unlock before calling pc.closeConnection()
(i.e., remove the defer and the later c.mu.Unlock()/c.mu.Lock()), so the map
mutation (c.connections[address] deletion) happens under c.mu and the
potentially blocking pc.closeConnection() runs without holding c.mu; reference
Client.RemoveProcessor, c.mu, c.connections, and pc.closeConnection when making
the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/flp-informers/main.go`:
- Around line 170-174: StartProcessorDiscovery is launched inside a goroutine so
any immediate errors (e.g., invalid discoveryConfig) are only logged and not
surfaced to startup; change this to detect and handle immediate failures by
calling k8scache.StartProcessorDiscovery synchronously for the initial startup
or by capturing the first error through a channel: invoke
k8scache.StartProcessorDiscovery(ctx, grpcClient, discoveryConfig) and if it
returns a non-nil error treat it as a fatal config/startup error (log and
return/exit), otherwise spawn the background goroutine or continue; reference
the existing goroutine invocation and variables grpcClient, discoveryConfig, and
k8scache.StartProcessorDiscovery when making the change.
- Around line 178-184: The shutdown path currently waits on sigChan but never
closes the Informers' stop channels, so informer watches keep running; after
detecting the shutdown signal (where sigChan is read in main) close the
Informers' stop channels (inf.stopChan and inf.mdStopChan) to signal informer
factories to stop — either defer closing them after InitFromConfig() returns or
explicitly call close(inf.stopChan) and close(inf.mdStopChan) immediately after
the <-sigChan read; reference the Informers instance returned by
InitFromConfig() (inf) so you close those two channels before exiting.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 409-484: reconnect currently holds pc.mu for the entire retry
loop, including the time.After backoff, which blocks other methods like
pc.getStream()/pc.cancelStream(); modify reconnect so it does not hold pc.mu
during the sleep: acquire lock only when reading/updating pc fields (e.g., set
pc.reconnectAttempts, check/close pc.conn, cancel pc.cancel, update
pc.conn/pc.stream/pc.cancel and pc.healthy), then release the lock before the
select time.After(backoff) and re-acquire it (or re-lock) only when you need to
touch pc again; keep the existing error handling and backoff updates but ensure
any access to pc fields is protected by pc.mu while the actual wait is done
without the lock to avoid blocking broadcastUpdate/getStream/cancelStream.

---

Nitpick comments:
In @.mk/development.mk:
- Around line 142-148: There is a missing undeploy-k8scache Make target to
mirror deploy-k8scache; add an undeploy-k8scache target that removes the
k8scache-specific Kubernetes resources (delete the deployment created from
/tmp/deployment-k8scache.yaml and the flowlogs-pipeline-configuration configmap)
and wait for rollout/termination as needed; ensure the new target name is
exactly undeploy-k8scache so local-deployments-cleanup-k8scache can call it, and
reuse the same identifiers used in deploy-k8scache (e.g., flowlogs-pipeline
configmap name and deploy/flowlogs-pipeline) so cleanup fully removes the
k8scache artifacts.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 236-252: The RemoveProcessor method currently uses defer
c.mu.Unlock() and also does an explicit c.mu.Unlock()/c.mu.Lock() around
pc.closeConnection(), creating a confusing double-unlock/re-lock window; change
the pattern to lock, look up pc, delete the map entry, then explicitly unlock
before calling pc.closeConnection() (i.e., remove the defer and the later
c.mu.Unlock()/c.mu.Lock()), so the map mutation (c.connections[address]
deletion) happens under c.mu and the potentially blocking pc.closeConnection()
runs without holding c.mu; reference Client.RemoveProcessor, c.mu,
c.connections, and pc.closeConnection when making the change.

In `@pkg/pipeline/transform/kubernetes/k8scache/discovery.go`:
- Around line 88-121: discoverAndConnect currently only adds processors and
never removes stale connections; update it to reconcile the client's processor
list with the current pod IPs: after listing pods build the set of desired
addresses (fmt.Sprintf("%s:%d", pod.Status.PodIP, cfg.ProcessorPort) for running
pods with PodIP) and then remove any client-side processors not in that set by
calling a remove method (e.g., Client.RemoveProcessor or a new method you add)
or by iterating client.processorConnections and closing/removing entries for
addresses absent from the desired set; keep the existing AddProcessor calls for
new addresses so discoverAndConnect performs full reconciliation.

In `@pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go`:
- Around line 41-60: Add unit tests for the event handler's OnAdd and OnUpdate
paths: create tests that call the handler's OnAdd with isInInitialList both true
and false to verify that SendAdd on mockClient is called (or suppressed)
appropriately, and add tests for OnUpdate covering both a real-change case (old
vs new ResourceMetaData differ) that should call SendUpdate and a no-change case
that should not; use the existing mockClient (methods SendAdd, SendUpdate,
SendDelete) and the same setup used for the OnDelete test (construct the handler
via NewEventHandler or the eventHandler type and assert
mockClient.addedMeta/updatedMeta contents and sendError behavior).

In `@pkg/pipeline/transform/kubernetes/k8scache/server.go`:
- Around line 67-81: The current loop uses select with a default and directly
calls stream.Recv(), which can lead to a tight/select-default pattern; change to
read from the stream in a dedicated goroutine that sends received updates and
errors into channels, then in the main loop select between ctx.Done() and those
channels (use symbols stream.Recv, ctx.Done, connectionID) so cancellation is
handled promptly without relying on the default case—on ctx.Done()
cancel/cleanup and on recv-channel handle io.EOF and errors the same way you
currently log using connectionID.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e49448c4-5beb-4126-83d4-0466d0974109

📥 Commits

Reviewing files that changed from the base of the PR and between de63e00 and ed2954b.

⛔ Files ignored due to path filters (2)

pkg/pipeline/transform/kubernetes/k8scache/k8scache.pb.go is excluded by !**/*.pb.go
pkg/pipeline/transform/kubernetes/k8scache/k8scache_grpc.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (24)

.mk/development.mk
Makefile
README.md
cmd/flowlogs-pipeline/main.go
cmd/flp-informers/main.go
contrib/docker/Dockerfile
contrib/kubernetes/deployment-flp-informers.yaml
contrib/kubernetes/deployment-k8scache.yaml
pkg/config/config.go
pkg/pipeline/transform/kubernetes/datasource/datasource.go
pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go
pkg/pipeline/transform/kubernetes/enrich.go
pkg/pipeline/transform/kubernetes/informers/cache_export.go
pkg/pipeline/transform/kubernetes/k8scache/client.go
pkg/pipeline/transform/kubernetes/k8scache/converter.go
pkg/pipeline/transform/kubernetes/k8scache/discovery.go
pkg/pipeline/transform/kubernetes/k8scache/discovery_test.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/server_test.go
proto/README.md
proto/k8scache.proto

✅ Files skipped from review due to trivial changes (6)

README.md
contrib/docker/Dockerfile
pkg/pipeline/transform/kubernetes/enrich.go
cmd/flowlogs-pipeline/main.go
contrib/kubernetes/deployment-flp-informers.yaml
pkg/pipeline/transform/kubernetes/k8scache/converter.go

🚧 Files skipped from review as they are similar to previous changes (6)

pkg/config/config.go
Makefile
pkg/pipeline/transform/kubernetes/informers/cache_export.go
contrib/kubernetes/deployment-k8scache.yaml
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go

coderabbitai · 2026-03-30T11:29:39Z

+	go func() {
+		if err := k8scache.StartProcessorDiscovery(ctx, grpcClient, discoveryConfig); err != nil {
+			log.WithError(err).Error("processor discovery stopped")
+		}
+	}()


⚠️ Potential issue | 🟡 Minor

Discovery failure is logged but not surfaced.

If StartProcessorDiscovery returns an error immediately (e.g., invalid config), it's only logged. Consider checking the error and failing startup for configuration errors vs. transient errors.

Proposed fix

+ // Use error channel to catch immediate failures + errCh := make(chan error, 1) go func() { - if err := k8scache.StartProcessorDiscovery(ctx, grpcClient, discoveryConfig); err != nil { - log.WithError(err).Error("processor discovery stopped") + errCh <- k8scache.StartProcessorDiscovery(ctx, grpcClient, discoveryConfig) + }() + + // Give discovery a moment to fail fast on config errors + select { + case err := <-errCh: + if err != nil && !errors.Is(err, context.Canceled) { + log.WithError(err).Fatal("processor discovery failed") } - }() + case <-time.After(2 * time.Second): + // Discovery started successfully, continue + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flp-informers/main.go` around lines 170 - 174, StartProcessorDiscovery is launched inside a goroutine so any immediate errors (e.g., invalid discoveryConfig) are only logged and not surfaced to startup; change this to detect and handle immediate failures by calling k8scache.StartProcessorDiscovery synchronously for the initial startup or by capturing the first error through a channel: invoke k8scache.StartProcessorDiscovery(ctx, grpcClient, discoveryConfig) and if it returns a non-nil error treat it as a fatal config/startup error (log and return/exit), otherwise spawn the background goroutine or continue; reference the existing goroutine invocation and variables grpcClient, discoveryConfig, and k8scache.StartProcessorDiscovery when making the change.

coderabbitai · 2026-03-30T11:29:39Z

+	// Wait for shutdown signal
+	sigChan := make(chan os.Signal, 1)
+	signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
+	<-sigChan
+
+	log.Info("Shutdown signal received, stopping...")
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if Informers has a Stop method ast-grep --pattern 'func ($_ *Informers) Stop() { $$$ }'

Repository: netobserv/flowlogs-pipeline

Length of output: 53

🏁 Script executed:

cat -n cmd/flp-informers/main.go | sed -n '130,190p'

Repository: netobserv/flowlogs-pipeline

Length of output: 2332

🏁 Script executed:

# Find where Informers is defined rg 'type Informers' --type go -A 5

Repository: netobserv/flowlogs-pipeline

Length of output: 635

🏁 Script executed:

# Check if there's a Stop method on any informers-related types rg 'func.*Stop\(' --type go | grep -i informer

Repository: netobserv/flowlogs-pipeline

Length of output: 53

🏁 Script executed:

# Check for any cleanup/close methods on Informers rg 'func.*Informers.*\(' --type go | grep -E '(Close|Release|Shutdown|Stop|Clean)'

Repository: netobserv/flowlogs-pipeline

Length of output: 53

🏁 Script executed:

# Look at the full Informers struct and its methods cat -n pkg/pipeline/transform/kubernetes/informers/informers.go | head -100

Repository: netobserv/flowlogs-pipeline

Length of output: 3756

🏁 Script executed:

# Check what the Interface type is rg 'type Interface interface' --type go -A 20 pkg/pipeline/transform/kubernetes/informers/

Repository: netobserv/flowlogs-pipeline

Length of output: 2045

🏁 Script executed:

# Check how stopChan and mdStopChan are used rg 'stopChan|mdStopChan' --type go pkg/pipeline/transform/kubernetes/informers/informers.go

Repository: netobserv/flowlogs-pipeline

Length of output: 386

🏁 Script executed:

# Look for any goroutine starts or stoppage logic rg 'go func|stopChan|for.*stopChan' --type go -B 2 -A 2 pkg/pipeline/transform/kubernetes/informers/informers.go

Repository: netobserv/flowlogs-pipeline

Length of output: 554

🏁 Script executed:

# Search for where stopChan or mdStopChan are closed rg 'close$.*stopChan$' --type go

Repository: netobserv/flowlogs-pipeline

Length of output: 149

🏁 Script executed:

# Search across entire codebase for any Stop method or cleanup on Informers rg 'func.*\*Informers.*\(' --type go pkg/pipeline/transform/kubernetes/informers/informers.go | head -20

Repository: netobserv/flowlogs-pipeline

Length of output: 1827

🏁 Script executed:

# Check the entire informers.go file length wc -l pkg/pipeline/transform/kubernetes/informers/informers.go

Repository: netobserv/flowlogs-pipeline

Length of output: 131

Close the Informers stop channels on shutdown to stop watches and release resources.

The stopChan and mdStopChan created during InitFromConfig() are never closed. Closing these channels signals the underlying informer factories to stop their watches. Add the following before the function returns:

defer func() { close(inf.stopChan) close(inf.mdStopChan) }()

Or close them explicitly when receiving the shutdown signal at line 181.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flp-informers/main.go` around lines 178 - 184, The shutdown path currently waits on sigChan but never closes the Informers' stop channels, so informer watches keep running; after detecting the shutdown signal (where sigChan is read in main) close the Informers' stop channels (inf.stopChan and inf.mdStopChan) to signal informer factories to stop — either defer closing them after InitFromConfig() returns or explicitly call close(inf.stopChan) and close(inf.mdStopChan) immediately after the <-sigChan read; reference the Informers instance returned by InitFromConfig() (inf) so you close those two channels before exiting.

openshift-ci-robot · 2026-03-30T12:02:32Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

New Features

Real-time Kubernetes cache sync (k8scache) with a new flp-informers component to forward incremental resource updates; CLI flags and TLS options to enable/configure the k8scache server and client.

Deployment tooling: Docker image now includes flp-informers; Kubernetes manifests, Make targets, and local deploy/undeploy flows for k8scache, flp-informers, and related services.

Documentation

Updated CLI help output and proto regeneration instructions for k8scache.

Tests

New unit and integration tests covering k8scache protocol, discovery, event handling, and end-to-end flows.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (2)

cmd/flp-informers/main.go (1)
192-196: ⚠️ Potential issue | 🟠 Major

Fail fast if processor discovery can't start.

StartProcessorDiscovery returns immediate config errors, but this goroutine only logs them. A bad kubeconfig or non-positive resync interval leaves flp-informers running with no processor connections. Treat non-context.Canceled startup errors as fatal.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flp-informers/main.go` around lines 192 - 196, The anonymous goroutine
that calls StartProcessorDiscovery currently only logs errors; change it to
treat startup errors (except context.Canceled) as fatal so the process fails
fast: after calling k8scache.StartProcessorDiscovery(ctx, grpcClient,
discoveryConfig), if err != nil and !errors.Is(err, context.Canceled) then log
the error and exit (e.g., log.WithError(err).Fatal or os.Exit(1)); keep the
existing non-fatal handling for context.Canceled. Ensure you reference
StartProcessorDiscovery, ctx, grpcClient and discoveryConfig and import/use
errors/context as needed.
pkg/pipeline/transform/kubernetes/k8scache/client.go (1)
131-133: ⚠️ Potential issue | 🟠 Major

Set TLS MinVersion to prevent protocol downgrade attacks.

The tls.Config lacks MinVersion, defaulting to TLS 1.0 for servers. Explicitly set MinVersion: tls.VersionTLS12 (or tls.VersionTLS13 for stronger security) as flagged by static analysis.
Proposed fix
 	tlsConfig := &tls.Config{
 		InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify,
+		MinVersion:         tls.VersionTLS12,
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 131 - 133,
The tls.Config constructed in the client (variable tlsConfig) currently only
sets InsecureSkipVerify from c.tlsConfig; update the config to include a
MinVersion to prevent protocol downgrade attacks by setting MinVersion:
tls.VersionTLS12 (or tls.VersionTLS13 for stricter security). Locate the
tlsConfig creation in the client.go code path that uses c.tlsConfig, add the
MinVersion field to the tls.Config literal, and ensure any related tests or
lints reflect the new minimum TLS version.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/flowlogs-pipeline/main.go`:
- Around line 241-243: The shutdown currently calls grpcServer.GracefulStop()
with no timeout which can hang because StreamUpdates is long-lived; modify
shutdown to perform a bounded graceful stop: start GracefulStop() in a
goroutine, wait for it to finish with a timeout (e.g., time.After or
context.WithTimeout), and if the timeout elapses call grpcServer.Stop() to
force-close active streams; locate the grpcServer usage in main.go and update
the shutdown block around grpcServer.GracefulStop() to implement the
timeout/fallback to Stop(), referencing grpcServer, GracefulStop, Stop, and the
StreamUpdates RPC.

In `@cmd/flp-informers/main.go`:
- Around line 163-164: The informer bootstrap currently constructs
api.NetworkTransformKubeConfig as an empty struct and calls
informers.NewConfig(apiConfig), which forces defaults and diverges from
processor-side settings; instead obtain the real NetworkTransformKubeConfig used
by the pipeline (e.g., load it from the same CLI/config source or shared
initializer) and pass that populated config to informers.NewConfig so managed
CNI, secondary networks and tracked kinds match processor enrichment (replace
the empty api.NetworkTransformKubeConfig{} used when creating infConfig).

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 219-223: The timeout cleanup calls stream.CloseSend() without
handling its error; modify the timeout handling block to capture the return
(e.g. err := stream.CloseSend()) and either log or explicitly ignore it (for
example log using the existing logger/processLogger with context like
"stream.CloseSend failed" if err != nil, or assign to _ to document intentional
ignore). Locate the cleanup where stream.CloseSend() is invoked and replace the
bare call with the error-aware version so the CloseSend error is handled or
explicitly ignored.
- Around line 272-289: The RemoveProcessor method currently defers c.mu.Unlock()
and also manually unlocks and re-locks c.mu, causing a double-unlock panic;
remove the defer and manage the mutex manually in RemoveProcessor: call
c.mu.Lock() at the top, look up pc := c.connections[address], if it doesn't
exist call c.mu.Unlock() and return, otherwise delete(c.connections, address),
then call c.mu.Unlock() before invoking pc.closeConnection(); do not re-lock
after close so the mutex is unlocked exactly once. Reference:
Client.RemoveProcessor, c.mu, c.connections, pc.closeConnection.

In `@pkg/pipeline/transform/kubernetes/k8scache/discovery.go`:
- Around line 114-118: Discovery currently dials
pod.Status.PodIP:cfg.ProcessorPort (address) which causes TLS verification to
fail because getTransportCredentials() (client.go) doesn't set
tls.Config.ServerName and Go will validate the IP against certificate SANs;
update the flow so that either (A) the discovery code (where address is built
and passed to client.AddProcessorWithTimeout) uses a DNS name that matches the
processor certificate (e.g., construct host like "<podHostname>.<namespace>..."
or a configured ProcessorHost) instead of pod.Status.PodIP, or (B) set
tls.Config.ServerName inside getTransportCredentials() when constructing
transport credentials (use a configured expected server name or derive one from
the processor metadata) so the TLS handshake validates against the certificate
CN/SAN; modify address construction in discovery.go and/or set ServerName in
getTransportCredentials() accordingly.

In `@proto/README.md`:
- Around line 15-16: The README's protoc examples use paths=source_relative but
pass ./proto/k8scache.proto which causes generated files to be nested under a
proto/ subdirectory; update the commands to add an import root and use the bare
filename (e.g., add -I ./proto and use k8scache.proto as input) so that the
outputs for the protoc --go_out and --go-grpc_out invocations land directly in
pkg/pipeline/transform/kubernetes/k8scache/ instead of pkg/.../k8scache/proto/.

---

Duplicate comments:
In `@cmd/flp-informers/main.go`:
- Around line 192-196: The anonymous goroutine that calls
StartProcessorDiscovery currently only logs errors; change it to treat startup
errors (except context.Canceled) as fatal so the process fails fast: after
calling k8scache.StartProcessorDiscovery(ctx, grpcClient, discoveryConfig), if
err != nil and !errors.Is(err, context.Canceled) then log the error and exit
(e.g., log.WithError(err).Fatal or os.Exit(1)); keep the existing non-fatal
handling for context.Canceled. Ensure you reference StartProcessorDiscovery,
ctx, grpcClient and discoveryConfig and import/use errors/context as needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 131-133: The tls.Config constructed in the client (variable
tlsConfig) currently only sets InsecureSkipVerify from c.tlsConfig; update the
config to include a MinVersion to prevent protocol downgrade attacks by setting
MinVersion: tls.VersionTLS12 (or tls.VersionTLS13 for stricter security). Locate
the tlsConfig creation in the client.go code path that uses c.tlsConfig, add the
MinVersion field to the tls.Config literal, and ensure any related tests or
lints reflect the new minimum TLS version.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5022cd27-6a5c-470b-b91f-2f54ec42500d

📥 Commits

Reviewing files that changed from the base of the PR and between ed2954b and 1117b70.

⛔ Files ignored due to path filters (2)

pkg/pipeline/transform/kubernetes/k8scache/k8scache.pb.go is excluded by !**/*.pb.go
pkg/pipeline/transform/kubernetes/k8scache/k8scache_grpc.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (24)

.mk/development.mk
Makefile
README.md
cmd/flowlogs-pipeline/main.go
cmd/flp-informers/main.go
contrib/docker/Dockerfile
contrib/kubernetes/deployment-flp-informers.yaml
contrib/kubernetes/deployment-k8scache.yaml
pkg/config/config.go
pkg/pipeline/transform/kubernetes/datasource/datasource.go
pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go
pkg/pipeline/transform/kubernetes/enrich.go
pkg/pipeline/transform/kubernetes/informers/cache_export.go
pkg/pipeline/transform/kubernetes/k8scache/client.go
pkg/pipeline/transform/kubernetes/k8scache/converter.go
pkg/pipeline/transform/kubernetes/k8scache/discovery.go
pkg/pipeline/transform/kubernetes/k8scache/discovery_test.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/server_test.go
proto/README.md
proto/k8scache.proto

✅ Files skipped from review due to trivial changes (7)

README.md
contrib/docker/Dockerfile
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go
pkg/pipeline/transform/kubernetes/k8scache/converter.go
contrib/kubernetes/deployment-flp-informers.yaml

🚧 Files skipped from review as they are similar to previous changes (8)

Makefile
pkg/pipeline/transform/kubernetes/enrich.go
pkg/config/config.go
pkg/pipeline/transform/kubernetes/informers/cache_export.go
.mk/development.mk
contrib/kubernetes/deployment-k8scache.yaml
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go

coderabbitai · 2026-03-30T12:15:30Z

+	if grpcServer != nil {
+		log.Info("stopping K8s cache sync server")
+		grpcServer.GracefulStop()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the main.go file around lines 241-243 find . -name main.go -path "*/cmd/flowlogs-pipeline/*" | head -1

Repository: netobserv/flowlogs-pipeline

Length of output: 102

🏁 Script executed:

# Get context around the shutdown code cat -n cmd/flowlogs-pipeline/main.go | sed -n '230,260p'

Repository: netobserv/flowlogs-pipeline

Length of output: 1253

🏁 Script executed:

# Find the gRPC service definition and StreamUpdates implementation find . -type f -name "*.go" | xargs grep -l "StreamUpdates" | head -5

Repository: netobserv/flowlogs-pipeline

Length of output: 370

🏁 Script executed:

# Check for any timeout configurations related to gRPC server rg "GracefulStop|time\.After|context" cmd/flowlogs-pipeline/ -A 2 -B 2

Repository: netobserv/flowlogs-pipeline

Length of output: 889

🏁 Script executed:

# Examine the gRPC service definition for StreamUpdates cat -n pkg/pipeline/transform/kubernetes/k8scache/k8scache_grpc.pb.go | grep -A 20 "StreamUpdates"

Repository: netobserv/flowlogs-pipeline

Length of output: 7670

🏁 Script executed:

# Also check the server.go implementation cat -n pkg/pipeline/transform/kubernetes/k8scache/server.go

Repository: netobserv/flowlogs-pipeline

Length of output: 5814

Guard GracefulStop() with a timeout.

StreamUpdates is a long-lived bidirectional streaming RPC. Waiting indefinitely for active streams to drain can hang shutdown; use a bounded wait and fall back to Stop().

Suggested change

if grpcServer != nil { log.Info("stopping K8s cache sync server") - grpcServer.GracefulStop() + stopped := make(chan struct{}) + go func() { + grpcServer.GracefulStop() + close(stopped) + }() + select { + case <-stopped: + case <-time.After(5 * time.Second): + log.Warn("timed out waiting for K8s cache sync streams to drain; forcing stop") + grpcServer.Stop() + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if grpcServer != nil {

log.Info("stopping K8s cache sync server")

grpcServer.GracefulStop()

if grpcServer != nil {

log.Info("stopping K8s cache sync server")

stopped := make(chan struct{})

go func() {

grpcServer.GracefulStop()

close(stopped)

}()

select {

case <-stopped:

case <-time.After(5 * time.Second):

log.Warn("timed out waiting for K8s cache sync streams to drain; forcing stop")

grpcServer.Stop()

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flowlogs-pipeline/main.go` around lines 241 - 243, The shutdown currently calls grpcServer.GracefulStop() with no timeout which can hang because StreamUpdates is long-lived; modify shutdown to perform a bounded graceful stop: start GracefulStop() in a goroutine, wait for it to finish with a timeout (e.g., time.After or context.WithTimeout), and if the timeout elapses call grpcServer.Stop() to force-close active streams; locate the grpcServer usage in main.go and update the shutdown block around grpcServer.GracefulStop() to implement the timeout/fallback to Stop(), referencing grpcServer, GracefulStop, Stop, and the StreamUpdates RPC.

coderabbitai · 2026-03-30T12:15:31Z

+	apiConfig := &api.NetworkTransformKubeConfig{} // Empty config - will use defaults
+	infConfig := informers.NewConfig(apiConfig)


⚠️ Potential issue | 🟠 Major

Don't bootstrap central informers with an empty Kubernetes config.

informers.NewConfig derives managed CNI, secondary networks, and tracked kinds from api.NetworkTransformKubeConfig. Passing {} here hardcodes defaults, so centralized cache mode diverges from processor-side enrichment whenever the pipeline uses non-default Kubernetes settings.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flp-informers/main.go` around lines 163 - 164, The informer bootstrap currently constructs api.NetworkTransformKubeConfig as an empty struct and calls informers.NewConfig(apiConfig), which forces defaults and diverges from processor-side settings; instead obtain the real NetworkTransformKubeConfig used by the pipeline (e.g., load it from the same CLI/config source or shared initializer) and pass that populated config to informers.NewConfig so managed CNI, secondary networks and tracked kinds match processor enrichment (replace the empty api.NetworkTransformKubeConfig{} used when creating infConfig).

coderabbitai · 2026-03-30T12:15:31Z

+$ protoc --go_out=./pkg/pipeline/transform/kubernetes/k8scache --go_opt=paths=source_relative ./proto/k8scache.proto
+$ protoc --go-grpc_out=./pkg/pipeline/transform/kubernetes/k8scache --go-grpc_opt=paths=source_relative ./proto/k8scache.proto


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and show the actual content cat -n proto/README.md | head -20

Repository: netobserv/flowlogs-pipeline

Length of output: 821

🏁 Script executed:

# Look for the actual .pb.go files to understand the current layout find . -name "*k8scache*.pb.go" -type f

Repository: netobserv/flowlogs-pipeline

Length of output: 195

🏁 Script executed:

# Check the proto directory structure ls -la proto/ | grep -i k8s

Repository: netobserv/flowlogs-pipeline

Length of output: 135

🏁 Script executed:

# Check protoc documentation for paths=source_relative behavior web_search "protoc paths=source_relative option behavior documentation"

Repository: netobserv/flowlogs-pipeline

Length of output: 121

🏁 Script executed:

# Search for any existing protoc invocations in the codebase to see patterns rg "protoc.*--go_out" --type bash --type md -A 1 -B 1

Repository: netobserv/flowlogs-pipeline

Length of output: 99

🏁 Script executed:

# Check if there's a Makefile or build script that might show correct protoc usage find . -name "Makefile" -o -name "*.mk" -o -name "build.sh" | head -5

Repository: netobserv/flowlogs-pipeline

Length of output: 188

🏁 Script executed:

# Check the Makefile for protoc usage patterns rg "protoc" Makefile -A 2 -B 2

Repository: netobserv/flowlogs-pipeline

Length of output: 53

🏁 Script executed:

# Also check development.mk rg "protoc" .mk/development.mk -A 2 -B 2

Repository: netobserv/flowlogs-pipeline

Length of output: 53

🌐 Web query:

protoc paths=source_relative option behavior go output

💡 Result:

The --go_opt=paths=source_relative option (or --go_out=paths=source_relative,...) in the protoc command for Go code generation controls the directory structure of the output .pb.go files relative to the --go_out directory. By default (or with paths=import), output mirrors the Go package import path from the go_package option in the .proto file. With paths=source_relative, the output file is placed in the same relative directory as the input .proto file (after stripping --proto_path prefixes). For example: protoc --proto_path=src --go_out=out --go_opt=paths=source_relative src/foo.proto src/bar/baz.proto This generates: - out/foo.pb.go - out/bar/baz.pb.go The compiler creates nested subdirectories as needed within the --go_out base directory but does not create the base output directory itself. This option is specified via --go_opt for protoc-gen-go (for .pb.go files). Use --go-grpc_opt=paths=source_relative similarly for protoc-gen-go-grpc (for _grpc.pb.go files). Other modes: - module=$PREFIX: Strips $PREFIX from the import path before placing in output. - Without paths=...: Defaults to paths=import. Requires go_package option in every .proto file for proper import paths.

Citations:

1: https://protobuf.dev/reference/go/go-generated-opaque/

2: https://protobuf.dev/reference/go/go-generated/

3: https://stackoverflow.com/questions/70731053/protoc-go-opt-paths-source-relative-vs-go-grpc-opt-paths-source-relative

4: https://stackoverflow.com/questions/72816024/protobuf-with-go-not-able-to-compile-to-the-desired-folder-with-paths-source-rel/72816972

These commands regenerate into the wrong path.

With paths=source_relative, the input path ./proto/k8scache.proto preserves the proto/ prefix in the output directory, placing files under pkg/pipeline/transform/kubernetes/k8scache/proto/ instead of directly in pkg/pipeline/transform/kubernetes/k8scache/. Use -I ./proto with just k8scache.proto as input so contributors regenerate into the current package.

Suggested doc fix

-$ protoc --go_out=./pkg/pipeline/transform/kubernetes/k8scache --go_opt=paths=source_relative ./proto/k8scache.proto -$ protoc --go-grpc_out=./pkg/pipeline/transform/kubernetes/k8scache --go-grpc_opt=paths=source_relative ./proto/k8scache.proto +$ protoc -I ./proto --go_out=./pkg/pipeline/transform/kubernetes/k8scache --go_opt=paths=source_relative k8scache.proto +$ protoc -I ./proto --go-grpc_out=./pkg/pipeline/transform/kubernetes/k8scache --go-grpc_opt=paths=source_relative k8scache.proto

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

$ protoc --go_out=./pkg/pipeline/transform/kubernetes/k8scache --go_opt=paths=source_relative ./proto/k8scache.proto

$ protoc --go-grpc_out=./pkg/pipeline/transform/kubernetes/k8scache --go-grpc_opt=paths=source_relative ./proto/k8scache.proto

$ protoc -I ./proto --go_out=./pkg/pipeline/transform/kubernetes/k8scache --go_opt=paths=source_relative k8scache.proto

$ protoc -I ./proto --go-grpc_out=./pkg/pipeline/transform/kubernetes/k8scache --go-grpc_opt=paths=source_relative k8scache.proto

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 15-15: Dollar signs used before commands without showing output

(MD014, commands-show-output)

[warning] 16-16: Dollar signs used before commands without showing output

(MD014, commands-show-output)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@proto/README.md` around lines 15 - 16, The README's protoc examples use paths=source_relative but pass ./proto/k8scache.proto which causes generated files to be nested under a proto/ subdirectory; update the commands to add an import root and use the bare filename (e.g., add -I ./proto and use k8scache.proto as input) so that the outputs for the protoc --go_out and --go-grpc_out invocations land directly in pkg/pipeline/transform/kubernetes/k8scache/ instead of pkg/.../k8scache/proto/.

leandroberetta · 2026-03-31T14:25:55Z

@coderabbitai review

coderabbitai · 2026-03-31T14:26:05Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

leandroberetta · 2026-03-31T14:26:50Z

@coderabbitai re-review

coderabbitai · 2026-03-31T14:27:04Z

@leandroberetta: Triggering a full re-review of the PR now.

✅ Actions performed

Full review triggered.

openshift-ci-robot · 2026-03-31T14:30:53Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

New Features

Real-time Kubernetes cache sync (k8scache) with a new flp-informers component, CLI flags and TLS options to enable/configure k8scache client/server, and local/kind deployment flows and Make targets to deploy/undeploy k8scache and flp-informers.

Docker image now includes flp-informers; Kubernetes manifests and namespace-aware deploy/cleanup flows added.

Documentation

Updated CLI help and proto regeneration instructions for k8scache.

Tests

New unit and integration tests covering k8scache protocol, discovery, event handling, and end-to-end flows.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (4)

cmd/flp-informers/main.go (2)
203-207: ⚠️ Potential issue | 🟠 Major

Surface the first discovery error instead of burying it in the goroutine.

StartProcessorDiscovery fails fast for invalid config and kubeconfig setup, but this path only logs the error and keeps the process alive with no processor connections. Treat the first returned error as startup failure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flp-informers/main.go` around lines 203 - 207, Start treating
StartProcessorDiscovery failures as startup errors instead of silently logging
them in the goroutine: remove the fire-and-forget goroutine and invoke or run
StartProcessorDiscovery such that its first returned error is propagated to main
(e.g., call StartProcessorDiscovery synchronously or start it in a goroutine but
send its error on a channel and block on that channel during startup); when
StartProcessorDiscovery(ctx, grpcClient, discoveryConfig) returns a non-nil
error, log it and return/exit the process to fail fast. Ensure you reference
StartProcessorDiscovery, grpcClient and discoveryConfig when changing the call
site so the initial discovery error is surfaced.
173-174: ⚠️ Potential issue | 🟠 Major

Don't bootstrap centralized informers from an empty kube transform config.

pkg/pipeline/transform/kubernetes/informers/informers.go:362-431 uses this config for secondary-network extraction and ownership tracking. Hard-coding {} here makes the exported cache diverge from processor-side enrichment whenever the pipeline sets non-default Kubernetes options.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flp-informers/main.go` around lines 173 - 174, Currently the informers
are being bootstrapped with an empty api.NetworkTransformKubeConfig
(api.NetworkTransformKubeConfig{}) which diverges from the pipeline's transform
settings; instead locate and pass the same NetworkTransformKubeConfig instance
the pipeline uses into informers.NewConfig (i.e., replace the hard-coded
&api.NetworkTransformKubeConfig{} with the pipeline's existing config object or
a shared variable), so NewConfig and the informers use identical Kubernetes
options as the processor-side transform/ownership logic.
cmd/flowlogs-pipeline/main.go (1)
241-243: ⚠️ Potential issue | 🟠 Major

Bound GracefulStop() with a timeout.

StreamUpdates is long-lived, so shutdown can block forever here if a client never drains. Fall back to Stop() after a short deadline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flowlogs-pipeline/main.go` around lines 241 - 243, The current shutdown
calls grpcServer.GracefulStop() which can block indefinitely because
StreamUpdates may never return; modify the shutdown to attempt a graceful stop
with a timeout and fall back to grpcServer.Stop() if the timeout elapses.
Implement this by launching a goroutine that calls grpcServer.GracefulStop() and
closes a done channel when finished, then in the main goroutine select on that
done channel vs time.After(a short deadline, e.g., 5s) and call
grpcServer.Stop() if the deadline fires; keep the existing nil check on
grpcServer and reference grpcServer, GracefulStop, Stop, and StreamUpdates in
your changes.
pkg/pipeline/transform/kubernetes/k8scache/client.go (1)
135-137: ⚠️ Potential issue | 🟡 Minor

Set TLS minimum protocol version explicitly.

tls.Config does not set MinVersion. Set a floor (tls.VersionTLS12 or tls.VersionTLS13) to avoid protocol downgrade risk.
Suggested fix
 	tlsConfig := &tls.Config{
 		InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify,
+		MinVersion:         tls.VersionTLS12,
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 135 - 137,
The TLS config currently sets only InsecureSkipVerify in the tls.Config
initialization; explicitly set a minimum TLS version to prevent protocol
downgrades by adding MinVersion: tls.VersionTLS12 (or tls.VersionTLS13) to the
tlsConfig struct initialization (i.e., update the tlsConfig := &tls.Config{
InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify, MinVersion: tls.VersionTLS12
}).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/flp-informers/main.go`:
- Around line 184-189: The informer currently registers NewEventHandler but
never feeds it the informer’s existing objects because OnAdd in
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go ignores
isInInitialList=true; fix by, immediately after calling
inf.AddEventHandler(handler) (and before logging registration success),
enumerate the informer’s current store (via the informer’s GetStore()/List() or
Indexer) and invoke the handler’s OnAdd for each existing object with
isInInitialList=false (or call the handler’s public add method) so the
KubernetesStore is seeded with the initial snapshot; reference NewEventHandler,
AddEventHandler and OnAdd to locate the code to change.

In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go`:
- Around line 77-90: The loops in KubernetesStore.Replace (and the similar loops
that call storeKey at other update paths) may dereference nil entries and panic;
before calling storeKey or otherwise using meta, add a nil check (skip the
iteration if meta == nil) so malformed cache update slices are ignored, and
still call addToIndexes/removeFromIndexes only for non-nil metas; update the
loops in Replace and the other update/remove loops that reference storeKey,
addToIndexes, or removeFromIndexes to guard against nil entries.
- Around line 23-24: The byIP map currently violates the "first match wins"
contract because writers unconditionally overwrite entries and deletions remove
keys regardless of owner; update the code that writes to byIP so it does not
overwrite an existing entry (if byIP[ip] != nil, skip storing the new
ResourceMetaData), and change the deletion logic so it only removes a key when
the stored ResourceMetaData matches the resource being deleted (compare a stable
identifier on model.ResourceMetaData such as UID/ID or another unique field
rather than pointer equality). Ensure all places that assign to or delete from
byIP follow these two rules so the first-matched owner is preserved and deletes
only remove the intended owner.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 364-372: The sendUpdate function can succeed after shutdown
because the send case may win the select even if c.ctx.Done() is already closed;
before attempting to enqueue, check the context and fail fast (e.g., if
c.ctx.Err() != nil return c.ctx.Err()), or perform a preliminary select on
c.ctx.Done() to return its error, then proceed to the existing select for
sending to c.updateChan (keeping the timeout case); reference sendUpdate,
c.updateChan, and c.ctx.Done()/c.ctx.Err() when applying the change so
Stop()-initiated cancellation is always honored.
- Around line 443-455: The send error branch in the select (case err := <-done)
currently only logs the error and sets pc.healthy false, which can leave the
stream open; modify that branch to also force teardown by calling
pc.cancelStream() so blocked Recv() calls unblock and trigger reconnection;
update the block handling err != nil in the same select where sendCtx and done
are used (reference: the select handling case err := <-done,
pc.healthy.Store(false), sendCtx.Done(), and the pc.cancelStream() method) to
call pc.cancelStream() after setting healthy to false and logging.

---

Duplicate comments:
In `@cmd/flowlogs-pipeline/main.go`:
- Around line 241-243: The current shutdown calls grpcServer.GracefulStop()
which can block indefinitely because StreamUpdates may never return; modify the
shutdown to attempt a graceful stop with a timeout and fall back to
grpcServer.Stop() if the timeout elapses. Implement this by launching a
goroutine that calls grpcServer.GracefulStop() and closes a done channel when
finished, then in the main goroutine select on that done channel vs time.After(a
short deadline, e.g., 5s) and call grpcServer.Stop() if the deadline fires; keep
the existing nil check on grpcServer and reference grpcServer, GracefulStop,
Stop, and StreamUpdates in your changes.

In `@cmd/flp-informers/main.go`:
- Around line 203-207: Start treating StartProcessorDiscovery failures as
startup errors instead of silently logging them in the goroutine: remove the
fire-and-forget goroutine and invoke or run StartProcessorDiscovery such that
its first returned error is propagated to main (e.g., call
StartProcessorDiscovery synchronously or start it in a goroutine but send its
error on a channel and block on that channel during startup); when
StartProcessorDiscovery(ctx, grpcClient, discoveryConfig) returns a non-nil
error, log it and return/exit the process to fail fast. Ensure you reference
StartProcessorDiscovery, grpcClient and discoveryConfig when changing the call
site so the initial discovery error is surfaced.
- Around line 173-174: Currently the informers are being bootstrapped with an
empty api.NetworkTransformKubeConfig (api.NetworkTransformKubeConfig{}) which
diverges from the pipeline's transform settings; instead locate and pass the
same NetworkTransformKubeConfig instance the pipeline uses into
informers.NewConfig (i.e., replace the hard-coded
&api.NetworkTransformKubeConfig{} with the pipeline's existing config object or
a shared variable), so NewConfig and the informers use identical Kubernetes
options as the processor-side transform/ownership logic.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 135-137: The TLS config currently sets only InsecureSkipVerify in
the tls.Config initialization; explicitly set a minimum TLS version to prevent
protocol downgrades by adding MinVersion: tls.VersionTLS12 (or tls.VersionTLS13)
to the tlsConfig struct initialization (i.e., update the tlsConfig :=
&tls.Config{ InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify, MinVersion:
tls.VersionTLS12 }).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: db98a847-8104-4140-be80-e8c51387672b

📥 Commits

Reviewing files that changed from the base of the PR and between ed2954b and 195bf1c.

⛔ Files ignored due to path filters (2)

pkg/pipeline/transform/kubernetes/k8scache/k8scache.pb.go is excluded by !**/*.pb.go
pkg/pipeline/transform/kubernetes/k8scache/k8scache_grpc.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (24)

.mk/development.mk
Makefile
README.md
cmd/flowlogs-pipeline/main.go
cmd/flp-informers/main.go
contrib/docker/Dockerfile
contrib/kubernetes/deployment-flp-informers.yaml
contrib/kubernetes/deployment-k8scache.yaml
pkg/config/config.go
pkg/pipeline/transform/kubernetes/datasource/datasource.go
pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go
pkg/pipeline/transform/kubernetes/enrich.go
pkg/pipeline/transform/kubernetes/informers/cache_export.go
pkg/pipeline/transform/kubernetes/k8scache/client.go
pkg/pipeline/transform/kubernetes/k8scache/converter.go
pkg/pipeline/transform/kubernetes/k8scache/discovery.go
pkg/pipeline/transform/kubernetes/k8scache/discovery_test.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/server_test.go
proto/README.md
proto/k8scache.proto

✅ Files skipped from review due to trivial changes (7)

README.md
contrib/docker/Dockerfile
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
contrib/kubernetes/deployment-flp-informers.yaml
pkg/pipeline/transform/kubernetes/k8scache/converter.go
contrib/kubernetes/deployment-k8scache.yaml
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go

🚧 Files skipped from review as they are similar to previous changes (8)

Makefile
pkg/pipeline/transform/kubernetes/enrich.go
pkg/pipeline/transform/kubernetes/k8scache/discovery_test.go
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/datasource/datasource.go
pkg/pipeline/transform/kubernetes/k8scache/server_test.go
.mk/development.mk

coderabbitai · 2026-03-31T14:41:32Z

+	// Setup informer event handlers to push updates via gRPC
+	handler := k8scache.NewEventHandler(grpcClient)
+	if err := inf.AddEventHandler(handler); err != nil {
+		log.WithError(err).Fatal("failed to add event handlers")
+	}
+	log.Info("Informer event handlers registered for cache sync")


⚠️ Potential issue | 🔴 Critical

Processors never get the initial cache contents.

This wires pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go:46-64, whose OnAdd drops isInInitialList=true, and the PR explicitly avoids snapshots. A fresh processor/informer pair therefore starts with an empty KubernetesStore until each existing object changes, which breaks enrichment for steady-state workloads.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flp-informers/main.go` around lines 184 - 189, The informer currently registers NewEventHandler but never feeds it the informer’s existing objects because OnAdd in pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go ignores isInInitialList=true; fix by, immediately after calling inf.AddEventHandler(handler) (and before logging registration success), enumerate the informer’s current store (via the informer’s GetStore()/List() or Indexer) and invoke the handler’s OnAdd for each existing object with isInInitialList=false (or call the handler’s public add method) so the KubernetesStore is seeded with the initial snapshot; reference NewEventHandler, AddEventHandler and OnAdd to locate the code to change.

coderabbitai · 2026-03-31T14:41:32Z

+	// index: IP -> meta (first match wins; pods may have multiple IPs)
+	byIP map[string]*model.ResourceMetaData


⚠️ Potential issue | 🟡 Minor

byIP does not implement the documented "first match wins" contract.

The comment on Line 23 says first match wins, but Line 65 overwrites any existing owner and Line 48 deletes the key outright. If two resources ever share an IP, updates/deletes can return the wrong object or drop the surviving mapping.

Also applies to: 47-49, 64-66

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go` around lines 23 - 24, The byIP map currently violates the "first match wins" contract because writers unconditionally overwrite entries and deletions remove keys regardless of owner; update the code that writes to byIP so it does not overwrite an existing entry (if byIP[ip] != nil, skip storing the new ResourceMetaData), and change the deletion logic so it only removes a key when the stored ResourceMetaData matches the resource being deleted (compare a stable identifier on model.ResourceMetaData such as UID/ID or another unique field rather than pointer equality). Ensure all places that assign to or delete from byIP follow these two rules so the first-matched owner is preserved and deletes only remove the intended owner.

coderabbitai · 2026-03-31T14:41:32Z

+func (s *KubernetesStore) Replace(entries []*model.ResourceMetaData) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	s.byKey = make(map[string]*model.ResourceMetaData)
+	s.byIP = make(map[string]*model.ResourceMetaData)
+	s.byNodeName = make(map[string]*model.ResourceMetaData)
+	s.bySecondaryKey = make(map[string]*model.ResourceMetaData)
+
+	for _, meta := range entries {
+		key := storeKey(meta)
+		s.byKey[key] = meta
+		s.addToIndexes(meta)
+	}


⚠️ Potential issue | 🟠 Major

Guard nil entries before calling storeKey.

addToIndexes/removeFromIndexes already tolerate nil, but these loops dereference meta first. A malformed cache update containing a nil entry will panic the processor instead of being skipped or rejected.

Also applies to: 94-105, 110-120

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go` around lines 77 - 90, The loops in KubernetesStore.Replace (and the similar loops that call storeKey at other update paths) may dereference nil entries and panic; before calling storeKey or otherwise using meta, add a nil check (skip the iteration if meta == nil) so malformed cache update slices are ignored, and still call addToIndexes/removeFromIndexes only for non-nil metas; update the loops in Replace and the other update/remove loops that reference storeKey, addToIndexes, or removeFromIndexes to guard against nil entries.

coderabbitai · 2026-03-31T14:41:32Z

+			case err := <-done:
+				if err != nil {
+					clog.WithError(err).WithField("address", pc.address).Error("failed to send update")
+					pc.healthy.Store(false)
+				}
+			case <-sendCtx.Done():
+				clog.WithField("address", pc.address).Error("send operation timed out")
+				pc.healthy.Store(false)
+
+				// Cancel the stream context to ensure any blocked operations are unblocked
+				// This will cause receiveFromProcessor to see an error and trigger reconnection
+				pc.cancelStream()
+			}


⚠️ Potential issue | 🟠 Major

Force stream teardown on send error path too.

At Line 443, send errors only set healthy=false. If Recv() doesn’t fail promptly, this connection can stay permanently skipped. Call pc.cancelStream() in the error branch as well.

Suggested fix

case err := <-done: if err != nil { clog.WithError(err).WithField("address", pc.address).Error("failed to send update") pc.healthy.Store(false) + // Force Recv to unblock so reconnect path can run + pc.cancelStream() } case <-sendCtx.Done(): clog.WithField("address", pc.address).Error("send operation timed out") pc.healthy.Store(false) // Cancel the stream context to ensure any blocked operations are unblocked // This will cause receiveFromProcessor to see an error and trigger reconnection pc.cancelStream() }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 443 - 455, The send error branch in the select (case err := <-done) currently only logs the error and sets pc.healthy false, which can leave the stream open; modify that branch to also force teardown by calling pc.cancelStream() so blocked Recv() calls unblock and trigger reconnection; update the block handling err != nil in the same select where sendCtx and done are used (reference: the select handling case err := <-done, pc.healthy.Store(false), sendCtx.Done(), and the pc.cancelStream() method) to call pc.cancelStream() after setting healthy to false and logging.

openshift-ci-robot · 2026-03-31T14:48:25Z

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Add k8scache: centralized K8s informers to reduce API load

This PR implements centralized Kubernetes metadata caching to reduce
K8s API load when running multiple FLP processor pods.

Architecture:

flp-informers (new binary): Single component that watches K8s API

flp-processors (flowlogs-pipeline): Receive cache updates via gRPC (port 9090)

KubernetesStore: Thread-safe in-memory cache for metadata

Incremental updates only (ADD/UPDATE/DELETE, no snapshots)

Deployment modes:

Standard (default): Each pod uses local informers (backward compatible)

K8scache: Use --k8scache.port=9090 to enable gRPC server

Benefits:

Reduces K8s API queries from N processors to 1 informer

Maintains consistency across all processors

Scales without increasing API load

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?

if so please describe it in PR description.

I have added thorough unit tests for the change.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

Summary by CodeRabbit

New Features

Real-time Kubernetes cache sync (k8scache) with a new flp-informers component, CLI flags and TLS options to enable/configure k8scache client/server, and local/kind deployment flows plus Make targets to deploy/undeploy k8scache and flp-informers.

Docker image now contains flp-informers; Kubernetes manifests and namespace-aware deploy/cleanup flows added.

Documentation

Updated CLI help and proto regeneration instructions for k8scache.

Tests

New unit and integration tests covering k8scache protocol, discovery, event handling, and end-to-end flows.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (11)

pkg/pipeline/transform/kubernetes/k8scache/server.go (1)

136-138: ⚠️ Potential issue | 🟠 Major

Snapshot semantics explicitly ignored—fresh processors start empty.

The comment documents this design choice. On restart, processors rebuild cache from incoming events. This means a processor restart during low-activity periods may have stale/missing data until resources change.

Consider implementing initial sync (snapshot) when SyncRequest.LastVersion == 0 so restarted processors can catch up immediately. The is_snapshot field and KubernetesStore.Replace exist but aren't wired in.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/server.go` around lines 136 - 138,
The comment notes snapshots are ignored causing restarts to miss state;
implement initial sync so when SyncRequest.LastVersion == 0 the server treats
the request as a snapshot: honor the is_snapshot flag and call
KubernetesStore.Replace with the full resource set instead of only applying
ADD/UPDATE events. Update the sync handling in server.go (the SyncRequest
processing code path) to detect LastVersion == 0 (or is_snapshot) and perform a
Replace on the store, populating version metadata and returning the appropriate
response so restarted processors get an immediate, consistent cache state.

cmd/flowlogs-pipeline/main.go (1)

241-244: ⚠️ Potential issue | 🟠 Major

Guard GracefulStop() with a timeout.

StreamUpdates is a long-lived bidirectional streaming RPC. Indefinite wait for active streams to drain can hang shutdown.

Suggested fix

 if grpcServer != nil {
 	log.Info("stopping K8s cache sync server")
-	grpcServer.GracefulStop()
+	stopped := make(chan struct{})
+	go func() {
+		grpcServer.GracefulStop()
+		close(stopped)
+	}()
+	select {
+	case <-stopped:
+	case <-time.After(5 * time.Second):
+		log.Warn("graceful stop timed out; forcing stop")
+		grpcServer.Stop()
+	}
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/flowlogs-pipeline/main.go` around lines 241 - 244, The shutdown currently
calls grpcServer.GracefulStop() with no timeout which can hang waiting on
long-lived StreamUpdates RPCs; fix by starting a goroutine that sleeps for a
bounded timeout (e.g., a configurable shutdownTimeout duration) and calls
grpcServer.Stop() to force-close if GracefulStop() doesn't finish in time, then
call grpcServer.GracefulStop() in the main shutdown path and log both the
graceful completion and the forced stop; reference grpcServer.GracefulStop(),
grpcServer.Stop(), and the StreamUpdates RPC so the timeout protects long-lived
streams.

pkg/pipeline/transform/kubernetes/k8scache/client.go (1)

134-137: ⚠️ Potential issue | 🟡 Minor

Set TLS MinVersion to prevent protocol downgrade.

Static analysis and past review both flag this. Without MinVersion, the default allows older TLS versions.

 	tlsConfig := &tls.Config{
 		InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify,
+		MinVersion:         tls.VersionTLS12,
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go` around lines 134 - 137,
The tls.Config created in client.go (variable tlsConfig with InsecureSkipVerify:
c.tlsConfig.InsecureSkipVerify) doesn't set MinVersion, allowing protocol
downgrade; update the tls.Config initialization in the code that builds
tlsConfig to set MinVersion to a safe minimum (e.g., tls.VersionTLS12 or use a
configured minimum from c.tlsConfig.MinVersion) so the client enforces a modern
TLS version and prevents downgrade attacks.

pkg/pipeline/transform/kubernetes/k8scache/integration_test.go (2)

113-116: ⚠️ Potential issue | 🟡 Minor

Missing SetKubernetesStore - cache mutations aren't tested.

Without store initialization, the gRPC path processes updates but doesn't verify they actually mutate the datasource. Add store setup for complete data path coverage.

 	// Setup
 	_, informers := inf.SetupStubs(testIPInfo, nil, testNodes)
 	ds := &datasource.Datasource{Informers: informers}
+	ds.SetKubernetesStore(datasource.NewKubernetesStore())
 	cacheServer := NewKubernetesCacheServer(ds)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go` around lines
113 - 116, The test never initializes the datasource store so mutations via the
gRPC path aren't validated; after creating ds :=
&datasource.Datasource{Informers: informers} call SetKubernetesStore on ds (or
use the existing helper that wires the store for kube objects) before creating
cacheServer with NewKubernetesCacheServer(ds) so updates applied through the
server mutate ds.Store and the test can assert expected changes; ensure you
reference and initialize the same store type used elsewhere in tests (e.g., the
Kubernetes store) so subsequent gRPC update calls actually persist to
datasource.Datasource.Store.

219-222: ⚠️ Potential issue | 🟡 Minor

Same issue: Missing SetKubernetesStore in multi-client test.

 	// Setup
 	_, informers := inf.SetupStubs(testIPInfo, nil, testNodes)
 	ds := &datasource.Datasource{Informers: informers}
+	ds.SetKubernetesStore(datasource.NewKubernetesStore())
 	cacheServer := NewKubernetesCacheServer(ds)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go` around lines
219 - 222, The multi-client integration test is missing a call to set the shared
store on the cache server; after creating the Datasource and cacheServer
(Datasource.Informers and NewKubernetesCacheServer), call SetKubernetesStore on
the cacheServer with the appropriate store instance (the informer store returned
by inf.SetupStubs or the Datasource's store) so the multi-client path uses the
same underlying store; update the test to obtain the store from inf.SetupStubs
(or Datasource) and pass it into cacheServer.SetKubernetesStore(...) before
running multi-client assertions.

cmd/flp-informers/main.go (3)

172-180: ⚠️ Potential issue | 🟠 Major

Empty NetworkTransformKubeConfig may cause configuration drift.

The informer is initialized with an empty config {}, using defaults. If processors use non-default Kubernetes settings (managed CNI, secondary networks, tracked kinds), the centralized informers won't track the same resources.

Consider loading the config from a shared source or making it configurable via CLI flags.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flp-informers/main.go` around lines 172 - 180, The code creates an empty
NetworkTransformKubeConfig (NetworkTransformKubeConfig{}) and passes it to
informers.NewConfig, which can cause informer/processor config drift; instead
load the real transformer kube config from the shared configuration or CLI
before creating the informers: parse the existing shared config or add CLI flags
to populate NetworkTransformKubeConfig, build infConfig via
informers.NewConfig(populatedConfig) and then call
inf.InitFromConfig(opts.Kubeconfig, &infConfig, opMetrics) so informers track
the same managed CNI/secondary networks/tracked kinds as processors (refer to
NetworkTransformKubeConfig, informers.NewConfig, informers.Informers, and
inf.InitFromConfig).
203-207: ⚠️ Potential issue | 🟡 Minor

Discovery startup errors are silently logged.

If StartProcessorDiscovery returns an error immediately (e.g., invalid config), it's only logged. Configuration errors should fail startup rather than silently continue with no discovery.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flp-informers/main.go` around lines 203 - 207, StartProcessorDiscovery
errors are only logged inside a goroutine which can let the process continue
without discovery; change the startup to surface failures: call
StartProcessorDiscovery in a way that captures its immediate return value (e.g.,
run it synchronously or send its error to a startup channel) and if err != nil
(from StartProcessorDiscovery(ctx, grpcClient, discoveryConfig)) then log the
error with log.WithError and stop startup by returning the error from main or
exiting non‑zero (or cancelling ctx and failing initialization). Ensure you
reference the existing symbols StartProcessorDiscovery, grpcClient,
discoveryConfig, ctx and log.WithError when implementing this propagation.
211-217: ⚠️ Potential issue | 🟠 Major

Informer stop channels are not closed on shutdown.

The Informers type has internal stopChan and mdStopChan that should be closed to stop watches and release resources. The current shutdown only cancels the context and stops the gRPC client.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/flp-informers/main.go` around lines 211 - 217, On shutdown, the code only
cancels context and stops the gRPC client but never closes the Informers
internal channels; update the shutdown handling (after receiving from sigChan
and before exiting) to close the Informers' stop channels (e.g.,
close(informers.stopChan) and close(informers.mdStopChan)) or call a safe stop
method on the Informers instance; ensure you guard against double-close (check
nil or use a Stop() on Informers that uses sync.Once) so watches are terminated
and resources released.

pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go (3)

114-120: ⚠️ Potential issue | 🟡 Minor

Same nil guard needed in Delete.

 	for _, meta := range entries {
+		if meta == nil {
+			continue
+		}
 		key := storeKey(meta)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go` around lines
114 - 120, In the Delete method loop over entries, add the same nil guard used
elsewhere to avoid panics when the store map is nil: before iterating check that
s != nil and s.byKey != nil (or return early if s.byKey == nil); then proceed to
compute key via storeKey(meta), check s.byKey[key], call
s.removeFromIndexes(existing) and delete(s.byKey, key) as before. Ensure you
reference the Delete method, s.byKey, storeKey, and removeFromIndexes when
making the change.

86-90: ⚠️ Potential issue | 🟡 Minor

Guard nil entries in Replace loop.

storeKey(meta) dereferences meta before addToIndexes can return early on nil. A malformed update slice with nil entries will panic.

 	for _, meta := range entries {
+		if meta == nil {
+			continue
+		}
 		key := storeKey(meta)
 		s.byKey[key] = meta
 		s.addToIndexes(meta)
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go` around lines
86 - 90, The Replace loop can panic when entries contain nil because
storeKey(meta) dereferences meta before addToIndexes can guard; update the loop
in the Replace implementation to check for nil first (e.g., if meta == nil {
continue }) and skip calling storeKey(meta), s.byKey[...] and
s.addToIndexes(meta) for nil items so malformed update slices do not cause a
panic; ensure the nil check is the first operation inside the loop covering
calls to storeKey and addToIndexes.

98-105: ⚠️ Potential issue | 🟡 Minor

Same nil guard needed in AddOrUpdate.

 	for _, meta := range entries {
+		if meta == nil {
+			continue
+		}
 		key := storeKey(meta)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go` around lines
98 - 105, Add a nil guard in AddOrUpdate: ensure the receiver and the byKey map
are non-nil and return early if entries is nil; specifically, at the start of
AddOrUpdate check if s == nil or entries == nil then return, and if s.byKey ==
nil initialize it (s.byKey = make(map[string]*Meta)) before the loop so
storeKey, s.byKey[key] assignment, removeFromIndexes(existing) and
addToIndexes(meta) are safe.

🧹 Nitpick comments (5)

proto/k8scache.proto (1)

1-5: Buf lint: package directory mismatch.

Static analysis reports the file should be in a k8scache/ directory to match the package name. This is a lint warning, not a functional issue.

If you want to satisfy Buf lint, either:

Move proto to proto/k8scache/k8scache.proto

Or add a Buf lint exception
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proto/k8scache.proto` around lines 1 - 5, The proto file declares package
k8scache but currently lives at proto/k8scache.proto which triggers Buf lint for
package-directory mismatch; fix by either moving the file into a k8scache
subdirectory (e.g., relocate proto/k8scache.proto ->
proto/k8scache/k8scache.proto) so the file path matches package k8scache, or add
a Buf lint exception/config entry to ignore this rule for proto/k8scache.proto;
ensure the package k8scache and option go_package remain unchanged when you move
or update the lint config.

pkg/pipeline/transform/kubernetes/k8scache/converter.go (1)

14-32: Maps are shared, not copied.

Labels, Annotations, and SecondaryNetNames are assigned by reference. If the receiver modifies these maps, the source is also modified. Consider defensive copies if mutation is possible.

Optional: defensive map copy

+func copyMap(m map[string]string) map[string]string {
+	if m == nil {
+		return nil
+	}
+	out := make(map[string]string, len(m))
+	for k, v := range m {
+		out[k] = v
+	}
+	return out
+}
+
 // In resourceEntryToMeta:
-		Labels:          entry.Labels,
-		Annotations:     entry.Annotations,
+		Labels:          copyMap(entry.Labels),
+		Annotations:     copyMap(entry.Annotations),
 // ...
-	SecondaryNetNames: entry.SecondaryNetNames,
+	SecondaryNetNames: copyMap(entry.SecondaryNetNames),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/converter.go` around lines 14 -
32, The ResourceMetaData construction in converter.go assigns Labels,
Annotations, and SecondaryNetNames by reference (from entry.Labels,
entry.Annotations, entry.SecondaryNetNames), which can cause shared-mutability
bugs; update the assignment in the block that creates meta
(model.ResourceMetaData) to use defensive copies for maps and slices (e.g.,
create new map instances and copy each key/value for Labels and Annotations, and
copy SecondaryNetNames into a new slice) or call small helper functions (e.g.,
copyStringMap, copyStringSlice) so mutations to ResourceMetaData do not affect
the original entry.

pkg/pipeline/transform/kubernetes/k8scache/discovery.go (1)

93-105: Consider structured logging for discovery errors.

The error logging swallows context about which selector/namespace failed. This is acceptable for resilience but could complicate debugging.

 	pods, err := clientset.CoreV1().Pods(namespace).List(ctx, metav1.ListOptions{
 		LabelSelector: cfg.ProcessorSelector,
 	})
 	if err != nil {
-		log.WithError(err).Error("failed to list processor pods")
+		log.WithError(err).WithFields(log.Fields{
+			"namespace": namespace,
+			"selector":  cfg.ProcessorSelector,
+		}).Error("failed to list processor pods")
 		return
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/k8scache/discovery.go` around lines 93 -
105, The discoverAndConnect function logs discovery errors without context;
update the error logging in the error branch after
clientset.CoreV1().Pods(...).List to include namespace and cfg.ProcessorSelector
so failures show which selector/namespace failed; locate the
log.WithError(err).Error call in discoverAndConnect and replace it (or augment
it) with a structured log entry that attaches fields like "namespace" and
"selector" alongside the error for clearer diagnostics.

pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go (1)

142-151: GetNodeByName error semantics are inconsistent with caller expectations.

Returns errors.New("notFound") which is a sentinel string, not a typed error. The caller in enrich.go (line 143-146) treats any non-nil error as a cache miss and logs at Trace level, so this works but is unconventional.

Consider using a typed error or returning (nil, nil) for not-found since callers already handle nil nodeInfo.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go` around lines
142 - 151, The GetNodeByName method on KubernetesStore currently returns
errors.New("notFound") for missing entries which is a plain string error and
inconsistent with callers; update GetNodeByName (and its callers if needed) to
either return (nil, nil) for a cache miss or use a package-level sentinel error
(e.g., var ErrNotFound = errors.New("node not found")) so callers can
distinguish "not found" from real errors — locate KubernetesStore.GetNodeByName
and the byNodeName lookup and change the return for the not-found branch to use
the chosen pattern, and if you add ErrNotFound update the caller in enrich.go to
check for ErrNotFound explicitly instead of treating any non-nil error as a
cache miss.

contrib/kubernetes/deployment-k8scache.yaml (1)

17-36: Consider adding resource requests/limits.

The container lacks resource constraints. Without them, the pod may be evicted under memory pressure or consume unbounded resources.

Example resource specification

         - name: flowlogs-pipeline
           image: %IMAGE_TAG_BASE%:%VERSION%
           args:
           - "--config=/etc/flowlogs-pipeline/flowlogs-pipeline.conf.yaml"
           - "--k8scache.port=9090"
           - "--k8scache.address=0.0.0.0"
+          resources:
+            requests:
+              memory: "256Mi"
+              cpu: "100m"
+            limits:
+              memory: "1Gi"
+              cpu: "1000m"
           ports:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@contrib/kubernetes/deployment-k8scache.yaml` around lines 17 - 36, The
flowlogs-pipeline container currently has no resource constraints; add a
resources block under the container spec for the container named
"flowlogs-pipeline" that defines both requests and limits (e.g., cpu and memory)
to prevent eviction and unbounded usage; update the container spec near where
image, args, ports, imagePullPolicy, and volumeMounts are defined to include
resources.requests (minimum cpu/memory) and resources.limits (max cpu/memory)
values appropriate for this workload.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/flowlogs-pipeline/main.go`:
- Around line 288-291: The call to log.WithError(err).WithField("address",
address).Fatal(...) makes the subsequent return nil unreachable; change the flow
so the error is returned instead of exiting: replace Fatal with a non-fatal
logger (e.g., Error or Warn) and return the actual error (return err) from the
surrounding function, or if you intend to terminate the process keep Fatal but
remove the unreachable return; update the log/return around
log.WithError(...).WithField(...) accordingly.
- Around line 316-319: The tls.Config used to build the server TLS settings
(tlsConfig := &tls.Config{...}) lacks a minimum protocol version, allowing
insecure TLS versions; update the tls.Config in main.go (tlsConfig) to include
MinVersion: tls.VersionTLS12 (or tls.VersionTLS13) so the server refuses TLS
<1.2, e.g., add MinVersion: tls.VersionTLS12 alongside Certificates and
ClientAuth in the tls.Config literal that uses cert.
- Around line 274-276: The return nil after log.WithError(err).Fatal("failed to
configure TLS for K8s cache server") is unreachable because Fatal exits the
process; remove the redundant return nil OR change the call to a non-fatal
logger (e.g., log.WithError(err).Error(...)) if you intend to return nil from
the function; update the code path around the TLS configuration (the
log.WithError(err).Fatal call and the subsequent return nil) accordingly.

In `@contrib/kubernetes/deployment-k8scache.yaml`:
- Around line 78-81: Fix the misspelled container port name in the Kubernetes
Service/Deployment spec by changing the name field value from "prometheous" to
"prometheus" (the entries involving port: 9102 and targetPort: 9102 and the
name: field). Update only the name string so it matches the correct "prometheus"
spelling wherever that service port is defined.

In `@README.md`:
- Around line 57-58: The README's flags list is missing the TLS-related k8scache
options; update the help output block to include --k8scache.tls-enabled,
--k8scache.tls-cert-path, --k8scache.tls-key-path, and --k8scache.tls-ca-path
alongside the existing --k8scache.address and --k8scache.port entries, showing
their default values and short descriptions (e.g. whether TLS is enabled, paths
for cert/key/CA), and regenerate the help output so the README accurately
documents all k8scache options.

---

Duplicate comments:
In `@cmd/flowlogs-pipeline/main.go`:
- Around line 241-244: The shutdown currently calls grpcServer.GracefulStop()
with no timeout which can hang waiting on long-lived StreamUpdates RPCs; fix by
starting a goroutine that sleeps for a bounded timeout (e.g., a configurable
shutdownTimeout duration) and calls grpcServer.Stop() to force-close if
GracefulStop() doesn't finish in time, then call grpcServer.GracefulStop() in
the main shutdown path and log both the graceful completion and the forced stop;
reference grpcServer.GracefulStop(), grpcServer.Stop(), and the StreamUpdates
RPC so the timeout protects long-lived streams.

In `@cmd/flp-informers/main.go`:
- Around line 172-180: The code creates an empty NetworkTransformKubeConfig
(NetworkTransformKubeConfig{}) and passes it to informers.NewConfig, which can
cause informer/processor config drift; instead load the real transformer kube
config from the shared configuration or CLI before creating the informers: parse
the existing shared config or add CLI flags to populate
NetworkTransformKubeConfig, build infConfig via
informers.NewConfig(populatedConfig) and then call
inf.InitFromConfig(opts.Kubeconfig, &infConfig, opMetrics) so informers track
the same managed CNI/secondary networks/tracked kinds as processors (refer to
NetworkTransformKubeConfig, informers.NewConfig, informers.Informers, and
inf.InitFromConfig).
- Around line 203-207: StartProcessorDiscovery errors are only logged inside a
goroutine which can let the process continue without discovery; change the
startup to surface failures: call StartProcessorDiscovery in a way that captures
its immediate return value (e.g., run it synchronously or send its error to a
startup channel) and if err != nil (from StartProcessorDiscovery(ctx,
grpcClient, discoveryConfig)) then log the error with log.WithError and stop
startup by returning the error from main or exiting non‑zero (or cancelling ctx
and failing initialization). Ensure you reference the existing symbols
StartProcessorDiscovery, grpcClient, discoveryConfig, ctx and log.WithError when
implementing this propagation.
- Around line 211-217: On shutdown, the code only cancels context and stops the
gRPC client but never closes the Informers internal channels; update the
shutdown handling (after receiving from sigChan and before exiting) to close the
Informers' stop channels (e.g., close(informers.stopChan) and
close(informers.mdStopChan)) or call a safe stop method on the Informers
instance; ensure you guard against double-close (check nil or use a Stop() on
Informers that uses sync.Once) so watches are terminated and resources released.

In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go`:
- Around line 114-120: In the Delete method loop over entries, add the same nil
guard used elsewhere to avoid panics when the store map is nil: before iterating
check that s != nil and s.byKey != nil (or return early if s.byKey == nil); then
proceed to compute key via storeKey(meta), check s.byKey[key], call
s.removeFromIndexes(existing) and delete(s.byKey, key) as before. Ensure you
reference the Delete method, s.byKey, storeKey, and removeFromIndexes when
making the change.
- Around line 86-90: The Replace loop can panic when entries contain nil because
storeKey(meta) dereferences meta before addToIndexes can guard; update the loop
in the Replace implementation to check for nil first (e.g., if meta == nil {
continue }) and skip calling storeKey(meta), s.byKey[...] and
s.addToIndexes(meta) for nil items so malformed update slices do not cause a
panic; ensure the nil check is the first operation inside the loop covering
calls to storeKey and addToIndexes.
- Around line 98-105: Add a nil guard in AddOrUpdate: ensure the receiver and
the byKey map are non-nil and return early if entries is nil; specifically, at
the start of AddOrUpdate check if s == nil or entries == nil then return, and if
s.byKey == nil initialize it (s.byKey = make(map[string]*Meta)) before the loop
so storeKey, s.byKey[key] assignment, removeFromIndexes(existing) and
addToIndexes(meta) are safe.

In `@pkg/pipeline/transform/kubernetes/k8scache/client.go`:
- Around line 134-137: The tls.Config created in client.go (variable tlsConfig
with InsecureSkipVerify: c.tlsConfig.InsecureSkipVerify) doesn't set MinVersion,
allowing protocol downgrade; update the tls.Config initialization in the code
that builds tlsConfig to set MinVersion to a safe minimum (e.g.,
tls.VersionTLS12 or use a configured minimum from c.tlsConfig.MinVersion) so the
client enforces a modern TLS version and prevents downgrade attacks.

In `@pkg/pipeline/transform/kubernetes/k8scache/integration_test.go`:
- Around line 113-116: The test never initializes the datasource store so
mutations via the gRPC path aren't validated; after creating ds :=
&datasource.Datasource{Informers: informers} call SetKubernetesStore on ds (or
use the existing helper that wires the store for kube objects) before creating
cacheServer with NewKubernetesCacheServer(ds) so updates applied through the
server mutate ds.Store and the test can assert expected changes; ensure you
reference and initialize the same store type used elsewhere in tests (e.g., the
Kubernetes store) so subsequent gRPC update calls actually persist to
datasource.Datasource.Store.
- Around line 219-222: The multi-client integration test is missing a call to
set the shared store on the cache server; after creating the Datasource and
cacheServer (Datasource.Informers and NewKubernetesCacheServer), call
SetKubernetesStore on the cacheServer with the appropriate store instance (the
informer store returned by inf.SetupStubs or the Datasource's store) so the
multi-client path uses the same underlying store; update the test to obtain the
store from inf.SetupStubs (or Datasource) and pass it into
cacheServer.SetKubernetesStore(...) before running multi-client assertions.

In `@pkg/pipeline/transform/kubernetes/k8scache/server.go`:
- Around line 136-138: The comment notes snapshots are ignored causing restarts
to miss state; implement initial sync so when SyncRequest.LastVersion == 0 the
server treats the request as a snapshot: honor the is_snapshot flag and call
KubernetesStore.Replace with the full resource set instead of only applying
ADD/UPDATE events. Update the sync handling in server.go (the SyncRequest
processing code path) to detect LastVersion == 0 (or is_snapshot) and perform a
Replace on the store, populating version metadata and returning the appropriate
response so restarted processors get an immediate, consistent cache state.

---

Nitpick comments:
In `@contrib/kubernetes/deployment-k8scache.yaml`:
- Around line 17-36: The flowlogs-pipeline container currently has no resource
constraints; add a resources block under the container spec for the container
named "flowlogs-pipeline" that defines both requests and limits (e.g., cpu and
memory) to prevent eviction and unbounded usage; update the container spec near
where image, args, ports, imagePullPolicy, and volumeMounts are defined to
include resources.requests (minimum cpu/memory) and resources.limits (max
cpu/memory) values appropriate for this workload.

In `@pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go`:
- Around line 142-151: The GetNodeByName method on KubernetesStore currently
returns errors.New("notFound") for missing entries which is a plain string error
and inconsistent with callers; update GetNodeByName (and its callers if needed)
to either return (nil, nil) for a cache miss or use a package-level sentinel
error (e.g., var ErrNotFound = errors.New("node not found")) so callers can
distinguish "not found" from real errors — locate KubernetesStore.GetNodeByName
and the byNodeName lookup and change the return for the not-found branch to use
the chosen pattern, and if you add ErrNotFound update the caller in enrich.go to
check for ErrNotFound explicitly instead of treating any non-nil error as a
cache miss.

In `@pkg/pipeline/transform/kubernetes/k8scache/converter.go`:
- Around line 14-32: The ResourceMetaData construction in converter.go assigns
Labels, Annotations, and SecondaryNetNames by reference (from entry.Labels,
entry.Annotations, entry.SecondaryNetNames), which can cause shared-mutability
bugs; update the assignment in the block that creates meta
(model.ResourceMetaData) to use defensive copies for maps and slices (e.g.,
create new map instances and copy each key/value for Labels and Annotations, and
copy SecondaryNetNames into a new slice) or call small helper functions (e.g.,
copyStringMap, copyStringSlice) so mutations to ResourceMetaData do not affect
the original entry.

In `@pkg/pipeline/transform/kubernetes/k8scache/discovery.go`:
- Around line 93-105: The discoverAndConnect function logs discovery errors
without context; update the error logging in the error branch after
clientset.CoreV1().Pods(...).List to include namespace and cfg.ProcessorSelector
so failures show which selector/namespace failed; locate the
log.WithError(err).Error call in discoverAndConnect and replace it (or augment
it) with a structured log entry that attaches fields like "namespace" and
"selector" alongside the error for clearer diagnostics.

In `@proto/k8scache.proto`:
- Around line 1-5: The proto file declares package k8scache but currently lives
at proto/k8scache.proto which triggers Buf lint for package-directory mismatch;
fix by either moving the file into a k8scache subdirectory (e.g., relocate
proto/k8scache.proto -> proto/k8scache/k8scache.proto) so the file path matches
package k8scache, or add a Buf lint exception/config entry to ignore this rule
for proto/k8scache.proto; ensure the package k8scache and option go_package
remain unchanged when you move or update the lint config.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 57586583-f7a8-4dc5-8f57-e388b8d716ea

📥 Commits

Reviewing files that changed from the base of the PR and between 0808416 and 195bf1c.

⛔ Files ignored due to path filters (2)

pkg/pipeline/transform/kubernetes/k8scache/k8scache.pb.go is excluded by !**/*.pb.go
pkg/pipeline/transform/kubernetes/k8scache/k8scache_grpc.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (24)

.mk/development.mk
Makefile
README.md
cmd/flowlogs-pipeline/main.go
cmd/flp-informers/main.go
contrib/docker/Dockerfile
contrib/kubernetes/deployment-flp-informers.yaml
contrib/kubernetes/deployment-k8scache.yaml
pkg/config/config.go
pkg/pipeline/transform/kubernetes/datasource/datasource.go
pkg/pipeline/transform/kubernetes/datasource/kubernetesstore.go
pkg/pipeline/transform/kubernetes/enrich.go
pkg/pipeline/transform/kubernetes/informers/cache_export.go
pkg/pipeline/transform/kubernetes/k8scache/client.go
pkg/pipeline/transform/kubernetes/k8scache/converter.go
pkg/pipeline/transform/kubernetes/k8scache/discovery.go
pkg/pipeline/transform/kubernetes/k8scache/discovery_test.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler.go
pkg/pipeline/transform/kubernetes/k8scache/eventhandler_test.go
pkg/pipeline/transform/kubernetes/k8scache/integration_test.go
pkg/pipeline/transform/kubernetes/k8scache/server.go
pkg/pipeline/transform/kubernetes/k8scache/server_test.go
proto/README.md
proto/k8scache.proto

coderabbitai · 2026-03-31T14:55:08Z

+			log.WithError(err).Fatal("failed to configure TLS for K8s cache server")
+			return nil
+		}


⚠️ Potential issue | 🟡 Minor

Unreachable code after log.Fatal.

log.Fatal calls os.Exit(1), so return nil is never reached. Either use log.Error + return nil or remove the return.

Suggested fix

if cfg.TLSEnabled { tlsConfig, err := createServerTLSConfig(cfg) if err != nil { - log.WithError(err).Fatal("failed to configure TLS for K8s cache server") - return nil + log.WithError(err).Error("failed to configure TLS for K8s cache server") + os.Exit(1) }

Or simply remove return nil since Fatal terminates.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

log.WithError(err).Fatal("failed to configure TLS for K8s cache server")

return nil

}

log.WithError(err).Error("failed to configure TLS for K8s cache server")

os.Exit(1)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flowlogs-pipeline/main.go` around lines 274 - 276, The return nil after log.WithError(err).Fatal("failed to configure TLS for K8s cache server") is unreachable because Fatal exits the process; remove the redundant return nil OR change the call to a non-fatal logger (e.g., log.WithError(err).Error(...)) if you intend to return nil from the function; update the code path around the TLS configuration (the log.WithError(err).Fatal call and the subsequent return nil) accordingly.

coderabbitai · 2026-03-31T14:55:08Z

+	if err != nil {
+		log.WithError(err).WithField("address", address).Fatal("failed to start K8s cache server")
+		return nil
+	}


⚠️ Potential issue | 🟡 Minor

Same issue: unreachable return nil after log.Fatal.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flowlogs-pipeline/main.go` around lines 288 - 291, The call to log.WithError(err).WithField("address", address).Fatal(...) makes the subsequent return nil unreachable; change the flow so the error is returned instead of exiting: replace Fatal with a non-fatal logger (e.g., Error or Warn) and return the actual error (return err) from the surrounding function, or if you intend to terminate the process keep Fatal but remove the unreachable return; update the log/return around log.WithError(...).WithField(...) accordingly.

coderabbitai · 2026-03-31T14:55:08Z

+	tlsConfig := &tls.Config{
+		Certificates: []tls.Certificate{cert},
+		ClientAuth:   tls.NoClientCert, // Default: no client cert required
+	}


⚠️ Potential issue | 🟠 Major

Set MinVersion: tls.VersionTLS12 (or TLS 1.3) to prevent weak protocol negotiation.

Without MinVersion, Go defaults to TLS 1.0 for servers, which is insecure.

Suggested fix

tlsConfig := &tls.Config{ Certificates: []tls.Certificate{cert}, ClientAuth: tls.NoClientCert, // Default: no client cert required + MinVersion: tls.VersionTLS12, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tlsConfig := &tls.Config{

Certificates: []tls.Certificate{cert},

ClientAuth: tls.NoClientCert, // Default: no client cert required

}

tlsConfig := &tls.Config{

Certificates: []tls.Certificate{cert},

ClientAuth: tls.NoClientCert, // Default: no client cert required

MinVersion: tls.VersionTLS12,

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/flowlogs-pipeline/main.go` around lines 316 - 319, The tls.Config used to build the server TLS settings (tlsConfig := &tls.Config{...}) lacks a minimum protocol version, allowing insecure TLS versions; update the tls.Config in main.go (tlsConfig) to include MinVersion: tls.VersionTLS12 (or tls.VersionTLS13) so the server refuses TLS <1.2, e.g., add MinVersion: tls.VersionTLS12 alongside Certificates and ClientAuth in the tls.Config literal that uses cert.

coderabbitai · 2026-03-31T14:55:09Z

+      --k8scache.address string    K8s cache sync server address (default "0.0.0.0")  
+      --k8scache.port int          K8s cache sync server port (default: disabled)  


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check all k8scache flags defined in main.go rg "k8scache\." cmd/flowlogs-pipeline/main.go -n

Repository: netobserv/flowlogs-pipeline

Length of output: 1048

Include TLS flags in the help output.

The code defines --k8scache.tls-enabled, --k8scache.tls-cert-path, --k8scache.tls-key-path, and --k8scache.tls-ca-path, but the README only documents the address and port flags. Regenerate the help output to include all k8scache options.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 57 - 58, The README's flags list is missing the TLS-related k8scache options; update the help output block to include --k8scache.tls-enabled, --k8scache.tls-cert-path, --k8scache.tls-key-path, and --k8scache.tls-ca-path alongside the existing --k8scache.address and --k8scache.port entries, showing their default values and short descriptions (e.g. whether TLS is enabled, paths for cert/key/CA), and regenerate the help output so the README accurately documents all k8scache options.

jpinsonneau

Let's make the cache configurable (buffer / timeout / batch size) and ensure the snapshot are sent when FLP start / restart

jpinsonneau · 2026-04-14T09:07:58Z

+		// AddProcessorWithTimeout is idempotent (won't duplicate if already connected)
+		// Use a 10-second timeout to avoid blocking the discovery loop for too long
+		if err := client.AddProcessorWithTimeout(address, 10*time.Second); err != nil {
+			log.WithError(err).WithField("pod", pod.Name).Error("failed to connect to processor")
+		}
+	}
+}


We should also handle RemoveProcessor ? FLPs may restart and change address.

Created a mechanism to remove stale processors.

jpinsonneau · 2026-04-14T09:13:42Z

+	for _, pc := range connections {
+		wg.Add(1)
+		go func(pc *processorConnection) {
+			defer wg.Done()
+
+			// Get stream reference under lock
+			stream := pc.getStream()
+			if stream == nil {
+				clog.WithField("address", pc.address).Warn("stream is nil, skipping send")
+				pc.healthy.Store(false)
+				return
+			}
+
+			// Create a context with timeout for this send operation
+			sendCtx, sendCancel := context.WithTimeout(context.Background(), sendTimeout)
+			defer sendCancel()
+
+			// Use a channel to signal completion or timeout
+			done := make(chan error, 1)
+			go func() {
+				select {
+				case <-sendCtx.Done():
+					// Context cancelled, exit goroutine to prevent leak
+					return
+				case done <- stream.Send(update):
+					// Send completed (success or error)
+				}
+			}()
+
+			select {
+			case err := <-done:
+				if err != nil {
+					clog.WithError(err).WithField("address", pc.address).Error("failed to send update")
+					pc.healthy.Store(false)
+				}
+			case <-sendCtx.Done():
+				clog.WithField("address", pc.address).Error("send operation timed out")
+				pc.healthy.Store(false)
+
+				// Cancel the stream context to ensure any blocked operations are unblocked
+				// This will cause receiveFromProcessor to see an error and trigger reconnection
+				pc.cancelStream()
+			}
+		}(pc)
+	}


Suggested change

for _, pc := range connections {

wg.Add(1)

go func(pc *processorConnection) {

defer wg.Done()

// Get stream reference under lock

stream := pc.getStream()

if stream == nil {

clog.WithField("address", pc.address).Warn("stream is nil, skipping send")

pc.healthy.Store(false)

return

}

// Create a context with timeout for this send operation

sendCtx, sendCancel := context.WithTimeout(context.Background(), sendTimeout)

defer sendCancel()

// Use a channel to signal completion or timeout

done := make(chan error, 1)

go func() {

select {

case <-sendCtx.Done():

// Context cancelled, exit goroutine to prevent leak

return

case done <- stream.Send(update):

// Send completed (success or error)

}

}()

select {

case err := <-done:

if err != nil {

clog.WithError(err).WithField("address", pc.address).Error("failed to send update")

pc.healthy.Store(false)

}

case <-sendCtx.Done():

clog.WithField("address", pc.address).Error("send operation timed out")

pc.healthy.Store(false)

// Cancel the stream context to ensure any blocked operations are unblocked

// This will cause receiveFromProcessor to see an error and trigger reconnection

pc.cancelStream()

}

}(pc)

}

for _, pc := range connections {

wg.Add(1)

go func(pc *processorConnection) {

defer wg.Done()

// Get stream reference under lock

stream := pc.getStream()

if stream == nil {

clog.WithField("address", pc.address).Warn("stream is nil, skipping send")

pc.healthy.Store(false)

return

}

// Race stream.Send against a timeout. stream.Send doesn't accept

// a context, so we run it in a separate goroutine and cancel the

// underlying stream on timeout to unblock it.

done := make(chan error, 1)

go func() {

done <- stream.Send(update)

}()

select {

case err := <-done:

if err != nil {

clog.WithError(err).WithField("address", pc.address).Error("failed to send update")

pc.healthy.Store(false)

}

case <-time.After(sendTimeout):

clog.WithField("address", pc.address).Error("send operation timed out")

pc.healthy.Store(false)

// Cancel the stream context so the blocked Send returns an error.

// The goroutine will then write to `done` and exit (buffered channel).

// This also causes receiveFromProcessor to see an error and trigger reconnection.

pc.cancelStream()

}

}(pc)

jpinsonneau · 2026-04-14T09:21:59Z

+// handleSyncRequest handles a SyncRequest from a processor
+func (c *Client) handleSyncRequest(pc *processorConnection, req *SyncRequest) {
+	clog.WithFields(log.Fields{
+		"address":      pc.address,
+		"processor_id": req.ProcessorId,
+		"last_version": req.LastVersion,
+	}).Info("received SyncRequest from processor")
+
+	// Note: We only send incremental updates (ADD/UPDATE/DELETE).
+	// No initial snapshot is sent. Processors start with empty cache and build up
+	// from incoming events. If a processor restarts, it loses enrichment until
+	// resources are updated again (acceptable trade-off per design).
+}


Does that mean every pods running before startup / restart will not be enriched ❓

I feel we should send a snapshot on SyncRequest when lastVersion is 0 to recover.

Initial snapshot re-added.

jpinsonneau · 2026-04-14T09:24:06Z

+		processorID: config.ProcessorID,
+		tlsConfig:   *config,
+		connections: make(map[string]*processorConnection),
+		updateChan:  make(chan *CacheUpdate, 100), // Buffer for updates


That may be not enough for big clusters especially with the 5s timeout 🤔

We should make those configurable and test some values

jpinsonneau · 2026-04-14T09:24:19Z

+	select {
+	case c.updateChan <- update:
+		return nil
+	case <-time.After(5 * time.Second):


Let's make this configurable

jpinsonneau · 2026-04-14T09:29:31Z

+// SendAdd sends an ADD operation to all connected processors
+func (c *Client) SendAdd(entries []*model.ResourceMetaData) error {
+	version := c.version.Add(1)
+	update := &CacheUpdate{
+		Version:    version,
+		IsSnapshot: false,
+		Operation:  OperationType_OPERATION_ADD,
+		Entries:    metaToResourceEntries(entries),
+	}
+
+	return c.sendUpdate(update)
+}
+
+// SendUpdate sends an UPDATE operation to all connected processors
+func (c *Client) SendUpdate(entries []*model.ResourceMetaData) error {
+	version := c.version.Add(1)
+	update := &CacheUpdate{
+		Version:    version,
+		IsSnapshot: false,
+		Operation:  OperationType_OPERATION_UPDATE,
+		Entries:    metaToResourceEntries(entries),
+	}
+
+	return c.sendUpdate(update)
+}
+
+// SendDelete sends a DELETE operation to all connected processors
+func (c *Client) SendDelete(entries []*model.ResourceMetaData) error {
+	version := c.version.Add(1)
+	update := &CacheUpdate{
+		Version:    version,
+		IsSnapshot: false,
+		Operation:  OperationType_OPERATION_DELETE,
+		Entries:    metaToResourceEntries(entries),
+	}
+
+	return c.sendUpdate(update)
+}


We should batch messages instead of sending those one by one.

openshift-ci · 2026-04-15T17:04:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jpinsonneau. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jpinsonneau

LGTM in terms of code ! time to test and get some metrics 😸

jotak · 2026-05-07T08:41:32Z

+		DiscoveryErrors: prometheus.NewCounter(prometheus.CounterOpts{
+			Name: "flp_informers_discovery_errors_total",
+			Help: "Total number of processor discovery errors",
+		}),


It's a nice thing to have this observability layer
For errors, I think it's better to have a single error metric associated with a code label. So that over time, if we already have a dashboard that surfaces those errors, we don't have to update it every time we add more kinds of errors
Currently that would be an error counter with an error code (or provenance) for "discovery" and another for UDN disambiguation

Good idea, done.

jotak · 2026-05-07T09:00:53Z

+	// It is populated by the k8s cache sync gRPC server when receiving updates from flp-informers.
+	// When k8scache is enabled, Informers is nil and only kubernetesStore is used.
+	// Access is synchronized via atomic operations to prevent race conditions.
+	kubernetesStore atomic.Pointer[KubernetesStore]


Does it really needs to be synchronized? The KubernetesStore struct itself has a mutex, isn't it sufficient? (If I'm reading correctly, the kubernetesStore var is only set once, in main.go , plus in tests)

That is true, it's not required. Removed.

jotak · 2026-05-07T09:09:07Z

+
 func (d *Datasource) IndexLookup(potentialKeys []string, ip string) *model.ResourceMetaData {
-	return d.Informers.IndexLookup(potentialKeys, ip)
+	if store := d.kubernetesStore.Load(); store != nil {


it would be cleaner to make Datasource an interface, with 2 implementations (k8s-store and informers), that would avoid having this pattern of "1. check store ; 2. fallback to informers"

we can defer that to later

Yes, I will evaluate effort, but I agree we can do it in a follow up PR to focus on performance now.

openshift-ci · 2026-05-07T22:47:03Z

@leandroberetta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/qe-e2e-tests	`dbc00d3`	link	false	`/test qe-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

leandroberetta self-assigned this Mar 26, 2026

openshift-ci-robot added the jira/valid-reference label Mar 26, 2026

openshift-ci Bot added the do-not-merge/work-in-progress label Mar 26, 2026

leandroberetta marked this pull request as ready for review March 27, 2026 10:32

openshift-ci Bot removed the do-not-merge/work-in-progress label Mar 27, 2026

coderabbitai Bot reviewed Mar 27, 2026

View reviewed changes

leandroberetta force-pushed the netobserv-1248 branch from de63e00 to ed2954b Compare March 30, 2026 11:12

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

flp-informers and flp-processors

1117b70

leandroberetta force-pushed the netobserv-1248 branch from ed2954b to 1117b70 Compare March 30, 2026 11:57

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

fix tls cert names issue

195bf1c

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

leandroberetta mentioned this pull request Apr 9, 2026

NETOBSERV-2675: flp-informers deployment and wiring netobserv/netobserv-operator#2650

Open

6 tasks

leandroberetta requested review from OlivierCazade, jotak and jpinsonneau April 13, 2026 13:36

removed deployment adn replicaset from event handlers

27651ee

jpinsonneau requested changes Apr 14, 2026

View reviewed changes

openshift-ci Bot assigned jpinsonneau Apr 14, 2026

feedback addressed

c6960fc

fix linter

5831373

leandroberetta requested a review from jpinsonneau April 15, 2026 17:21

jpinsonneau previously approved these changes Apr 15, 2026

View reviewed changes

openshift-ci Bot added the lgtm label Apr 15, 2026

informers metrics

e5cbc92

leandroberetta dismissed jpinsonneau’s stale review via e5cbc92 April 24, 2026 14:21

openshift-ci Bot removed the lgtm label Apr 24, 2026

fixes

ab6e2fc

leandroberetta requested a review from jpinsonneau April 24, 2026 18:06

jotak reviewed May 7, 2026

View reviewed changes

Comment thread cmd/flp-informers/main.go Outdated

jotak reviewed May 7, 2026

View reviewed changes

addressing feedback

dbc00d3

jpinsonneau approved these changes May 11, 2026

View reviewed changes

openshift-ci Bot added the lgtm label May 11, 2026

		apiConfig := &api.NetworkTransformKubeConfig{} // Empty config - will use defaults
		infConfig := informers.NewConfig(apiConfig)

		$ protoc --go_out=./pkg/pipeline/transform/kubernetes/k8scache --go_opt=paths=source_relative ./proto/k8scache.proto
		$ protoc --go-grpc_out=./pkg/pipeline/transform/kubernetes/k8scache --go-grpc_opt=paths=source_relative ./proto/k8scache.proto

		// index: IP -> meta (first match wins; pods may have multiple IPs)
		byIP map[string]*model.ResourceMetaData

		--k8scache.address string K8s cache sync server address (default "0.0.0.0")
		--k8scache.port int K8s cache sync server port (default: disabled)

Conversation

leandroberetta commented Mar 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Mar 26, 2026

Uh oh!

openshift-ci-robot commented Mar 26, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

coderabbitai Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci-robot commented Mar 27, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Mar 27, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 27, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Summary by CodeRabbit

Uh oh!

leandroberetta commented Mar 26, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Mar 26, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

openshift-ci-robot commented Mar 27, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Mar 27, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Mar 27, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Mar 30, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Mar 30, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Mar 31, 2026 •

edited by openshift-ci Bot

Loading