Skip to content

Add anomaly detection transform stage to flowlogs-pipeline#1143

Open
vatankh wants to merge 13 commits into
netobserv:mainfrom
vatankh:feature/anomaly-transform
Open

Add anomaly detection transform stage to flowlogs-pipeline#1143
vatankh wants to merge 13 commits into
netobserv:mainfrom
vatankh:feature/anomaly-transform

Conversation

@vatankh
Copy link
Copy Markdown

@vatankh vatankh commented Nov 26, 2025

Description

This PR introduces a new anomaly transform stage to flowlogs-pipeline as a first step toward anomaly detection for Kubernetes network flows (see issue #).

Key points:

  • Adds a new type: anomaly transform that computes streaming anomaly scores per key.
  • Supports two algorithms:
    • zscore: rolling z-score over a sliding window.
    • ewma: exponentially weighted moving average baseline.
  • Configuration options:
    • algorithm (ewma | zscore)
    • valueField (numeric field, e.g. Bytes)
    • keyFields (used to group flows per entity, e.g. [SrcAddr, DstAddr, Proto])
    • windowSize, baselineWindow, sensitivity, ewmaAlpha
  • Emits additional fields on each record:
    • anomaly_score
    • anomaly_type (e.g. warming_up, normal, zscore_high, zscore_low, ewma_high, ewma_low)
    • baseline_window (current number of samples in the baseline window)
  • Adds API docs and an example pipeline (hack/examples/pipeline-anomaly.yaml).

This is intentionally a local, per-instance anomaly stage that works on the existing pipeline input only; it does not consume Loki/Kafka yet, as discussed in the issue conversation.

Dependencies

n/a

Testing

  • go test ./pkg/pipeline/transform -run TestTransformAnomaly
  • go test ./...
  • Manual run:
    • go build ./cmd/flowlogs-pipeline
    • ./flowlogs-pipeline --log-level debug --config hack/examples/pipeline-anomaly.yaml

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [ x] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
    No, this change only adds a new optional transform in flowlogs-pipeline and is not yet wired into the operator.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

  • Does this PR require product documentation?

    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?

    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):

    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

To run a perfscale test, comment with: /test flp-node-density-heavy-25nodes

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Nov 26, 2025

Hi @vatankh. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/pipeline/transform/transform_anomaly.go Outdated
Comment thread pkg/pipeline/transform/transform_anomaly.go Outdated
@jotak
Copy link
Copy Markdown
Member

jotak commented Dec 9, 2025

Thanks @vatankh ! This is looking pretty good already.

I have a few more comments, let's start with the nitpicking one :-) : could you remove .idea from the PR, and add it to .gitignore ?

Then a comment on the API design: as it is, it doesn't allow to run several anomaly detections (e.g. on several valueFields, or with different keys). A single Anomaly stage runs for a single value field, and if several stages are defined, they would conflict when writing on the same output fields. A simple way to fix this would be to add a Prefix field to the API config, which would prefix the "anomaly_score", "anomaly_type" and "baseline_window" outputs. So each stage would define a different prefix, allowing for disambiguation.

Another approach would be to allow multiple value fields in a single stage.

Comment thread pkg/pipeline/transform/transform_anomaly.go Outdated
Comment thread pkg/pipeline/transform/transform_anomaly.go Outdated
Comment thread hack/examples/pipeline-anomaly.yaml Outdated
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Dec 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign stleerh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vatankh
Copy link
Copy Markdown
Author

vatankh commented Dec 22, 2025

Thanks @vatankh ! This is looking pretty good already.

I have a few more comments, let's start with the nitpicking one :-) : could you remove .idea from the PR, and add it to .gitignore ?

Then a comment on the API design: as it is, it doesn't allow to run several anomaly detections (e.g. on several valueFields, or with different keys). A single Anomaly stage runs for a single value field, and if several stages are defined, they would conflict when writing on the same output fields. A simple way to fix this would be to add a Prefix field to the API config, which would prefix the "anomaly_score", "anomaly_type" and "baseline_window" outputs. So each stage would define a different prefix, allowing for disambiguation.

Another approach would be to allow multiple value fields in a single stage.
@jotak
Added the Prefix field to the API config (TransformAnomaly) and implemented it in the transformer. The outputField method now prepends the prefix to anomaly fields, allowing multiple anomaly stages to run without conflicts.

@vatankh
Copy link
Copy Markdown
Author

vatankh commented Feb 3, 2026

is there anything needs fixing ? @jotak

@jotak
Copy link
Copy Markdown
Member

jotak commented Feb 10, 2026

Sorry for the delay, I'm taking another look at it

Comment thread pkg/api/transform_anomaly.go Outdated
Comment thread pkg/pipeline/transform/transform_anomaly.go Outdated
}

// Reset clears the internal state; useful for tests.
func (a *Anomaly) Reset() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's only used in the test file that sits in the same package, let's not export it (use reset without capital R)

Comment thread pkg/pipeline/transform/transform_anomaly_test.go Outdated
Removing IBM copyright & license header from new files
@jotak
Copy link
Copy Markdown
Member

jotak commented Feb 10, 2026

/ok-to-test

@openshift-ci openshift-ci Bot added ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. and removed needs-ok-to-test labels Feb 10, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Feb 10, 2026

@vatankh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/qe-e2e-tests 627b824 link false /test qe-e2e-tests
ci/prow/images 627b824 link true /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jotak
Copy link
Copy Markdown
Member

jotak commented Feb 10, 2026

We need to rebase the PR - image build jobs are failing because they check out the PR against the updated main branch, and an issue pops up in that case. There's a new function in the Transform interface that needs to be implemented, or to throw an error/warning - let's just make it throw a warning like that:

func (a *Anomaly) Update(_ config.StageParam) {
	log.Warn("Transform Anomaly, update not supported")
}

@jotak
Copy link
Copy Markdown
Member

jotak commented Feb 10, 2026

I see also the linter complaining about formatting issues. You can run go fmt ./... to fix them. I'm also adding it into our Makefile ( #1189 ) - I think most people have go fmt baked-in their IDE as an on-save hook so don't have this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants