Skip to content
340 changes: 340 additions & 0 deletions enhancements/observability/enable-network-observability-on-day-0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,340 @@
---
title: enable-network-observability-on-day-0
authors:
- "@stleerh"
reviewers:
- "@jotak"
- "@jpinsonneau"
- "@memodi"
- "Mike Fiedler"
- "@pavolloffay"
- "@jan--f"
- "@abhat"
- "@simonpasquier"
- "@everettraven"
approvers:
- "@jotak"
- "@dave-tucker"
api-approvers:
- "@jotak"
- "@dave-tucker"
- "@everettraven"
creation-date: 2025-09-30
last-updated: 2026-02-25
tracking-link:
- https://issues.redhat.com/browse/OCPSTRAT-2469
see-also:
- N/A
replaces:
- N/A
superseded-by:
- N/A
---

# Enable Network Observability on Day 0

## Summary

This feature enhancement makes Network Observability available on day 0 by default. That is, Network Observability is up and running after you create an OpenShift cluster using `openshift-install`. It installs Network Observability Operator and creates a basic FlowCollector instance. There is an option to turn this off. It also makes it easy to have Network Observability available on day 1.

## Motivation

Network Observability is an optional OLM operator that collects and stores traffic flow information and provides insights into your network traffic, including troubleshooting features like packet drops, latencies, DNS tracking, and more.
Comment thread
stleerh marked this conversation as resolved.

### User Stories

* As a cluster admin or developer, I expect to be able to observe and manage my network traffic without having to install other components. It should just be there.
* As a cluster admin, I should be able to see the networking health of my cluster after creating it.
* As a customer support engineer, I want the customer to be aware that Network Observability exists and can provide insights into their network traffic, including the ability to troubleshoot a number of networking issues.

These are the related issues for this feature enhancement.

* (Feature) [OCPSTRAT-2469](https://issues.redhat.com/browse/OCPSTRAT-2469) Provide a default OpenShift install experience for Network Observability
* (Epic) [NETOBSERV-2454](https://issues.redhat.com/browse/NETOBSERV-2454) Install Network Observability operator by default on OpenShift clusters
* (Spike) [NETOBSERV-2236](https://issues.redhat.com/browse/NETOBSERV-2236) What it would take to enable Network Observability by default in the console
* (PoC) [NETOBSERV-2247](https://issues.redhat.com/browse/NETOBSERV-2247) Have network observability be available and enabled on day 0

### Goals

Being able to manage and observe the network in an OpenShift cluster is critical in maintaining the health and integrity of the network. Without it, there’s no easy way to verify whether your changes are working as expected or whether your network is experiencing issues.

Currently, Network Observability is an optional operator and a majority of customers do not have Network Observability installed. Customers are missing out on features that they should have and have already paid for.

Network observability should be an integral part of networking and not thought of as a separate component. You shouldn't have to ask, "Do I need observability?" any more than you would ask "Do I need security?" Because it requires resources, basic observability should exist and additional features can be enabled as needed. There are a few scenarios where you might not want Network Observability, so there is an easy way to opt out.

There is no one-size-fits-all solution in terms of configuring Network Observability, but the goal is to keep this part simple, while still providing as much value as possible given the constraints, and make it an easy way to change parameters on day 2.

### Non-Goals

There are other proposals to make Network Observability more visible and prominent, such as displaying a panel that would describe the features of Network Observability and provide a button to install it. However, this feature enhancement addresses [OCPSTRAT-2469](https://issues.redhat.com/browse/OCPSTRAT-2469) that explicitly calls for Network Observability to be up and running after install.

There is a separate effort to add Network Observability to OpenShift Assisted Installer ([NETOBSERV-2486](https://issues.redhat.com/browse/NETOBSERV-2486)). That addresses some installation cases but not all.

Network Observability Operator manages the components, such as flowlog pipelines. Therefore, there is no need to consider the lifecycle management, since that will not change.

## Proposal

There are three OpenShift repositories that this proposal changes. They are [openshift/api](https://github.com/openshift/api), [openshift/cluster-network-operator](https://github.com/openshift/cluster-network-operator), and [openshift/install](https://github.com/openshift/installer).

### Repository: openshift/api

The openshift/api repository is a shared repository for defining the API. This adds the `networkObservability` field and a nested `installationPolicy` field in the Network Custom Resource Definition (CRD) under the spec section.

```yaml
apiVersion: config.openshift.io/v1
kind: Network
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two "Network" types, one in config.openshift.io and one in operator.openshift.io (corresponding to config/v1/types_network.go and operator/v1/types_network.go in openshift/api).

Although we have not been consistent, the idea was supposed to be that network.operator.openshift.io is the cluster-network-operator's configuration, and network.config.openshift.io is networking-related configuration that is higher-level than CNO / would apply even in clusters without CNO.

So if this is telling CNO whether or not to deploy the observability operator, then it belongs in the operator config, not the config config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was generated by openshift-install. I'm not entering line 84, but rather modifying this file indirectly by changing install-config.yaml.

If you have an install-config.yaml file and enter openshift-install create manifests, it creates the 'manifests' directory. Inside this directory is a file named "cluster-network-02-config.yml". It has the line:

apiVersion: config.openshift.io/v1
kind: Network

Because of the Network Observability config in install-config.yaml, this file gets modified to include:

spec:
  networkObservability:
    installationPolicy: InstallAndEnable | DoNotInstall

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that all happens because you told openshift-install to do that, and I'm saying, you should tell it to write that config to a new object modifying the networks.operator.openshift.io object instead, because that's where that could would belong.

Copy link
Copy Markdown
Contributor Author

@stleerh stleerh Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but it would make it harder for someone to not enable Network Observability.

What other attributes are written into networks.operator.openshift.io?

When I enter oc edit network, I get config.openshift.io/v1.

metadata:
name: cluster
spec:
networkObservability:
installationPolicy: InstallAndEnable | DoNotInstall
```

This allows flexibility for future growth as opposed to having a simple true/false field.

Listing 1: Network manifest

If the value is `InstallAndEnable` or doesn't exist, Network Observability is enabled. That is, Network Observability will be installed and a FlowCollector custom resource will be created (more details below). If it is set to `DoNotInstall`, Network Observability is not enabled or to be precise, *nothing is done*. It doesn’t remove Network Observability if it is set to `DoNotInstall`.

### Repository: openshift/cluster-network-operator

The actual enabling of Network Observability is done in the Cluster Network Operator (CNO). The rationale is that we want the network observability feature to be part of networking. This is as opposed to being part of the general observability or as a standalone entity. Yet, there is still a separation at the lower level so that the two can be independently developed and released at different times, particularly for bug fixes.

In CNO, it adds a new controller for observability and adds it to the manager. The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field. If set to `InstallAndEnable`, it does the following:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a users changes from InstallAndEnable to DoNotInstall?
Does CNO uninstall the operator? Or is the user expected to clean this up manually?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, once NOO has been installed, the CNO has finished its work, switching back to DoNotInstall do nothing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why it's called "DoNotInstall" rather than "Uninstall".

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be beneficial to document that for future reference


1. Check if Network Observability Operator (NOO) is installed. If yes, exit.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you intending to evaluate this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR #2925.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate please @stleerh?
I see wasNetworkObservabilityDeployed but that's checking the Status of the network object - but that doesn't seem the same as checking if NOO was installed or not.
What's the expected behaviour when a customer who previously installed NOO via the catalog, upgrades to an OCP version which now attempts to install it via CNO?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different things :

  • the CNO controller checks for an existing flowcollector CRD to detect any already installed NOO and to not overwrite it.
  • once the CNO controller has installed NOO, we want the user to be free to own the deployment and to be able to modify anything. This is where the status is used, once the controller has successfully installed NOO, it set the status and any later reconciliation will check for the status and close the reconciliation.

2. Install NOO using OLM's OperatorGroup and Subscription.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered making this fully managed by the cluster-network-operator instead of utilizing OLM?

If this is going to be considered a core component of OpenShift moving forward, it seems like it could be reasonable to try and move it away from being a layered product and thus installed by OLM so there is a tighter coupling between OCP version and NOO version.

If you take the OLM approach, will you be pinning the version of NOO to a particular version to prevent automatic upgrades? Are we going to support customers modifying the OLM resources created by the CNO? Would we support running a newer version of NOO on an older version of OCP (since OLM supports over-the-air style updates and upgrades?

There are some nuances when leveraging OLM as the deployment mechanism and I'd like to better understand the interactions here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want Network Observability to be there when you have Cluster Network Operator (CNO), but it can run independent of that in the upstream version.

We have one version of NOO that is backwards-compatible to all versions of OCP. We've been releasing a new X.Y version alongside the OCP version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to document how this pattern has been used before, i.e in CIO

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You call out OLM v0 constructs, but the implementation uses OLM v1.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what if someone installed NOO via OLM v0?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with OLM team yesterday we did the following change on the implementation :

  • to check if NOO is installed, it checks if the flowcollector CRD is already present, this way it works with OLM v0, v1 and Helm install (upstream deployment)
  • changed the installation to use the OLM v1 objects

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to avoid implementation details, so it will check if NOO is installed, regardless of whether it was done using OLM v0 or v1. Step 2 could have avoided implementation details by just saying "Install NOO".

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the user mutates the resources created by CNO (i.e ClusterExtension)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user own both the ClusterExtension and the Flowcollector.

If the user delete the ClusterExtension, NOO is removed and the CNO does not try to reinstall it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how CNO and NOO lifecycles are supposed to interact?
Maybe take a look over https://github.com/openshift/cluster-network-operator/blob/master/docs/operands.md#cluster-network-operator-operands and how it relates to the rollout of NOO.

Should NOO enter degraded state if NOO has failed to deploy for some reason?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new controller first install NOO, and then create the flowcollector object.

When the flowcollector object is created, NOO will install the network observability stack. If something fails during this phase NOO will enter degraded state.

But if something fails during NOO installation itself, it does not enter degraded state but the ClusterExtension does display some errors.

3. Wait for NOO to be ready.
4. Create the "netobserv" namespace if it doesn't exist.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s/netobserv/openshift-network-observability/g?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a discussion we've had in the past. I'll resurface this topic, and while this proposal might influence that decision, it has to be in agreement with what Network Observability does.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like I mentioned in another comment, netobserv itself doesn't enforce a target namespace, it simply has a default; the CNO can enforce a namespace that differs from the default. So, +1 to what dave says: let's use openshift-network-observability here.

5. Check if a FlowCollector instance exists. If yes, exit.
6. Create a FlowCollector instance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would CNO signal to the user when one of the steps fails?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be in the log file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about reporting this via status conditions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for a status condition

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that can be done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sentence was added about this in the last section.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who owns the FlowCollector. Is it CNO? or the user?
Can a user change this after install?
What happens if, in a later release, we decide to enable a different set of features by default?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FlowCollector is owned by the user.

The user can change any field or even delete it to disable NOO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network Observability Operator (NOO) manages FlowCollector. It is very likely the user will want to change attributes of their FlowCollector instance.

If the default features change, it will only affect future clusters that don't have Network Observability already enabled. We don't anticipate problems with different features on different clusters, since this is primarily a single cluster component. Worse case, we can make a change to update the feature set, but that is something we want to avoid.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize the discussion we had a few weeks ago, it was suggested that CNO deploys an empty FlowCollector, and that NOO is then responsible for establishing the default featureset, potentially writing that back to the CRD if it needs to. That way, CNO doesn't need to be aware of NOO defaults in future.


The Reconciler leverages the existing framework and reuses the concept of client, scheme, and manager. It provides a clear ownership by having a separate controller for it. If the Network CR changes, the Reconciler will repeat the above steps. Note it doesn’t monitor NOO or any of NOO's components for changes, and it doesn’t do any upgrades. That is still the responsibility of NOO.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it doesn’t do any upgrades

Networking component upgrade is managed via CNO, which, AIUI has pinned versions for every release.
I think it would make sense for NOO to pin a version for a given OpenShift release too, and have CNO edit the ClusterExtension to trigger the upgrade of NOO for consistency with how other components are updated.

In addition, my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

☝️ is a good reason to do so. Let's say we rollout OVN-K8s 4.1.1 and to observe a new feature you need NOO 24.1.1 (and maybe a new feature being toggled on in the FlowCollector). Managing this through CNO seems easier than asking the user to upgrade NOO from the catalog and edit the flow collector themselves.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an issue with NOO upgrading as needed for OVN upgrades. We can make that mostly transparent to the user.

In fact, I would say it would make things more complicated if you have to go through COO, because there might be other reasons why you want to upgrade Network Observability. And if you have two ways of upgrading, either through COO or NOO, that just make things more confusing. Having the tie-in plus still be separate and independent gives it the most flexibility.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paraphrasing from our previous call, NOO will always upgrade itself to the latest - I don't recall if that was the default, or user opt-in. CNO doesn't care, we just keep networking up-to-date.


### Repository: openshift/install

The openshift/install repository contains the source code for the **openshift-install** binary. This adds the same fields as in the Network CRD but under the existing `networking` section in the **install-config.yaml** file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer lets you pass arbitrary manifests (ie, other objects to create at install time), and that can include a network.operator.openshift.io object. For most cases of infrequent or non-trivial configuration, we have people do it that way (by creating additional manifests), not by plumbing the configuration all the way down to install-config.yaml. (And the installer team will push back against you adding unnecessary fields to install-config.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing it to be set in install-config.yaml is how it gets day 0 capability. It's the same mechanism that was used when you want the CNI to be OVN-Kubernetes back in the days when OpenShiftSDN was the default.

File: install-config.yaml

networking:
  networkType: OVNKubernetes
  # other networking config

We've been working with the installer team on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but for example, if you wanted to override the ovn-kubernetes join subnet, you can't do that via install-config. You have to do it by creating a manifest. Because most people aren't going to need to do that, so we don't expose it in install-config

Copy link
Copy Markdown
Contributor Author

@stleerh stleerh Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "join subnet" is a very specific configuration for OVN-Kubernetes and belongs in network.operator.openshift.io. This Network Observability flag is just saying what you want and not the details. If we decide later that we want to incorporate some NOO attributes, then it belongs in network.operator.openshift.io.


```yaml
apiVersion: v1
baseDomain: devcluster.openshift.com
networking:
networkObservability:
installationPolicy: InstallAndEnable | DoNotInstall
```

Listing 2: install-config.yaml

The `installationPolicy` value is passed on to CNO to set the field of the same name in the Network CRD. If this field is set to `InstallAndEnable` or doesn’t exist, it sets the Network CR’s `installationPolicy` field to `InstallAndEnable`. To *not* enable Network Observability, set it to `DoNotInstall`. This then sets the Network CR’s `installationPolicy` field to `DoNotInstall`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(CNO is not involved at install time. The installer would need to create the network.operator.openshift.io object itself based on the install-config.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.


### FlowCollector Custom Resource (CR)

Here is the FlowCollector Custom Resource (CR) that is instantiated.

```yaml
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
agent:
ebpf:
features:
- DNSTracking
sampling: 400
type: eBPF
deploymentModel: Service
loki:
enable: false
namespace: netobserv
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with network observability but does netobserv here mean in which namespace the FlowCollector resources will be installed? If yes, why not an openshift- prefixed namespace?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, the CRD default is "netobserv", but it could make more sense to use "openshift-netobserv" now that's baked into the installer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a discussion from many years ago (NETOBSERV-373). The original description was changed so if you look at the History link, it said:

The current project name for Network Observability is "network-observability".  The downstream version should be in a project that begins with "openshift-" that is reserved for OpenShift namespaces.  The suggested new name is "openshift-network-observability".

Nevertheless, it's a good point that should be reconsidered. However, this proposal would just follow whatever Network Observability decides to do.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was rather thinking that, on the contrary, what we do here can deviate from upstream

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is going to be an openshift-owned thing, deviating from upstream and using an openshift-* style namespace would be my recommendation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stleerh can you update the EP to use "openshift-netobserv" or "openshift-network-observability" here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the code merged and then this EP can be updated to match it.

```

Listing 3: FlowCollector configuration

Other eBPF features were considered, but the criteria was to avoid features that needed privilege mode and features that consumed significant resources.

Summary:

* Sampling at 400
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it depend on the in-cluster Prometheus stack? If yes how does it play with #1880?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. PM has said, "The default OpenShift experience should continue to include Prometheus, alerts, and baseline dashboards."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier I'm currently trying to understand how that plays with optional monitoring, but I don't think that's something tied to this EP (I mean, the question is equally relevant to when netobserv or any other redhat operator is installed from operatorhub?)

Copy link
Copy Markdown
Contributor

@jotak jotak Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that, when CMO is in lightweight mode like that, netobserv will show no data and potentially query failures;
Given that this is an explicit choice of the user, I don't think it's too problematic; it would make sense, in that case, that the user manually uninstalls netobserv.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rephrase my question: is it worth deploying netobserv if there's no platform monitoring stack? I understand that users create sub-optimal setups but it might be better to prevent it in the first place?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever option we pick (never install NOO at day-0 when monitoring is disabled or delegate the decision to the user), I'd rather see it documented somewhere including what it entails in terms of features/limitations (and this EP seems to be the right place).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be documented in Network Observability. This EP just enables or not enables Network Observability.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier what's the ETA for OptionalMonitoring ? We may want to iterate once the feature is live, but at the moment it isn't, so there's little we can do, right ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the doc aspects, I think it should be documented on both sides (netobserv & monitoring)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ETA yet for optional monitoring but what I don't want this to fall through the cracks. I'd encourage both of you @stleerh and @jotak to follow #1880

* No Loki
* No Kafka
* DNSTracking feature enabled

### Workflow Description

Network Observability is enabled by default on day 0 (planning stage). You don’t have to configure anything when using `openshift-install`, and Network Observability Operator will be installed and a FlowCollector custom resource (CR) will be created (Listing 3 above).

If you don’t want Network Observability enabled, first create the **install-config.yaml** file using the command below.

`$ openshift-install create install-config`

Then add the following as shown in Listing 4.

```
networking:
networkObservability:
installationPolicy: DoNotInstall
```

Listing 4: Don't enable Network Observability in install-config.yaml

Here's an alternate approach. Using your **install-config.yaml** file, you can create manifests from it and add the change there instead. To create the manifests, enter:

`$ openshift-install create manifests`

This creates a **manifests** directory. Of particular relevance in this directory is a file named **cluster-network-02-config.yml**, which is the Network CR. Under the `spec` section, add the following as shown in Listing 5.

```
spec:
networkObservability:
installationPolicy: DoNotInstall
```

Listing 5: Don't enable Network Observability in Network CR

Finally, to create the cluster, enter:

`$ openshift-install create cluster`

When you bring up the OpenShift web console, you should see that NOO is installed just as it would be if you had gone to **Ecosystem > Software Catalog** to install **Network Observability** from Red Hat (not the Community version). In **Installed Operators**, there should be a row for **Network Observability**. In the **Observe** menu, there should be a panel named **Network Traffic**.

The Technology Preview (TP) release will have a feature gate named `NetworkObservabilityInstall` that needs to be enabled. To enable this on day 0, enter:

```
$ openshift-install create install-config
$ openshift-install create manifests
```

Now create a file named **99-feature-gate.yml** in the **manifests** directory with the following:

```yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
name: cluster
spec:
featureSet: CustomNoUpgrade
customNoUpgrade:
enabled:
- NetworkObservabilityInstall
```

Then enter:

`$ openshift-install create cluster`

If you have a running cluster, you can update the feature gate by entering `oc edit featuregate` and make the changes shown above.

At General Availability (GA), the feature gate for this feature will be enabled by default, so you no longer need to modify the FeatureGate resource.

### API Extensions

See Listing 2 above for the changes to Network CRD.

### Topology Considerations
Comment thread
stleerh marked this conversation as resolved.

#### Hypershift / Hosted Control Planes

This proposal doesn't change how Network Observability works in a Hosted Control Plane (HCP) environment. Network Observability is supported on host clusters and the management cluster, therefore it will be enabled by default.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the cluster-network-operator equivalent in HyperShift work? For example, I know that the HyperShift Control Plane Operator has different mechanics than the cluster-authentication-operator for authentication/authorization related things.

Will there need to be any changes to HyperShift's controller behaviors to support this new functionality?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network Observability is already supported on HCP. Enabling NOO by default does nothing to change that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't answer my question. Are there any changes needed on the HyperShift side to support enabling this by default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No


#### Standalone Clusters

This proposal applies to standalone clusters.

#### Single-node Deployments or MicroShift

Due to resource constraints, Single Node OpenShift (SNO) is an exception and will not be enabled by default.

MicroShift is not supported since Network Observability and CNO are not supported on that platform.

#### OpenShift Kubernetes Engine

OpenShift Kubernetes Engine is supported.

### Implementation Details/Notes/Constraints

### Risks and Mitigations

* Network Observability requires CPU, memory, and storage that the customer might not be aware of. See the Test Plan section for the target goals.

**Mitigation:** The default setting stores only metrics at a high sampling interval to minimize the use of resources. If this isn’t sufficient, more fine-tuning and filtering can be done in the provided default configuration (e.g. filtering on specific interfaces only).

* Some of the Network Observability features aren’t enabled in order to use minimal resources. Therefore, users might not know about these features.

**Mitigation:** Determine what features, particularly related to troubleshooting, can be enabled with minimal CPU and memory impact. Mention other features in the panels.

### Drawbacks

Rather than actually installing NOO and creating the FlowCollector instance, it is less risky and simpler to just display a panel or a button to let the user install and enable Network Observability. This resolves the awareness issue. However, it goes against the principle that networking and network observability should always go hand-in-hand and be there from the start.

## Alternatives (Not Implemented)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was installation via the assisted-installer considered? https://github.com/openshift/assisted-service/blob/master/docs/dev/olm-operator-plugins.md

This seems like a viable option to mitigate the drawbacks around topologies and resource constraints.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a parallel work:
https://issues.redhat.com/browse/NETOBSERV-2486
openshift/assisted-service#8729

However, assisted installer doesn't cover all the installation cases. That's why we consider having an alternative.


### Alternative #1: Make NOO a core component of OpenShift

Rather than have CNO enable Network Observability, take the existing Network Observability Operator (NOO) and have it be installed by default in the cluster. There needs to be some logic to accept the values in openshift-install to decide whether NOO should be enabled or not.

The core components of OpenShift are operators like Cluster Network Operator (CNO) and Cluster Storage Operator (CSO). NOO is a much smaller component and should not reside at the top level.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the size of a component really dictates how it should be deployed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, I don't believe it belongs at this level.

Copy link
Copy Markdown
Contributor

@tssurya tssurya Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the size of a component really dictates how it should be deployed.

I agree with this. Note that we have something called CloudNetworkConfigController which is a core operator on cloud platforms and does very small things scope wise

My preference is to ship NOO as core.
Is my understanding correct that

There needs to be some logic to accept the values in openshift-install to decide whether NOO should be enabled or not

Is the only reason why this alternative was not considered? or am I missing something? - installer code change avoidance doesn't seem like a valid reason..

Is NetObserv completely tied to "OVN-Kubernetes" ? and is never going to expand beyond being an observability solution outside of the components owned by CNO?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the two main points.

  1. If you have CNO (basically you are running OpenShift), then you should have Network Observability. That's the basis for this feature and where it's linked to OVN-Kubernetes. In addition, my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

  2. Network Observability can also run with other container network interface (CNI). So no, it's not exclusively tied to OVN-Kubernetes, but it does support unique features of OVN-Kubernetes.

Copy link
Copy Markdown
Contributor

@tssurya tssurya Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the two main points.

  1. If you have CNO (basically you are running OpenShift), then you should have Network Observability.

again, why not make the same claim around "if you are running OpenShift, then you are running NOO" and hence you are running NetObserv, maybe I am missing that point still.

That's the basis for this feature and where it's linked to OVN-Kubernetes. In addition, my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

sure, that's still possible to do via NOO right? can't NOO adopt to OCP release cycle if its part of OCP? again apologies if I am not familiar with the NOO lifecycle

  1. Network Observability can also run with other container network interface (CNI). So no, it's not exclusively tied to OVN-Kubernetes, but it does support unique features of OVN-Kubernetes.

so does Network Observability work for all the 3rd party CNIs allowed on OCP? what happens when those CNIs are used instead of OVN-Kubernetes? example Cilium. Will NetObsev still be running by default and doing observ for cilium? what about Layer7 - do we plan to expand it there? or is netobserv also strictly restricted to l2/l3/l4 always?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time wrapping my head around why we want this to essentially be a core feature of OpenShift without coupling it to the OCP version by including it as part of the payload.

Maybe it helps to have a sync call with appropriate stakeholders to hash this out, but my biggest concern with the currently proposed approach is that you mention:

my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

I could be misunderstanding here, but if the intention is that we should be tying NOO versions to specific OCP versions due to dependence of features, continuing to use OLM as the deployment mechanism starts to make things more confusing for end users. They have to start understanding that they are responsible for ensuring that NOO is up-to-date and that they aren't accidentally installing a version that is incompatible with the version of OpenShift that they are running.

IIRC, there are some safeguards in OLM to help mitigate this but it still feels like we are putting the onus on the user to be more attentive to the platform requirements rather than us being responsible for making sure we are always installing a compatible version of NOO automatically and that it will always stay compatible without any user intervention.

Copy link
Copy Markdown
Contributor Author

@stleerh stleerh Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surya Seetharaman, Dave Tucker, Ben Bennett and I had a discussion on April 1 to hash this out. The conclusion is that we agreed to have CNO enable NOO, so the current proposal remains unchanged.

Just a few comments to the questions above:

  • The upstream Network Observability is not guaranteed to work with all CNIs, meaning we don't formally test against other CNIs.
  • Each new version of Network Observability is backwards-compatible to all supported OCP versions and adapts accordingly.

Copy link
Copy Markdown
Contributor

@tssurya tssurya Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that:

  1. I still prefer for NOO to be its own thing (v/s not have nested operators where CNO just installs it and then NOO does the rest where CNO owns NOO but has no say much or won't fix it if something goes wrong, CNO is just being used as a "click here thing" in this proposal)
  2. Reason we can't do that is because OCP as a whole (don't remember which set of stakeholders) are against adding new operators as told by @knobunc @stleerh

just wanted to put this point here that its not that I am happy with what is being done, but its that the choice I would rather go for doesn't exist? @stleerh I think this specific alternative should call out the limitation aspect that we are not allowed to add new operators to payload.


### Alternative #2: Have COO enable Network Observability

Instead of CNO enabling Network Observability, have Cluster Observability Operator (COO) do it instead. COO is becoming the operator and the central place for core observability components to be installed. In addition, it provides services like metrics, Perses for dashboards, and troubleshooting via Korrel8r (Observability Signal Correlation).

A critical issue is that COO is itself an optional operator, so it can’t enable Network Observability on day 0, because it has to be installed first. The central question is, "Is Network Observability part of OpenShift Networking or part of Cluster Observability?" The answer is the former. Component-based observability, such as Network Observability, is a layer on top of COO rather than a part of COO.

### Alternative #3: Have CVO enable Network Observability

Similar to alternative #1, this explicitly suggests having the Cluster Version Operator (CVO) enable Network Observability. CVO currently manages larger scope operators that represent core cluster functions, such as CNO and CSO, rather than specific operators like Network Observability.

## Test Plan

The test plan will consider the following:

- Different architectures
- Different size clusters
- Hosted Control Plane (HCP) environment
- e2e tests in [OpenShift Release Tooling](https://github.com/openshift/release)

Performance testing will be done to optimize the use of resources and to determine the specific FlowCollector settings, with the goal of using less than 5% resources (CPU and memory) and an ideal target of less than 3%, including external components that are affected.

## Graduation Criteria

### Dev Preview -> Tech Preview

Network Observability reached GA back in January 2023. Because the feature is to simply enable Network Observability, which has already existed for 3+ years, the plan is to forego the Dev Preview and go directly to Tech Preview.

### Tech Preview -> GA

There are many different customer scenarios and cluster profiles. The Tech Preview will allow us to gauge the customer responses and make optimizations to the FlowCollector configuration or even the Network CRD if necessary. To enable the feature gate for this feature, see the **Workflow Description** above.

Here are the GA requirements.

* [NETOBSERV-2533](https://issues.redhat.com/browse/NETOBSERV-2533) Performance testing in Loki-less mode with default settings
- Provide guidance on CPU, memory, and storage resources
- Measure the impact on Prometheus in the In-Cluster Monitoring
- Optimize the default FlowCollector configuration
* [NETOBSERV-2534](https://issues.redhat.com/browse/NETOBSERV-2534) Have a way to pause Network Observability functions
* [NETOBSERV-2535](https://issues.redhat.com/browse/NETOBSERV-2535) Security audit on Network Observability code
* [NETOBSERV-2428](https://issues.redhat.com/browse/NETOBSERV-2428) New Service deployment model
* User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/)

### Removing a deprecated feature

N/A

## Upgrade / Downgrade Strategy

The upgrade strategy is treated like any other feature. At Tech Preview, you will need to enable the feature gate for this feature. At GA, Network Observability will be enabled by default without additional user intervention.

On a downgrade, it will no longer enable Network Observability. If Network Observability Operator and/or FlowCollector exists, they will remain and will not be removed.

## Version Skew Strategy

There are no issues with version skew, since the logic to enable Network Observability only resides in CNO.

## Operational Aspects of API Extensions

N/A

## Support Procedures

Check the CNO logs and search for "observability\_controller.go" to determine whether Network Observability did or did not get enabled. This will also be reported in the Status conditions.