feat(chart): image normaliser config and egress NetworkPolicy#74
feat(chart): image normaliser config and egress NetworkPolicy#74pjb157 wants to merge 5 commits into
Conversation
Adds an opt-in egress NetworkPolicy template that restricts the control-layer pod's outbound traffic, and documents the existing ServiceAccount annotation hook for GKE Workload Identity. The NetworkPolicy allows broad public-internet egress (provider APIs, managed Postgres, GCS) but denies private / loopback / link-local / CGNAT / IPv6 ULA + LL ranges and operator-configured cluster CIDRs. It's the application-layer backstop for the in-process image fetcher's IP allow-list: even if a bug let a request through the deny-list inside the process, the CNI refuses the packet. `networkPolicy.allowKubeDns: true` (the default) emits an explicit allow rule for kube-dns, because the cluster-CIDR deny would otherwise break in-pod name resolution. `networkPolicy.enabled` defaults to `false` so the chart change is a no-op until an operator opts in. Operators MUST set `networkPolicy.clusterCidrs` to match their cluster's Pod and Service CIDRs. ServiceAccount annotation documentation now spells out the GKE Workload Identity binding shape for operators wiring the image normaliser to GCS. No template change for that; the existing `serviceAccount.annotations` passthrough already covers it.
There was a problem hiding this comment.
Summary
This PR introduces an egress NetworkPolicy template to restrict outbound traffic from the control-layer pod, and documents the GKE Workload Identity annotation pattern for ServiceAccount configuration. The NetworkPolicy allows public internet egress while blocking private ranges, loopback, link-local, CGNAT, and operator-configured cluster CIDRs — acting as a kernel-level backstop to the application's IP allow-list.
Verdict: Needs changes before merge — there is a significant security gap in the IPv6 handling that undermines the stated security goal.
Research notes
- Kubernetes NetworkPolicy documentation: Confirms that
ipBlockwithexceptworks as expected for excluding subnets. EachipBlockis evaluated independently — there's no cross-referencing between IPv4 and IPv6 rules. - GKE Workload Identity docs: The documented annotation approach (
iam.gke.io/gcp-service-account) remains valid, though Google now recommends federated identity via IAM principals as best practice. The documented roles (roles/storage.objectAdminandroles/iam.serviceAccountTokenCreator) are correct for GCS write access with V4 signed URLs. - Kubernetes 1.21+: The namespace selector using
kubernetes.io/metadata.namerequires Kubernetes 1.21 or later (this label was added in 1.21).
Suggested next steps
- Blocking: Add IPv6 cluster CIDR exclusion to match the IPv4 behavior (see inline comment on line 39)
- Non-blocking: Reconsider the default
clusterCidrsvalue —10.0.0.0/8is overly broad and duplicates an already-excluded RFC1918 range; operators should be required to explicitly set their actual Pod/Service CIDRs - Non-blocking: Document the Kubernetes version requirement (1.21+) for the kube-dns namespace selector, or use a more compatible selector pattern
General findings
- No test coverage: There are no unit tests in
tests/for the new NetworkPolicy template. Consider adding a test that validates the rendered policy structure whennetworkPolicy.enabled: true. - CNI compatibility note is good: The documentation correctly warns that a NetworkPolicy-aware CNI is required (Cilium, Calico, GKE Dataplane V2). This is helpful for operators.
- Pod selector alignment: The NetworkPolicy's
podSelectorcorrectly matches the deployment's selector labels (control-layer.selectorLabels+app.kubernetes.io/component: control-layer). This is correct and will target only control-layer pods.
| {{- range .Values.networkPolicy.clusterCidrs }} | ||
| - {{ . }} | ||
| {{- end }} | ||
| - ipBlock: |
There was a problem hiding this comment.
Blocking: IPv6 cluster CIDRs are not excluded, defeating the security purpose for IPv6 traffic.
Why it matters: The IPv4 rule (lines 22-38) excludes both standard private ranges AND operator-configured clusterCidrs to prevent in-cluster egress. However, the IPv6 rule (lines 39-47) only excludes loopback (::1/128), ULA (fc00::/7), and link-local (fe80::/10). If your cluster uses IPv6 Pod or Service CIDRs (e.g., fd00::/108 or similar), those would be allowed by the ::/0 rule because they're not in the except block. This creates an asymmetric security posture where IPv4 in-cluster egress is blocked but IPv6 in-cluster egress is permitted.
Suggested fix: Add a mechanism to exclude IPv6 cluster CIDRs. Options:
- Add a separate
networkPolicy.clusterCidrsIPv6value and range over it in the IPv6 except block - Support mixed IPv4/IPv6 entries in
clusterCidrsand use Helm logic to split them into the appropriate except blocks - Document that IPv6 clusters require manual modification of this template
| - to: | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: kube-system |
There was a problem hiding this comment.
Non-blocking: Namespace metadata label requires Kubernetes 1.21+.
Why it matters: The kubernetes.io/metadata.name label on namespaces was added in Kubernetes 1.21. Clusters running older versions won't have this label, causing the DNS allow rule to fail to match and breaking name resolution inside the pod. Additionally, some distributions may use different labels for their DNS service (not k8s-app: kube-dns).
Suggested fix: Either:
- Add a note in the documentation specifying Kubernetes 1.21+ requirement for this feature
- Use a more compatible namespace selector (e.g., match on namespace name directly if supported by CNI)
- Add a configurable label override for clusters with non-standard DNS deployments
| kubernetes.io/metadata.name: kube-system | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: kube-dns |
There was a problem hiding this comment.
Non-blocking: DNS pod label assumption may not match all clusters.
Why it matters: While k8s-app: kube-dns is common, some Kubernetes distributions use different labels for CoreDNS. For example, some use k8s-app: coredns or other variations. If the label doesn't match, DNS queries will be blocked even with allowKubeDns: true.
Suggested fix: Consider making the DNS pod selector configurable via values, or document that operators may need to adjust this label based on their cluster's DNS deployment.
| # Cluster pod/service CIDRs to deny egress to. Override per cluster. | ||
| # Typical GKE defaults are 10.0.0.0/8 covering both. | ||
| clusterCidrs: | ||
| - 10.0.0.0/8 |
There was a problem hiding this comment.
Non-blocking: Default clusterCidrs value is misleading and partially redundant.
Why it matters: The comment states "Typical GKE defaults are 10.0.0.0/8 covering both" (Pod and Service CIDRs). However:
10.0.0.0/8is already excluded by the explicit RFC1918 rule at line 26, making this default redundant for IPv4- More critically, it gives operators a false sense of correctness — if their actual Pod CIDR is
172.16.0.0/12or192.168.0.0/16, in-cluster egress would still be allowed because those ranges are also RFC1918 and already excluded - For IPv6 clusters, this default provides zero protection since it's an IPv4 CIDR
Suggested fix: Change the default to an empty array [] with a comment requiring operators to explicitly set their cluster's Pod and Service CIDRs. This makes the configuration intent clearer and prevents accidental misconfiguration.
| # even if a bug in the in-process IP allow-list let a request through, | ||
| # the kernel-level network policy refuses the packet. | ||
| # | ||
| # Operators MUST set `clusterCidrs` for their cluster — typically the |
There was a problem hiding this comment.
Nit: Character encoding artifact in comment.
Why it matters: The em-dash character appears to have an encoding issue (shows as M-bM-^@M-^T in raw diff output). While this renders correctly in most editors, it's worth ensuring consistent UTF-8 encoding.
Suggested fix: Replace with a standard ASCII dash or verify file encoding is UTF-8.
Adds a brief note alongside the `serviceAccount.annotations` block explaining that dwctl does not garbage-collect normaliser objects — operators configure a GCS Object Lifecycle Management rule on the bucket itself (Terraform / gcloud) to delete objects after enough days to outlive the longest possible batch dispatch window. Suggested default: 7 days (24h batch completion window + 6d investigation buffer). Operators tighten or extend per their own retention policies. No behavioural change.
There was a problem hiding this comment.
Summary
This PR introduces an opt-in egress NetworkPolicy for the control-layer pod and documents the GKE Workload Identity wiring for the image normaliser service account. The NetworkPolicy acts as a CNI-level backstop to the application's IP allow-list, blocking egress to private ranges (RFC1918, loopback, link-local, CGNAT), IPv6 ULA/LL, and operator-configured cluster CIDRs while permitting public internet egress.
Verdict: Approved with one non-blocking clarification recommended.
The changes are well-documented, follow Helm best practices, and align with Kubernetes NetworkPolicy semantics. The default-disabled behaviour ensures backward compatibility, and the allowKubeDns rule correctly preserves DNS resolution.
Research notes
- Kubernetes NetworkPolicy documentation: Confirmed that
ipBlockwithexceptentries is the correct mechanism for excluding specific CIDR ranges from allowed egress. The policy only affects pods with matching labels; other pods are unaffected. - CoreDNS/kube-dns labels: The kube-dns selector (
kubernetes.io/metadata.name: kube-system+k8s-app: kube-dns) matches the standard CoreDNS deployment in most clusters. However, some distributions (e.g. EKS withkube-dns, RKE2 withcoredns) may use different labels likek8s-app: corednsoreks.amazonaws.com/component: kube-dns. - GKE Workload Identity: The documented annotations (
iam.gke.io/gcp-service-account) and required IAM roles (roles/storage.objectAdmin,roles/iam.serviceAccountTokenCreator) are correct per Google Cloud documentation.
Suggested next steps
- (Non-blocking) Consider documenting alternative DNS selector labels in comments or README for clusters not using the
k8s-app: kube-dnslabel pattern (e.g., EKS, RKE2, or clusters using NodeLocal DNSCache). - Operators enabling this policy should validate their cluster's actual Pod and Service CIDRs before setting
networkPolicy.clusterCidrs— the default10.0.0.0/8covers typical GKE clusters but may be too broad or narrow for other environments.
General findings
No blocking issues identified. The implementation is sound:
- The NetworkPolicy correctly targets only control-layer pods via the combined selector labels.
- IPv4 and IPv6 private ranges are comprehensively covered.
- The
allowKubeDnsconditional defaults totrue, preventing accidental DNS breakage. - Documentation in both
values.yamlcommits clearly explains the security model and operational requirements.
General findings (auto-demoted from inline due to pre-validation)
- Non-blocking
values.yaml:369— The comment says "Typical GKE defaults are 10.0.0.0/8 covering both" (Pod and Service CIDRs). This is accurate for many GKE clusters, but operators should verify their actual CIDRs before enabling.- (demoted: code self-check failed at values.yaml:369: diff has
# Typical GKE defaults are 10.0.0.0/8 covering both., model claimed# Cluster pod/service CIDRs to deny egress to. Override per cluster.)
- (demoted: code self-check failed at values.yaml:369: diff has
| matchLabels: | ||
| kubernetes.io/metadata.name: kube-system | ||
| podSelector: | ||
| matchLabels: |
There was a problem hiding this comment.
Non-blocking: The kube-dns selector uses k8s-app: kube-dns, which matches the standard CoreDNS deployment in many clusters. However, some managed Kubernetes providers use different labels:
- EKS:
k8s-app: kube-dns(correct) but also addseks.amazonaws.com/component: kube-dns - RKE2: Uses
k8s-app: coredns(lowercase) - NodeLocal DNSCache: Creates a DaemonSet with different selectors
Why it matters: If an operator enables this NetworkPolicy on a cluster where the DNS pods don't match k8s-app: kube-dns, all outbound DNS queries will fail, breaking name resolution for the control-layer pod.
Suggested fix: Consider either:
- Adding a note in the values.yaml documentation to check
kubectl get pods -n kube-system --show-labelsand adjust the template if needed - Or making the DNS selector configurable via
.Values.networkPolicy.dnsSelectorfor clusters with non-standard DNS deployments
Alternatively, document this as a known requirement: "Operators must verify their cluster's DNS pod labels match k8s-app: kube-dns or modify the template accordingly."
The previous default `clusterCidrs: [10.0.0.0/8]` was redundant with the hardcoded RFC1918 exception already in the NetworkPolicy template, producing a duplicate `- 10.0.0.0/8` line in the rendered manifest and (more importantly) suggesting to operators that they MUST set this even on standard GKE clusters. They do not — the hardcoded exception covers RFC1918 ranges (Pod CIDR + Service CIDR for any typical GKE cluster). `clusterCidrs` is only meaningful for clusters whose Pod or Service CIDRs fall OUTSIDE RFC1918 / CGNAT / IPv6 ULA+LL (e.g. dual-stack with public IPv6, custom CIDR plans). Defaulted to an empty list and the comment now explains exactly when to set it. `helm lint .` clean; the rendered policy contains no duplicate CIDR entries with the default.
There was a problem hiding this comment.
Summary
This PR adds an egress NetworkPolicy template for the control-layer pod and enhances documentation around ServiceAccount annotations (GKE Workload Identity) and GCS bucket lifecycle management. The NetworkPolicy is a security hardening feature that blocks egress to private/loopback/link-local ranges while allowing public internet access, serving as a kernel-level backstop to the application's IP allow-list.
The changes are well-documented with clear comments explaining when and how to use each feature. The follow-up commit fixing the clusterCidrs default from [10.0.0.0/8] to [] correctly addresses potential duplicate CIDR entries in rendered manifests.
Verdict: Approved with one non-blocking finding regarding DNS label assumptions that operators should be aware of.
Research notes
- Kubernetes NetworkPolicy documentation confirms that when only
Egressis specified inpolicyTypes, ingress remains unrestricted (which is the intended behavior here). TheipBlock.cidrwithexceptarrays work as expected for blocking specific ranges. - kube-dns labeling: The default CoreDNS deployment in vanilla Kubernetes uses
k8s-app: kube-dnsin thekube-systemnamespace. However, some distributions differ:- OpenShift uses
dns.operator.openshift.io/daemonset-dns=defaultin theopenshift-dnsnamespace - Some managed clusters may use different labels or namespaces
- OpenShift uses
- GKE Workload Identity: The documented annotation shape (
iam.gke.io/gcp-service-account) and required IAM roles (roles/storage.objectAdminon bucket,roles/iam.serviceAccountTokenCreatoron the GCP service account itself) align with Google's official documentation for V4 signed URL workflows.
Suggested next steps
- Non-blocking: Consider documenting alternative DNS selector patterns for non-standard Kubernetes distributions (OpenShift, EKS with custom DNS, etc.) in the values.yaml comments or chart README.
- Operators enabling this NetworkPolicy should verify their cluster's DNS pod labels match
k8s-app: kube-dnsin namespacekube-system, or adjust the template accordingly.
General findings
- The NetworkPolicy selector correctly matches the control-layer deployment pods (uses
control-layer.selectorLabels+app.kubernetes.io/component: control-layer, which aligns with the pod template labels in deployment.yaml). - The hardcoded RFC1918 exceptions (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) plus loopback, link-local, and CGNAT ranges provide comprehensive coverage for standard cluster configurations.
- The IPv6 rules appropriately block ULA (fc00::/7) and link-local (fe80::/10) while allowing global unicast.
- Default
enabled: falseensures this is opt-in and doesn't break existing deployments. - The comment about requiring a NetworkPolicy-aware CNI (Cilium, Calico, GKE Dataplane V2) is accurate — vanilla kubenet doesn't enforce NetworkPolicy.
| kubernetes.io/metadata.name: kube-system | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: kube-dns |
There was a problem hiding this comment.
Non-blocking: This NetworkPolicy assumes your cluster's DNS pods use the label k8s-app: kube-dns in the kube-system namespace, which is correct for vanilla Kubernetes and GKE with default CoreDNS. However, some distributions differ:
Why it matters: If your cluster uses a different DNS deployment (e.g., OpenShift uses dns.operator.openshift.io/daemonset-dns=default in openshift-dns namespace), enabling this NetworkPolicy without adjustment will break in-pod DNS resolution, preventing all outbound traffic that requires name lookup.
Suggested fix: Add a note in the values.yaml comment block (near line 380) alerting operators to verify their DNS pod labels before enabling, or consider making the DNS selector configurable:
# In values.yaml
dnsSelector:
namespace: kube-system
podLabels:
k8s-app: kube-dnsThen template it dynamically. For now, at minimum document this assumption for operators.
The egress NetworkPolicy's kube-dns allow rule selects on `kube-system / k8s-app=kube-dns`, which matches vanilla Kubernetes and CoreDNS on most distributions but NOT OpenShift's `openshift-dns / dns.operator.openshift.io/daemonset-dns=default` or some managed offerings. Adds an inline note so operators enabling NetworkPolicy on a non-standard distribution don't quietly break in-pod DNS resolution. Suggests two ways out: disable the rule and add an equivalent NetworkPolicy out-of-band, or fork with the right selector. No behavioural change; documentation only.
There was a problem hiding this comment.
Summary
This PR introduces an opt-in egress NetworkPolicy for the control-layer pod, along with documentation updates for the image normaliser configuration and DNS selector limitations. The core security feature (egress restriction as a backstop for the application-layer IP allow-list) is well-designed and follows defense-in-depth principles.
Verdict: Needs changes before merge — there is a critical issue that would break Workload Identity on GKE Dataplane V2 clusters.
Research notes
-
Fetched Kubernetes Network Policies documentation — confirms
ipBlockwithexceptsemantics and CNI-dependent behavior. -
Fetched GKE Network Policy documentation — specifically the section "Network policy and Workload Identity Federation for GKE" which states:
For clusters running GKE Dataplane V2, allow egress to
169.254.169.254/32on ports80and8080.The current template blocks the entire
169.254.0.0/16link-local range, which includes this metadata server IP. Without access to the metadata server, Workload Identity cannot exchange tokens for GCP service accounts, breaking GCS writes from the image normaliser.
Suggested next steps
- Blocking: Add an explicit allow rule for the GKE metadata server (
169.254.169.254/32on ports 80/8080) when Workload Identity is in use, or narrow the link-local exception to exclude this IP. This is required for any GKE cluster using Workload Identity with NetworkPolicy enabled. - Non-blocking: Add helm-unittest test cases for the new
networkpolicy.yamltemplate covering enabled/disabled scenarios,allowKubeDnstoggle, andclusterCidrsrendering.
General findings
- The DNS selector limitation for non-standard distributions (OpenShift, managed offerings) is well-documented in
values.yaml:383-391. Operators are appropriately warned. - The
clusterCidrsdefault change to empty list is correct — RFC1918 ranges are already covered by the hardcoded exceptions, avoiding duplicate CIDR entries. - The ServiceAccount annotation documentation for Workload Identity (
values.yaml:18-34) correctly describes the required IAM roles (storage.objectAdminandiam.serviceAccountTokenCreator).
| - 192.168.0.0/16 | ||
| # Loopback | ||
| - 127.0.0.0/8 | ||
| # Link-local — covers GCE / AWS metadata at 169.254.169.254 |
There was a problem hiding this comment.
Blocking: This comment is misleading and the rule breaks Workload Identity on GKE Dataplane V2.
Why it matters: The 169.254.0.0/16 exception denies egress to this range (it's in the except list under 0.0.0.0/0). However, GKE Dataplane V2 requires pods using Workload Identity to reach the metadata server at 169.254.169.254 on ports 80/8080 for token exchange. The GKE documentation explicitly states:
For clusters running GKE Dataplane V2, allow egress to
169.254.169.254/32on ports80and8080.
Blocking this IP prevents the image normaliser from obtaining GCP service account credentials, breaking GCS writes entirely.
Suggested fix: Either (a) add a separate allow rule for the metadata server before the broad deny, or (b) carve out 169.254.169.254/32 from the link-local exception:
# Allow GKE metadata server for Workload Identity
- to:
- ipBlock:
cidr: 169.254.169.254/32
ports:
- protocol: TCP
port: 80
- protocol: TCP
port: 8080Alternatively, make this rule conditional on a new allowGcpMetadata value that defaults to true for GKE compatibility.
| # Loopback | ||
| - 127.0.0.0/8 | ||
| # Link-local — covers GCE / AWS metadata at 169.254.169.254 | ||
| - 169.254.0.0/16 |
There was a problem hiding this comment.
Blocking: This CIDR exception denies egress to the GKE metadata server required for Workload Identity.
Why it matters: Combined with line 31's comment, this rule blocks 169.254.169.254/32 which is the GKE metadata server endpoint. Per GKE documentation, Workload Identity on Dataplane V2 clusters requires egress to this IP on ports 80/8080. Without it, the pod cannot obtain GCP service account tokens, breaking GCS object writes from the image normaliser.
Suggested fix: Narrow this exception to exclude the metadata server IP, or add a separate allow rule for 169.254.169.254/32 on ports 80/8080 before this deny rule takes effect.
…cluster Postgres Enabling networkPolicy while `postgresql.enabled: true` blocks the control-layer pod from reaching the in-cluster Postgres StatefulSet (its Pod IP / ClusterIP are RFC1918, which the policy denies), causing immediate startup connection failures. The feature is opt-in and aimed at deployments using external managed Postgres (Neon) + public-internet provider APIs, but the conflict wasn't documented. Added a prominent warning so operators don't enable it on an internal-Postgres deployment and get a confusing startup failure. No behavioural change.
There was a problem hiding this comment.
Summary
This PR introduces an opt-in egress NetworkPolicy for the control-layer pod, along with comprehensive documentation for GKE Workload Identity wiring and important operational warnings. The changes are well-structured and security-focused:
- New NetworkPolicy template (
templates/networkpolicy.yaml) - Provides kernel-level egress restrictions as a backstop to the application-layer IP allow-list - Documentation updates in
values.yaml- Covers Workload Identity annotation format, bucket lifecycle management, and critical warnings about in-cluster Postgres incompatibility - Default configuration -
clusterCidrsdefaulted to empty list with clear guidance on when to set it
Verdict: Ready to approve with one recommended addition (test coverage for the new NetworkPolicy template).
Research notes
- Kubernetes NetworkPolicy documentation: Confirmed the
ipBlockstructure withcidrandexceptarrays is valid fornetworking.k8s.io/v1. The policy structure in this PR follows the standard pattern. - GKE Workload Identity documentation: The documented annotation
iam.gke.io/gcp-service-accountand required IAM roles (roles/storage.objectAdminon the bucket,roles/iam.serviceAccountTokenCreatoron the service account) align with Google Cloud's official guidance for federated identity access to GCS. - RFC1918 ranges: The hardcoded exceptions (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) correctly cover all private IPv4 space. The warning about in-cluster Postgres incompatibility is accurate — ClusterIP services receive RFC1918 addresses that would be denied by this policy.
Suggested next steps
- Add NetworkPolicy test coverage (Recommended) — Create a test file in
tests/networkpolicy_test.yamlto verify the template renders correctly when enabled/disabled and validates the kube-dns rule behavior. - Consider adding a helm-test or integration test that explicitly validates the incompatibility scenario (networkPolicy.enabled + postgresql.enabled) fails fast with a clear error message.
General findings
Test coverage gap: The new templates/networkpolicy.yaml has no corresponding test file in the tests/ directory. Given this is a security-critical control (the "backstop" for the image fetcher's IP allow-list), having assertions that verify:
- The policy is not rendered when
networkPolicy.enabled: false(default) - The policy renders with correct CIDR exceptions when enabled
- The kube-dns rule is present/absent based on
allowKubeDns
would provide confidence that future changes don't accidentally break the security boundary.
Overall, this is a solid security enhancement with excellent operational documentation. The warnings about in-cluster Postgres incompatibility and non-standard DNS selectors are particularly valuable — they prevent confusing startup failures that would otherwise require debugging.
| - to: | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: kube-system |
There was a problem hiding this comment.
Non-blocking: The kube-dns selector uses kubernetes.io/metadata.name: kube-system which requires Kubernetes v1.16+ (when this label was auto-added to all namespaces). For clusters running older versions, this rule won't match and DNS will break.
Why it matters: The comment mentions OpenShift and managed offerings using different labels, but doesn't note the minimum K8s version requirement. Most production clusters are well past v1.16, but it's worth noting for users on legacy distributions.
Suggested fix: Add a brief note in the comment above line 48 mentioning the v1.16+ requirement, or consider using the older namespaceSelector with matchNames: ["kube-system"] if backward compatibility is needed.
| # even if a bug in the in-process IP allow-list let a request through, | ||
| # the kernel-level network policy refuses the packet. | ||
| # | ||
| # ⚠️ INCOMPATIBLE WITH IN-CLUSTER POSTGRES. Because the policy denies |
There was a problem hiding this comment.
Non-blocking: The warning is prominent and accurate, but operators who miss this inline comment will hit a confusing startup failure. Consider adding a Helm template validation that fails fast with a clear error message when both are enabled.
Why it matters: A user enabling networkPolicy.enabled: true without reading the full comment block will get connection timeouts to Postgres rather than a clear "this configuration is invalid" error. Helm chart best practices often include validation logic in _helpers.tpl for incompatible value combinations.
Suggested fix: Add a fail condition in a template (e.g., in _helpers.tpl or as a ConfigMap that always renders) like:
{{- if and .Values.networkPolicy.enabled .Values.postgresql.enabled }}
{{- fail "networkPolicy.enabled and postgresql.enabled are mutually exclusive. Use external Postgres (e.g. Neon) with networkPolicy, or disable networkPolicy for in-cluster Postgres." }}
{{- end }}This turns a runtime debugging session into an immediate helm install failure.
| # Carrier-grade NAT (RFC 6598) | ||
| - 100.64.0.0/10 | ||
| # In-cluster Pod / Service CIDRs (operator-configured) | ||
| {{- range .Values.networkPolicy.clusterCidrs }} |
There was a problem hiding this comment.
Nit: Minor YAML indentation inconsistency — the ranged CIDR entries (line 37) are indented with 14 spaces, matching the static RFC1918 entries above, which is correct. However, the closing {{- end }} on line 38 has no trailing content after it, so the next line (39) starts a new - ipBlock: entry. This is fine, but worth double-checking that the rendered YAML has proper structure.
The template looks correct as-is; this is just a note to verify the output once with helm template . before merging.
Summary
Adds an opt-in egress NetworkPolicy for the control-layer pod and documents the existing ServiceAccount annotation hook for GKE Workload Identity.
Changes
templates/networkpolicy.yaml(new) — egress deny-list policy. Allows broad public-internet egress (provider APIs, managed Postgres, GCS) but denies private / loopback / link-local / CGNAT / IPv6 ULA + LL ranges and operator-configured cluster CIDRs. Explicitkube-dnsallow so the cluster-CIDR deny doesn't break in-pod name resolution.values.yaml— new top-levelnetworkPolicyblock (defaulted off). New documentation inserviceAccount.annotationsshowing the WI binding shape for operators wiring the image normaliser to GCS.Why
This is the application-layer backstop for the in-process image fetcher's IP allow-list. Even if a bug let a request through the deny-list inside the process, the CNI refuses the packet.
Requires a NetworkPolicy-aware CNI (Cilium, Calico, GKE Dataplane V2).
Compatibility
networkPolicy.enabled: falseby default — no behavioural change for existing deploymentsnetworkPolicy.clusterCidrswhen enabling, to match their cluster's Pod and Service CIDRsnetworkPolicy.allowKubeDns: trueby default; only disable if the operator knows what they're doingTest plan