Skip to content

feat(chart): image normaliser config and egress NetworkPolicy#74

Draft
pjb157 wants to merge 5 commits into
mainfrom
feat/image-normalizer-and-egress-policy
Draft

feat(chart): image normaliser config and egress NetworkPolicy#74
pjb157 wants to merge 5 commits into
mainfrom
feat/image-normalizer-and-egress-policy

Conversation

@pjb157

@pjb157 pjb157 commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in egress NetworkPolicy for the control-layer pod and documents the existing ServiceAccount annotation hook for GKE Workload Identity.

Changes

  • templates/networkpolicy.yaml (new) — egress deny-list policy. Allows broad public-internet egress (provider APIs, managed Postgres, GCS) but denies private / loopback / link-local / CGNAT / IPv6 ULA + LL ranges and operator-configured cluster CIDRs. Explicit kube-dns allow so the cluster-CIDR deny doesn't break in-pod name resolution.
  • values.yaml — new top-level networkPolicy block (defaulted off). New documentation in serviceAccount.annotations showing the WI binding shape for operators wiring the image normaliser to GCS.

Why

This is the application-layer backstop for the in-process image fetcher's IP allow-list. Even if a bug let a request through the deny-list inside the process, the CNI refuses the packet.

Requires a NetworkPolicy-aware CNI (Cilium, Calico, GKE Dataplane V2).

Compatibility

  • networkPolicy.enabled: false by default — no behavioural change for existing deployments
  • Operators MUST set networkPolicy.clusterCidrs when enabling, to match their cluster's Pod and Service CIDRs
  • networkPolicy.allowKubeDns: true by default; only disable if the operator knows what they're doing

Test plan

  • `helm lint .` clean
  • `helm template . --set networkPolicy.enabled=true --set 'networkPolicy.clusterCidrs={10.0.0.0/8,10.96.0.0/16}'` renders with expected CIDRs
  • `helm template . --set networkPolicy.enabled=false` omits the policy
  • CI lint

Adds an opt-in egress NetworkPolicy template that restricts the
control-layer pod's outbound traffic, and documents the existing
ServiceAccount annotation hook for GKE Workload Identity.

The NetworkPolicy allows broad public-internet egress (provider APIs,
managed Postgres, GCS) but denies private / loopback / link-local /
CGNAT / IPv6 ULA + LL ranges and operator-configured cluster CIDRs.
It's the application-layer backstop for the in-process image fetcher's
IP allow-list: even if a bug let a request through the deny-list
inside the process, the CNI refuses the packet.

`networkPolicy.allowKubeDns: true` (the default) emits an explicit
allow rule for kube-dns, because the cluster-CIDR deny would
otherwise break in-pod name resolution.

`networkPolicy.enabled` defaults to `false` so the chart change is a
no-op until an operator opts in. Operators MUST set
`networkPolicy.clusterCidrs` to match their cluster's Pod and
Service CIDRs.

ServiceAccount annotation documentation now spells out the GKE
Workload Identity binding shape for operators wiring the image
normaliser to GCS. No template change for that; the existing
`serviceAccount.annotations` passthrough already covers it.

@doubleword-code doubleword-code Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces an egress NetworkPolicy template to restrict outbound traffic from the control-layer pod, and documents the GKE Workload Identity annotation pattern for ServiceAccount configuration. The NetworkPolicy allows public internet egress while blocking private ranges, loopback, link-local, CGNAT, and operator-configured cluster CIDRs — acting as a kernel-level backstop to the application's IP allow-list.

Verdict: Needs changes before merge — there is a significant security gap in the IPv6 handling that undermines the stated security goal.

Research notes

  • Kubernetes NetworkPolicy documentation: Confirms that ipBlock with except works as expected for excluding subnets. Each ipBlock is evaluated independently — there's no cross-referencing between IPv4 and IPv6 rules.
  • GKE Workload Identity docs: The documented annotation approach (iam.gke.io/gcp-service-account) remains valid, though Google now recommends federated identity via IAM principals as best practice. The documented roles (roles/storage.objectAdmin and roles/iam.serviceAccountTokenCreator) are correct for GCS write access with V4 signed URLs.
  • Kubernetes 1.21+: The namespace selector using kubernetes.io/metadata.name requires Kubernetes 1.21 or later (this label was added in 1.21).

Suggested next steps

  1. Blocking: Add IPv6 cluster CIDR exclusion to match the IPv4 behavior (see inline comment on line 39)
  2. Non-blocking: Reconsider the default clusterCidrs value — 10.0.0.0/8 is overly broad and duplicates an already-excluded RFC1918 range; operators should be required to explicitly set their actual Pod/Service CIDRs
  3. Non-blocking: Document the Kubernetes version requirement (1.21+) for the kube-dns namespace selector, or use a more compatible selector pattern

General findings

  • No test coverage: There are no unit tests in tests/ for the new NetworkPolicy template. Consider adding a test that validates the rendered policy structure when networkPolicy.enabled: true.
  • CNI compatibility note is good: The documentation correctly warns that a NetworkPolicy-aware CNI is required (Cilium, Calico, GKE Dataplane V2). This is helpful for operators.
  • Pod selector alignment: The NetworkPolicy's podSelector correctly matches the deployment's selector labels (control-layer.selectorLabels + app.kubernetes.io/component: control-layer). This is correct and will target only control-layer pods.

{{- range .Values.networkPolicy.clusterCidrs }}
- {{ . }}
{{- end }}
- ipBlock:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: IPv6 cluster CIDRs are not excluded, defeating the security purpose for IPv6 traffic.

Why it matters: The IPv4 rule (lines 22-38) excludes both standard private ranges AND operator-configured clusterCidrs to prevent in-cluster egress. However, the IPv6 rule (lines 39-47) only excludes loopback (::1/128), ULA (fc00::/7), and link-local (fe80::/10). If your cluster uses IPv6 Pod or Service CIDRs (e.g., fd00::/108 or similar), those would be allowed by the ::/0 rule because they're not in the except block. This creates an asymmetric security posture where IPv4 in-cluster egress is blocked but IPv6 in-cluster egress is permitted.

Suggested fix: Add a mechanism to exclude IPv6 cluster CIDRs. Options:

  1. Add a separate networkPolicy.clusterCidrsIPv6 value and range over it in the IPv6 except block
  2. Support mixed IPv4/IPv6 entries in clusterCidrs and use Helm logic to split them into the appropriate except blocks
  3. Document that IPv6 clusters require manual modification of this template

- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: Namespace metadata label requires Kubernetes 1.21+.

Why it matters: The kubernetes.io/metadata.name label on namespaces was added in Kubernetes 1.21. Clusters running older versions won't have this label, causing the DNS allow rule to fail to match and breaking name resolution inside the pod. Additionally, some distributions may use different labels for their DNS service (not k8s-app: kube-dns).

Suggested fix: Either:

  1. Add a note in the documentation specifying Kubernetes 1.21+ requirement for this feature
  2. Use a more compatible namespace selector (e.g., match on namespace name directly if supported by CNI)
  3. Add a configurable label override for clusters with non-standard DNS deployments

kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: DNS pod label assumption may not match all clusters.

Why it matters: While k8s-app: kube-dns is common, some Kubernetes distributions use different labels for CoreDNS. For example, some use k8s-app: coredns or other variations. If the label doesn't match, DNS queries will be blocked even with allowKubeDns: true.

Suggested fix: Consider making the DNS pod selector configurable via values, or document that operators may need to adjust this label based on their cluster's DNS deployment.

Comment thread values.yaml Outdated
# Cluster pod/service CIDRs to deny egress to. Override per cluster.
# Typical GKE defaults are 10.0.0.0/8 covering both.
clusterCidrs:
- 10.0.0.0/8

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: Default clusterCidrs value is misleading and partially redundant.

Why it matters: The comment states "Typical GKE defaults are 10.0.0.0/8 covering both" (Pod and Service CIDRs). However:

  1. 10.0.0.0/8 is already excluded by the explicit RFC1918 rule at line 26, making this default redundant for IPv4
  2. More critically, it gives operators a false sense of correctness — if their actual Pod CIDR is 172.16.0.0/12 or 192.168.0.0/16, in-cluster egress would still be allowed because those ranges are also RFC1918 and already excluded
  3. For IPv6 clusters, this default provides zero protection since it's an IPv4 CIDR

Suggested fix: Change the default to an empty array [] with a comment requiring operators to explicitly set their cluster's Pod and Service CIDRs. This makes the configuration intent clearer and prevents accidental misconfiguration.

Comment thread values.yaml Outdated
# even if a bug in the in-process IP allow-list let a request through,
# the kernel-level network policy refuses the packet.
#
# Operators MUST set `clusterCidrs` for their cluster — typically the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Character encoding artifact in comment.

Why it matters: The em-dash character appears to have an encoding issue (shows as M-bM-^@M-^T in raw diff output). While this renders correctly in most editors, it's worth ensuring consistent UTF-8 encoding.

Suggested fix: Replace with a standard ASCII dash or verify file encoding is UTF-8.

Adds a brief note alongside the `serviceAccount.annotations` block
explaining that dwctl does not garbage-collect normaliser objects —
operators configure a GCS Object Lifecycle Management rule on the
bucket itself (Terraform / gcloud) to delete objects after enough
days to outlive the longest possible batch dispatch window.

Suggested default: 7 days (24h batch completion window + 6d
investigation buffer). Operators tighten or extend per their own
retention policies.

No behavioural change.

@doubleword-code doubleword-code Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces an opt-in egress NetworkPolicy for the control-layer pod and documents the GKE Workload Identity wiring for the image normaliser service account. The NetworkPolicy acts as a CNI-level backstop to the application's IP allow-list, blocking egress to private ranges (RFC1918, loopback, link-local, CGNAT), IPv6 ULA/LL, and operator-configured cluster CIDRs while permitting public internet egress.

Verdict: Approved with one non-blocking clarification recommended.

The changes are well-documented, follow Helm best practices, and align with Kubernetes NetworkPolicy semantics. The default-disabled behaviour ensures backward compatibility, and the allowKubeDns rule correctly preserves DNS resolution.

Research notes

  • Kubernetes NetworkPolicy documentation: Confirmed that ipBlock with except entries is the correct mechanism for excluding specific CIDR ranges from allowed egress. The policy only affects pods with matching labels; other pods are unaffected.
  • CoreDNS/kube-dns labels: The kube-dns selector (kubernetes.io/metadata.name: kube-system + k8s-app: kube-dns) matches the standard CoreDNS deployment in most clusters. However, some distributions (e.g. EKS with kube-dns, RKE2 with coredns) may use different labels like k8s-app: coredns or eks.amazonaws.com/component: kube-dns.
  • GKE Workload Identity: The documented annotations (iam.gke.io/gcp-service-account) and required IAM roles (roles/storage.objectAdmin, roles/iam.serviceAccountTokenCreator) are correct per Google Cloud documentation.

Suggested next steps

  1. (Non-blocking) Consider documenting alternative DNS selector labels in comments or README for clusters not using the k8s-app: kube-dns label pattern (e.g., EKS, RKE2, or clusters using NodeLocal DNSCache).
  2. Operators enabling this policy should validate their cluster's actual Pod and Service CIDRs before setting networkPolicy.clusterCidrs — the default 10.0.0.0/8 covers typical GKE clusters but may be too broad or narrow for other environments.

General findings

No blocking issues identified. The implementation is sound:

  • The NetworkPolicy correctly targets only control-layer pods via the combined selector labels.
  • IPv4 and IPv6 private ranges are comprehensively covered.
  • The allowKubeDns conditional defaults to true, preventing accidental DNS breakage.
  • Documentation in both values.yaml commits clearly explains the security model and operational requirements.

General findings (auto-demoted from inline due to pre-validation)

  • Non-blocking values.yaml:369 — The comment says "Typical GKE defaults are 10.0.0.0/8 covering both" (Pod and Service CIDRs). This is accurate for many GKE clusters, but operators should verify their actual CIDRs before enabling.
    • (demoted: code self-check failed at values.yaml:369: diff has # Typical GKE defaults are 10.0.0.0/8 covering both., model claimed # Cluster pod/service CIDRs to deny egress to. Override per cluster.)

matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: The kube-dns selector uses k8s-app: kube-dns, which matches the standard CoreDNS deployment in many clusters. However, some managed Kubernetes providers use different labels:

  • EKS: k8s-app: kube-dns (correct) but also adds eks.amazonaws.com/component: kube-dns
  • RKE2: Uses k8s-app: coredns (lowercase)
  • NodeLocal DNSCache: Creates a DaemonSet with different selectors

Why it matters: If an operator enables this NetworkPolicy on a cluster where the DNS pods don't match k8s-app: kube-dns, all outbound DNS queries will fail, breaking name resolution for the control-layer pod.

Suggested fix: Consider either:

  1. Adding a note in the values.yaml documentation to check kubectl get pods -n kube-system --show-labels and adjust the template if needed
  2. Or making the DNS selector configurable via .Values.networkPolicy.dnsSelector for clusters with non-standard DNS deployments

Alternatively, document this as a known requirement: "Operators must verify their cluster's DNS pod labels match k8s-app: kube-dns or modify the template accordingly."

The previous default `clusterCidrs: [10.0.0.0/8]` was redundant with
the hardcoded RFC1918 exception already in the NetworkPolicy template,
producing a duplicate `- 10.0.0.0/8` line in the rendered manifest
and (more importantly) suggesting to operators that they MUST set
this even on standard GKE clusters. They do not — the hardcoded
exception covers RFC1918 ranges (Pod CIDR + Service CIDR for any
typical GKE cluster).

`clusterCidrs` is only meaningful for clusters whose Pod or Service
CIDRs fall OUTSIDE RFC1918 / CGNAT / IPv6 ULA+LL (e.g. dual-stack
with public IPv6, custom CIDR plans). Defaulted to an empty list and
the comment now explains exactly when to set it.

`helm lint .` clean; the rendered policy contains no duplicate CIDR
entries with the default.

@doubleword-code doubleword-code Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds an egress NetworkPolicy template for the control-layer pod and enhances documentation around ServiceAccount annotations (GKE Workload Identity) and GCS bucket lifecycle management. The NetworkPolicy is a security hardening feature that blocks egress to private/loopback/link-local ranges while allowing public internet access, serving as a kernel-level backstop to the application's IP allow-list.

The changes are well-documented with clear comments explaining when and how to use each feature. The follow-up commit fixing the clusterCidrs default from [10.0.0.0/8] to [] correctly addresses potential duplicate CIDR entries in rendered manifests.

Verdict: Approved with one non-blocking finding regarding DNS label assumptions that operators should be aware of.

Research notes

  • Kubernetes NetworkPolicy documentation confirms that when only Egress is specified in policyTypes, ingress remains unrestricted (which is the intended behavior here). The ipBlock.cidr with except arrays work as expected for blocking specific ranges.
  • kube-dns labeling: The default CoreDNS deployment in vanilla Kubernetes uses k8s-app: kube-dns in the kube-system namespace. However, some distributions differ:
    • OpenShift uses dns.operator.openshift.io/daemonset-dns=default in the openshift-dns namespace
    • Some managed clusters may use different labels or namespaces
  • GKE Workload Identity: The documented annotation shape (iam.gke.io/gcp-service-account) and required IAM roles (roles/storage.objectAdmin on bucket, roles/iam.serviceAccountTokenCreator on the GCP service account itself) align with Google's official documentation for V4 signed URL workflows.

Suggested next steps

  1. Non-blocking: Consider documenting alternative DNS selector patterns for non-standard Kubernetes distributions (OpenShift, EKS with custom DNS, etc.) in the values.yaml comments or chart README.
  2. Operators enabling this NetworkPolicy should verify their cluster's DNS pod labels match k8s-app: kube-dns in namespace kube-system, or adjust the template accordingly.

General findings

  • The NetworkPolicy selector correctly matches the control-layer deployment pods (uses control-layer.selectorLabels + app.kubernetes.io/component: control-layer, which aligns with the pod template labels in deployment.yaml).
  • The hardcoded RFC1918 exceptions (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) plus loopback, link-local, and CGNAT ranges provide comprehensive coverage for standard cluster configurations.
  • The IPv6 rules appropriately block ULA (fc00::/7) and link-local (fe80::/10) while allowing global unicast.
  • Default enabled: false ensures this is opt-in and doesn't break existing deployments.
  • The comment about requiring a NetworkPolicy-aware CNI (Cilium, Calico, GKE Dataplane V2) is accurate — vanilla kubenet doesn't enforce NetworkPolicy.

kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: This NetworkPolicy assumes your cluster's DNS pods use the label k8s-app: kube-dns in the kube-system namespace, which is correct for vanilla Kubernetes and GKE with default CoreDNS. However, some distributions differ:

Why it matters: If your cluster uses a different DNS deployment (e.g., OpenShift uses dns.operator.openshift.io/daemonset-dns=default in openshift-dns namespace), enabling this NetworkPolicy without adjustment will break in-pod DNS resolution, preventing all outbound traffic that requires name lookup.

Suggested fix: Add a note in the values.yaml comment block (near line 380) alerting operators to verify their DNS pod labels before enabling, or consider making the DNS selector configurable:

# In values.yaml
dnsSelector:
  namespace: kube-system
  podLabels:
    k8s-app: kube-dns

Then template it dynamically. For now, at minimum document this assumption for operators.

The egress NetworkPolicy's kube-dns allow rule selects on
`kube-system / k8s-app=kube-dns`, which matches vanilla Kubernetes
and CoreDNS on most distributions but NOT OpenShift's
`openshift-dns / dns.operator.openshift.io/daemonset-dns=default` or
some managed offerings.

Adds an inline note so operators enabling NetworkPolicy on a
non-standard distribution don't quietly break in-pod DNS resolution.
Suggests two ways out: disable the rule and add an equivalent
NetworkPolicy out-of-band, or fork with the right selector.

No behavioural change; documentation only.

@doubleword-code doubleword-code Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces an opt-in egress NetworkPolicy for the control-layer pod, along with documentation updates for the image normaliser configuration and DNS selector limitations. The core security feature (egress restriction as a backstop for the application-layer IP allow-list) is well-designed and follows defense-in-depth principles.

Verdict: Needs changes before merge — there is a critical issue that would break Workload Identity on GKE Dataplane V2 clusters.

Research notes

  • Fetched Kubernetes Network Policies documentation — confirms ipBlock with except semantics and CNI-dependent behavior.

  • Fetched GKE Network Policy documentation — specifically the section "Network policy and Workload Identity Federation for GKE" which states:

    For clusters running GKE Dataplane V2, allow egress to 169.254.169.254/32 on ports 80 and 8080.

    The current template blocks the entire 169.254.0.0/16 link-local range, which includes this metadata server IP. Without access to the metadata server, Workload Identity cannot exchange tokens for GCP service accounts, breaking GCS writes from the image normaliser.

Suggested next steps

  1. Blocking: Add an explicit allow rule for the GKE metadata server (169.254.169.254/32 on ports 80/8080) when Workload Identity is in use, or narrow the link-local exception to exclude this IP. This is required for any GKE cluster using Workload Identity with NetworkPolicy enabled.
  2. Non-blocking: Add helm-unittest test cases for the new networkpolicy.yaml template covering enabled/disabled scenarios, allowKubeDns toggle, and clusterCidrs rendering.

General findings

  • The DNS selector limitation for non-standard distributions (OpenShift, managed offerings) is well-documented in values.yaml:383-391. Operators are appropriately warned.
  • The clusterCidrs default change to empty list is correct — RFC1918 ranges are already covered by the hardcoded exceptions, avoiding duplicate CIDR entries.
  • The ServiceAccount annotation documentation for Workload Identity (values.yaml:18-34) correctly describes the required IAM roles (storage.objectAdmin and iam.serviceAccountTokenCreator).

- 192.168.0.0/16
# Loopback
- 127.0.0.0/8
# Link-local — covers GCE / AWS metadata at 169.254.169.254

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: This comment is misleading and the rule breaks Workload Identity on GKE Dataplane V2.

Why it matters: The 169.254.0.0/16 exception denies egress to this range (it's in the except list under 0.0.0.0/0). However, GKE Dataplane V2 requires pods using Workload Identity to reach the metadata server at 169.254.169.254 on ports 80/8080 for token exchange. The GKE documentation explicitly states:

For clusters running GKE Dataplane V2, allow egress to 169.254.169.254/32 on ports 80 and 8080.

Blocking this IP prevents the image normaliser from obtaining GCP service account credentials, breaking GCS writes entirely.

Suggested fix: Either (a) add a separate allow rule for the metadata server before the broad deny, or (b) carve out 169.254.169.254/32 from the link-local exception:

# Allow GKE metadata server for Workload Identity
- to:
    - ipBlock:
        cidr: 169.254.169.254/32
  ports:
    - protocol: TCP
      port: 80
    - protocol: TCP
      port: 8080

Alternatively, make this rule conditional on a new allowGcpMetadata value that defaults to true for GKE compatibility.

# Loopback
- 127.0.0.0/8
# Link-local — covers GCE / AWS metadata at 169.254.169.254
- 169.254.0.0/16

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: This CIDR exception denies egress to the GKE metadata server required for Workload Identity.

Why it matters: Combined with line 31's comment, this rule blocks 169.254.169.254/32 which is the GKE metadata server endpoint. Per GKE documentation, Workload Identity on Dataplane V2 clusters requires egress to this IP on ports 80/8080. Without it, the pod cannot obtain GCP service account tokens, breaking GCS object writes from the image normaliser.

Suggested fix: Narrow this exception to exclude the metadata server IP, or add a separate allow rule for 169.254.169.254/32 on ports 80/8080 before this deny rule takes effect.

…cluster Postgres

Enabling networkPolicy while `postgresql.enabled: true` blocks the
control-layer pod from reaching the in-cluster Postgres StatefulSet
(its Pod IP / ClusterIP are RFC1918, which the policy denies),
causing immediate startup connection failures.

The feature is opt-in and aimed at deployments using external managed
Postgres (Neon) + public-internet provider APIs, but the conflict
wasn't documented. Added a prominent warning so operators don't
enable it on an internal-Postgres deployment and get a confusing
startup failure. No behavioural change.

@doubleword-code doubleword-code Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces an opt-in egress NetworkPolicy for the control-layer pod, along with comprehensive documentation for GKE Workload Identity wiring and important operational warnings. The changes are well-structured and security-focused:

  1. New NetworkPolicy template (templates/networkpolicy.yaml) - Provides kernel-level egress restrictions as a backstop to the application-layer IP allow-list
  2. Documentation updates in values.yaml - Covers Workload Identity annotation format, bucket lifecycle management, and critical warnings about in-cluster Postgres incompatibility
  3. Default configuration - clusterCidrs defaulted to empty list with clear guidance on when to set it

Verdict: Ready to approve with one recommended addition (test coverage for the new NetworkPolicy template).

Research notes

  • Kubernetes NetworkPolicy documentation: Confirmed the ipBlock structure with cidr and except arrays is valid for networking.k8s.io/v1. The policy structure in this PR follows the standard pattern.
  • GKE Workload Identity documentation: The documented annotation iam.gke.io/gcp-service-account and required IAM roles (roles/storage.objectAdmin on the bucket, roles/iam.serviceAccountTokenCreator on the service account) align with Google Cloud's official guidance for federated identity access to GCS.
  • RFC1918 ranges: The hardcoded exceptions (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) correctly cover all private IPv4 space. The warning about in-cluster Postgres incompatibility is accurate — ClusterIP services receive RFC1918 addresses that would be denied by this policy.

Suggested next steps

  1. Add NetworkPolicy test coverage (Recommended) — Create a test file in tests/networkpolicy_test.yaml to verify the template renders correctly when enabled/disabled and validates the kube-dns rule behavior.
  2. Consider adding a helm-test or integration test that explicitly validates the incompatibility scenario (networkPolicy.enabled + postgresql.enabled) fails fast with a clear error message.

General findings

Test coverage gap: The new templates/networkpolicy.yaml has no corresponding test file in the tests/ directory. Given this is a security-critical control (the "backstop" for the image fetcher's IP allow-list), having assertions that verify:

  • The policy is not rendered when networkPolicy.enabled: false (default)
  • The policy renders with correct CIDR exceptions when enabled
  • The kube-dns rule is present/absent based on allowKubeDns

would provide confidence that future changes don't accidentally break the security boundary.


Overall, this is a solid security enhancement with excellent operational documentation. The warnings about in-cluster Postgres incompatibility and non-standard DNS selectors are particularly valuable — they prevent confusing startup failures that would otherwise require debugging.

- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: The kube-dns selector uses kubernetes.io/metadata.name: kube-system which requires Kubernetes v1.16+ (when this label was auto-added to all namespaces). For clusters running older versions, this rule won't match and DNS will break.

Why it matters: The comment mentions OpenShift and managed offerings using different labels, but doesn't note the minimum K8s version requirement. Most production clusters are well past v1.16, but it's worth noting for users on legacy distributions.

Suggested fix: Add a brief note in the comment above line 48 mentioning the v1.16+ requirement, or consider using the older namespaceSelector with matchNames: ["kube-system"] if backward compatibility is needed.

Comment thread values.yaml
# even if a bug in the in-process IP allow-list let a request through,
# the kernel-level network policy refuses the packet.
#
# ⚠️ INCOMPATIBLE WITH IN-CLUSTER POSTGRES. Because the policy denies

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: The warning is prominent and accurate, but operators who miss this inline comment will hit a confusing startup failure. Consider adding a Helm template validation that fails fast with a clear error message when both are enabled.

Why it matters: A user enabling networkPolicy.enabled: true without reading the full comment block will get connection timeouts to Postgres rather than a clear "this configuration is invalid" error. Helm chart best practices often include validation logic in _helpers.tpl for incompatible value combinations.

Suggested fix: Add a fail condition in a template (e.g., in _helpers.tpl or as a ConfigMap that always renders) like:

{{- if and .Values.networkPolicy.enabled .Values.postgresql.enabled }}
{{- fail "networkPolicy.enabled and postgresql.enabled are mutually exclusive. Use external Postgres (e.g. Neon) with networkPolicy, or disable networkPolicy for in-cluster Postgres." }}
{{- end }}

This turns a runtime debugging session into an immediate helm install failure.

# Carrier-grade NAT (RFC 6598)
- 100.64.0.0/10
# In-cluster Pod / Service CIDRs (operator-configured)
{{- range .Values.networkPolicy.clusterCidrs }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Minor YAML indentation inconsistency — the ranged CIDR entries (line 37) are indented with 14 spaces, matching the static RFC1918 entries above, which is correct. However, the closing {{- end }} on line 38 has no trailing content after it, so the next line (39) starts a new - ipBlock: entry. This is fine, but worth double-checking that the rendered YAML has proper structure.

The template looks correct as-is; this is just a note to verify the output once with helm template . before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant