-
Notifications
You must be signed in to change notification settings - Fork 0
feat(chart): image normaliser config and egress NetworkPolicy #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3b64e6e
3695623
20ab664
241289e
9f8d7fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| {{- if .Values.networkPolicy.enabled -}} | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: {{ include "control-layer.fullname" . }}-egress | ||
| labels: | ||
| {{- include "control-layer.labels" . | nindent 4 }} | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| {{- include "control-layer.selectorLabels" . | nindent 6 }} | ||
| app.kubernetes.io/component: control-layer | ||
| policyTypes: | ||
| - Egress | ||
| egress: | ||
| # Egress to the public internet, with private / loopback / link-local | ||
| # / CGNAT / IPv6 ULA + LL ranges excluded. This is the application-layer | ||
| # backstop for the in-process image fetcher's IP allow-list: even if a | ||
| # bug let a request through the deny-list inside the process, the CNI | ||
| # refuses the packet. | ||
| - to: | ||
| - ipBlock: | ||
| cidr: 0.0.0.0/0 | ||
| except: | ||
| # RFC1918 private ranges | ||
| - 10.0.0.0/8 | ||
| - 172.16.0.0/12 | ||
| - 192.168.0.0/16 | ||
| # Loopback | ||
| - 127.0.0.0/8 | ||
| # Link-local — covers GCE / AWS metadata at 169.254.169.254 | ||
| - 169.254.0.0/16 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Blocking: This CIDR exception denies egress to the GKE metadata server required for Workload Identity. Why it matters: Combined with line 31's comment, this rule blocks Suggested fix: Narrow this exception to exclude the metadata server IP, or add a separate allow rule for |
||
| # Carrier-grade NAT (RFC 6598) | ||
| - 100.64.0.0/10 | ||
| # In-cluster Pod / Service CIDRs (operator-configured) | ||
| {{- range .Values.networkPolicy.clusterCidrs }} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Minor YAML indentation inconsistency — the ranged CIDR entries (line 37) are indented with 14 spaces, matching the static RFC1918 entries above, which is correct. However, the closing The template looks correct as-is; this is just a note to verify the output once with |
||
| - {{ . }} | ||
| {{- end }} | ||
| - ipBlock: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Blocking: IPv6 cluster CIDRs are not excluded, defeating the security purpose for IPv6 traffic. Why it matters: The IPv4 rule (lines 22-38) excludes both standard private ranges AND operator-configured Suggested fix: Add a mechanism to exclude IPv6 cluster CIDRs. Options:
|
||
| cidr: ::/0 | ||
| except: | ||
| # IPv6 loopback | ||
| - ::1/128 | ||
| # IPv6 unique-local (RFC 4193) | ||
| - fc00::/7 | ||
| # IPv6 link-local | ||
| - fe80::/10 | ||
| {{- if .Values.networkPolicy.allowKubeDns }} | ||
| # DNS is required for any outbound resolution. Without this rule, the | ||
| # cluster-CIDR deny above blocks queries to kube-dns and breaks all | ||
| # name lookups inside the pod. | ||
| - to: | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: kube-system | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-blocking: Namespace metadata label requires Kubernetes 1.21+. Why it matters: The Suggested fix: Either:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-blocking: The kube-dns selector uses Why it matters: The comment mentions OpenShift and managed offerings using different labels, but doesn't note the minimum K8s version requirement. Most production clusters are well past v1.16, but it's worth noting for users on legacy distributions. Suggested fix: Add a brief note in the comment above line 48 mentioning the v1.16+ requirement, or consider using the older |
||
| podSelector: | ||
| matchLabels: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-blocking: The kube-dns selector uses
Why it matters: If an operator enables this NetworkPolicy on a cluster where the DNS pods don't match Suggested fix: Consider either:
Alternatively, document this as a known requirement: "Operators must verify their cluster's DNS pod labels match |
||
| k8s-app: kube-dns | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-blocking: DNS pod label assumption may not match all clusters. Why it matters: While Suggested fix: Consider making the DNS pod selector configurable via values, or document that operators may need to adjust this label based on their cluster's DNS deployment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-blocking: This NetworkPolicy assumes your cluster's DNS pods use the label Why it matters: If your cluster uses a different DNS deployment (e.g., OpenShift uses Suggested fix: Add a note in the values.yaml comment block (near line 380) alerting operators to verify their DNS pod labels before enabling, or consider making the DNS selector configurable: # In values.yaml
dnsSelector:
namespace: kube-system
podLabels:
k8s-app: kube-dnsThen template it dynamically. For now, at minimum document this assumption for operators. |
||
| ports: | ||
| - protocol: UDP | ||
| port: 53 | ||
| - protocol: TCP | ||
| port: 53 | ||
| {{- end }} | ||
| {{- end }} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,7 +15,23 @@ serviceAccount: | |
| create: true | ||
| # Automatically mount a ServiceAccount's API credentials? | ||
| automount: true | ||
| # Annotations to add to the service account | ||
| # Annotations to add to the service account. | ||
| # | ||
| # For GKE Workload Identity (used when the image normaliser writes | ||
| # objects to a GCS bucket), bind a GCP service account here: | ||
| # | ||
| # annotations: | ||
| # iam.gke.io/gcp-service-account: <name>@<project>.iam.gserviceaccount.com | ||
| # | ||
| # The bound GCP service account needs `roles/storage.objectAdmin` on the | ||
| # image-normaliser bucket and `roles/iam.serviceAccountTokenCreator` on | ||
| # itself (to call signBlob for V4 signed URLs). | ||
| # | ||
| # Bucket lifetime: dwctl does NOT garbage-collect normaliser objects. | ||
| # Configure an Object Lifecycle Management rule on the bucket itself | ||
| # (Terraform / gcloud) to delete objects after enough days to outlive | ||
| # the longest possible batch dispatch window. Example: 7d (covers a | ||
| # 24h batch completion window + a 6d investigation buffer). | ||
| annotations: {} | ||
| # The name of the service account to use. | ||
| # If not set and create is true, a name is generated using the fullname template | ||
|
|
@@ -328,3 +344,59 @@ postgresql: | |
| runAsGroup: 999 | ||
| # Security best practice - prevent running as root | ||
| runAsNonRoot: true | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Egress NetworkPolicy for the control-layer pod. | ||
| # | ||
| # Disabled by default. When enabled, restricts the control-layer pod's | ||
| # outbound traffic so it cannot reach private / loopback / link-local | ||
| # ranges or in-cluster service CIDRs. Public-internet egress (provider | ||
| # APIs, external Postgres, object storage) is preserved. | ||
| # | ||
| # This is the application-layer backstop for the hardened image fetcher: | ||
| # even if a bug in the in-process IP allow-list let a request through, | ||
| # the kernel-level network policy refuses the packet. | ||
| # | ||
| # ⚠️ INCOMPATIBLE WITH IN-CLUSTER POSTGRES. Because the policy denies | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-blocking: The warning is prominent and accurate, but operators who miss this inline comment will hit a confusing startup failure. Consider adding a Helm template validation that fails fast with a clear error message when both are enabled. Why it matters: A user enabling Suggested fix: Add a fail condition in a template (e.g., in {{- if and .Values.networkPolicy.enabled .Values.postgresql.enabled }}
{{- fail "networkPolicy.enabled and postgresql.enabled are mutually exclusive. Use external Postgres (e.g. Neon) with networkPolicy, or disable networkPolicy for in-cluster Postgres." }}
{{- end }}This turns a runtime debugging session into an immediate |
||
| # RFC1918 egress, enabling it while `postgresql.enabled: true` will | ||
| # block the control-layer pod from reaching the in-cluster Postgres | ||
| # StatefulSet (its Pod IP and ClusterIP are RFC1918) — startup will | ||
| # fail immediately with connection errors. This NetworkPolicy is | ||
| # intended for deployments using EXTERNAL managed Postgres (e.g. Neon) | ||
| # and public-internet provider APIs. If you must run both, add an | ||
| # explicit allow rule for your Postgres CIDR out-of-band. | ||
| # | ||
| # The template ALREADY denies RFC1918 (10.0.0.0/8, 172.16/12, 192.168/16), | ||
| # loopback, link-local, CGNAT, and IPv6 ULA/LL ranges. `clusterCidrs` | ||
| # is only needed if your cluster uses Pod or Service CIDRs OUTSIDE | ||
| # those ranges (e.g. dual-stack with public IPv6, custom CIDR plans). | ||
| # | ||
| # For standard GKE clusters (default Pod CIDR 10.x and Service CIDR | ||
| # 10.x), the hardcoded RFC1918 block already covers them — leave the | ||
| # default empty list and nothing additional is denied. | ||
| # | ||
| # Misconfiguring this can break in-pod DNS resolution if `kube-dns` | ||
| # isn't allowed; the template emits an explicit `kube-dns` allow rule | ||
| # regardless of which CIDRs you set. | ||
| # | ||
| # Requires a NetworkPolicy-aware CNI (Cilium, Calico, GKE Dataplane V2). | ||
| networkPolicy: | ||
| enabled: false | ||
| # Additional CIDRs to deny egress to, ON TOP OF the hardcoded RFC1918 | ||
| # / loopback / link-local / CGNAT / IPv6-ULA-LL exceptions. Empty by | ||
| # default — standard clusters using RFC1918 Pod/Service ranges need | ||
| # nothing here. Set explicitly only for non-RFC1918 cluster CIDRs. | ||
| clusterCidrs: [] | ||
| # Allow egress to kube-dns. Disable only if you know what you're doing. | ||
| # | ||
| # NOTE on non-standard DNS deployments: the template selects DNS pods | ||
| # using `namespace=kube-system, podSelector k8s-app=kube-dns`, which | ||
| # matches vanilla Kubernetes (and CoreDNS on most distros). If your | ||
| # cluster runs DNS under different labels — e.g. OpenShift uses | ||
| # `dns.operator.openshift.io/daemonset-dns=default` in the | ||
| # `openshift-dns` namespace, and some managed offerings use other | ||
| # combinations — the rule will not match and in-pod DNS resolution | ||
| # will break when networkPolicy is enabled. In that case either | ||
| # disable this rule (and add an equivalent NetworkPolicy out-of-band) | ||
| # or fork the chart with the right selector for your distribution. | ||
| allowKubeDns: true | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking: This comment is misleading and the rule breaks Workload Identity on GKE Dataplane V2.
Why it matters: The
169.254.0.0/16exception denies egress to this range (it's in theexceptlist under0.0.0.0/0). However, GKE Dataplane V2 requires pods using Workload Identity to reach the metadata server at169.254.169.254on ports 80/8080 for token exchange. The GKE documentation explicitly states:Blocking this IP prevents the image normaliser from obtaining GCP service account credentials, breaking GCS writes entirely.
Suggested fix: Either (a) add a separate allow rule for the metadata server before the broad deny, or (b) carve out
169.254.169.254/32from the link-local exception:Alternatively, make this rule conditional on a new
allowGcpMetadatavalue that defaults totruefor GKE compatibility.