metis: add glibc floor qualification test target to Makefile#1036
metis: add glibc floor qualification test target to Makefile#1036arvindbr8 wants to merge 8 commits intokubernetes:masterfrom
Conversation
Adds a GitHub Actions workflow to qualify glibc floor compatibility on ubuntu-22.04 runners for the metis CNI. Adds a new test-glibc-floor make target to run the verification locally inside a container, ensuring safety for the glibc 2.35 floor.
|
This issue is currently awaiting triage. If the repository mantainers determine this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @arvindbr8. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
PTAL: @YifeiZhuang @gnossen |
|
/ok-to-test |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: arvindbr8, gnossen The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
YifeiZhuang
left a comment
There was a problem hiding this comment.
Thanks for adding the sanity checks for the glibc version skew issue! I don't see why we cannot use both github action and prow.
But it easier to maintain to keep it consistent with prow in this repo https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubernetes/cloud-provider-gcp
| # | ||
| # WARNING: Do not link this binary against newer GLIBC symbols. Doing so | ||
| # will cause immediate runtime panics when scheduled on baseline fleet nodes. | ||
| GLIBC_FLOOR_IMAGE := ubuntu:22.04 |
There was a problem hiding this comment.
Since we already use 22.04 in dockerfile, it is becomes redundant check then? Given that we develop e2e test to use metis image.
Separate but related: internally in gke release process, we probably should replace/pin a base image with a specific sha per recommendation and approved base images.
https://g3doc.corp.google.com/cloud/kubernetes/g3doc/subgroup/security/ssci/guidance/container_base_image.md?cl=head#oss-distroless (Interesting, it also explains why not scratch/alpine.)
Currently approved image is debian12. It is using glibc 2.36 - higher than your researched oldest version on GKE 1.30. But this is fine because the feature will be version trait guarded to a future version. And nodes can not be two minor versions lower.
There was a problem hiding this comment.
Great points, Ivy.
Regarding the redundancy: The test actually catches a different failure mode. Because we compile with CGO, the glibc version requirement is permanently baked into the binary during the builder stage, not the runtime stage. If someone accidentally upgrades the builder image to a newer OS, the compilation succeeds, but the resulting binary will panic when the Kubelet executes it natively on an older host. The extraction test physically verifies the ELF headers against the host OS floor to prevent that specific builder regression.
Regarind internal debian12 base image policy:
Because glibc is strictly backwards compatible, I think sticking with the 2.35 floor (compiling via Debian Bullseye) is the safest, most conservative approach for right now. A binary compiled against 2.31 will run flawlessly on both our oldest Ubuntu 22.04 nodes (2.35) and the newer approved debian12 nodes (2.36).
By holding the floor at 2.35 today, we maintain an absolute, fleet-wide safety net just in case of unexpected rollouts or backports. Since it's fully forwards-safe for newer nodes, we can easily upgrade the builder image and this CI test to debian12 next year once the GKE 1.30/Ubuntu 22.04 nodes are fully deprecated and physically out of the fleet.
Does that sound reasonable?
Enforce a qualification check for the
metisCNI binary to ensure it remains compatible with the GKE fleet'sglibc 2.35floor (Ubuntu 22.04 / COS Milestone 117).Context: Why
glibc 2.35?Because the Metis CNI is executed natively on the host OS by the Kubernetes Kubelet (rather than inside a container namespace), it is strictly bound by the host's C standard library.
Our oldest supported GKE node pools currently run Ubuntu 22.04 LTS and COS Milestone 117, both of which natively provide
glibc 2.35. This makes2.35the absolute lowest common denominator across our fleet. If the CGO binary links against aglibcversion higher than 2.35, it will immediately panic with aversion not founderror when scheduled on these nodes. See the Container-Optimized OS Release Notes and GKE Release Notes for concrete historical proof of the milestone baselines (Ubuntu 22.04 / COS Milestone 117).Fleet floor verification
To definitively prove that
glibc 2.35is the correct mathematical floor, we provisioned an ephemeral GKE cluster (1.30.14-gke.2250000) with two node pools reflecting our oldest supported fleet OS images:COS_CONTAINERD(COS Milestone 117)UBUNTU_CONTAINERD(Ubuntu 22.04 LTS)Using debug pods, I queried the host OS's C standard library. The results empirically prove
2.35is our hard floor, dictated by the Ubuntu nodes:1. Ubuntu 22.04 Node Pool (
UBUNTU_CONTAINERD):2. COS Node Pool (
COS_CONTAINERD):$ kubectl debug node/gke-glibc-test-clust-cos-verification-82afa5a0-sfv3 -it --image=ubuntu --profile=sysadmin root@gke-glibc-test-clust-cos-verification-82afa5a0-sfv3:/# chroot /host /lib64/libc.so.6 | head -n 1 GNU C Library (Gentoo 2.37-r15 p12) stable release version 2.37.Changes
Component:
metis/Makefiletest-glibc-floortarget: Builds image, extracts binary, and runs--helpnatively inside a vanillaubuntu:22.04container to guarantee runtime compatibility regardless of host OS.### Component: GitHub Actions- [NEW].github/workflows/metis-glibc-floor-test.yml: Pre-submit guardrail that runs the extraction test on an OS representing the fleet floor (ubuntu-22.04).Note
The GitHub Actions workflow file (
metis-glibc-floor-test.yml) was removed from this PR. The test will be run as a >Prow presubmit job (to be submitted tokubernetes/test-infra). kubernetes/test-infra#36769Verification Results
1. Symbol Analysis Proof
readelf -Vanalysis of the binary built on standardgolang:1.25.8(Bookworm) confirms the highest required version isGLIBC_2.34(safe for 2.35):2. Local Extraction Test
Running make test-glibc-floor succeeded without linkage errors on ubuntu:22.04:
The GitHub Actions workflow will be run as part of this PR's presubmit checks (i think!)