diff --git a/.gitignore b/.gitignore index 1528e9f93..14d8ef5a4 100644 --- a/.gitignore +++ b/.gitignore @@ -9,3 +9,4 @@ _rundir/ _tmp/ /bin/ __pycache__/ +/clusters/ diff --git a/README.md b/README.md index 1ce32dc39..988684d5e 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,9 @@ This repository implements the [cloud provider](https://github.com/kubernetes/cloud-provider) interface for [Google Cloud Platform (GCP)](https://cloud.google.com/). It provides components for Kubernetes clusters running on GCP and is maintained primarily by the Kubernetes team at Google. -To see all available commands in this repository, run `make help`. +To get started with the GCP CCM, see the **[kOps Quickstart](docs/kops-quickstart.md)** (automated setup) or the **[Manual CCM Setup Guide](docs/ccm-manual.md)**. + +For local development, use `make help` to see all available commands. ## Components diff --git a/docs/ccm-manual.md b/docs/ccm-manual.md new file mode 100644 index 000000000..aa6b086da --- /dev/null +++ b/docs/ccm-manual.md @@ -0,0 +1,155 @@ +# GCP Cloud Controller Manager (CCM) Manual Setup Guide + +This guide provides instructions for building and deploying the GCP Cloud Controller Manager (CCM) to a self-managed Kubernetes cluster. + +## Prerequisites + +1. **Kubernetes Cluster**: A Kubernetes cluster running on Google Cloud Platform. + * The cluster's components (`kube-apiserver`, `kube-controller-manager`, and `kubelet`) must have the `--cloud-provider=external` flag. + * For an example of how to create GCE instances and initialize such a cluster manually using `kubeadm`, see **[Manual Kubernetes Cluster on GCE](manual-cluster-gce.md)**. +2. **GCP Service Account**: The nodes (or the CCM pod itself) must have access to a GCP IAM Service Account with sufficient permissions to manage compute resources (e.g. instances, load balancers, and routes). +3. **Docker & gcloud CLI**: Authorized and configured for pushing images to GCP Artifact Registry. + + +## Step 1: Build and Push the CCM Image (Manual Clusters) + +If you are using a manually provisioned cluster (e.g. `kubeadm`), build the `cloud-controller-manager` Docker image and push it to your registry: + +```sh +# Google Cloud Project ID, registry location, and repository name. +GCP_PROJECT=$(gcloud config get-value project) +GCP_LOCATION=us-central1 +REPO=my-repo + +# Create an Artifact Registry repository (if it doesn't already exist) +gcloud artifacts repositories create ${REPO} \ + --project=${GCP_PROJECT} \ + --repository-format=docker \ + --location=${GCP_LOCATION} \ + --description="Docker repository for CCM" + +# Grant the cluster nodes permission to read from the newly created Artifact Registry. +# This automatically extracts your GCE node's service account using kubectl and gcloud. +NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}') +NODE_ZONE=$(kubectl get node $NODE_NAME -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}') +NODE_SA=$(gcloud compute instances describe $NODE_NAME \ + --zone=$NODE_ZONE --project=${GCP_PROJECT} \ + --format="value(serviceAccounts[0].email)") + +gcloud artifacts repositories add-iam-policy-binding ${REPO} \ + --project=${GCP_PROJECT} \ + --location=${GCP_LOCATION} \ + --member="serviceAccount:${NODE_SA}" \ + --role="roles/artifactregistry.reader" +# Configure docker to authenticate with Artifact Registry +gcloud auth configure-docker ${GCP_LOCATION}-docker.pkg.dev + +# Build and Push +IMAGE_REPO=${GCP_LOCATION}-docker.pkg.dev/${GCP_PROJECT}/${REPO} IMAGE_TAG=v0 make publish +``` + +*Note: If `IMAGE_TAG` is omitted, the Makefile will use a combination of the current Git commit SHA and the build date.* + +## Step 2: Deploy the CCM to your Cluster (Manual Clusters) + +Once the image is pushed, you must deploy the necessary RBAC permissions and the CCM pod itself to the Kubernetes cluster. + +For native Kubernetes clusters, avoid the legacy `deploy/cloud-controller-manager.manifest` (which is a SaltStack template used by legacy `kube-up`). Instead, use the kustomize-ready DaemonSet which correctly includes the RBAC roles and deployment: + +1. Update the image to your newly pushed tag: +```sh +(cd deploy/packages/default && kustomize edit set image k8scloudprovidergcp/cloud-controller-manager=$IMAGE_REPO:$IMAGE_TAG) +``` +2. The `manifest.yaml` DaemonSet is left intentionally blank of execution flags (`args: []`). You **must** provide the necessary command-line arguments to the `cloud-controller-manager` container. For a typical Kops or GCE cluster, you can supply these arguments by creating a Kustomize patch. + +> [!NOTE] +> If you skipped building your own image in Step 1 and chose to deploy the public upstream image (`k8scloudprovidergcp/cloud-controller-manager:latest`), you **must** also include `command: ["/cloud-controller-manager"]` in your patch's `containers` block. Locally built Dockerfile images automatically set the correct `ENTRYPOINT`, so they do not require this override! + +> [!IMPORTANT] +> Be sure to update the `--cluster-cidr` and `--cluster-name` arguments below to match your specific cluster's configuration. Note that GCP resource names cannot contain dots (`.`), so if your cluster name is `my.cluster.net`, you **must** use a sanitized format like `my-cluster-net` here! + +```sh +cat << EOF > deploy/packages/default/args-patch.yaml +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: cloud-controller-manager + namespace: kube-system +spec: + template: + spec: + volumes: + - name: host-kubeconfig + hostPath: + path: /etc/kubernetes/admin.conf + containers: + - name: cloud-controller-manager + command: ["/usr/local/bin/cloud-controller-manager"] + volumeMounts: + - name: host-kubeconfig + mountPath: /etc/kubernetes/admin.conf + readOnly: true + args: + - --kubeconfig=/etc/kubernetes/admin.conf + - --authentication-kubeconfig=/etc/kubernetes/admin.conf + - --authorization-kubeconfig=/etc/kubernetes/admin.conf + - --cloud-provider=gce + - --allocate-node-cidrs=true + - --cluster-cidr=10.4.0.0/14 + - --cluster-name=kops-k8s-local + - --configure-cloud-routes=true + - --leader-elect=true + - --use-service-account-credentials=true + - --v=2 +EOF +(cd deploy/packages/default && kustomize edit add patch --path args-patch.yaml) + +# Deploy the configured package (this applies the DaemonSet and its required roles): +kubectl apply -k deploy/packages/default +``` + +### Alternative: Apply Standalone RBAC Roles + +If you prefer to deploy the RBAC rules independently from the base daemonset package, you can apply them directly: + +```sh +kubectl apply -f deploy/cloud-node-controller-role.yaml +kubectl apply -f deploy/cloud-node-controller-binding.yaml +kubectl apply -f deploy/pvl-controller-role.yaml +``` + +## Step 3: Verification + +To verify that the Cloud Controller Manager is running successfully: + +1. **Check the Pod Status**: Verify the pod is `Running` in the `kube-system` namespace. +```sh +kubectl get pods -n kube-system -l component=cloud-controller-manager +``` + +2. **Check Pod Logs**: Look for any errors or access and authentication issues with the GCP API. +```sh +kubectl describe pod -n kube-system -l component=cloud-controller-manager +kubectl logs -n kube-system -l component=cloud-controller-manager +``` + +3. **Check Node Initialization**: The `kubelet` initially applies a `node.cloudprovider.kubernetes.io/uninitialized` taint when bound to an external cloud provider. The CCM should remove this taint once it successfully fetches the node's properties from the GCP API. +```sh +# Ensure no nodes have the uninitialized taint, output should be empty. +kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints | grep uninitialized +``` + +4. **Verify External IPs and ProviderID**: Check if your nodes are correctly populated with GCP-specific data (e.g., `ProviderID` in the format `gce://...`). +```sh +kubectl describe nodes | grep "ProviderID:" +``` + +## Teardown + +If you used the default CCM package, you can clean up the local patch file and reset all changes to kustomization.yaml: +```sh +rm deploy/packages/default/args-patch.yaml +git checkout deploy/packages/default/kustomization.yaml +``` + +If you followed the [manual cluster setup guide](manual-cluster-gce.md), you may follow the [teardown steps](manual-cluster-gce.md#teardown) to clean up your GCP resources. \ No newline at end of file diff --git a/docs/kops-quickstart.md b/docs/kops-quickstart.md new file mode 100644 index 000000000..81b0e307d --- /dev/null +++ b/docs/kops-quickstart.md @@ -0,0 +1,69 @@ +# GCP CCM with kOps Quickstart + +This guide provides a quickstart for building and deploying the GCP Cloud Controller Manager (CCM) to a self-managed Kubernetes cluster provisioned with kOps. + +## Prerequisites + +A Google Cloud Platform project with billing enabled. + +## Deployment + +The `make kops-up` target is an end-to-end workflow that automatically: +- Provisions a Kubernetes cluster using kOps. +- Builds the CCM image locally. +- Pushes the image to your Artifact Registry. +- Deploys the CCM (along with required RBAC) to the cluster. + +Run the following commands to get started: + +```sh +# Enable required GCP APIs +gcloud services enable compute.googleapis.com +gcloud services enable artifactregistry.googleapis.com + +# Set environment variables +export GCP_PROJECT=$(gcloud config get-value project) +export GCP_LOCATION=us-central1 +export GCP_ZONES=${GCP_LOCATION}-a +export KOPS_CLUSTER_NAME=kops.k8s.local +export KOPS_STATE_STORE=gs://${GCP_PROJECT}-kops-state + +# Create the state store bucket if it doesn't already exist +gcloud storage buckets create ${KOPS_STATE_STORE} --location=${GCP_LOCATION} || true + +# Run the cluster creation target, may take several minutes +make kops-up +``` + +## Verification + +To verify that the Cloud Controller Manager is running successfully: + +1. **Check the Pod Status**: Verify the pod is `Running` in the `kube-system` namespace. +```sh +kubectl get pods -n kube-system -l component=cloud-controller-manager +``` + +2. **Check Pod Logs**: Look for any errors or access and authentication issues with the GCP API. +```sh +kubectl logs -n kube-system -l component=cloud-controller-manager +``` + +3. **Check Node Initialization**: The CCM should remove the `node.cloudprovider.kubernetes.io/uninitialized` taint once it successfully fetches the node's properties from the GCP API. +```sh +# Ensure no nodes have the uninitialized taint, output should be empty. +kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints | grep uninitialized +``` + +4. **Verify ProviderID**: Check if your nodes are correctly populated with GCP-specific data (e.g., `ProviderID` in the format `gce://...`). +```sh +kubectl describe nodes | grep "ProviderID:" +``` + +## Teardown + +To tear down the cluster and clean up resources: + +```sh +make kops-down +``` diff --git a/docs/manual-cluster-gce.md b/docs/manual-cluster-gce.md new file mode 100644 index 000000000..7f0db8e13 --- /dev/null +++ b/docs/manual-cluster-gce.md @@ -0,0 +1,198 @@ +# Manual Kubernetes Cluster on GCE + +This guide provides an example of how to create GCE instances and initialize a Kubernetes cluster manually using `kubeadm`, configured to use an external Cloud Controller Manager. + +## Step 1: Create GCE Instances + +```sh +ZONE=us-central1-a +MACHINE_TYPE=e2-medium # Minimum recommended for K8s control plane +IMAGE_FAMILY=ubuntu-2204-lts +IMAGE_PROJECT=ubuntu-os-cloud + +# Control pane instance +gcloud compute instances create k8s-master \ + --zone=$ZONE \ + --machine-type=$MACHINE_TYPE \ + --image-family=$IMAGE_FAMILY \ + --image-project=$IMAGE_PROJECT \ + --can-ip-forward \ + --scopes=cloud-platform \ + --tags=k8s-master + +# Worker instances +gcloud compute instances create k8s-worker-1 k8s-worker-2 \ + --zone=$ZONE \ + --machine-type=$MACHINE_TYPE \ + --image-family=$IMAGE_FAMILY \ + --image-project=$IMAGE_PROJECT \ + --can-ip-forward \ + --scopes=cloud-platform \ + --tags=k8s-worker +``` + +## Step 2: Access and Configure Master Node + +SSH into master: +```sh +gcloud compute ssh k8s-master --zone=us-central1-a +``` + +### 2.1 Install Container Runtime +```sh +# Update package list +sudo apt-get update + +# Install containerd +sudo apt-get install -y containerd + +# Configure containerd (Generate default config) +sudo mkdir -p /etc/containerd +containerd config default | sudo tee /etc/containerd/config.toml + +# update SystemdCgroup to true (Recommended for systemd integration) +sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml + +# Restart containerd +sudo systemctl restart containerd +``` + +### 2.2 Install Kubeadm, Kubelet, and Kubectl +```sh +# Install dependencies +sudo apt-get update +sudo apt-get install -y apt-transport-https ca-certificates curl gpg + +# Download the public signing key +sudo mkdir -p -m 755 /etc/apt/keyrings +curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg + +# Add the Kubernetes apt repository +echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list + +# Install tools and lock versions +sudo apt-get update +sudo apt-get install -y kubelet kubeadm kubectl +sudo apt-mark hold kubelet kubeadm kubectl +``` + +### 2.3 Configure Kubelet for External Cloud Provider +```sh +# Add --cloud-provider=external to the KUBELET_KUBEADM_ARGS +echo 'KUBELET_EXTRA_ARGS="--cloud-provider=external"' | sudo tee /etc/default/kubelet + +# Restart kubelet +sudo systemctl daemon-reload +sudo systemctl restart kubelet +``` + +### 2.4 Enable Kernel Modules and IP Forwarding +```sh +# Load required kernel modules +sudo modprobe overlay +sudo modprobe br_netfilter + +# Persist modules across boots +cat < kubeadm-config.yaml +apiVersion: kubeadm.k8s.io/v1beta3 +kind: ClusterConfiguration +kubernetesVersion: v1.30.0 # Match the version you installed +apiServer: + extraArgs: + cloud-provider: external +controllerManager: + extraArgs: + cloud-provider: external +EOF + +sudo kubeadm init --config=kubeadm-config.yaml +``` + +## Step 4: Configure Kubectl for Admin Access +```sh +mkdir -p $HOME/.kube +sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config +sudo chown $(id -u):$(id -g) $HOME/.kube/config +``` + +## Step 5: Verify Node Initialization +```sh +kubectl get nodes +kubectl describe node k8s-master | grep Taints +# Output should show: node.cloudprovider.kubernetes.io/uninitialized:NoSchedule +``` + +## Step 6: Configure Kubectl for External Access + +If following the CCM manual setup guide, you should configure kubectl for external access. +`exit` the master node and prepare the kubeconfig on your local machine using an **SSH Tunnel** (recommended to bypass firewall restrictions): + +1. **Extract Kubeconfig**: + ```sh + gcloud compute scp k8s-master:~/.kube/config /tmp/manual-kubeconfig --zone=us-central1-a + ``` + +2. **Patch config for external use (Skip TLS Verify)**: + ```sh + # Delete CA data and append skip flag + sed -i "/certificate-authority-data:/d" /tmp/manual-kubeconfig + sed -i '/server:/a \ insecure-skip-tls-verify: true' /tmp/manual-kubeconfig + ``` + +3. **Start SSH Tunnel** (Run this in a **separate background terminal**): + ```sh + # Forward local port 6443 to Master port 6443 + gcloud compute ssh k8s-master --zone=us-central1-a -- -L 6443:localhost:6443 -N + ``` + +4. **Update config to target the tunnel** (Run this in your working terminal): + ```sh + sed -i "s|server: https://.*|server: https://localhost:6443|g" /tmp/manual-kubeconfig + + export KUBECONFIG=/tmp/manual-kubeconfig + kubectl get nodes + ``` + +You may now proceed with the [CCM manual setup guide](ccm-manual.md). + +## Teardown + +To tear down the manual Kubernetes cluster and release all GCE resources: + +1. **Delete GCE Instances**: + ```sh + gcloud compute instances delete k8s-master k8s-worker-1 k8s-worker-2 --zone=us-central1-a --quiet + ``` + +2. **Clean up Local Kubeconfig**: + ```sh + rm /tmp/manual-kubeconfig + ``` + +3. **Stop SSH Tunnel**: + Terminate the background `gcloud compute ssh` tunnel command running in your secondary terminal window (e.g. via `Ctrl + C`). + +4. **Unset local KUBECONFIG**: + Restore default `kubectl` context behavior: + ```sh + unset KUBECONFIG + ``` +