Categories
kubernetes

GCP Horizontal Pod Autoscaling with Pub/Sub

Table of Contents

Google Just Why?

GCP Horizontal Pod Autoscaling with Pub/Sub shouldn’t be as complicated as it is. I’m not sure why but following this GCP article it appears workload identity doesn’t work with the stack driver.

I instead did it the “old” way of using Google Service Accounts instead.

Assumptions

  • You already have a k8s cluster running.
  • You have kubectl installed and you are authenticated into your cluster
  • You have admin permissions with GKE to do the following
    • Create pub/sub topics & subscriptions
    • Create service accounts
    • Admin permissions inside of your k8s cluster
  • You already have workload identity turned on for BOTH you cluster and node pool
Cluster with workload identity for GCP Horizontal Pod Autoscaling with Pub/Sub article
Cluster with workload identity
Node Page with GKE Metadata Server enabled for GCP Horizontal Pod Autoscaling with Pub/Sub article
Node Page with GKE Metadata Server enabled

If all the assumptions are true then your ready to run the script below. If not follow this guide GCP guide up until the “Deploying the Custom Metrics Adapter.”

Lets Get Down to HPA

First create a manifest file for a application and call the file test-app.yaml

This manifest will be called by the script below so make sure its in the working directory when you execute the script

apiVersion: v1
kind: ServiceAccount
metadata:
  name: pubsub-sa
---
# [START gke_deployment_pubsub_with_workflow_identity_deployment_pubsub]
# [START container_pubsub_workload_identity_deployment]
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pubsub
spec:
  selector:
    matchLabels:
      app: pubsub
  template:
    metadata:
      labels:
        app: pubsub
    spec:
      serviceAccountName: pubsub-sa
      containers:
        - name: subscriber
          image: us-docker.pkg.dev/google-samples/containers/gke/pubsub-sample:v2
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: pubsub
spec:
  minReplicas: 1
  maxReplicas: 4
  metrics:
    - external:
        metric:
          name: pubsub.googleapis.com|subscription|num_undelivered_messages
          selector:
            matchLabels:
              resource.labels.subscription_id: echo-read
        target:
          type: AverageValue
          averageValue: 2
      type: External
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pubsub
# [END container_pubsub_workload_identity_deployment]
# [END gke_deployment_pubsub_with_workflow_identity_deployment_pubsub]

You can find the container code here
https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/databases/cloud-pubsub/main.py


import datetime
import time

# [START gke_pubsub_pull]
# [START container_pubsub_pull]
from google import auth
from google.cloud import pubsub_v1


def main():
    """Continuously pull messages from subsciption"""

    # read default project ID
    _, project_id = auth.default()
    subscription_id = 'echo-read'

    subscriber = pubsub_v1.SubscriberClient()
    subscription_path = subscriber.subscription_path(
        project_id, subscription_id)

    def callback(message: pubsub_v1.subscriber.message.Message) -> None:
        """Process received message"""
        print(f"Received message: ID={message.message_id} Data={message.data}")
        print(f"[{datetime.datetime.now()}] Processing: {message.message_id}")
        time.sleep(3)
        print(f"[{datetime.datetime.now()}] Processed: {message.message_id}")
        message.ack()

    streaming_pull_future = subscriber.subscribe(
        subscription_path, callback=callback)
    print(f"Pulling messages from {subscription_path}...")

    with subscriber:
        try:
            streaming_pull_future.result()
        except Exception as e:
            print(e)
# [END container_pubsub_pull]
# [END gke_pubsub_pull]


if __name__ == '__main__':
    main()

Next create bash script called run-example.sh

PROJECT_ID=$(gcloud projects list --filter="$(gcloud config get-value project)" --format="value(PROJECT_ID)")
SERVICE_ACCOUNT_NAME=custom-metrics-stackdriver
PROJECT_NUMBER=$(gcloud projects list --filter="$(gcloud config get-value project)" --format="value(PROJECT_NUMBER)")
EXAMPLE_NAMESPACE=default
PUBSUB_TOPIC=echo
PUBSUB_SUBSCRIPTION=echo-read

create (){

  kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
  sleep 5
  kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
  # running twice to make sure its being created
  echo "Created custom-metrics namespace and additional resources"

  gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME \
    --description="custom metrics stackdriver" \
    --display-name="custom-metrics-stackdriver"
  echo "Created google service account(GSA) $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com"
  
  sleep 5 #Sleep is because iam policy binding fails sometimes if its used to soon after service account creation

  gcloud projects add-iam-policy-binding $PROJECT_ID \
   --role roles/monitoring.viewer \
   --member serviceAccount:$SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
  echo "added role monitoring.viewer to GSA $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com"

  gcloud iam service-accounts add-iam-policy-binding  \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:$PROJECT_ID.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]" \
    $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
  echo "added iam policy for KSA custom-metrics-stackdriver-adapter"

  kubectl annotate serviceaccount --namespace custom-metrics \
    custom-metrics-stackdriver-adapter \
    iam.gke.io/gcp-service-account=$SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
  echo "annotated KSA custom-metrics-stackdriver-adapter with GSA $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com"

  gcloud pubsub topics create $PUBSUB_TOPIC
  sleep 5
  echo "Created Topic"

  gcloud pubsub subscriptions create $PUBSUB_SUBSCRIPTION --topic=$PUBSUB_TOPIC
  echo "Created Subscription to Topic"


  kubectl apply -f test-app.yaml -n $EXAMPLE_NAMESPACE
  echo "Deployed test application"

  gcloud projects add-iam-policy-binding projects/$PROJECT_ID \
    --role=roles/pubsub.subscriber \
    --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$EXAMPLE_NAMESPACE/sa/pubsub-sa
  echo "Added workload identity to to pubsub-sa"
}

delete() {
  kubectl delete -f test-app.yaml -n $EXAMPLE_NAMESPACE
  kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

  echo  $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
  gcloud iam service-accounts delete $SERVICE_ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com --quiet

  gcloud projects remove-iam-policy-binding projects/$PROJECT_ID \
      --role=roles/pubsub.subscriber \
      --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/$EXAMPLE_NAMESPACE/sa/pubsub-sa

  gcloud pubsub topics delete $PUBSUB_TOPIC
  gcloud pubsub subscriptions delete $PUBSUB_SUBSCRIPTION
}

create

If you are prompted to enter a condition choose “None”

Confirm Application is Working

Make the application pod is running

$ kubectl get pods

NAME                      READY   STATUS    RESTARTS   AGE
pubsub-7f44cf5977-rbztk   1/1     Running   0          16h

Make sure the hpa is running

$ kubectl get pods
NAME     REFERENCE           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
pubsub   Deployment/pubsub   0/2 (avg)   1         4         1          1m

Lets trigger an auto-scale event by sending messages to the echo topic.

 for i in {1..200}; do gcloud pubsub topics publish echo --message="Autoscaling #${i}";  done

It’ll take 2-5 minutes for the scaling event to occur. Yes this is slow.

After awhile you should see that the pod number has increased and that is reflected on the hpa status as well

$ kubectl get hpa

NAME     REFERENCE           TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
pubsub   Deployment/pubsub   25/2 (avg)   1         4         4          74m



$ kubectl get pods

NAME                      READY   STATUS        RESTARTS         AGE
pubsub-7f44cf5977-f54hc   1/1     Running       0                25s
pubsub-7f44cf5977-gjbsh   1/1     Running       0                25s
pubsub-7f44cf5977-n7ttr   1/1     Running       0                25s
pubsub-7f44cf5977-xglct   1/1     Running       0                26s

Troubleshooting

Always check the output of run-example.sh first. Odds are you didn’t have permissions to do something. You can always run the delete command and start all over

***NOTE: you’ll need to change the name of the service account because GCP does soft deletes on service accounts.

Problems

HPA has unknown under targets.

$kubectl get hpa

NAME     REFERENCE           TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
pubsub   Deployment/pubsub   unknown/2 (avg)   1         4         4          64m
  • The reason for this is that some configuration just went wrong. Check to make sure every command executed correctly.
  • You can even check the logs from the custom-metrics pod to make sure nothing is wrong.
austin.poole@docker-and-such:~$ kubectl get pods -n custom-metrics
NAME                                                 READY   STATUS    RESTARTS   AGE
custom-metrics-stackdriver-adapter-89fdf8645-bbn4l   1/1     Running   0          5h11m
austin.poole@docker-and-such:~$ kubectl logs custom-metrics-stackdriver-adapter-89fdf8645-bbn4l -n custom-metrics
I1127 13:52:25.333064       1 adapter.go:217] serverOptions: {true true true true false   false false}
I1127 13:52:25.336266       1 adapter.go:227] ListFullCustomMetrics is disabled, which would only list 1 metric resource to reduce memory usage. Add --list-full-custom-metrics to list full metric resources for debugging.
I1127 13:52:29.127164       1 serving.go:374] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
....
  • Make sure that the external metrics APIService exists by querying the api-server.
$ kubectl proxy --port 8080 &

Starting to serve on 127.0.0.1:8080


$ curl http://localhost:8080/apis/external.metrics.k8s.io/v1beta1

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "externalmetrics",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

If there the external metrics APIService is missing than re-run

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

Thanks for taking the time to read about GCP Horizontal Pod Autoscaling with Pub/Sub.

Cheers!