Skip to content

Updating Knative Configuration and Annotations to Support 0 Initial Scale#537

Merged
openshift-merge-bot[bot] merged 13 commits into
opendatahub-io:masterfrom
brettmthompson:feature/RHOAIENG-18965-changing-replicas-to-0-causes-new-pod-creation
Apr 21, 2025
Merged

Updating Knative Configuration and Annotations to Support 0 Initial Scale#537
openshift-merge-bot[bot] merged 13 commits into
opendatahub-io:masterfrom
brettmthompson:feature/RHOAIENG-18965-changing-replicas-to-0-causes-new-pod-creation

Conversation

@brettmthompson
Copy link
Copy Markdown

@brettmthompson brettmthompson commented Mar 19, 2025

What this PR does / why we need it:
This PR is required to resolve RHOAIENG-18965.

The cause of "Changing 0 replicas to 0 causes a new pod creation" is the default KnativeServing configuration, which does not allow for this functionality. By default KnativeServing is configured to not allow zero initial scale and has the default initial scale value as 0. This means that for any knative revision to be considered ready at least one replica must first be created. In the case of changing an inference service's replicas to 0, a new revision is created with initial scale as the default value of 1, which causes a single pod to be spun up. Once the initial scale is reached this pod is then destroyed to achieve the desired state of 0 replicas.

Per the Knative Configuring Scale Bounds documentation, to prevent pods from being created when an inference service is configured to have 0 replicas, the following is required:

  1. KnativeServing must be configured to allow zero initial scale. Covered in PR.
  2. The autoscaling.knative.dev/initial-scale: "0" annotation must be added to created knative revisions Covered in this PR.

This PR does not change any existing logic regarding how the min-scale and max-scale annotations are added to ISVCs and IGs. If we want to change this logic, which I think is warranted, it should be done in a separate PR. I have a fork with those changes already in place as well.

The changes in this PR encompass the following:

  1. We now check the knativeserving autoscaler configuration while creating a new knative service for ISVCs or IGs. This configuration is defined by the knativeserving custom resource. Since there is no guarantee in which namespace this resource exists or what its name is, to retrieve this configuration we list all kantiveserving custom resources and grab the first element returned in the list. By doing do, we are operating under the assumption that only one knativeserving resource exists in the cluster.
  2. Once the kantive serving autoscaler configuration is retrieved, there are now 2 main differences in the knative service creation flow. First, a warning will be logged if the autoscaler's globally configured initial-scale value exceeds the requested min replicas for an ISVC or IG. Second, if the autoscaler's globally configured allow-zero-inital-scale value is set to true, then we set the inital-scale annotation to 0 in the created knative service if an ISVC is requested with 0 min replicas or if an IG has min-scale: "0" annotation.

Type of changes
Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing:

  • Unit Tests
  • Manual Integration Tests

Test Strategy:

Inference Service:

  1. Deploy ODH model serving component in a cluster
  2. Deploy sample ISVC
  3. Update the KnativeServing operator and apply the following value under the config key:
autoscaler:
      allow-zero-initial-scale: "true"
  1. Scale the ODH operator to 0, this will prevent the change in step 5 from being reverted by the operator
  2. Edit the kserve-controller-manager-deployment and update the image to be quay.io/bretthom/kserve-dev:autoscaling-odh
  3. Edit the ISVC to change minReplicas to 0.
  4. Observe that no new pods are created for the new knative revision

Inference Service:

  1. Deploy ODH model serving component in a cluster
  2. Deploy sample IG
  3. Update the KnativeServing operator and apply the following value under the config key:
autoscaler:
      allow-zero-initial-scale: "true"
  1. Scale the ODH operator to 0, this will prevent the change in step 5 from being reverted by the operator
  2. Edit the kserve-controller-manager-deployment and update the image to be quay.io/bretthom/kserve-dev:autoscaling-odh
  3. Edit the IG to add the autoscaling.knative.dev/min-scale: "0" annotation
  4. Observe that no new pods are created for the new knative revision

Special notes for your reviewer:

Please confirm if this is functionality we think is worth supporting. If it is, please direct me to where documentation should be updated.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?

Release note:

NONE

Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

…ources created for inference graphs

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
… for inference services

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Comment thread pkg/controller/v1alpha1/inferencegraph/controller_test.go Outdated
Copy link
Copy Markdown

@Jooho Jooho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, I tested something wrong. it works now.

and brett will update the description of test steps.

Comment thread pkg/controller/v1alpha1/inferencegraph/controller_test.go Outdated
Comment thread pkg/controller/v1alpha1/inferencegraph/controller_test.go Outdated
Comment thread pkg/controller/v1alpha1/inferencegraph/controller_test.go Outdated
Comment thread pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go Outdated
Comment thread pkg/controller/v1beta1/inferenceservice/controller_test.go Outdated
@spolti
Copy link
Copy Markdown
Member

spolti commented Mar 21, 2025

Adding autoscaling.knative.dev/max-scale annotation with a corresponding value based on the Inference Graph's maxReplicas

is this valid only for inferenceGraph?
What about if in a usual ISVC, the user sets the maxReplicas field?

Comment thread pkg/controller/v1beta1/inferenceservice/reconcilers/knative/ksvc_reconciler.go Outdated
Copy link
Copy Markdown
Member

@spolti spolti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @brettmthompson
just a few comments, minors

@brettmthompson
Copy link
Copy Markdown
Author

brettmthompson commented Mar 21, 2025

Adding autoscaling.knative.dev/max-scale annotation with a corresponding value based on the Inference Graph's maxReplicas

is this valid only for inferenceGraph? What about if in a usual ISVC, the user sets the maxReplicas field?

@spolti this functionality is already enabled for ISVC here. What I am doing in the IG related commit is applying the same maxReplicas logic used for ISVCs to IGs.

Comment thread pkg/controller/v1beta1/inferenceservice/reconcilers/knative/ksvc_reconciler.go Outdated
Copy link
Copy Markdown

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think this should go to upstream first to check if they would accept this.

Comment thread pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go Outdated
@spolti
Copy link
Copy Markdown
Member

spolti commented Apr 2, 2025

@brettmthompson can you please fix the conflict?

…e creation

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…-changing-replicas-to-0-causes-new-pod-creation

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
@andresllh
Copy link
Copy Markdown
Member

/retest

Copy link
Copy Markdown
Member

@andresllh andresllh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

…iguration

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
@openshift-ci openshift-ci Bot removed the lgtm label Apr 4, 2025
verbs:
- get
- list
- watch No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need watch?

Copy link
Copy Markdown
Author

@brettmthompson brettmthompson Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jooho tested without watch permission. Everything works, but there are a lot of error logs in the kserve-controller-manager pod related to Failed to watch *v1beta1.KnativeServing. For this reason I think it is best to add the watch permission to avoid those logs from being emitted

Comment thread pkg/controller/v1alpha1/inferencegraph/controller_test.go Outdated
Comment thread pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go Outdated
Comment thread pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go Outdated
Comment thread pkg/controller/v1alpha1/inferencegraph/suite_test.go Outdated
Comment thread pkg/controller/v1alpha1/inferencegraph/suite_test.go Outdated
Comment thread pkg/controller/v1beta1/inferenceservice/controller_test.go Outdated
Comment thread pkg/controller/v1beta1/inferenceservice/controller_test.go Outdated
Comment thread pkg/controller/v1beta1/inferenceservice/reconcilers/knative/ksvc_reconciler.go Outdated
Comment thread pkg/controller/v1beta1/inferenceservice/suite_test.go Outdated
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…serve-manager-role rules

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
@brettmthompson brettmthompson force-pushed the feature/RHOAIENG-18965-changing-replicas-to-0-causes-new-pod-creation branch from 06b1a36 to a7187de Compare April 9, 2025 23:41
@andresllh
Copy link
Copy Markdown
Member

Will this not be going upstream first like @israel-hdez recommended?

@brettmthompson
Copy link
Copy Markdown
Author

brettmthompson commented Apr 10, 2025

Will this not be going upstream first like @israel-hdez recommended?

From the conversation the other day I believe the resolution was to merge this downstream and then recommend this feature + fixes for the other vulnerabilities identified with kserve's handling of the knative autoscaler by opening an issue upstream.

@pierDipi
Copy link
Copy Markdown
Member

can you share why?

@brettmthompson
Copy link
Copy Markdown
Author

brettmthompson commented Apr 10, 2025

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…-changing-replicas-to-0-causes-new-pod-creation

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…cale and config-autoscaler keys in the KnativeServing CR

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
@andresllh
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm label Apr 21, 2025
@openshift-merge-bot openshift-merge-bot Bot merged commit 279fb1e into opendatahub-io:master Apr 21, 2025
37 of 40 checks passed
@github-project-automation github-project-automation Bot moved this from New/Backlog to Done in ODH Model Serving Planning Apr 21, 2025
@israel-hdez
Copy link
Copy Markdown

Can we consider removing the dependency on the Knative Operator CRD? I don't think we should be reading those resources, given this PR has also added some configs in our owned ConfigMap.

@brettmthompson
Copy link
Copy Markdown
Author

brettmthompson commented Apr 21, 2025

Can we consider removing the dependency on the Knative Operator CRD? I don't think we should be reading those resources, given this PR has also added some configs in our owned ConfigMap.

@israel-hdez the KnativeServing CRD dependency is removed in the upstream PR: kserve#4394

It was decided to keep that dependency in this downstream PR due to a known issue with the serverless operator's reconcilliation of config maps. If the changes are accepted upstream we can adopt them in ODH once the serverless operator issue is resolved.

@israel-hdez
Copy link
Copy Markdown

It was decided to keep that dependency in this downstream PR due to a known issue with the serverless operator's reconcilliation of config maps.

I think the problem I see here is that Knative's source of truth are its ´ConfigMaps´. By using in KServe the operator's CRDs as source of configs we have the risk of using a config that is not the active one... unless the serverless operator has a more active role?

@brettmthompson
Copy link
Copy Markdown
Author

brettmthompson commented Apr 22, 2025

It was decided to keep that dependency in this downstream PR due to a known issue with the serverless operator's reconcilliation of config maps.

I think the problem I see here is that Knative's source of truth are its ´ConfigMaps´. By using in KServe the operator's CRDs as source of configs we have the risk of using a config that is not the active one... unless the serverless operator has a more active role?

The expected behavior of the serverless operator is to actively keep the configmaps it owns in parity with the configuration defined in the operator CRDs. Currently this functionality is not working properly and any manual changes made to these configmaps are not rolled back as they should be. This is the reason the decision was made to read the configuration directly from the CRD for the time being.

brettmthompson added a commit to brettmthompson/kserve that referenced this pull request Apr 29, 2025
…cale (opendatahub-io#537)

* adding initial-scale and max-scale annotations to knative service resources created for inference graphs

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* adding initial-scale annotation for knative service resources created for inference services

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* now checking knative autoscaler configuration prior to knative service creation

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* fixing golang linter errors

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* now relying only on the knativeserving cr to read the autoscaler configuration

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* addressing comments

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* updating comments

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* adding knative operator APIs to scheme and using kubebuilder to set kserve-manager-role rules

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* reformatting

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* updating GetAutoscalerConfiguration method to look for both the autoscale and config-autoscaler keys in the KnativeServing CR

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

* refactoring GetAutoscalerConfiguration method

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

---------

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
spolti pushed a commit that referenced this pull request Apr 29, 2025
…cale (#537) (#579)

* adding initial-scale and max-scale annotations to knative service resources created for inference graphs



* adding initial-scale annotation for knative service resources created for inference services



* now checking knative autoscaler configuration prior to knative service creation



* fixing golang linter errors



* now relying only on the knativeserving cr to read the autoscaler configuration



* addressing comments



* updating comments



* adding knative operator APIs to scheme and using kubebuilder to set kserve-manager-role rules



* reformatting



* updating GetAutoscalerConfiguration method to look for both the autoscale and config-autoscaler keys in the KnativeServing CR



* refactoring GetAutoscalerConfiguration method



---------

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants