Updating Knative Configuration and Annotations to Support 0 Initial Scale by brettmthompson · Pull Request #537 · opendatahub-io/kserve

brettmthompson · 2025-03-19T17:44:14Z

What this PR does / why we need it:
This PR is required to resolve RHOAIENG-18965.

The cause of "Changing 0 replicas to 0 causes a new pod creation" is the default KnativeServing configuration, which does not allow for this functionality. By default KnativeServing is configured to not allow zero initial scale and has the default initial scale value as 0. This means that for any knative revision to be considered ready at least one replica must first be created. In the case of changing an inference service's replicas to 0, a new revision is created with initial scale as the default value of 1, which causes a single pod to be spun up. Once the initial scale is reached this pod is then destroyed to achieve the desired state of 0 replicas.

Per the Knative Configuring Scale Bounds documentation, to prevent pods from being created when an inference service is configured to have 0 replicas, the following is required:

KnativeServing must be configured to allow zero initial scale. Covered in PR.
The autoscaling.knative.dev/initial-scale: "0" annotation must be added to created knative revisions Covered in this PR.

This PR does not change any existing logic regarding how the min-scale and max-scale annotations are added to ISVCs and IGs. If we want to change this logic, which I think is warranted, it should be done in a separate PR. I have a fork with those changes already in place as well.

The changes in this PR encompass the following:

We now check the knativeserving autoscaler configuration while creating a new knative service for ISVCs or IGs. This configuration is defined by the knativeserving custom resource. Since there is no guarantee in which namespace this resource exists or what its name is, to retrieve this configuration we list all kantiveserving custom resources and grab the first element returned in the list. By doing do, we are operating under the assumption that only one knativeserving resource exists in the cluster.
Once the kantive serving autoscaler configuration is retrieved, there are now 2 main differences in the knative service creation flow. First, a warning will be logged if the autoscaler's globally configured initial-scale value exceeds the requested min replicas for an ISVC or IG. Second, if the autoscaler's globally configured allow-zero-inital-scale value is set to true, then we set the inital-scale annotation to 0 in the created knative service if an ISVC is requested with 0 min replicas or if an IG has min-scale: "0" annotation.

Type of changes
Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing:

Unit Tests
Manual Integration Tests

Test Strategy:

Inference Service:

Deploy ODH model serving component in a cluster
Deploy sample ISVC
Update the KnativeServing operator and apply the following value under the config key:

autoscaler:
      allow-zero-initial-scale: "true"

Scale the ODH operator to 0, this will prevent the change in step 5 from being reverted by the operator
Edit the kserve-controller-manager-deployment and update the image to be quay.io/bretthom/kserve-dev:autoscaling-odh
Edit the ISVC to change minReplicas to 0.
Observe that no new pods are created for the new knative revision

Inference Service:

Deploy ODH model serving component in a cluster
Deploy sample IG
Update the KnativeServing operator and apply the following value under the config key:

autoscaler:
      allow-zero-initial-scale: "true"

Scale the ODH operator to 0, this will prevent the change in step 5 from being reverted by the operator
Edit the kserve-controller-manager-deployment and update the image to be quay.io/bretthom/kserve-dev:autoscaling-odh
Edit the IG to add the autoscaling.knative.dev/min-scale: "0" annotation
Observe that no new pods are created for the new knative revision

Special notes for your reviewer:

Please confirm if this is functionality we think is worth supporting. If it is, please direct me to where documentation should be updated.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?

Release note:

NONE

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

…ources created for inference graphs Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

… for inference services Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

Jooho

my bad, I tested something wrong. it works now.

and brett will update the description of test steps.

spolti · 2025-03-21T14:08:46Z

Adding autoscaling.knative.dev/max-scale annotation with a corresponding value based on the Inference Graph's maxReplicas

is this valid only for inferenceGraph?
What about if in a usual ISVC, the user sets the maxReplicas field?

spolti

Very nice @brettmthompson
just a few comments, minors

brettmthompson · 2025-03-21T16:34:09Z

Adding autoscaling.knative.dev/max-scale annotation with a corresponding value based on the Inference Graph's maxReplicas

is this valid only for inferenceGraph? What about if in a usual ISVC, the user sets the maxReplicas field?

@spolti this functionality is already enabled for ISVC here. What I am doing in the IG related commit is applying the same maxReplicas logic used for ISVCs to IGs.

israel-hdez

FWIW, I think this should go to upstream first to check if they would accept this.

spolti · 2025-04-02T13:20:50Z

@brettmthompson can you please fix the conflict?

…e creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

…-changing-replicas-to-0-causes-new-pod-creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

andresllh · 2025-04-04T15:25:19Z

/retest

andresllh

/lgtm

…iguration Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

Jooho · 2025-04-04T19:08:29Z

+  verbs:
+  - get
+  - list
+  - watch


do we need watch?

@Jooho tested without watch permission. Everything works, but there are a lot of error logs in the kserve-controller-manager pod related to Failed to watch *v1beta1.KnativeServing. For this reason I think it is best to add the watch permission to avoid those logs from being emitted

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

…serve-manager-role rules Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

andresllh · 2025-04-10T14:31:41Z

Will this not be going upstream first like @israel-hdez recommended?

brettmthompson · 2025-04-10T14:56:40Z

Will this not be going upstream first like @israel-hdez recommended?

From the conversation the other day I believe the resolution was to merge this downstream and then recommend this feature + fixes for the other vulnerabilities identified with kserve's handling of the knative autoscaler by opening an issue upstream.

pierDipi · 2025-04-10T15:45:50Z

can you share why?

brettmthompson · 2025-04-10T19:08:20Z

can you share why?

The issue was raised by Red Hat's BU team:

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

…-changing-replicas-to-0-causes-new-pod-creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

…cale and config-autoscaler keys in the KnativeServing CR Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

andresllh · 2025-04-21T14:59:47Z

/lgtm

israel-hdez · 2025-04-21T20:41:11Z

Can we consider removing the dependency on the Knative Operator CRD? I don't think we should be reading those resources, given this PR has also added some configs in our owned ConfigMap.

brettmthompson · 2025-04-21T21:14:24Z

Can we consider removing the dependency on the Knative Operator CRD? I don't think we should be reading those resources, given this PR has also added some configs in our owned ConfigMap.

@israel-hdez the KnativeServing CRD dependency is removed in the upstream PR: kserve#4394

It was decided to keep that dependency in this downstream PR due to a known issue with the serverless operator's reconcilliation of config maps. If the changes are accepted upstream we can adopt them in ODH once the serverless operator issue is resolved.

israel-hdez · 2025-04-22T04:16:35Z

It was decided to keep that dependency in this downstream PR due to a known issue with the serverless operator's reconcilliation of config maps.

I think the problem I see here is that Knative's source of truth are its ´ConfigMaps´. By using in KServe the operator's CRDs as source of configs we have the risk of using a config that is not the active one... unless the serverless operator has a more active role?

brettmthompson · 2025-04-22T13:53:21Z

It was decided to keep that dependency in this downstream PR due to a known issue with the serverless operator's reconcilliation of config maps.

I think the problem I see here is that Knative's source of truth are its ´ConfigMaps´. By using in KServe the operator's CRDs as source of configs we have the risk of using a config that is not the active one... unless the serverless operator has a more active role?

The expected behavior of the serverless operator is to actively keep the configmaps it owns in parity with the configuration defined in the operator CRDs. Currently this functionality is not working properly and any manual changes made to these configmaps are not rolled back as they should be. This is the reason the decision was made to read the configuration directly from the CRD for the time being.

…cale (opendatahub-io#537) * adding initial-scale and max-scale annotations to knative service resources created for inference graphs Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * adding initial-scale annotation for knative service resources created for inference services Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * now checking knative autoscaler configuration prior to knative service creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * fixing golang linter errors Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * now relying only on the knativeserving cr to read the autoscaler configuration Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * addressing comments Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * updating comments Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * adding knative operator APIs to scheme and using kubebuilder to set kserve-manager-role rules Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * reformatting Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * updating GetAutoscalerConfiguration method to look for both the autoscale and config-autoscaler keys in the KnativeServing CR Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * refactoring GetAutoscalerConfiguration method Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

…cale (#537) (#579) * adding initial-scale and max-scale annotations to knative service resources created for inference graphs * adding initial-scale annotation for knative service resources created for inference services * now checking knative autoscaler configuration prior to knative service creation * fixing golang linter errors * now relying only on the knativeserving cr to read the autoscaler configuration * addressing comments * updating comments * adding knative operator APIs to scheme and using kubebuilder to set kserve-manager-role rules * reformatting * updating GetAutoscalerConfiguration method to look for both the autoscale and config-autoscaler keys in the KnativeServing CR * refactoring GetAutoscalerConfiguration method --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

brettmthompson added 2 commits March 19, 2025 13:17

adding initial-scale and max-scale annotations to knative service res…

d93b5df

…ources created for inference graphs Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

adding initial-scale annotation for knative service resources created…

d56f120

… for inference services Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

brettmthompson requested review from Jooho, andresllh, hdefazio, israel-hdez, mholder6 and spolti March 19, 2025 17:44

github-project-automation Bot added this to ODH Model Serving Planning Mar 19, 2025

github-project-automation Bot moved this to New/Backlog in ODH Model Serving Planning Mar 19, 2025

openshift-ci Bot added the approved label Mar 19, 2025

brettmthompson mentioned this pull request Mar 19, 2025

configuring KnativeServing to allow zero initial scale when installed opendatahub-io/opendatahub-operator#1774

Merged

7 tasks

andresllh reviewed Mar 20, 2025

View reviewed changes

Comment thread pkg/controller/v1alpha1/inferencegraph/controller_test.go Outdated

Jooho reviewed Mar 20, 2025

View reviewed changes

spolti reviewed Mar 21, 2025

View reviewed changes

Comment thread pkg/controller/v1beta1/inferenceservice/reconcilers/knative/ksvc_reconciler.go Outdated

spolti approved these changes Mar 21, 2025

View reviewed changes

zdtsw reviewed Mar 23, 2025

View reviewed changes

Comment thread pkg/controller/v1beta1/inferenceservice/reconcilers/knative/ksvc_reconciler.go Outdated

israel-hdez reviewed Mar 25, 2025

View reviewed changes

Comment thread pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go Outdated

openshift-merge-robot added the needs-rebase label Apr 2, 2025

brettmthompson added 2 commits April 2, 2025 13:09

now checking knative autoscaler configuration prior to knative servic…

d04f1e3

…e creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

Merge remote-tracking branch 'odh/master' into feature/RHOAIENG-18965…

af56467

…-changing-replicas-to-0-causes-new-pod-creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

openshift-merge-robot removed the needs-rebase label Apr 2, 2025

fixing golang linter errors

30e9124

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

andresllh approved these changes Apr 4, 2025

View reviewed changes

openshift-ci Bot assigned andresllh Apr 4, 2025

now relying only on the knativeserving cr to read the autoscaler conf…

454b2c6

…iguration Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

openshift-ci Bot removed the lgtm label Apr 4, 2025

Jooho reviewed Apr 4, 2025

View reviewed changes

brettmthompson added 3 commits April 4, 2025 17:25

addressing comments

cd9c8d0

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

updating comments

2b1e887

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

adding knative operator APIs to scheme and using kubebuilder to set k…

a7187de

…serve-manager-role rules Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

brettmthompson force-pushed the feature/RHOAIENG-18965-changing-replicas-to-0-causes-new-pod-creation branch from 06b1a36 to a7187de Compare April 9, 2025 23:41

brettmthompson mentioned this pull request Apr 10, 2025

Handle Knative Autoscaler Configuration When Creating Knative Services kserve/kserve#4389

Closed

reformatting

3113303

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

openshift-merge-robot added the needs-rebase label Apr 15, 2025

Merge remote-tracking branch 'odh/master' into feature/RHOAIENG-18965…

e14378a

…-changing-replicas-to-0-causes-new-pod-creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

openshift-merge-robot removed the needs-rebase label Apr 15, 2025

brettmthompson added 2 commits April 17, 2025 11:54

updating GetAutoscalerConfiguration method to look for both the autos…

477f1d1

…cale and config-autoscaler keys in the KnativeServing CR Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

refactoring GetAutoscalerConfiguration method

84b10c5

Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>

openshift-ci Bot added the lgtm label Apr 21, 2025

openshift-merge-bot Bot merged commit 279fb1e into opendatahub-io:master Apr 21, 2025
37 of 40 checks passed

github-project-automation Bot moved this from New/Backlog to Done in ODH Model Serving Planning Apr 21, 2025

brettmthompson mentioned this pull request May 30, 2025

Restoring zero initial scale behavior #646

Merged

8 tasks

brettmthompson mentioned this pull request Jun 30, 2025

Restore zero initial scale behavior on master branch #709

Merged

8 tasks

+                verbs:
+                - get
+                - list
+                - watch

                
                    No newline at end of file

Conversation

brettmthompson commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Jooho left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spolti commented Mar 21, 2025

Uh oh!

Uh oh!

spolti left a comment

Choose a reason for hiding this comment

Uh oh!

brettmthompson commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

israel-hdez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spolti commented Apr 2, 2025

Uh oh!

andresllh commented Apr 4, 2025

Uh oh!

andresllh left a comment

Choose a reason for hiding this comment

Uh oh!

Jooho Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

brettmthompson Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andresllh commented Apr 10, 2025

Uh oh!

brettmthompson commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pierDipi commented Apr 10, 2025

Uh oh!

brettmthompson commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andresllh commented Apr 21, 2025

Uh oh!

Uh oh!

israel-hdez commented Apr 21, 2025

Uh oh!

brettmthompson commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

israel-hdez commented Apr 22, 2025

Uh oh!

brettmthompson commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

brettmthompson commented Mar 19, 2025 •

edited

Loading

Jooho left a comment •

edited

Loading

brettmthompson commented Mar 21, 2025 •

edited

Loading

brettmthompson Apr 9, 2025 •

edited

Loading

brettmthompson commented Apr 10, 2025 •

edited

Loading

brettmthompson commented Apr 10, 2025 •

edited

Loading

brettmthompson commented Apr 21, 2025 •

edited

Loading

brettmthompson commented Apr 22, 2025 •

edited

Loading