Updating Knative Configuration and Annotations to Support 0 Initial Scale#537
Conversation
…ources created for inference graphs Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
… for inference services Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
is this valid only for inferenceGraph? |
spolti
left a comment
There was a problem hiding this comment.
Very nice @brettmthompson
just a few comments, minors
@spolti this functionality is already enabled for ISVC here. What I am doing in the IG related commit is applying the same maxReplicas logic used for ISVCs to IGs. |
israel-hdez
left a comment
There was a problem hiding this comment.
FWIW, I think this should go to upstream first to check if they would accept this.
|
@brettmthompson can you please fix the conflict? |
…e creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…-changing-replicas-to-0-causes-new-pod-creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
|
/retest |
…iguration Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
| verbs: | ||
| - get | ||
| - list | ||
| - watch No newline at end of file |
There was a problem hiding this comment.
@Jooho tested without watch permission. Everything works, but there are a lot of error logs in the kserve-controller-manager pod related to Failed to watch *v1beta1.KnativeServing. For this reason I think it is best to add the watch permission to avoid those logs from being emitted
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…serve-manager-role rules Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
06b1a36 to
a7187de
Compare
|
Will this not be going upstream first like @israel-hdez recommended? |
From the conversation the other day I believe the resolution was to merge this downstream and then recommend this feature + fixes for the other vulnerabilities identified with kserve's handling of the knative autoscaler by opening an issue upstream. |
|
can you share why? |
The issue was raised by Red Hat's BU team: |
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…-changing-replicas-to-0-causes-new-pod-creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…cale and config-autoscaler keys in the KnativeServing CR Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
|
/lgtm |
279fb1e
into
opendatahub-io:master
|
Can we consider removing the dependency on the Knative Operator CRD? I don't think we should be reading those resources, given this PR has also added some configs in our owned ConfigMap. |
@israel-hdez the KnativeServing CRD dependency is removed in the upstream PR: kserve#4394 It was decided to keep that dependency in this downstream PR due to a known issue with the serverless operator's reconcilliation of config maps. If the changes are accepted upstream we can adopt them in ODH once the serverless operator issue is resolved. |
I think the problem I see here is that Knative's source of truth are its ´ConfigMaps´. By using in KServe the operator's CRDs as source of configs we have the risk of using a config that is not the active one... unless the serverless operator has a more active role? |
The expected behavior of the serverless operator is to actively keep the configmaps it owns in parity with the configuration defined in the operator CRDs. Currently this functionality is not working properly and any manual changes made to these configmaps are not rolled back as they should be. This is the reason the decision was made to read the configuration directly from the CRD for the time being. |
…cale (opendatahub-io#537) * adding initial-scale and max-scale annotations to knative service resources created for inference graphs Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * adding initial-scale annotation for knative service resources created for inference services Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * now checking knative autoscaler configuration prior to knative service creation Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * fixing golang linter errors Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * now relying only on the knativeserving cr to read the autoscaler configuration Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * addressing comments Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * updating comments Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * adding knative operator APIs to scheme and using kubebuilder to set kserve-manager-role rules Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * reformatting Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * updating GetAutoscalerConfiguration method to look for both the autoscale and config-autoscaler keys in the KnativeServing CR Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * refactoring GetAutoscalerConfiguration method Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
…cale (#537) (#579) * adding initial-scale and max-scale annotations to knative service resources created for inference graphs * adding initial-scale annotation for knative service resources created for inference services * now checking knative autoscaler configuration prior to knative service creation * fixing golang linter errors * now relying only on the knativeserving cr to read the autoscaler configuration * addressing comments * updating comments * adding knative operator APIs to scheme and using kubebuilder to set kserve-manager-role rules * reformatting * updating GetAutoscalerConfiguration method to look for both the autoscale and config-autoscaler keys in the KnativeServing CR * refactoring GetAutoscalerConfiguration method --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com>
What this PR does / why we need it:
This PR is required to resolve RHOAIENG-18965.
The cause of "Changing 0 replicas to 0 causes a new pod creation" is the default KnativeServing configuration, which does not allow for this functionality. By default KnativeServing is configured to not allow zero initial scale and has the default initial scale value as 0. This means that for any knative revision to be considered ready at least one replica must first be created. In the case of changing an inference service's replicas to 0, a new revision is created with initial scale as the default value of 1, which causes a single pod to be spun up. Once the initial scale is reached this pod is then destroyed to achieve the desired state of 0 replicas.
Per the Knative Configuring Scale Bounds documentation, to prevent pods from being created when an inference service is configured to have 0 replicas, the following is required:
This PR does not change any existing logic regarding how the min-scale and max-scale annotations are added to ISVCs and IGs. If we want to change this logic, which I think is warranted, it should be done in a separate PR. I have a fork with those changes already in place as well.
The changes in this PR encompass the following:
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing:
Test Strategy:
Inference Service:
Inference Service:
Special notes for your reviewer:
Please confirm if this is functionality we think is worth supporting. If it is, please direct me to where documentation should be updated.
Checklist:
Release note:
Re-running failed tests
/rerun-all- rerun all failed workflows./rerun-workflow <workflow name>- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.