250731 sync kserve/master into odh/master batch3#820
Conversation
…ve#4601) Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
…r different namespaces (kserve#4593) Signed-off-by: Vincent Hou <shou73@bloomberg.net>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Hannah DeFazio <h2defazio@gmail.com>
Signed-off-by: HutakiHare <sunniessheep@gmail.com>
…r_cipn (kserve#4607) Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com> Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com> Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com> Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com>
…io#784) (kserve#4619) Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
WalkthroughThis change introduces new and updated Kubernetes CRD schemas, resource templates, and supporting Go logic for large language model (LLM) inference services, including new parallelism fields and expert flags. It adds several new Go utility, lifecycle, and defaulting methods, refactors and expands Python model postprocessing for raw logits, and enhances test coverage for new output modes and controller behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant K8s API Server
participant Controller
participant LLMInferenceServiceConfig
participant LLMInferenceService
User->>K8s API Server: Apply LLMInferenceService manifest
K8s API Server->>Controller: Watch/notify new LLMInferenceService
Controller->>LLMInferenceServiceConfig: Merge base refs, overlays, presets
Controller->>LLMInferenceService: Set defaults, update conditions
Controller->>K8s API Server: Create/patch workloads, router, scheduler
Controller->>LLMInferenceService: Update status/conditions (Ready/Stopped)
sequenceDiagram
participant User
participant PythonServer
participant Model
participant Client
User->>PythonServer: Run with --return_raw_logits or --return_probabilities
Client->>PythonServer: Send inference request
PythonServer->>Model: Run model forward pass
Model-->>PythonServer: Return outputs
PythonServer->>PythonServer: Postprocess (raw logits or probabilities)
PythonServer-->>Client: Return formatted result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~75+ minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.38.6)pkg/controller/v1beta1/inferenceservice/rawkube_controller_test.go🔧 golangci-lint (2.2.2)Error: can't load config: can't unmarshal config by viper (flags, file): 1 error(s) decoding:
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mholder6 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 28
🧹 Nitpick comments (30)
test/scripts/gh-actions/run-e2e-tests.sh (1)
34-38: Quote shell variables when passing to pytest
$PARALLELISMand$NETWORK_LAYERmay contain whitespace or other shell-special characters coming from CI parameterization. Quoting avoids accidental word-splitting and protects against edge cases.- pytest -m "$MARKER" --ignore=qpext --log-cli-level=INFO -n $PARALLELISM --dist worksteal --network-layer $NETWORK_LAYER --ignore=explainer/ + pytest -m "$MARKER" --ignore=qpext --log-cli-level=INFO -n "$PARALLELISM" --dist worksteal --network-layer "$NETWORK_LAYER" --ignore=explainer/Same applies to the
elsebranch.python/huggingfaceserver/huggingfaceserver/encoder_model.py (2)
354-385: Well-structured sequence classification processing.The method correctly handles three output modes (raw logits, probabilities, indices) with proper CPU conversion and type handling.
Consider extracting the CPU conversion pattern into a small helper to reduce duplication:
def _to_cpu(self, tensor: Tensor) -> Tensor: return tensor.cpu() if tensor.is_cuda else tensor
433-474: Consider including decoded tokens for better interpretability.The methods correctly process logits and probabilities, but the output format uses token IDs as keys. Consider including decoded tokens for better interpretability.
Example enhancement:
token = self._tokenizer.decode([token_id]) token_logits.append({"token": token, "token_id": token_id, "logit": logit.item()})pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go (1)
95-99: Consider using the existing utility function for consistency.While the implementation is correct, there's an existing utility function
utils.GetForceStopRuntimethat performs the same logic. However, since this checks the annotation on the Knative service template rather than the InferenceGraph resource itself, the current implementation may be intentional.If the intention is to check the InferenceGraph's annotation rather than the Knative service template's annotation, consider using the utility:
-forceStopRuntime := false -if val, exist := desired.Spec.Template.Annotations[constants.StopAnnotationKey]; exist { - forceStopRuntime = strings.EqualFold(val, "true") -} +// If checking the IG's annotation was intended +// forceStopRuntime := utils.GetForceStopRuntime(ig)pkg/controller/v1alpha1/inferencegraph/controller_test.go (1)
1542-1765: Well-structured test implementation with comprehensive coverage!The test implementation is thorough and follows best practices. The helper functions are reusable and the test scenarios comprehensively cover the stop annotation behavior.
Consider using the defined
timeoutconstant instead of hardcoding the duration:-}, time.Second*10, interval).Should(BeTrue(), "%T %s should not be created", obj, objKey.Name) +}, timeout, interval).Should(BeTrue(), "%T %s should not be created", obj, objKey.Name)config/llmisvc/kustomization.yaml (1)
14-14: Add missing newline at end of file.YAML files should end with a newline character for proper formatting.
- config-llm-worker-data-parallel.yaml +charts/llmisvc-crd/templates/serving.kserve.io_llminferenceservices.yaml (1)
129-146: Duplicate schema block—extract with$refto avoid driftThe
prefill.parallelismblock duplicates the exact structure just defined forspec.parallelism. Duplicated schemas easily diverge over time.Consider hoisting this definition to
components.schemas.<Parallelism>and referencing it twice:components: schemas: Parallelism: type: object properties: ... required: [] # if any ... spec: properties: parallelism: $ref: '#/components/schemas/Parallelism' prefill: properties: parallelism: $ref: '#/components/schemas/Parallelism'Reduces maintenance burden and guarantees both locations stay identical.
charts/llmisvc-resources/templates/config-llm-template.yaml (1)
1-82: Add missing newline at end of file.The YAML file is missing a newline character at the end, which violates standard formatting conventions.
secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" +config/llmisvc/config-llm-router-route.yaml (1)
1-38: Add missing newline at end of file.The YAML file is missing a newline character at the end, which violates standard formatting conventions.
request: 0s +charts/llmisvc-resources/templates/config-llm-router-route.yaml (1)
1-38: Add missing newline at end of file.The YAML file is missing a newline character at the end, which violates standard formatting conventions.
request: 0s +config/llmisvc/config-llm-template.yaml (1)
35-40: Harden the securityContextGood start with
allowPrivilegeEscalation: falseand droppingMKNOD, but best-practice for public clusters is also to:
- run as non-root
- set
readOnlyRootFilesystem: true(you already mount/homefor writes)- drop all caps and add back only what is required (none for vLLM).
securityContext: allowPrivilegeEscalation: false - capabilities: - drop: - - MKNOD + runAsNonRoot: true + readOnlyRootFilesystem: true + capabilities: + drop: ["ALL"]config/llmisvc/config-llm-prefill-template.yaml (2)
16-29: Potentially duplicated model specification
vllm servealready receives the model name as positional arg (line 18) and via--served-model-name(lines 20-22). If both are supplied the binary currently ignores one, but future versions could error. Consider keeping only one form:- command: - - vllm - - serve - - "{{ .Spec.Model.Name }}" + command: ["vllm", "serve"]
30-40: Align terminationMessagePolicy with other presetsOther LLM templates use
FallbackToLogsOnError; this one setsFile. Unless there is a specific reason, use the same policy everywhere to keep operator behaviour predictable.config/crd/full/serving.kserve.io_llminferenceservices.yaml (1)
129-139: Duplicate schema fragment – consider$refor helper definitionThe
prefill.parallelismblock repeats the exact structure ofspec.model.parallelism.
While valid, duplication increases maintenance overhead and the risk of drift.
You can DRY this by defining a reusableParallelismschema incomponents.schemasand$ref-ing it in both places.config/llmisvc/config-llm-worker-data-parallel.yaml (1)
122-123: Add missing trailing newline
yamllintflags a missing EOL. Some tooling (e.g.git diff–stat) warns on this.-secret: - secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" +secret: + secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" +config/llmisvc/config-llm-decode-worker-data-parallel.yaml (1)
168-170: Missing trailing newlineAdd a final EOL to satisfy
yamllint.config/llmisvc/config-llm-prefill-worker-data-parallel.yaml (1)
122-124: Add newline at EOFInsert a trailing newline for lint cleanliness.
config/llmisvc/config-llm-scheduler.yaml (2)
64-68: Only requests, no limits definedWhile the scheduler container declares
resources.requests, it omitslimits. Defining limits prevents potential resource ballooning in failure scenarios.
83-89: EOF newline missingAdd a terminating newline to comply with
yamllint.charts/llmisvc-resources/templates/config-llm-prefill-template.yaml (3)
37-40: TerminationMessagePolicyFilemay hide crash logs
Filecaptures only the last 4 KB of the container’s STDIO. UsingFallbackToLogsOnError(as in other presets) provides richer debugging information.-terminationMessagePolicy: File +terminationMessagePolicy: FallbackToLogsOnError
38-39: Security hardeningConsider adding:
securityContext: readOnlyRootFilesystem: true runAsNonRoot: true capabilities: drop: - ALLto align with the stricter posture in the scheduler preset.
79-81: Missing trailing newlineInsert EOL to satisfy lint.
charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml (1)
123-123: Add newline at end of fileYAML files should end with a newline character per best practices.
- secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" + secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" +charts/llmisvc-resources/templates/config-llm-scheduler.yaml (1)
89-89: Add newline at end of fileYAML files should end with a newline character per best practices.
- terminationGracePeriodSeconds: 30 + terminationGracePeriodSeconds: 30 +config/llmisvc/config-llm-decode-template.yaml (2)
131-131: Add newline at end of fileYAML files should end with a newline character per best practices.
- secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" + secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" +
79-79: Consider adding security context for init containerThe init container has an empty security context. Consider adding security restrictions similar to the main container for defense in depth.
- securityContext: { } + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + capabilities: + drop: + - ALLcharts/llmisvc-resources/templates/config-llm-decode-worker-data-parallel.yaml (2)
170-170: Add newline at end of fileYAML files should end with a newline character per best practices.
- secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" + secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" +
7-54: Consider adding security context for init containerThe routing sidecar init container lacks an explicit security context. For consistency and defense in depth, consider adding security restrictions.
After line 16, add:
securityContext: allowPrivilegeEscalation: false runAsNonRoot: true capabilities: drop: - ALLpkg/controller/v1alpha1/llmisvc/config_merge.go (1)
214-216: Address TODO: Improve error messages with available configs.The TODO comment indicates that error messages should list available LLMInferenceServiceConfig resources when the requested config is not found. This would significantly improve debugging experience.
Would you like me to implement this enhancement or create an issue to track it?
charts/llmisvc-resources/templates/config-llm-decode-template.yaml (1)
131-131: Add newline at end of file.YAMLlint reports missing newline character at the end of the file.
- secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" + secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}" +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (53)
Makefile(1 hunks)charts/llmisvc-crd/templates/serving.kserve.io_llminferenceserviceconfigs.yaml(1 hunks)charts/llmisvc-crd/templates/serving.kserve.io_llminferenceservices.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-decode-template.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-decode-worker-data-parallel.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-prefill-template.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-prefill-worker-data-parallel.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-router-route.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-scheduler.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-template.yaml(1 hunks)charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml(1 hunks)config/crd/full/serving.kserve.io_llminferenceserviceconfigs.yaml(1 hunks)config/crd/full/serving.kserve.io_llminferenceservices.yaml(1 hunks)config/llmisvc/config-llm-decode-template.yaml(1 hunks)config/llmisvc/config-llm-decode-worker-data-parallel.yaml(1 hunks)config/llmisvc/config-llm-prefill-template.yaml(1 hunks)config/llmisvc/config-llm-prefill-worker-data-parallel.yaml(1 hunks)config/llmisvc/config-llm-router-route.yaml(1 hunks)config/llmisvc/config-llm-scheduler.yaml(1 hunks)config/llmisvc/config-llm-template.yaml(1 hunks)config/llmisvc/config-llm-worker-data-parallel.yaml(1 hunks)config/llmisvc/kustomization.yaml(1 hunks)pkg/agent/watcher_test.go(1 hunks)pkg/apis/serving/v1alpha1/llm_inference_service_defaults.go(1 hunks)pkg/apis/serving/v1alpha1/llm_inference_service_lifecycle.go(1 hunks)pkg/apis/serving/v1alpha1/llm_inference_service_types.go(2 hunks)pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go(1 hunks)pkg/apis/serving/v1alpha1/v1alpha1.go(1 hunks)pkg/apis/serving/v1alpha1/zz_generated.deepcopy.go(1 hunks)pkg/apis/serving/v1beta1/inference_service_validation.go(2 hunks)pkg/apis/serving/v1beta1/inference_service_validation_test.go(2 hunks)pkg/controller/v1alpha1/inferencegraph/controller.go(2 hunks)pkg/controller/v1alpha1/inferencegraph/controller_test.go(4 hunks)pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go(2 hunks)pkg/controller/v1alpha1/llmisvc/config_merge.go(2 hunks)pkg/controller/v1alpha1/llmisvc/config_merge_test.go(13 hunks)pkg/controller/v1alpha1/llmisvc/config_presets_test.go(1 hunks)pkg/controller/v1alpha1/llmisvc/sample.go(1 hunks)pkg/controller/v1alpha1/llmisvc/utils.go(1 hunks)pkg/controller/v1beta1/inferenceservice/rawkube_controller_test.go(2 hunks)pkg/controller/v1beta1/inferenceservice/reconcilers/keda/keda_reconciler.go(3 hunks)pkg/controller/v1beta1/inferenceservice/reconcilers/keda/keda_reconciler_test.go(16 hunks)pkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler.go(4 hunks)pkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler_test.go(7 hunks)python/huggingfaceserver/huggingfaceserver/__main__.py(2 hunks)python/huggingfaceserver/huggingfaceserver/encoder_model.py(4 hunks)python/huggingfaceserver/tests/test_model.py(4 hunks)python/huggingfaceserver/tests/test_output.py(1 hunks)test/e2e/logger/test_raw_logger.py(1 hunks)test/e2e/predictor/test_autoscaling.py(1 hunks)test/e2e/predictor/test_huggingface.py(4 hunks)test/e2e/predictor/test_output.py(1 hunks)test/scripts/gh-actions/run-e2e-tests.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2559-2566
Timestamp: 2025-07-10T14:58:38.489Z
Learning: In the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered external files by the maintainers and should not be flagged for YAML formatting issues or other code quality concerns.
📚 Learning: in the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered ...
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2559-2566
Timestamp: 2025-07-10T14:58:38.489Z
Learning: In the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered external files by the maintainers and should not be flagged for YAML formatting issues or other code quality concerns.
Applied to files:
Makefileconfig/llmisvc/kustomization.yamlcharts/llmisvc-resources/templates/config-llm-template.yamlconfig/llmisvc/config-llm-template.yamlcharts/llmisvc-crd/templates/serving.kserve.io_llminferenceservices.yamlconfig/llmisvc/config-llm-prefill-template.yamlcharts/llmisvc-resources/templates/config-llm-worker-data-parallel.yamlconfig/llmisvc/config-llm-prefill-worker-data-parallel.yamlcharts/llmisvc-resources/templates/config-llm-prefill-template.yamlconfig/llmisvc/config-llm-decode-template.yamlconfig/llmisvc/config-llm-scheduler.yamlconfig/llmisvc/config-llm-decode-worker-data-parallel.yamlpkg/controller/v1alpha1/llmisvc/config_presets_test.gocharts/llmisvc-resources/templates/config-llm-scheduler.yamlconfig/llmisvc/config-llm-worker-data-parallel.yamlcharts/llmisvc-resources/templates/config-llm-prefill-worker-data-parallel.yamlcharts/llmisvc-resources/templates/config-llm-decode-worker-data-parallel.yaml
📚 Learning: the file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant fo...
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:46-65
Timestamp: 2025-07-10T14:58:54.997Z
Learning: The file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant for review in this repository.
Applied to files:
config/llmisvc/kustomization.yamlcharts/llmisvc-resources/templates/config-llm-worker-data-parallel.yamlpkg/controller/v1alpha1/llmisvc/config_presets_test.go
📚 Learning: the file `test/overlays/llm-istio-experimental/istio_base.yaml` is considered external and irrelevan...
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istio_base.yaml:6854-6869
Timestamp: 2025-07-10T14:58:14.161Z
Learning: The file `test/overlays/llm-istio-experimental/istio_base.yaml` is considered external and irrelevant for review in this repository.
Applied to files:
config/llmisvc/kustomization.yaml
📚 Learning: in pkg/apis/serving/v1beta1/configmap.go, the getstorageinitializerconfigs function was moved from a...
Learnt from: israel-hdez
PR: opendatahub-io/kserve#738
File: pkg/apis/serving/v1beta1/configmap.go:0-0
Timestamp: 2025-07-17T16:19:36.771Z
Learning: In pkg/apis/serving/v1beta1/configmap.go, the GetStorageInitializerConfigs function was moved from another package to prevent circular imports, which explains why it may contain existing patterns rather than being newly written code.
Applied to files:
config/llmisvc/kustomization.yamlpkg/apis/serving/v1alpha1/v1alpha1.gopkg/controller/v1alpha1/llmisvc/utils.gopkg/apis/serving/v1beta1/inference_service_validation_test.gopkg/controller/v1beta1/inferenceservice/reconcilers/keda/keda_reconciler.gopkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler.gopkg/agent/watcher_test.gopkg/controller/v1alpha1/llmisvc/sample.gopkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler_test.gopkg/controller/v1beta1/inferenceservice/reconcilers/keda/keda_reconciler_test.gopkg/controller/v1alpha1/llmisvc/config_presets_test.gopkg/controller/v1alpha1/inferencegraph/controller_test.gopkg/controller/v1alpha1/llmisvc/config_merge.gopkg/apis/serving/v1alpha1/llm_inference_service_types_func.go
📚 Learning: in pkg/controller/v1alpha1/utils/utils.go, the condition `minreplicas == nil && constants.defaultmin...
Learnt from: brettmthompson
PR: opendatahub-io/kserve#709
File: pkg/controller/v1alpha1/utils/utils.go:118-128
Timestamp: 2025-06-30T22:30:38.045Z
Learning: In pkg/controller/v1alpha1/utils/utils.go, the condition `minReplicas == nil && constants.DefaultMinReplicas == 0` in ValidateInitialScaleAnnotation function is intentionally kept even though currently unreachable (DefaultMinReplicas = 1) to handle potential future changes to DefaultMinReplicas value, implementing defensive programming to limit future code changes.
Applied to files:
pkg/apis/serving/v1alpha1/llm_inference_service_defaults.gopkg/controller/v1alpha1/inferencegraph/knative_reconciler.gopkg/controller/v1alpha1/inferencegraph/controller.gopkg/controller/v1alpha1/inferencegraph/controller_test.go
📚 Learning: in the kserve llminferenceservice controller, the discoverurls function in router_discovery.go inten...
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#757
File: pkg/controller/llmisvc/router_discovery.go:45-49
Timestamp: 2025-07-21T17:40:11.003Z
Learning: In the KServe LLMInferenceService controller, the DiscoverURLs function in router_discovery.go intentionally returns nil, nil for missing Gateways rather than errors. This allows the reconciler to continue processing and update status conditions through the separate validation logic in ref_validation.go, preventing reconciliation loops that would occur if errors were returned.
Applied to files:
charts/llmisvc-resources/templates/config-llm-router-route.yamlconfig/llmisvc/config-llm-router-route.yamlpkg/controller/v1alpha1/inferencegraph/knative_reconciler.goconfig/llmisvc/config-llm-scheduler.yamlcharts/llmisvc-resources/templates/config-llm-scheduler.yamlpkg/controller/v1alpha1/inferencegraph/controller.gopkg/controller/v1alpha1/inferencegraph/controller_test.gopkg/controller/v1alpha1/llmisvc/config_merge.gopkg/apis/serving/v1alpha1/llm_inference_service_lifecycle.gopkg/apis/serving/v1alpha1/llm_inference_service_types_func.go
🧬 Code Graph Analysis (9)
pkg/apis/serving/v1alpha1/v1alpha1.go (1)
pkg/apis/serving/v1beta1/v1beta1.go (1)
SchemeGroupVersion(38-38)
pkg/controller/v1alpha1/llmisvc/config_merge_test.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (2)
WorkloadSpec(92-115)ParallelismSpec(247-266)
pkg/controller/v1alpha1/llmisvc/utils.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (1)
LLMInferenceServiceSpec(60-89)
pkg/apis/serving/v1alpha1/llm_inference_service_defaults.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (1)
LLMInferenceService(40-46)
pkg/apis/serving/v1beta1/inference_service_validation_test.go (3)
pkg/apis/serving/v1beta1/inference_service.go (2)
InferenceService(137-145)InferenceServiceSpec(24-36)pkg/apis/serving/v1beta1/predictor.go (2)
PredictorSpec(32-69)PredictorExtensionSpec(88-105)pkg/apis/serving/v1beta1/predictor_sklearn.go (1)
SKLearnSpec(27-30)
pkg/controller/v1alpha1/llmisvc/sample.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (13)
LLMInferenceService(40-46)LLMInferenceServiceSpec(60-89)LLMModelSpec(118-141)LLMStorageSpec(270-281)WorkloadSpec(92-115)ParallelismSpec(247-266)RouterSpec(153-174)GatewayRoutesSpec(177-181)HTTPRouteSpec(185-195)GatewaySpec(198-203)UntypedObjectReference(286-291)SchedulerSpec(223-232)InferencePoolSpec(236-244)
pkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler_test.go (1)
pkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler.go (16)
AnnotationPrometheusPort(47-47)KeyJobName(77-77)JobNameOtelCollector(42-42)KeyScrapeInterval(78-78)KeyStaticConfigs(79-79)KeyTargets(80-80)PrometheusReceiver(43-43)KeyConfig(75-75)KeyScrapeConfigs(76-76)ProcessorFilterMetrics(41-41)KeyMetrics(71-71)KeyInclude(72-72)KeyMetricNames(74-74)ProcessorResourcedetectionEnv(39-39)ProcessorTransform(40-40)PipelineMetrics(59-59)
pkg/controller/v1alpha1/inferencegraph/controller.go (2)
pkg/utils/utils.go (1)
GetForceStopRuntime(220-239)pkg/apis/serving/v1beta1/inference_service_status.go (2)
Stopped(132-132)StoppedISVCReason(205-205)
pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (5)
GatewayRoutesSpec(177-181)GatewaySpec(198-203)HTTPRouteSpec(185-195)InferencePoolSpec(236-244)ParallelismSpec(247-266)
🪛 YAMLlint (1.37.1)
config/llmisvc/kustomization.yaml
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-template.yaml
[error] 82-82: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-router-route.yaml
[error] 38-38: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-template.yaml
[error] 82-82: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-router-route.yaml
[error] 38-38: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-prefill-template.yaml
[error] 81-81: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml
[error] 123-123: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-prefill-worker-data-parallel.yaml
[error] 124-124: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-prefill-template.yaml
[error] 81-81: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-decode-template.yaml
[error] 131-131: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-scheduler.yaml
[error] 89-89: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-decode-worker-data-parallel.yaml
[error] 170-170: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-decode-template.yaml
[error] 131-131: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-scheduler.yaml
[error] 89-89: no new line character at the end of file
(new-line-at-end-of-file)
config/llmisvc/config-llm-worker-data-parallel.yaml
[error] 123-123: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-prefill-worker-data-parallel.yaml
[error] 124-124: no new line character at the end of file
(new-line-at-end-of-file)
charts/llmisvc-resources/templates/config-llm-decode-worker-data-parallel.yaml
[error] 170-170: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (82)
test/e2e/logger/test_raw_logger.py (1)
70-71: Decorator alignment looks goodAdding the
@pytest.mark.asyncio(scope="session")marker brings this test in line withtest_kserve_logger, ensuring both use the same session-scoped event loop. No further issues spotted in this segment.test/scripts/gh-actions/run-e2e-tests.sh (1)
29-29: LGTM – unconditional log line simplifies flowAlways printing the chosen parallelism is clearer than the previous conditional branch.
No further action needed.test/e2e/predictor/test_autoscaling.py (1)
683-683: LGTM! Test assertion updated to match new metric query format.The test correctly updates the expected
metricQueryto use the new fully qualified Prometheus query format with namespace and deployment label selectors. This change aligns with the corresponding updates in the KEDA reconciler logic.pkg/controller/v1beta1/inferenceservice/reconcilers/keda/keda_reconciler.go (3)
70-71: LGTM! Function signature enhancement for metric isolation.Adding the
componentMetaparameter enables the function to construct namespace and deployment-specific metric queries, which improves metric isolation in multi-tenant environments.
159-164: Excellent improvement for metric isolation and security.The new metric query construction properly isolates metrics by wrapping the base query with
sum()and adding namespace and deployment label selectors. This prevents cross-tenant metric leakage and ensures each InferenceService only scales based on its own metrics.The Prometheus query syntax is correct and follows best practices for metric aggregation.
196-196: LGTM! Function call updated to pass required parameter.The call to
getKedaMetricscorrectly passes thecomponentMetaparameter, enabling the enhanced metric query construction.pkg/controller/v1beta1/inferenceservice/rawkube_controller_test.go (2)
10358-10372: LGTM! Well-structured OpenTelemetry processor configurations.The added processor configurations follow OpenTelemetry best practices:
resourcedetection/envproperly configures environment-based resource detection with reasonable timeouttransformprocessor correctly maps Kubernetes resource attributes to metric attributes using standard attribute names
10387-10387: LGTM! Proper OpenTelemetry pipeline processor ordering.The processors are correctly ordered in the metrics pipeline:
resourcedetection/env- Resource detection firsttransform- Attribute transformation using detected resourcesfilter/metrics- Filtering last after enrichmentThis follows OpenTelemetry best practices for data processing flow.
pkg/controller/v1beta1/inferenceservice/reconcilers/keda/keda_reconciler_test.go (3)
58-61: LGTM: Consistent test setup for new function signatureThe addition of
componentMetainitialization across all test cases properly supports the updatedgetKedaMetricsfunction signature. The consistent use of "test-component" and "test-namespace" ensures uniform test data.Also applies to: 73-76, 90-93, 258-261, 288-291, 318-321, 409-412, 424-427, 453-456, 481-484, 510-513, 539-542, 585-588, 621-624
65-65: LGTM: Function calls updated for new signatureAll calls to
getKedaMetricsare correctly updated to include the newcomponentMetaparameter as the first argument, maintaining consistency across the test suite.Also applies to: 80-80, 97-97, 280-280, 310-310, 339-339, 418-418, 445-445, 473-473, 502-502, 531-531, 570-570, 609-609, 646-646
102-103: LGTM: Enhanced metric query validationThe updated expected metric queries correctly validate the new enhanced format that includes namespace and deployment selectors wrapped in a
sum()function. This improvement provides better metric targeting and aggregation capabilities.Also applies to: 651-651
pkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler_test.go (3)
50-50: LGTM: Improved maintainability with constantsThe replacement of hardcoded string literals with predefined constants (e.g.,
AnnotationPrometheusPort,KeyJobName,KeyScrapeInterval) enhances code maintainability and reduces the risk of typos in test assertions.Also applies to: 59-64, 79-84, 100-112
117-141: LGTM: Comprehensive processor configuration validationThe enhanced test logic properly validates the new processor configuration behavior:
- Always includes
ProcessorResourcedetectionEnvandProcessorTransformprocessors- Conditionally includes
ProcessorFilterMetricswhen metric names are specified- Correctly validates pipeline processor ordering
This provides thorough coverage of the processor configuration logic.
211-225: LGTM: Consistent constant usage throughout testsThe consistent use of constants in existing collector configurations maintains the refactoring pattern and improves overall code quality.
Also applies to: 258-264
pkg/controller/v1beta1/inferenceservice/reconcilers/otel/otel_reconciler.go (3)
38-85: LGTM: Excellent refactoring with comprehensive constantsThe introduction of well-organized constants significantly improves code maintainability by:
- Eliminating hardcoded strings throughout the codebase
- Following Go naming conventions consistently
- Providing clear semantic meaning for configuration values
- Reducing the risk of typos and inconsistencies
This is a high-quality refactoring that enhances code readability and maintainability.
118-151: LGTM: Well-architected processor configurationThe enhanced processor configuration logic provides a solid foundation for metric processing:
- Always includes essential processors:
resourcedetection/envfor Kubernetes environment detection andtransformfor adding namespace/deployment/pod attributes- Conditional filtering: Adds
filter/metricsprocessor only when specific metric names are specified- Proper pipeline ordering: Ensures processors are applied in the correct sequence
This architecture ensures consistent metric labeling while providing flexibility for metric filtering when needed.
163-200: LGTM: Clean and maintainable collector configurationThe OpenTelemetry Collector configuration is now much more maintainable with consistent use of constants. The configuration properly includes:
- Prometheus receiver with configurable scrape settings
- OTLP exporter with appropriate TLS configuration
- Dynamic processor configuration based on the enhanced logic
- Well-structured service pipeline connecting all components
The use of constants makes the configuration more readable and less prone to configuration errors.
test/e2e/predictor/test_output.py (1)
404-409: LGTM! Variable renaming aligns with new raw logits functionality.The renaming from
huggingface_sequence_classification_with_probabilities_expected_outputtohuggingface_sequence_classification_with_raw_logits_expected_outputcorrectly reflects the change in output format. The values are appropriate for raw logits (pre-softmax outputs).python/huggingfaceserver/tests/test_output.py (1)
18-179: LGTM! Variable renaming and structure simplification are appropriate.The renaming to
bert_token_classification_return_raw_logits_expected_outputcorrectly reflects the new functionality. The simplification of the nested structure by removing the extra list level around the logits dictionaries makes the data structure cleaner and more straightforward while maintaining the correct logit values for token classification.python/huggingfaceserver/huggingfaceserver/__main__.py (2)
142-154: LGTM! Well-implemented mutually exclusive CLI arguments.The implementation correctly uses a mutually exclusive group to prevent conflicts between
--return_probabilitiesand--return_raw_logits. The help text clearly specifies supported tasks and the argument handling follows established patterns.
303-303: LGTM! Proper argument forwarding to model constructor.The
return_raw_logitsargument is correctly forwarded from the CLI arguments to theHuggingfaceEncoderModelconstructor, maintaining consistency with other boolean flags.test/e2e/predictor/test_huggingface.py (3)
43-43: LGTM! Import updated to match new expected output variable.The import correctly references the renamed expected output variable for raw logits.
577-640: LGTM! Well-structured test for raw logits functionality.The test function is properly renamed and configured to test the
--return_raw_logitsflag. The output parsing and assertion logic correctly handles the raw logits format, and proper cleanup is maintained.
642-697: LGTM! Comprehensive test coverage for both output modes.The addition of a separate test function for probabilities ensures both output modes are properly validated. The test correctly uses the
--return_probabilitiesflag and validates the expected probability output format.python/huggingfaceserver/tests/test_model.py (6)
31-31: LGTM! Import updated for raw logits expected output.The import correctly references the renamed expected output variable for token classification raw logits.
100-111: LGTM! Well-configured fixture for sequence classification raw logits.The new fixture properly sets up the model with
return_raw_logits=Trueand follows the established pattern for model fixtures.
127-139: LGTM! Well-configured fixture for token classification raw logits.The new fixture properly sets up the token classification model with
return_raw_logits=Trueand maintains consistency with other fixtures.
284-304: LGTM! Comprehensive test for sequence classification raw logits.The test function properly validates raw logits output with appropriate floating-point approximations. The expected values appear to be valid logit values for binary classification.
308-316: LGTM! Proper test for token classification raw logits.The test function correctly validates token classification raw logits output using the imported expected output, maintaining consistency with the test pattern.
281-281: LGTM! Updated assertion for probability output format.The assertion correctly validates the probability output format as a dictionary with class probabilities.
python/huggingfaceserver/huggingfaceserver/encoder_model.py (6)
16-16: LGTM!The additional type imports (
Tuple,Union) are properly used in the new helper methods' type annotations.
294-329: Excellent refactoring for improved modularity!The delegation of task-specific processing to dedicated helper methods significantly improves code organization and maintainability. The clear docstring and consistent pattern across all ML tasks make the code much easier to understand and extend.
330-352: LGTM! Clean input normalization.The method properly handles both
TensorandInferResponsetypes, ensuring consistent output format for downstream processing.
476-514: LGTM! Consistent token classification processing.The method follows the same pattern as sequence classification, properly handling three output modes with appropriate CPU conversion and type handling.
516-539: LGTM! Standard text embedding processing.The method correctly implements mean pooling with attention mask and L2 normalization, which are standard practices for text embeddings.
105-105: Mutual exclusivity enforced at CLI levelThe
--return_probabilitiesand--return_raw_logitsflags are both added to the sameoutput_format_group(created viaparser.add_mutually_exclusive_group()) in__main__.py. This guarantees you cannot set both flags at once when invoking the server via CLI, so no additional runtime validation is required.pkg/controller/v1alpha1/inferencegraph/knative_reconciler.go (2)
106-116: LGTM!The deletion logic correctly handles the force stop scenario by checking for existing deletion timestamps to avoid duplicate deletion attempts and properly exits the retry loop.
141-146: LGTM!The code correctly prevents Knative service creation when the stop annotation is set, while maintaining consistent status reporting.
pkg/controller/v1alpha1/inferencegraph/controller.go (1)
157-158: LGTM!Correctly uses the shared utility function to retrieve the stop annotation value early in the reconciliation process.
pkg/agent/watcher_test.go (2)
706-710: LGTM! Good test refactoring.Moving the test server and HTTPSProvider client initialization earlier in the scenario iteration is a good improvement. This ensures the client is properly associated with the test server before it's used for model operations.
719-719: Excellent improvement using dynamic test server URLs.Replacing hardcoded example URLs with dynamic test server URLs (
ts.URL + "/test.tar"andts.URL + "/test.zip") makes the tests more robust and self-contained. This ensures the tests use the actual mock server instead of potentially failing external endpoints.Also applies to: 726-726
Makefile (1)
99-101: LGTM! Proper integration of LLM inference service templates.The addition of these commands correctly integrates the newly added LLM inference service configurations into the Helm chart packaging process. The pattern follows the existing approach used for other components (rbac, localmodel, etc.), ensuring consistency in the build process.
pkg/controller/v1alpha1/llmisvc/config_merge_test.go (1)
290-290: LGTM! Proper type alignment with updated API schema.The changes from
ptr.To[int64]toptr.To[int32]for parallelism fields (Tensor and Pipeline) correctly align with the updatedParallelismSpecstruct definition where these fields are now*int32. This ensures type consistency with the schema changes introduced in the CRD definitions.Also applies to: 298-298, 307-308, 382-383, 428-429, 475-476, 535-536, 567-568, 617-618, 648-649, 710-711, 967-967, 995-995, 1028-1029
pkg/apis/serving/v1beta1/inference_service_validation_test.go (1)
21-21: LGTM!The
stringsimport is correctly added to support the new test function that usesstrings.Contains().config/llmisvc/kustomization.yaml (1)
1-14: LGTM! Well-structured Kustomization file.The Kustomization configuration is properly structured with:
- Correct API version and kind
- Appropriate namespace (
kserve)- Comprehensive list of LLM inference service resource configurations
This aligns well with the broader PR changes introducing LLM inference service components.
pkg/apis/serving/v1alpha1/v1alpha1.go (1)
43-45: LGTM! Proper GVK variable definitions.The new
LLMInferenceServiceGVKandLLMInferenceServiceConfigGVKvariables are correctly defined following the established pattern in the codebase. These provide standardized GroupVersionKind identifiers for the new LLM inference service resources, which will be useful for controller logic and type identification throughout the system.charts/llmisvc-crd/templates/serving.kserve.io_llminferenceservices.yaml (1)
119-123: Downgrade fromint64→int32may break existing objects
pipelineandtensorwere previouslyint64. Narrowing toint32drops the valid range from ≈9 quadrillion to 2 billion and will reject any existing CRs using larger values.Double-check that:
- No production manifests exceed
2 147 483 647.- API go-types are updated (
*int32) everywhere to stay consistent.If there is any doubt, keep
int64or provide a conversion webhook.pkg/apis/serving/v1alpha1/llm_inference_service_defaults.go (1)
27-33: LGTM! Clean defaulting implementation.The defaulting logic follows Kubernetes best practices by implementing the
apis.Defaultableinterface correctly. The nil and empty string checks ensure robust handling, and usingptr.Tofor pointer creation is the recommended approach.pkg/apis/serving/v1beta1/inference_service_validation.go (2)
128-131: LGTM! Proper integration with existing validation flow.The new validation is correctly integrated into the main validation workflow with proper error handling and early return.
153-185: LGTM! Comprehensive validation for standard predictors.The validation function correctly:
- Logs the predictor for debugging purposes
- Checks all standard predictor types for the disallowed
namefield- Uses consistent error messaging
- Follows the existing validation patterns in the codebase
The explicit switch statement approach is preferable to reflection for this use case as it's more readable and type-safe.
pkg/controller/v1alpha1/llmisvc/utils.go (1)
25-32: LGTM! Well-structured label selector generation.The function follows Kubernetes labeling best practices using standard labels (
app.kubernetes.io/part-of,app.kubernetes.io/name) and appropriate custom labels (kserve.io/component). The unusedLLMInferenceServiceSpecparameter appears intentional for future extensibility or interface consistency.pkg/apis/serving/v1alpha1/zz_generated.deepcopy.go (1)
1163-1187: LGTM! Auto-generated deep copy methods are correct.The generated deep copy logic correctly handles:
- Type changes from
*int64to*int32for Tensor and Pipeline fields- New
*int32fields: Data, DataLocal, and DataRPCPort- Proper nil checking and pointer allocation for all fields
These changes align with the API schema updates for enhanced parallelism configuration.
config/crd/full/serving.kserve.io_llminferenceserviceconfigs.yaml (3)
100-105: Changingpipeline/tensorfromint64→int32may break existing CRsPrevious releases allowed values up to
2^63-1; truncating to 32-bit will reject any CRs that relied on larger counts. Confirm no user workloads depend on >2^31-1 values or bump the CRDversionand document the breaking change.
122-127: Double-check backward-compat risk forprefill.parallelism.pipeline|tensorAs with Lines 100-105, confirm that reducing the integer width will not invalidate pre-existing
LLMInferenceServiceConfigobjects that specify large values.
88-97: Add schema validation and descriptions for new parallelism fieldsPlease update the CRD to forbid zero/negative values and out-of-range ports, and mirror these changes in the Helm chart templates.
• File:
config/crd/full/serving.kserve.io_llminferenceserviceconfigs.yaml(lines 88–97)data: format: int32 type: integer + description: Number of data-parallel workers (must be ≥ 1) + minimum: 1 dataLocal: format: int32 type: integer + description: Number of local data-parallel shards on this node (must be ≥ 1) + minimum: 1 dataRPCPort: format: int32 type: integer + description: RPC port used by data-parallel workers (1–65535) + minimum: 1 + maximum: 65535 expert: type: boolean + description: Enables expert parallelism optimization• Ensure the same
description,minimum/maximumentries are added to the corresponding Helm chart templates so CRD and chart stay in sync.charts/llmisvc-crd/templates/serving.kserve.io_llminferenceserviceconfigs.yaml (1)
100-105: Potentially breaking change:pipeline/tensordowngraded from int64 → int32Changing the
formatfromint64toint32tightens the allowed numeric range to ±2 147 483 647.
Any existing CRs or automation that used larger counts (common in large-scale GPU clusters) will now fail validation on upgrade.Please double-check backward-compatibility requirements; if larger values must remain supported, keep
format: int64or introduce dual-field migration logic.charts/llmisvc-resources/templates/config-llm-template.yaml (3)
8-27: LGTM! Well-configured container with security best practices.The container configuration properly enables SSL/TLS with certificate mounting and uses appropriate security settings. The vllm serve configuration with SSL refresh and proper certificate paths is well-structured.
35-39: Good security practices implemented.The security context properly restricts privilege escalation and drops the MKNOD capability, following container security best practices.
42-59: Health probe configuration is appropriate.The liveness and readiness probes are well-configured with reasonable timeouts and use HTTPS for security. The initial delay and failure thresholds are sensible for LLM inference services.
config/llmisvc/config-llm-router-route.yaml (2)
36-38: Consider the implications of zero timeouts.Both
backendRequestandrequesttimeouts are set to0s, which typically means no timeout. For LLM inference services that can have long processing times, this may be intentional, but ensure this aligns with your infrastructure's timeout policies.Verify that zero timeouts are appropriate for your LLM inference workloads and won't conflict with upstream proxy or load balancer timeout settings.
10-35: Routing configuration looks correct.The Gateway API HTTP route configuration properly references the parent Gateway, sets up path prefix matching with URL rewrite, and routes to the correct InferencePool backend. The templating for dynamic names is well-structured.
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (3)
62-64: Good clarification for optional model field.Making the Model field optional for
LLMInferenceServiceConfigkind makes sense for template configurations. The comment and annotation clearly document this behavior.
254-266: Well-designed new parallelism fields.The new data parallelism fields (
Data,DataLocal,DataRPCPort) are properly typed as*int32for consistency and include appropriate documentation. TheExpertfield as a boolean is appropriate for a feature flag.
250-253: ParallelismSpec Tensor/Pipeline fields safe as int32A full-code search confirms that
TensorandPipelineare only used within theParallelismSpecstruct, its autogenerated deep-copy methods, and helper functions inllm_inference_service_types_func.go. No other code treats them asint64. All environment-variable handling, arithmetic for GPU/Ray node counts, and defaults operate onint32(or small literal values), so overflow beyond2³¹-1is effectively impossible in practice. No further changes required.charts/llmisvc-resources/templates/config-llm-router-route.yaml (2)
36-38: Consider the implications of zero timeouts.Both
backendRequestandrequesttimeouts are set to0s, which typically means no timeout. For LLM inference services that can have long processing times, this may be intentional, but ensure this aligns with your infrastructure's timeout policies.This appears to be the same configuration as
config/llmisvc/config-llm-router-route.yaml. Verify that zero timeouts are appropriate for your LLM inference workloads.
1-35: Router configuration is consistent with config version.This Helm template version of the router configuration maintains the same structure and templating as the config version, which is good for consistency.
pkg/controller/v1alpha1/llmisvc/sample.go (4)
29-36: Well-documented sample function.The function provides a comprehensive sample that covers all major components of an LLMInferenceService. The purpose and usage are clearly documented.
63-68: Good use of new parallelism fields.The sample properly demonstrates the new parallelism fields including
Data,DataLocal,Tensor, andExpert, which aligns with the API type changes.
81-90: Reasonable resource specifications.The CPU and memory resource requests and limits are appropriate for different workload types, with prefill pods getting more resources than main containers, which aligns with LLM inference patterns.
169-223: Comprehensive router configuration sample.The router specification includes all major components (HTTP routes, gateway references, scheduler with inference pool) and demonstrates the full capabilities of the routing system.
config/llmisvc/config-llm-template.yaml (1)
80-82: Verify that the TLS secret is created before this template is renderedThe template expects
{{ ChildName .ObjectMeta.Name "-kserve-self-signed-certs" }}to exist; if the controller does not create it beforehand the Pod will stay inPending.
Please double-check controller logic or Helm hooks.charts/llmisvc-resources/templates/config-llm-prefill-worker-data-parallel.yaml (2)
23-24: Arithmetic expression may break when the template renders spaces
START_RANK=$(( ${LWS_WORKER_INDEX:-0} * {{ or .Spec.Prefill.Parallelism.DataLocal 1 }} ))If
.Spec.Prefill.Parallelism.DataLocalis missing, the rendered line becomes:
$(( ${LWS_WORKER_INDEX:-0} * 1 ))✅…but if the field is present the Go template inserts a space-prefixed integer (due to YAML indent), yielding:
$(( ${LWS_WORKER_INDEX:-0} * 2 ))(double space) – Bash still parses it, yet safer is to trim:-START_RANK=$(( ${LWS_WORKER_INDEX:-0} * {{ or .Spec.Prefill.Parallelism.DataLocal 1 }} )) +START_RANK=$(( ${LWS_WORKER_INDEX:-0} * {{ trimPrefix " " (printf "%v" (or .Spec.Prefill.Parallelism.DataLocal 1)) }} ))
76-81: Review added capabilities
IPC_LOCKandSYS_RAWIOare powerful. Confirm they are strictly necessary; otherwise drop them to keep the attack surface minimal.config/llmisvc/config-llm-worker-data-parallel.yaml (1)
75-81: Questionable need for SYS_RAWIO capability
SYS_RAWIOis a privileged capability that allows raw I/O port access and is rarely required by user-space applications. Retaining it widens the pod’s attack surface.If the container does not actually perform low-level I/O, drop this capability (and ideally set
readOnlyRootFilesystem: true&runAsNonRoot) to align with the “least privilege” principle.config/llmisvc/config-llm-decode-worker-data-parallel.yaml (1)
123-128: Same privileged capability concern as worker config
SYS_RAWIOshould be dropped unless strictly necessary. Consider mirroring the more restrictive policy used in the scheduler preset (drop: [ALL],readOnlyRootFilesystem: true,runAsNonRoot: true).config/llmisvc/config-llm-prefill-worker-data-parallel.yaml (1)
76-82: Privileged capabilities should be reviewedUnless
vllmrequiresSYS_RAWIO, drop it and enablereadOnlyRootFilesystemplusrunAsNonRoot.charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml (1)
1-123: LGTM! Well-structured worker data parallel configurationThe configuration properly handles leader/worker differentiation using LWS_WORKER_INDEX and includes appropriate security controls, health probes, and TLS support.
pkg/controller/v1alpha1/llmisvc/config_presets_test.go (1)
41-354: Comprehensive test coverage for configuration presetsThe test properly validates all preset configurations, checks for well-known configs, and verifies variable substitution. Good use of table-driven tests with the detailed expected configuration for the decode-worker-data-parallel preset.
charts/llmisvc-resources/templates/config-llm-scheduler.yaml (1)
70-78: Excellent security configurationThe security context follows best practices with read-only filesystem, non-root user, dropped capabilities, and seccomp profile. This provides strong security isolation for the scheduler component.
config/llmisvc/config-llm-decode-template.yaml (1)
105-108: TLS configuration uses self-signed certificatesThe configuration uses
--decoder-tls-insecure-skip-verify=trueand--prefiller-tls-insecure-skip-verify=true, which is appropriate for self-signed certificates referenced in the volume mount. This aligns with the secret name pattern.charts/llmisvc-resources/templates/config-llm-decode-worker-data-parallel.yaml (1)
54-170: Well-structured decode worker configuration with proper parallelism supportThe configuration correctly implements leader/worker differentiation, includes comprehensive parallelism options (expert, tensor, data), and has appropriate health probes and TLS support.
pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go (1)
64-75: Verify the parallelism size calculation logic.The calculation on line 69 divides
DatabyDataLocalto compute the effective parallelism size. Please verify:
- This performs integer division which truncates decimals - is this intended?
- The formula assumes
DataLocalrepresents the local group size within data parallelism - please confirm this interpretation.Consider adding a comment to document the calculation logic:
func (p *ParallelismSpec) GetSize() *int32 { if p == nil { return nil } if p.IsDataParallel() { + // Calculate number of data parallel groups: total data parallel / local group size return ptr.To(max(ptr.Deref(p.Data, 1), 1) / max(ptr.Deref(p.DataLocal, 1), 1)) }
|
/lgtm |
7b04aea
into
opendatahub-io:master
What this PR does / why we need it:
Synced the kserve/master branch into odh/master branch [batch3]
[RHOAIENG-29333]
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests
Chores