feat(validation): ensures references are valid#757
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughIntroduces pre-validation of router references in reconciliation, adds a validation subsystem with custom error typing and condition setting, updates reconcile flow to gate on validation, and adjusts readiness evaluations to respect validation failures. Tests are refactored to a builder/functional-options pattern and expanded for reference validation scenarios. Adds an InferencePool option. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant R as Reconciler
participant K as K8s API
participant S as LLMInferenceService (CR)
participant G as Gateway(s)
participant H as HTTPRoute(s)
R->>S: fetch LLMInferenceService
R->>R: validateRouterReferences(S)
alt Validation fails (RefsInvalid)
R->>S: set GatewaysReady/HTTPRoutesReady=false (RefsInvalidReason)
R-->>R: abort reconcile with validation error
else Valid
opt Managed HTTPRoute Spec present
R->>K: ensure managed HTTPRoute
K-->>R: managed route reconciled
end
R->>K: collect referenced HTTPRoutes
R->>R: EvaluateGatewayConditions (guard if RefsInvalid)
R->>R: EvaluateHTTPRouteConditions (guard if RefsInvalid)
R-->>S: update status conditions
end
sequenceDiagram
autonumber
participant R as validateRouterReferences
participant K as K8s API
participant G as Gateway(s)
participant H as HTTPRoute(s)
par Gateway refs
R->>K: Get each Gateway
alt any NotFound
R-->>R: aggregate missing gateway names
end
and HTTPRoute refs
R->>K: Get each HTTPRoute
alt any NotFound
R-->>R: aggregate missing route names
end
and Managed HTTPRoute spec
R->>K: Get referenced Gateways in spec
alt any NotFound
R-->>R: aggregate missing gateways
end
end
opt HTTP refs present
R->>K: Get routes -> validate ParentRefs to Gateways
alt any invalid parent
R-->>R: aggregate invalid parents
end
end
alt any aggregates
R-->>R: return ValidationError (RefsInvalidReason)
else
R-->>R: nil
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/controller/llmisvc/controller_int_test.go (1)
380-381: Minor: Remove trailing space in error message- }).WithContext(ctx).Should(Succeed(), "Should have no managed HTTPRoutes ") + }).WithContext(ctx).Should(Succeed(), "Should have no managed HTTPRoutes")pkg/controller/llmisvc/ref_validation.go (1)
195-203: Consider logging non-NotFound errors for debugging.While ignoring transient errors is appropriate, logging them at debug level would help with troubleshooting.
func (r *LLMInferenceServiceReconciler) validateGatewayRef(ctx context.Context, namespacedName types.NamespacedName) string { refGateway := &gatewayapi.Gateway{} errGet := r.Client.Get(ctx, namespacedName, refGateway) if apierrors.IsNotFound(errGet) { return fmt.Sprintf("Gateway %s/%s does not exist", namespacedName.Namespace, namespacedName.Name) } + if errGet != nil { + log.FromContext(ctx).V(1).Info("Ignoring transient error while validating gateway", "gateway", namespacedName, "error", errGet) + } // We ignore other errors as they might be transient and should be handled by retry logic return "" }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go(1 hunks)pkg/controller/llmisvc/controller_int_test.go(4 hunks)pkg/controller/llmisvc/ref_validation.go(1 hunks)pkg/controller/llmisvc/ref_validation_test.go(1 hunks)pkg/controller/llmisvc/router.go(4 hunks)pkg/controller/llmisvc/router_discovery.go(2 hunks)pkg/controller/llmisvc/scheduler.go(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
pkg/controller/llmisvc/scheduler.go (1)
Learnt from: israel-hdez
PR: #738
File: pkg/apis/serving/v1beta1/configmap.go:0-0
Timestamp: 2025-07-17T16:19:36.771Z
Learning: In pkg/apis/serving/v1beta1/configmap.go, the GetStorageInitializerConfigs function was moved from another package to prevent circular imports, which explains why it may contain existing patterns rather than being newly written code.
pkg/controller/llmisvc/controller_int_test.go (2)
Learnt from: israel-hdez
PR: #738
File: pkg/apis/serving/v1beta1/configmap.go:0-0
Timestamp: 2025-07-17T16:19:36.771Z
Learning: In pkg/apis/serving/v1beta1/configmap.go, the GetStorageInitializerConfigs function was moved from another package to prevent circular imports, which explains why it may contain existing patterns rather than being newly written code.
Learnt from: brettmthompson
PR: #709
File: pkg/controller/v1alpha1/utils/utils.go:118-128
Timestamp: 2025-06-30T22:30:38.045Z
Learning: In pkg/controller/v1alpha1/utils/utils.go, the condition minReplicas == nil && constants.DefaultMinReplicas == 0 in ValidateInitialScaleAnnotation function is intentionally kept even though currently unreachable (DefaultMinReplicas = 1) to handle potential future changes to DefaultMinReplicas value, implementing defensive programming to limit future code changes.
🧬 Code Graph Analysis (3)
pkg/controller/llmisvc/ref_validation_test.go (1)
pkg/controller/llmisvc/ref_validation.go (1)
RefValidationResult(48-50)
pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (2)
SchedulerSpec(223-232)LLMInferenceService(40-46)
pkg/controller/llmisvc/ref_validation.go (2)
pkg/controller/llmisvc/controller.go (1)
LLMInferenceServiceReconciler(95-99)pkg/controller/llmisvc/fixture/gwapi_builders.go (2)
HTTPRoute(98-117)Gateway(30-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build (3.9)
- GitHub Check: test
- GitHub Check: build (3.10)
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: Build
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: precommit-check
- GitHub Check: test
🔇 Additional comments (11)
pkg/controller/llmisvc/scheduler.go (1)
211-217: LGTM! Good refactoring to centralize naming logicThe replacement of manual name construction with
llmSvc.InferencePoolName()improves maintainability by centralizing the naming logic. Storing the result in a local variable for reuse is also a good practice.Also applies to: 234-252
pkg/controller/llmisvc/ref_validation_test.go (1)
1-128: Well-structured and comprehensive test coverageThe tests are well-organized with:
- Clear table-driven test structure
- Good coverage of edge cases (no issues, single issue, multiple issues)
- Proper testing of different issue types
- Clear test names that describe the behavior being tested
pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go (1)
24-38: Default inference pool suffix verificationThe suffix
"-inference-pool"is used consistently across the codebase:
- pkg/controller/llmisvc/sample.go sets
POOL_NAMEwithsvcName + "-inference-pool"- pkg/controller/llmisvc/controller_int_test.go asserts backend refs with
svcName + "-inference-pool"- pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go uses
kmeta.ChildName(..., "-inference-pool")LGTM—nil checks and fallback logic are solid, and the comment about matching the well-known presets remains helpful.
pkg/controller/llmisvc/controller_int_test.go (2)
413-670: Excellent test coverage for reference validationThe new test cases comprehensively cover all the validation scenarios mentioned in the PR objectives:
- Missing Gateway references
- Missing parent Gateways in HTTPRoutes
- Missing HTTPRoute references
- Multiple missing references
- Misconfigured HTTPRoutes
The tests properly verify status conditions, reasons, and error messages. Good use of test helpers and consistent test structure.
279-291: Good improvement to use actual HTTPRoute instead of commentThe change from a comment to creating an actual custom HTTPRoute makes the test more realistic and properly exercises the reference validation logic.
pkg/controller/llmisvc/router.go (3)
65-106: LGTM! Well-structured validation integration.The validation flow is logically organized and the status update correctly prioritizes "not found" issues over "misconfigured" issues. The separation of validation logic into dedicated methods improves maintainability.
130-134: Good fix for preserving existing ParentRefs.The change to append rather than replace ParentRefs prevents potential loss of existing gateway references, which is important for maintaining proper routing configuration.
140-140: Improved function naming.The rename from
updateRoutingStatustoupdateLLMInferenceServiceURLsbetter describes the function's purpose.pkg/controller/llmisvc/ref_validation.go (3)
33-119: Well-designed validation result types.The validation types and methods are cleanly structured with good separation of concerns. The
CombinedMessagemethod's formatting logic provides clear user-facing messages.
120-141: Correct validation logic for HTTPRoute references.The method properly handles not-found errors as validation issues while allowing the reconciler's retry logic to handle transient errors.
143-173: Thorough validation of HTTPRoute configurations.The method correctly validates that routes target the expected backend service and includes parent gateway validation. The nested loop with break statements efficiently checks for the correct backend.
18fa6a0 to
97c4616
Compare
|
UPDATE It's been fixed now. |
97c4616 to
dc3dae7
Compare
af45e5f to
14953b1
Compare
14953b1 to
416f5d0
Compare
416f5d0 to
a0afecd
Compare
a0afecd to
4846732
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andresllh, bartoszmajsak The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
pkg/controller/llmisvc/fixture/llmisvc_builders.go (1)
245-259: Nit: guard against empty pool name in test fixturesA small defensive check avoids silent misconfigurations in tests.
Apply this diff:
func WithInferencePoolRef(poolName string) LLMInferenceServiceOption { return func(llmSvc *v1alpha1.LLMInferenceService) { + if poolName == "" { + // For test fixtures, panic is acceptable to surface misconfiguration early + panic("WithInferencePoolRef: poolName must be non-empty") + } if llmSvc.Spec.Router == nil { llmSvc.Spec.Router = &v1alpha1.RouterSpec{} } if llmSvc.Spec.Router.Scheduler == nil { llmSvc.Spec.Router.Scheduler = &v1alpha1.SchedulerSpec{} } llmSvc.Spec.Router.Scheduler.Pool = &v1alpha1.InferencePoolSpec{ Ref: &corev1.LocalObjectReference{ Name: poolName, }, } } }pkg/controller/llmisvc/router.go (2)
52-54: Treat ValidationError as non-fatal to avoid error-based requeue loopsConsider swallowing typed validation errors here (conditions are already marked) to avoid tight error requeues. This aligns with the existing pattern of not erroring on missing infra (see DiscoverURLs behavior) and still gates readiness later.
Apply this diff:
- if err := r.validateRouterReferences(ctx, llmSvc); err != nil { - return err - } + if err := r.validateRouterReferences(ctx, llmSvc); err != nil { + if IsValidationError(err) { + // Conditions already set; avoid error-based requeue loop. + return nil + } + return err + }If you prefer requeues, consider returning a controller-runtime Result with backoff at the outer reconcile, but avoid bubbling errors for pure validation failures.
129-136: Fix HTTPRoute error message: wrong namespace/name order and wrong resource usedThe error logs a wrong pair (uses service name and swaps parts). Use namespace from the service and the referenced route name.
Apply this diff:
- return referencedRoutes, fmt.Errorf("failed to get HTTPRoute %s/%s: %w", routeRef.Name, llmSvc.GetName(), err) + return referencedRoutes, fmt.Errorf("failed to get HTTPRoute %s/%s: %w", llmSvc.GetNamespace(), routeRef.Name, err)pkg/controller/llmisvc/router_validation.go (2)
62-109: Optionally aggregate all validation issues before returningCurrently the first failing phase short-circuits. Consider aggregating all validation failures (gateways + routes + targets) into a single ValidationError, marking the respective subconditions, then return once. This surfaces all issues in one reconcile and matches the “single condition with detailed message” approach discussed in the PR.
If interested, I can draft an aggregator that collects messages from each validator and marks GatewaysReady/HTTPRoutesReady accordingly before returning a combined error.
152-177: Fix HTTPRoute fetch error message (namespace/name order)Minor but useful for debugging: the error swaps namespace and name.
Apply this diff:
- return fmt.Errorf("failed to get HTTPRoute %s/%s: %w", routeRef.Name, llmSvc.GetNamespace(), err) + return fmt.Errorf("failed to get HTTPRoute %s/%s: %w", llmSvc.GetNamespace(), routeRef.Name, err)pkg/controller/llmisvc/controller_int_test.go (2)
485-488: Minor: prefer direct spec mutation over invoking options in-placeInside CreateOrUpdate, calling WithHTTPRouteRefs(...)(llmSvc) and WithGatewayRefs(...)(llmSvc) works, but mutating llmSvc.Spec directly avoids any side effects from option functions.
Example inline change (illustrative):
llmSvc.Spec.Router.Route.HTTP.Refs = []corev1.LocalObjectReference{{Name: "my-custom-route"}} llmSvc.Spec.Router.Gateway.Refs = []v1alpha1.UntypedObjectReference{ LLMGatewayRef(gateway.Name, gateway.Namespace) }
672-814: Reference-validation test block is valuable and preciseCovers:
- missing Gateway ref,
- managed HTTPRoute with non-existent parent Gateway, and
- missing HTTPRoute ref.
Assertions on RouterReady=False with RefsInvalid reason and message substrings look solid.
Two optional additions to strengthen coverage:
- Add a case where a referenced HTTPRoute exists but its ParentRef points to a non-existent Gateway (exercises validateHTTPRouteTargets for refs).
- Add a negative InferencePool reference case (missing pool) to assert InferencePoolReady=False path.
If you want, I can contribute a small test that creates an HTTPRoute with an invalid parent and references it from LLMInferenceService to cover the referenced-route target validation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
pkg/controller/llmisvc/controller_int_test.go(9 hunks)pkg/controller/llmisvc/fixture/llmisvc_builders.go(1 hunks)pkg/controller/llmisvc/router.go(4 hunks)pkg/controller/llmisvc/router_validation.go(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-21T17:40:11.003Z
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#757
File: pkg/controller/llmisvc/router_discovery.go:45-49
Timestamp: 2025-07-21T17:40:11.003Z
Learning: In the KServe LLMInferenceService controller, the DiscoverURLs function in router_discovery.go intentionally returns nil, nil for missing Gateways rather than errors. This allows the reconciler to continue processing and update status conditions through the separate validation logic in ref_validation.go, preventing reconciliation loops that would occur if errors were returned.
Applied to files:
pkg/controller/llmisvc/router_validation.gopkg/controller/llmisvc/router.gopkg/controller/llmisvc/controller_int_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build
- GitHub Check: precommit-check
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: build (3.9)
- GitHub Check: build (3.10)
- GitHub Check: test
🔇 Additional comments (15)
pkg/controller/llmisvc/fixture/llmisvc_builders.go (1)
245-259: LGTM: helper correctly lazy-initializes and sets InferencePool refThe option properly guards nils on Router and Scheduler and only mutates the Pool reference. Good addition for the test builder API.
pkg/controller/llmisvc/router.go (3)
245-251: Gating readiness on RefsInvalid reason looks goodEarly exit when RefsInvalid is already set prevents redundant API calls and noisy logs.
339-345: HTTPRoute readiness gating is consistentThe check mirrors Gateway gating and avoids double-reporting after validation failures.
121-123: Remove unnecessary nil guard for Route.HTTP.HasRefsThe
HasRefsmethod onHTTPRouteSpecalready guards against a nil receiver—it returnsfalseifr == niland only then checkslen(r.Refs). CallingllmSvc.Spec.Router.Route.HTTP.HasRefs()is therefore safe even when
Route.HTTPisnil. The extra|| llmSvc.Spec.Router.Route.HTTP == nilcheck is redundant and can be removed.• No change needed in
pkg/controller/llmisvc/router.go(lines 121–123).
• Discard the previously proposed diff.Example from pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go:
func (r *HTTPRouteSpec) HasRefs() bool { return r != nil && len(r.Refs) > 0 }Likely an incorrect or invalid review comment.
pkg/controller/llmisvc/router_validation.go (5)
33-35: Reason constant is clear and reusableExporting RefsInvalidReason simplifies status assertions in tests and callers.
37-56: ValidationError type is appropriate for condition-driven flowsSimple and sufficient for non-fatal validation paths. No changes needed.
111-149: Gateway refs validation LGTMCovers namespace defaulting and cleanly aggregates missing refs. Logging on unexpected errors is good.
179-222: Route target validation is thorough and non-fatalPer-route aggregation with continued validation is the right call. Logs unexpected errors and reports all missing parents in one message.
223-265: Managed HTTPRoute spec validation is correctNamespace defaulting and error handling are consistent with referenced-route validation. Good separation of concerns.
pkg/controller/llmisvc/controller_int_test.go (6)
24-25: Using constants for gateway naming improves test stabilityGood import of constants.GatewayName/KServeNamespace to reduce hard-coded strings.
187-193: Builder-based setup reads well and reduces duplicationClean switch to functional options. This makes scenarios easier to compose.
247-253: Covers external InferencePool reference pathNice addition validating that backend refs point to the referenced pool when provided.
311-316: No-scheduler scenario well coveredAsserts that backend points to workload service when scheduler is disabled.
381-386: Managed HTTPRoute spec wiring is exercised properlyCreating a custom Gateway and ensuring route readiness makes this test deterministic.
471-479: Transition test: great use of custom HTTPRoute and readiness helperNicely validates deletion of managed route when refs are introduced.
4846732 to
8a9f5c9
Compare
8a9f5c9 to
5a9242c
Compare
|
/lgtm |
5a9242c to
f7ae49c
Compare
This PR enhances route reconcilation logic to ensure that referenced resources such as Gateways and HTTPRoutes are valid. This includes: ## Validation of referenced resource * Referenced `Gateway`s (`.spec.router.gateway.ref`) is not found * Referenced `HTTPRoute`s (`spec.router.route.http.refs`) is not found ### Ensures proper targets are defined When existing route is referenced (e.g. `router.route.http.refs`) it must be validated that resolved parent `Gateway`s exist. The errors are consolidated into one condition. Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com> ; Conflicts: ; pkg/controller/llmisvc/controller_int_test.go Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
f7ae49c to
1e0e90c
Compare
|
/lgtm |
ca1e902
into
opendatahub-io:release-v0.15
Important
Validation of referencing "target the inference service" will come as a follow-up.
This PR enhances route reconcilation logic to ensure that referenced resources such as Gateways and HTTPRoutes are valid.
This includes:
Validation of referenced resource
Gateways (.spec.router.gateway.ref) is not foundHTTPRoutes (spec.router.route.http.refs) is not foundEnsures proper targets are defined
When an existing route is referenced (e.g.
router.route.http.refs), it must be validated that the resolved parentGateways exist.The errors are consolidated into one condition.
Fixes RHOAIENG-28538
Re-running failed tests
/rerun-all- rerun all failed workflows./rerun-workflow <workflow name>- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.Summary by CodeRabbit
New Features
Refactor
Tests