Skip to content

feat(validation): ensures references are valid#757

Merged
openshift-merge-bot[bot] merged 1 commit into
opendatahub-io:release-v0.15from
bartoszmajsak:llm-d/feature/validation/RHOAIENG-28538
Aug 22, 2025
Merged

feat(validation): ensures references are valid#757
openshift-merge-bot[bot] merged 1 commit into
opendatahub-io:release-v0.15from
bartoszmajsak:llm-d/feature/validation/RHOAIENG-28538

Conversation

@bartoszmajsak
Copy link
Copy Markdown

@bartoszmajsak bartoszmajsak commented Jul 21, 2025

Important

Validation of referencing "target the inference service" will come as a follow-up.

This PR enhances route reconcilation logic to ensure that referenced resources such as Gateways and HTTPRoutes are valid.

This includes:

Validation of referenced resource

  • Referenced Gateways (.spec.router.gateway.ref) is not found
  • Referenced HTTPRoutes (spec.router.route.http.refs) is not found

Ensures proper targets are defined

When an existing route is referenced (e.g. router.route.http.refs), it must be validated that the resolved parent Gateways exist.

The errors are consolidated into one condition.

Fixes RHOAIENG-28538

Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Summary by CodeRabbit

  • New Features

    • Early validation of gateway and route references with clear “RefsInvalid” status reasons and messages.
    • Readiness checks now respect prior validation failures to avoid misleading Ready states.
  • Refactor

    • Streamlined router reconciliation flow with dedicated reference validation and guarded readiness evaluation.
  • Tests

    • Expanded coverage for missing/invalid gateway and route references.
    • Transitioned test setup to a builder-style pattern for consistent configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 21, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces pre-validation of router references in reconciliation, adds a validation subsystem with custom error typing and condition setting, updates reconcile flow to gate on validation, and adjusts readiness evaluations to respect validation failures. Tests are refactored to a builder/functional-options pattern and expanded for reference validation scenarios. Adds an InferencePool option.

Changes

Cohort / File(s) Summary
Validation subsystem (new/gated checks)
pkg/controller/llmisvc/router_validation.go
Adds ValidationError type, RefsInvalidReason, and functions to validate Gateway refs, HTTPRoute refs, managed HTTPRoute specs, and HTTPRoute targets. Integrates condition updates when refs are invalid.
Reconcile flow updates
pkg/controller/llmisvc/router.go
Calls validateRouterReferences early in reconcile; adjusts referenced route collection; adds guards in readiness evaluators to short-circuit when RefsInvalidReason is set.
Test refactor and coverage
pkg/controller/llmisvc/controller_int_test.go
Refactors tests to builder-style LLMInferenceService(opts...); adds options usage for model URI, managed route/gateway/scheduler, HTTPRoute spec/refs, and gateway refs; introduces scenarios validating missing refs and RouterReady conditions.
Fixture/builders options
pkg/controller/llmisvc/fixture/llmisvc_builders.go
Adds WithInferencePoolRef to set Router.Scheduler.Pool reference; integrates with existing builder/option pattern.
Public API usage/constants in tests
pkg/controller/llmisvc/...
Tests import kserve constants (GatewayName, KServeNamespace) for gateway references and namespaces.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant R as Reconciler
  participant K as K8s API
  participant S as LLMInferenceService (CR)
  participant G as Gateway(s)
  participant H as HTTPRoute(s)

  R->>S: fetch LLMInferenceService
  R->>R: validateRouterReferences(S)
  alt Validation fails (RefsInvalid)
    R->>S: set GatewaysReady/HTTPRoutesReady=false (RefsInvalidReason)
    R-->>R: abort reconcile with validation error
  else Valid
    opt Managed HTTPRoute Spec present
      R->>K: ensure managed HTTPRoute
      K-->>R: managed route reconciled
    end
    R->>K: collect referenced HTTPRoutes
    R->>R: EvaluateGatewayConditions (guard if RefsInvalid)
    R->>R: EvaluateHTTPRouteConditions (guard if RefsInvalid)
    R-->>S: update status conditions
  end
Loading
sequenceDiagram
  autonumber
  participant R as validateRouterReferences
  participant K as K8s API
  participant G as Gateway(s)
  participant H as HTTPRoute(s)

  par Gateway refs
    R->>K: Get each Gateway
    alt any NotFound
      R-->>R: aggregate missing gateway names
    end
  and HTTPRoute refs
    R->>K: Get each HTTPRoute
    alt any NotFound
      R-->>R: aggregate missing route names
    end
  and Managed HTTPRoute spec
    R->>K: Get referenced Gateways in spec
    alt any NotFound
      R-->>R: aggregate missing gateways
    end
  end
  opt HTTP refs present
    R->>K: Get routes -> validate ParentRefs to Gateways
    alt any invalid parent
      R-->>R: aggregate invalid parents
    end
  end
  alt any aggregates
    R-->>R: return ValidationError (RefsInvalidReason)
  else
    R-->>R: nil
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump my paws at refs that hide,
Gateways peek, routes stand aside.
With options neat, we build the warren,
Validate paths, no broken foreign.
Now status sings, conditions true—
A tidy burrow, fresh and new.
🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@bartoszmajsak
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/controller/llmisvc/controller_int_test.go (1)

380-381: Minor: Remove trailing space in error message

-					}).WithContext(ctx).Should(Succeed(), "Should have no managed HTTPRoutes ")
+					}).WithContext(ctx).Should(Succeed(), "Should have no managed HTTPRoutes")
pkg/controller/llmisvc/ref_validation.go (1)

195-203: Consider logging non-NotFound errors for debugging.

While ignoring transient errors is appropriate, logging them at debug level would help with troubleshooting.

 func (r *LLMInferenceServiceReconciler) validateGatewayRef(ctx context.Context, namespacedName types.NamespacedName) string {
 	refGateway := &gatewayapi.Gateway{}
 	errGet := r.Client.Get(ctx, namespacedName, refGateway)
 	if apierrors.IsNotFound(errGet) {
 		return fmt.Sprintf("Gateway %s/%s does not exist", namespacedName.Namespace, namespacedName.Name)
 	}
+	if errGet != nil {
+		log.FromContext(ctx).V(1).Info("Ignoring transient error while validating gateway", "gateway", namespacedName, "error", errGet)
+	}
 	// We ignore other errors as they might be transient and should be handled by retry logic
 	return ""
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cbf220 and 8e6532a.

📒 Files selected for processing (7)
  • pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go (1 hunks)
  • pkg/controller/llmisvc/controller_int_test.go (4 hunks)
  • pkg/controller/llmisvc/ref_validation.go (1 hunks)
  • pkg/controller/llmisvc/ref_validation_test.go (1 hunks)
  • pkg/controller/llmisvc/router.go (4 hunks)
  • pkg/controller/llmisvc/router_discovery.go (2 hunks)
  • pkg/controller/llmisvc/scheduler.go (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
pkg/controller/llmisvc/scheduler.go (1)

Learnt from: israel-hdez
PR: #738
File: pkg/apis/serving/v1beta1/configmap.go:0-0
Timestamp: 2025-07-17T16:19:36.771Z
Learning: In pkg/apis/serving/v1beta1/configmap.go, the GetStorageInitializerConfigs function was moved from another package to prevent circular imports, which explains why it may contain existing patterns rather than being newly written code.

pkg/controller/llmisvc/controller_int_test.go (2)

Learnt from: israel-hdez
PR: #738
File: pkg/apis/serving/v1beta1/configmap.go:0-0
Timestamp: 2025-07-17T16:19:36.771Z
Learning: In pkg/apis/serving/v1beta1/configmap.go, the GetStorageInitializerConfigs function was moved from another package to prevent circular imports, which explains why it may contain existing patterns rather than being newly written code.

Learnt from: brettmthompson
PR: #709
File: pkg/controller/v1alpha1/utils/utils.go:118-128
Timestamp: 2025-06-30T22:30:38.045Z
Learning: In pkg/controller/v1alpha1/utils/utils.go, the condition minReplicas == nil && constants.DefaultMinReplicas == 0 in ValidateInitialScaleAnnotation function is intentionally kept even though currently unreachable (DefaultMinReplicas = 1) to handle potential future changes to DefaultMinReplicas value, implementing defensive programming to limit future code changes.

🧬 Code Graph Analysis (3)
pkg/controller/llmisvc/ref_validation_test.go (1)
pkg/controller/llmisvc/ref_validation.go (1)
  • RefValidationResult (48-50)
pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (2)
  • SchedulerSpec (223-232)
  • LLMInferenceService (40-46)
pkg/controller/llmisvc/ref_validation.go (2)
pkg/controller/llmisvc/controller.go (1)
  • LLMInferenceServiceReconciler (95-99)
pkg/controller/llmisvc/fixture/gwapi_builders.go (2)
  • HTTPRoute (98-117)
  • Gateway (30-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build (3.9)
  • GitHub Check: test
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: Build
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: precommit-check
  • GitHub Check: test
🔇 Additional comments (11)
pkg/controller/llmisvc/scheduler.go (1)

211-217: LGTM! Good refactoring to centralize naming logic

The replacement of manual name construction with llmSvc.InferencePoolName() improves maintainability by centralizing the naming logic. Storing the result in a local variable for reuse is also a good practice.

Also applies to: 234-252

pkg/controller/llmisvc/ref_validation_test.go (1)

1-128: Well-structured and comprehensive test coverage

The tests are well-organized with:

  • Clear table-driven test structure
  • Good coverage of edge cases (no issues, single issue, multiple issues)
  • Proper testing of different issue types
  • Clear test names that describe the behavior being tested
pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go (1)

24-38: Default inference pool suffix verification

The suffix "-inference-pool" is used consistently across the codebase:

  • pkg/controller/llmisvc/sample.go sets POOL_NAME with svcName + "-inference-pool"
  • pkg/controller/llmisvc/controller_int_test.go asserts backend refs with svcName + "-inference-pool"
  • pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go uses kmeta.ChildName(..., "-inference-pool")

LGTM—nil checks and fallback logic are solid, and the comment about matching the well-known presets remains helpful.

pkg/controller/llmisvc/controller_int_test.go (2)

413-670: Excellent test coverage for reference validation

The new test cases comprehensively cover all the validation scenarios mentioned in the PR objectives:

  • Missing Gateway references
  • Missing parent Gateways in HTTPRoutes
  • Missing HTTPRoute references
  • Multiple missing references
  • Misconfigured HTTPRoutes

The tests properly verify status conditions, reasons, and error messages. Good use of test helpers and consistent test structure.


279-291: Good improvement to use actual HTTPRoute instead of comment

The change from a comment to creating an actual custom HTTPRoute makes the test more realistic and properly exercises the reference validation logic.

pkg/controller/llmisvc/router.go (3)

65-106: LGTM! Well-structured validation integration.

The validation flow is logically organized and the status update correctly prioritizes "not found" issues over "misconfigured" issues. The separation of validation logic into dedicated methods improves maintainability.


130-134: Good fix for preserving existing ParentRefs.

The change to append rather than replace ParentRefs prevents potential loss of existing gateway references, which is important for maintaining proper routing configuration.


140-140: Improved function naming.

The rename from updateRoutingStatus to updateLLMInferenceServiceURLs better describes the function's purpose.

pkg/controller/llmisvc/ref_validation.go (3)

33-119: Well-designed validation result types.

The validation types and methods are cleanly structured with good separation of concerns. The CombinedMessage method's formatting logic provides clear user-facing messages.


120-141: Correct validation logic for HTTPRoute references.

The method properly handles not-found errors as validation issues while allowing the reconciler's retry logic to handle transient errors.


143-173: Thorough validation of HTTPRoute configurations.

The method correctly validates that routes target the expected backend service and includes parent gateway validation. The nested loop with break statements efficiently checks for the correct backend.

Comment thread pkg/controller/llmisvc/router_discovery.go Outdated
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch 2 times, most recently from 18fa6a0 to 97c4616 Compare July 22, 2025 07:36
@bartoszmajsak
Copy link
Copy Markdown
Author

bartoszmajsak commented Jul 22, 2025

I think TestLLMInferenceServiceController is flaky - I will investigate.

UPDATE It's been fixed now.

Comment thread pkg/controller/llmisvc/router.go Outdated
Comment thread pkg/controller/llmisvc/router.go Outdated
Comment thread pkg/controller/llmisvc/router.go Outdated
Comment thread pkg/controller/llmisvc/router.go Outdated
@bartoszmajsak bartoszmajsak changed the title feat(validation): ensures references are valid [WIP] feat(validation): ensures references are valid Jul 23, 2025
@pierDipi pierDipi changed the base branch from feature-llmd-llm-inference-service to release-v0.15 July 25, 2025 13:48
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from af45e5f to 14953b1 Compare August 21, 2025 21:16
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from 14953b1 to 416f5d0 Compare August 21, 2025 21:21
@bartoszmajsak bartoszmajsak changed the title [WIP] feat(validation): ensures references are valid feat(validation): ensures references are valid Aug 21, 2025
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from 416f5d0 to a0afecd Compare August 21, 2025 21:31
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from a0afecd to 4846732 Compare August 21, 2025 21:34
@bartoszmajsak
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Member

@andresllh andresllh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Aug 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andresllh, bartoszmajsak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [andresllh,bartoszmajsak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
pkg/controller/llmisvc/fixture/llmisvc_builders.go (1)

245-259: Nit: guard against empty pool name in test fixtures

A small defensive check avoids silent misconfigurations in tests.

Apply this diff:

 func WithInferencePoolRef(poolName string) LLMInferenceServiceOption {
   return func(llmSvc *v1alpha1.LLMInferenceService) {
+    if poolName == "" {
+      // For test fixtures, panic is acceptable to surface misconfiguration early
+      panic("WithInferencePoolRef: poolName must be non-empty")
+    }
     if llmSvc.Spec.Router == nil {
       llmSvc.Spec.Router = &v1alpha1.RouterSpec{}
     }
     if llmSvc.Spec.Router.Scheduler == nil {
       llmSvc.Spec.Router.Scheduler = &v1alpha1.SchedulerSpec{}
     }
     llmSvc.Spec.Router.Scheduler.Pool = &v1alpha1.InferencePoolSpec{
       Ref: &corev1.LocalObjectReference{
         Name: poolName,
       },
     }
   }
 }
pkg/controller/llmisvc/router.go (2)

52-54: Treat ValidationError as non-fatal to avoid error-based requeue loops

Consider swallowing typed validation errors here (conditions are already marked) to avoid tight error requeues. This aligns with the existing pattern of not erroring on missing infra (see DiscoverURLs behavior) and still gates readiness later.

Apply this diff:

- if err := r.validateRouterReferences(ctx, llmSvc); err != nil {
-   return err
- }
+ if err := r.validateRouterReferences(ctx, llmSvc); err != nil {
+   if IsValidationError(err) {
+     // Conditions already set; avoid error-based requeue loop.
+     return nil
+   }
+   return err
+ }

If you prefer requeues, consider returning a controller-runtime Result with backoff at the outer reconcile, but avoid bubbling errors for pure validation failures.


129-136: Fix HTTPRoute error message: wrong namespace/name order and wrong resource used

The error logs a wrong pair (uses service name and swaps parts). Use namespace from the service and the referenced route name.

Apply this diff:

-      return referencedRoutes, fmt.Errorf("failed to get HTTPRoute %s/%s: %w", routeRef.Name, llmSvc.GetName(), err)
+      return referencedRoutes, fmt.Errorf("failed to get HTTPRoute %s/%s: %w", llmSvc.GetNamespace(), routeRef.Name, err)
pkg/controller/llmisvc/router_validation.go (2)

62-109: Optionally aggregate all validation issues before returning

Currently the first failing phase short-circuits. Consider aggregating all validation failures (gateways + routes + targets) into a single ValidationError, marking the respective subconditions, then return once. This surfaces all issues in one reconcile and matches the “single condition with detailed message” approach discussed in the PR.

If interested, I can draft an aggregator that collects messages from each validator and marks GatewaysReady/HTTPRoutesReady accordingly before returning a combined error.


152-177: Fix HTTPRoute fetch error message (namespace/name order)

Minor but useful for debugging: the error swaps namespace and name.

Apply this diff:

-    return fmt.Errorf("failed to get HTTPRoute %s/%s: %w", routeRef.Name, llmSvc.GetNamespace(), err)
+    return fmt.Errorf("failed to get HTTPRoute %s/%s: %w", llmSvc.GetNamespace(), routeRef.Name, err)
pkg/controller/llmisvc/controller_int_test.go (2)

485-488: Minor: prefer direct spec mutation over invoking options in-place

Inside CreateOrUpdate, calling WithHTTPRouteRefs(...)(llmSvc) and WithGatewayRefs(...)(llmSvc) works, but mutating llmSvc.Spec directly avoids any side effects from option functions.

Example inline change (illustrative):

llmSvc.Spec.Router.Route.HTTP.Refs = []corev1.LocalObjectReference{{Name: "my-custom-route"}}
llmSvc.Spec.Router.Gateway.Refs = []v1alpha1.UntypedObjectReference{ LLMGatewayRef(gateway.Name, gateway.Namespace) }

672-814: Reference-validation test block is valuable and precise

Covers:

  • missing Gateway ref,
  • managed HTTPRoute with non-existent parent Gateway, and
  • missing HTTPRoute ref.

Assertions on RouterReady=False with RefsInvalid reason and message substrings look solid.

Two optional additions to strengthen coverage:

  • Add a case where a referenced HTTPRoute exists but its ParentRef points to a non-existent Gateway (exercises validateHTTPRouteTargets for refs).
  • Add a negative InferencePool reference case (missing pool) to assert InferencePoolReady=False path.

If you want, I can contribute a small test that creates an HTTPRoute with an invalid parent and references it from LLMInferenceService to cover the referenced-route target validation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8e6532a and 4846732.

📒 Files selected for processing (4)
  • pkg/controller/llmisvc/controller_int_test.go (9 hunks)
  • pkg/controller/llmisvc/fixture/llmisvc_builders.go (1 hunks)
  • pkg/controller/llmisvc/router.go (4 hunks)
  • pkg/controller/llmisvc/router_validation.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-21T17:40:11.003Z
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#757
File: pkg/controller/llmisvc/router_discovery.go:45-49
Timestamp: 2025-07-21T17:40:11.003Z
Learning: In the KServe LLMInferenceService controller, the DiscoverURLs function in router_discovery.go intentionally returns nil, nil for missing Gateways rather than errors. This allows the reconciler to continue processing and update status conditions through the separate validation logic in ref_validation.go, preventing reconciliation loops that would occur if errors were returned.

Applied to files:

  • pkg/controller/llmisvc/router_validation.go
  • pkg/controller/llmisvc/router.go
  • pkg/controller/llmisvc/controller_int_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build
  • GitHub Check: precommit-check
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.10)
  • GitHub Check: test
🔇 Additional comments (15)
pkg/controller/llmisvc/fixture/llmisvc_builders.go (1)

245-259: LGTM: helper correctly lazy-initializes and sets InferencePool ref

The option properly guards nils on Router and Scheduler and only mutates the Pool reference. Good addition for the test builder API.

pkg/controller/llmisvc/router.go (3)

245-251: Gating readiness on RefsInvalid reason looks good

Early exit when RefsInvalid is already set prevents redundant API calls and noisy logs.


339-345: HTTPRoute readiness gating is consistent

The check mirrors Gateway gating and avoids double-reporting after validation failures.


121-123: Remove unnecessary nil guard for Route.HTTP.HasRefs

The HasRefs method on HTTPRouteSpec already guards against a nil receiver—it returns false if r == nil and only then checks len(r.Refs). Calling

llmSvc.Spec.Router.Route.HTTP.HasRefs()

is therefore safe even when Route.HTTP is nil. The extra || llmSvc.Spec.Router.Route.HTTP == nil check is redundant and can be removed.

• No change needed in pkg/controller/llmisvc/router.go (lines 121–123).
• Discard the previously proposed diff.

Example from pkg/apis/serving/v1alpha1/llm_inference_service_types_func.go:

func (r *HTTPRouteSpec) HasRefs() bool {
    return r != nil && len(r.Refs) > 0
}

Likely an incorrect or invalid review comment.

pkg/controller/llmisvc/router_validation.go (5)

33-35: Reason constant is clear and reusable

Exporting RefsInvalidReason simplifies status assertions in tests and callers.


37-56: ValidationError type is appropriate for condition-driven flows

Simple and sufficient for non-fatal validation paths. No changes needed.


111-149: Gateway refs validation LGTM

Covers namespace defaulting and cleanly aggregates missing refs. Logging on unexpected errors is good.


179-222: Route target validation is thorough and non-fatal

Per-route aggregation with continued validation is the right call. Logs unexpected errors and reports all missing parents in one message.


223-265: Managed HTTPRoute spec validation is correct

Namespace defaulting and error handling are consistent with referenced-route validation. Good separation of concerns.

pkg/controller/llmisvc/controller_int_test.go (6)

24-25: Using constants for gateway naming improves test stability

Good import of constants.GatewayName/KServeNamespace to reduce hard-coded strings.


187-193: Builder-based setup reads well and reduces duplication

Clean switch to functional options. This makes scenarios easier to compose.


247-253: Covers external InferencePool reference path

Nice addition validating that backend refs point to the referenced pool when provided.


311-316: No-scheduler scenario well covered

Asserts that backend points to workload service when scheduler is disabled.


381-386: Managed HTTPRoute spec wiring is exercised properly

Creating a custom Gateway and ensuring route readiness makes this test deterministic.


471-479: Transition test: great use of custom HTTPRoute and readiness helper

Nicely validates deletion of managed route when refs are introduced.

@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from 4846732 to 8a9f5c9 Compare August 21, 2025 21:50
@openshift-ci openshift-ci Bot removed the lgtm label Aug 21, 2025
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from 8a9f5c9 to 5a9242c Compare August 21, 2025 21:51
@andresllh
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm label Aug 22, 2025
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from 5a9242c to f7ae49c Compare August 22, 2025 03:53
@openshift-ci openshift-ci Bot removed the lgtm label Aug 22, 2025
This PR enhances route reconcilation logic to ensure that referenced resources such as Gateways and HTTPRoutes are valid.

This includes:

## Validation of referenced resource

* Referenced `Gateway`s (`.spec.router.gateway.ref`) is not found
* Referenced `HTTPRoute`s (`spec.router.route.http.refs`) is not found

### Ensures proper targets are defined

When existing route is referenced (e.g. `router.route.http.refs`) it must be validated that resolved parent `Gateway`s exist.

The errors are consolidated into one condition.

Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>

; Conflicts:
;	pkg/controller/llmisvc/controller_int_test.go

Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
@bartoszmajsak bartoszmajsak force-pushed the llm-d/feature/validation/RHOAIENG-28538 branch from f7ae49c to 1e0e90c Compare August 22, 2025 03:59
@israel-hdez
Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm label Aug 22, 2025
@openshift-merge-bot openshift-merge-bot Bot merged commit ca1e902 into opendatahub-io:release-v0.15 Aug 22, 2025
33 of 35 checks passed
@github-project-automation github-project-automation Bot moved this from New/Backlog to Done in ODH Model Serving Planning Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants