Skip to content

[WIP] test(e2e): introduces basic e2e test for a sample single node model#736

Closed
bartoszmajsak wants to merge 2 commits into
opendatahub-io:feature-llmd-llm-inference-servicefrom
bartoszmajsak:llm/basic-e2e-test
Closed

[WIP] test(e2e): introduces basic e2e test for a sample single node model#736
bartoszmajsak wants to merge 2 commits into
opendatahub-io:feature-llmd-llm-inference-servicefrom
bartoszmajsak:llm/basic-e2e-test

Conversation

@bartoszmajsak
Copy link
Copy Markdown

@bartoszmajsak bartoszmajsak commented Jul 11, 2025

Important

First commit, fixing Python packages can be upstreamed separately.

How to run

Follow DEV.md to set up KinD, create Gateway and Destination Rule
resources.

Set up python venv

python -m venv venv
source venv/bin/activate
cd python/kserve
pip install -e .
pip install pytest pytest-asyncio requests
cd -

Execute the test

kind get kubeconfig --name kserve-llm-d > kubeconfig.e2e

KUBECONFIG=kubeconfig.e2e pytest test/e2e/llmisvc/test_llm_inference_service.py::test_llm_inference_service_facebook_opt -v

TODO

  • fix wait/watch
  • improve assertions (it's now checking 200 OK only)
  • automate setup (gateway and destination rule)
  • include as github action
  • expand test scenarios

Summary by CodeRabbit

  • New Features

    • Added support for new LLM Inference Service model classes, making them available in the public API.
    • Introduced an end-to-end test for LLM Inference Service, verifying deployment, readiness, and inference response using the Facebook OPT-125M model.
  • Tests

    • Implemented comprehensive integration tests for LLM Inference Service, including resource creation, status monitoring, endpoint invocation, and cleanup.
    • Added a new pytest marker for LLM Inference Service end-to-end tests.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 11, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

"""

Walkthrough

The changes introduce new LLM inference service model imports to the kserve package's public API, add an end-to-end test for KServe's LLMInferenceService using the Facebook OPT-125M model, and register a new pytest marker for these tests. No existing logic or control flow is altered.

Changes

File(s) Change Summary
python/kserve/kserve/init.py Added imports for four v1alpha1 LLMInferenceService-related model classes to the package namespace.
test/e2e/llmisvc/test_llm_inference_service.py Added comprehensive async end-to-end test and helper functions for LLMInferenceService deployment, readiness, and inference validation.
test/e2e/pytest.ini Added new pytest marker llminferenceservice for LLM inference service controller e2e tests.

Sequence Diagram(s)

sequenceDiagram
    participant Tester
    participant KServeClient
    participant KubernetesAPI
    participant LLMInferenceService

    Tester->>KServeClient: Create LLMInferenceService CR
    KServeClient->>KubernetesAPI: POST CR to cluster
    Tester->>KServeClient: Wait for readiness (watch status)
    KServeClient->>KubernetesAPI: GET/Watch CR status
    KubernetesAPI-->>KServeClient: Status updates
    KServeClient-->>Tester: Ready status
    Tester->>LLMInferenceService: Send inference HTTP request
    LLMInferenceService-->>Tester: Return inference result
    Tester->>KServeClient: Delete LLMInferenceService CR
    KServeClient->>KubernetesAPI: DELETE CR
Loading

Poem

In the garden of code, a new seed’s been sown,
LLM services now stand, fully grown.
With tests that await the models to rise,
And markers to help them organize.
🐇 Hopping with joy, I watch them deploy—
Inferences flourish, oh what a ploy!

"""


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Jul 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bartoszmajsak
Once this PR has been reviewed and has the lgtm label, please assign terrytangyuan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bartoszmajsak
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 11, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
python/kserve/kserve/__init__.py (1)

52-55: LGTM! New LLMInferenceService model imports follow the existing pattern.

The addition of these v1alpha1 LLMInferenceService model classes to the package's public API is consistent with how other models are exposed.

Consider adding an __all__ list to this file in a future PR to explicitly define the public API and improve documentation/autocomplete support.

test/e2e/llmisvc/test_llm_inference_service.py (3)

90-106: Consider enhancing test assertions.

The test currently only validates HTTP 200 status and presence of "choices" field. As noted in the PR objectives TODO, more thorough assertions would improve test coverage.

Consider validating additional response fields such as:

  • Response structure matches expected completion format
  • Model name in response matches requested model
  • Generated text content is not empty
  • Response time is within acceptable limits

Would you like me to generate enhanced assertions for the test?


115-154: Consider moving LLMInferenceService operations to KServeClient.

The TODO comment correctly identifies that these helper functions would be better placed in the KServeClient class for reusability across tests and applications.

Moving these functions to KServeClient would:

  • Provide a consistent API for all KServe resource types
  • Enable reuse in other tests and client applications
  • Follow the existing pattern for InferenceService operations

Would you like me to create an issue to track this architectural improvement?


36-101: Acknowledge known limitations and future improvements.

Based on the PR objectives, this test has known limitations including:

  • Flaky wait conditions (currently being addressed)
  • Manual gateway/destination rule setup required
  • Basic assertions (HTTP 200 only)

The implementation provides a solid foundation for e2e testing of LLMInferenceService.

For the TODO items mentioned in the PR description (automating gateway setup, enhancing assertions, GitHub action integration), would you like me to help create tracking issues or provide implementation suggestions?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 947e449 and 8694f08.

📒 Files selected for processing (3)
  • python/kserve/kserve/__init__.py (1 hunks)
  • test/e2e/llmisvc/test_llm_inference_service.py (1 hunks)
  • test/e2e/pytest.ini (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:46-65
Timestamp: 2025-07-10T14:58:54.997Z
Learning: The file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant for review in this repository.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2025-2049
Timestamp: 2025-07-10T14:58:43.762Z
Learning: The file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant for review in this repository.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istio_base.yaml:6854-6869
Timestamp: 2025-07-10T14:58:14.161Z
Learning: The file `test/overlays/llm-istio-experimental/istio_base.yaml` is considered external and irrelevant for review in this repository.
Learnt from: hdefazio
PR: opendatahub-io/kserve#658
File: pkg/controller/v1beta1/inferenceservice/rawkube_controller_test.go:1873-1886
Timestamp: 2025-06-12T22:10:15.375Z
Learning: If a review comment is raised on code that was not introduced or changed by the current PR, and the user points this out, acknowledge it and clarify that no action is required from the PR author.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2559-2566
Timestamp: 2025-07-10T14:58:38.489Z
Learning: In the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered external files by the maintainers and should not be flagged for YAML formatting issues or other code quality concerns.
test/e2e/pytest.ini (3)
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2559-2566
Timestamp: 2025-07-10T14:58:38.489Z
Learning: In the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered external files by the maintainers and should not be flagged for YAML formatting issues or other code quality concerns.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2025-2049
Timestamp: 2025-07-10T14:58:43.762Z
Learning: The file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant for review in this repository.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:46-65
Timestamp: 2025-07-10T14:58:54.997Z
Learning: The file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant for review in this repository.
test/e2e/llmisvc/test_llm_inference_service.py (1)
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2559-2566
Timestamp: 2025-07-10T14:58:38.489Z
Learning: In the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered external files by the maintainers and should not be flagged for YAML formatting issues or other code quality concerns.
🧬 Code Graph Analysis (1)
test/e2e/llmisvc/test_llm_inference_service.py (4)
python/kserve/kserve/api/kserve_client.py (2)
  • KServeClient (31-834)
  • get (166-223)
python/kserve/kserve/models/v1alpha1_llm_inference_service.py (1)
  • V1alpha1LLMInferenceService (35-242)
python/kserve/kserve/utils/utils.py (1)
  • get_isvc_namespace (49-50)
python/kserve/kserve/exceptions.py (1)
  • ApiException (98-123)
🪛 Ruff (0.11.9)
python/kserve/kserve/__init__.py

52-52: .models.v1alpha1_llm_inference_service.V1alpha1LLMInferenceService imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


53-53: .models.v1alpha1_llm_inference_service_config.V1alpha1LLMInferenceServiceConfig imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


54-54: .models.v1alpha1_llm_inference_service_config_list.V1alpha1LLMInferenceServiceConfigList imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


55-55: .models.v1alpha1_llm_inference_service_list.V1alpha1LLMInferenceServiceList imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

test/e2e/llmisvc/test_llm_inference_service.py

15-15: asyncio imported but unused

Remove unused import: asyncio

(F401)


16-16: json imported but unused

Remove unused import: json

(F401)


21-21: kserve.constants imported but unused

Remove unused import: kserve.constants

(F401)


143-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


175-177: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


197-199: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


245-245: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 Flake8 (7.2.0)
test/e2e/llmisvc/test_llm_inference_service.py

[error] 15-15: 'asyncio' imported but unused

(F401)


[error] 16-16: 'json' imported but unused

(F401)


[error] 21-21: 'kserve.constants' imported but unused

(F401)


[error] 312-312: visually indented line with same indent as next logical line

(E129)

🪛 GitHub Actions: Precommit Check
test/e2e/llmisvc/test_llm_inference_service.py

[error] 15-15: flake8: 'asyncio' imported but unused (F401)


[error] 16-16: flake8: 'json' imported but unused (F401)


[error] 21-21: flake8: 'kserve.constants' imported but unused (F401)


[error] 108-108: flake8: too many leading '#' for block comment (E266)

🔇 Additional comments (1)
test/e2e/pytest.ini (1)

20-20: LGTM! New pytest marker for LLM inference service tests.

The marker follows the existing naming convention and will enable selective execution of LLM inference service e2e tests.

Comment on lines +143 to +115
raise RuntimeError(
f"Exception when calling CustomObjectsApi->create_namespaced_custom_object for LLMInferenceService: {e}"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve exception handling by preserving the exception chain.

Use raise ... from e to maintain the exception chain for better debugging and error tracing.

Apply these diffs to fix exception handling:

In create_llmisvc:

     except client.rest.ApiException as e:
         raise RuntimeError(
             f"Exception when calling CustomObjectsApi->create_namespaced_custom_object for LLMInferenceService: {e}"
-        )
+        ) from e

In delete_llmisvc:

     except client.rest.ApiException as e:
         raise RuntimeError(
             f"Exception when calling CustomObjectsApi->delete_namespaced_custom_object for LLMInferenceService: {e}"
-        )
+        ) from e

In get_llmisvc:

     except client.rest.ApiException as e:
         raise RuntimeError(
             f"Exception when calling CustomObjectsApi->get_namespaced_custom_object for LLMInferenceService: {e}"
-        )
+        ) from e

In get_llm_service_url:

     except Exception as e:
-        raise ValueError(f"Failed to get URL for LLM inference service {service_name}: {e}")
+        raise ValueError(f"Failed to get URL for LLM inference service {service_name}: {e}") from e

Also applies to: 175-177, 197-199, 245-245

🧰 Tools
🪛 Ruff (0.11.9)

143-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In test/e2e/llmisvc/test_llm_inference_service.py at lines 143-145 and similarly
at lines 175-177, 197-199, and 245, improve exception handling by changing the
raise statements to use "raise ... from e" syntax. This preserves the original
exception context and stack trace. Locate each raise RuntimeError statement that
includes the exception variable e and modify it to "raise RuntimeError(...) from
e" to maintain the exception chain for better debugging.

Comment thread test/e2e/llmisvc/test_llm_inference_service.py Outdated
Comment thread test/e2e/llmisvc/test_llm_inference_service.py Outdated
Comment on lines +202 to +230
async def wait_llm_isvc_ready(
kserve_client,
name,
namespace,
timeout_seconds=600,
version=KSERVE_V1ALPHA1_VERSION
):
"""
Wait for LLM inference service to be ready by checking all conditions
:param kserve_client: KServe client instance
:param name: LLM inference service name
:param namespace: namespace
:param timeout_seconds: timeout seconds for waiting, default to 600s
:param version: api group version
:return: ready LLM inference service object
"""
llm_isvc_watch(name=name, namespace=namespace, timeout_seconds=timeout_seconds)

llm_isvc = get_llmisvc(kserve_client, name, namespace, version)

return llm_isvc

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unnecessary async declaration or implement true async waiting.

The wait_llm_isvc_ready function is declared as async but doesn't perform any asynchronous operations. It calls the synchronous llm_isvc_watch function.

Either remove the async declaration:

-async def wait_llm_isvc_ready(
+def wait_llm_isvc_ready(
     kserve_client, 
     name, 
     namespace, 
     timeout_seconds=600,
     version=KSERVE_V1ALPHA1_VERSION
 ):

And update the test to not await it:

-        await wait_llm_isvc_ready(kserve_client, service_name, KSERVE_TEST_NAMESPACE)
+        wait_llm_isvc_ready(kserve_client, service_name, KSERVE_TEST_NAMESPACE)

Or implement true async waiting using asyncio if needed for concurrent operations.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In test/e2e/llmisvc/test_llm_inference_service.py around lines 202 to 223, the
function wait_llm_isvc_ready is declared async but only calls synchronous
functions without awaiting any async operations. To fix this, either remove the
async keyword from the function definition and update all calls to it to not use
await, or refactor the function to perform actual asynchronous waiting using
asyncio constructs if concurrent async behavior is required.

Comment on lines +108 to +81
finally:
try:
print(f"{service_url}/v1/completions")
delete_llmisvc(kserve_client, service_name, KSERVE_TEST_NAMESPACE)
except Exception as e:
print(f"Warning: Failed to cleanup service {service_name}: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Move debug print statement before cleanup.

The print statement is currently in the finally block after the service might be deleted, which could be confusing for debugging.

Apply this diff to improve the code structure:

         assert response.status_code == 200, f"Expected 200 but got {response.status_code}: {response.text}"
         
         response_data = response.json()
         assert "choices" in response_data, "Response should contain 'choices' field"
         
+        print(f"Successfully tested endpoint: {completion_url}")
+        
     finally:
         try:
-            print(f"{service_url}/v1/completions")
             delete_llmisvc(kserve_client, service_name, KSERVE_TEST_NAMESPACE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
finally:
try:
print(f"{service_url}/v1/completions")
delete_llmisvc(kserve_client, service_name, KSERVE_TEST_NAMESPACE)
except Exception as e:
print(f"Warning: Failed to cleanup service {service_name}: {e}")
assert response.status_code == 200, f"Expected 200 but got {response.status_code}: {response.text}"
response_data = response.json()
assert "choices" in response_data, "Response should contain 'choices' field"
print(f"Successfully tested endpoint: {completion_url}")
finally:
try:
delete_llmisvc(kserve_client, service_name, KSERVE_TEST_NAMESPACE)
except Exception as e:
print(f"Warning: Failed to cleanup service {service_name}: {e}")
🧰 Tools
🪛 GitHub Actions: Precommit Check

[error] 108-108: flake8: too many leading '#' for block comment (E266)

🤖 Prompt for AI Agents
In test/e2e/llmisvc/test_llm_inference_service.py around lines 108 to 113, move
the debug print statement that outputs the service URL to before the call to
delete_llmisvc. This ensures the URL is printed before the service deletion
attempt, improving clarity during debugging. Keep the try-except structure to
catch and log any exceptions during cleanup.

Comment on lines +309 to +332
if (ready == "True" and
workloads_ready == "True" and
router_ready == "True" and
main_workload_ready == "True"):
print("All conditions ready")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix indentation issue in multi-line condition.

The condition continuation has incorrect indentation causing a flake8 E129 error.

Apply this diff to fix the indentation:

                 if (ready == "True" and 
                     workloads_ready == "True" and 
-                    router_ready == "True" and
-                    main_workload_ready == "True"):
+                     router_ready == "True" and
+                     main_workload_ready == "True"):
                     print("All conditions ready")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (ready == "True" and
workloads_ready == "True" and
router_ready == "True" and
main_workload_ready == "True"):
print("All conditions ready")
if (ready == "True" and
workloads_ready == "True" and
router_ready == "True" and
main_workload_ready == "True"):
print("All conditions ready")
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 312-312: visually indented line with same indent as next logical line

(E129)

🤖 Prompt for AI Agents
In test/e2e/llmisvc/test_llm_inference_service.py around lines 309 to 313, the
multi-line if condition has incorrect indentation causing a flake8 E129 error.
Fix this by aligning the continued lines properly under the opening parenthesis,
ensuring consistent indentation for each condition line to comply with style
guidelines.

Copy link
Copy Markdown

@Jooho Jooho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow DEV.md to set up KinD, create Gateway and Destination Rule

Shouldn't the e2e test include creating this Gateway and Destination Rule?

"template": {
"containers": [{
"name": "main",
"image": "quay.io/pierdipi/vllm-cpu:latest",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be okay to use images from a personal repository for testing purposes?

Copy link
Copy Markdown
Author

@bartoszmajsak bartoszmajsak Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's wip. We should use official one for the real thing, but I think that very image was not published lately and the latest version was buggy, that's why we published newer one. FWIW that's not any different to what we have in sample and what we used in the demo.

But I agree - we should address this problem.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @pierDipi

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a quick improvement for now is that we could push that same image to the opendatahub Quay org to avoid referencing mine but it's ok to stay on that personal one for the purpose of this PR (perhaps extract a constant for quick future edits)

Comment thread test/e2e/llmisvc/test_llm_inference_service.py Outdated
Comment thread test/e2e/llmisvc/test_llm_inference_service.py Outdated
@bartoszmajsak
Copy link
Copy Markdown
Author

bartoszmajsak commented Jul 11, 2025

Follow DEV.md to set up KinD, create Gateway and Destination Rule

Shouldn't the e2e test include creating this Gateway and Destination Rule?

It's part of PRs TODOs :)

automate setup (gateway and destination rule)

@bartoszmajsak bartoszmajsak force-pushed the llm/basic-e2e-test branch 2 times, most recently from 1ed9a72 to d5d7902 Compare July 13, 2025 21:49
@bartoszmajsak
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@openshift-ci openshift-ci Bot requested a review from pierDipi July 13, 2025 21:54
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
test/e2e/llmisvc/test_llm_inference_service.py (4)

113-115: Fix exception handling by preserving the exception chain.

Use raise ... from e to maintain the exception chain for better debugging and error tracing.


138-140: Fix exception handling by preserving the exception chain.

Use raise ... from e to maintain the exception chain for better debugging and error tracing.


163-165: Fix exception handling by preserving the exception chain.

Use raise ... from e to maintain the exception chain for better debugging and error tracing.


258-260: Fix exception handling by preserving the exception chain.

Use raise ... from e to maintain the exception chain for better debugging and error tracing.

🧹 Nitpick comments (2)
test/e2e/llmisvc/test_llm_inference_service.py (2)

50-50: Consider using official container images for production tests.

The test uses a personal repository image (quay.io/pierdipi/vllm-cpu:latest). For production-ready tests, consider using official images to ensure consistency and reliability.

Do you want me to help identify official alternatives or create an issue to track this improvement?


213-213: Improve polling efficiency with exponential backoff.

The current polling uses a fixed 10-second interval which may be inefficient. Consider implementing exponential backoff for faster initial checks while avoiding overwhelming the service.

Apply this diff to improve the polling strategy:

-    wait_for(assert_model_responds, timeout=timeout_seconds, interval=10.0)
+    wait_for(assert_model_responds, timeout=timeout_seconds, interval=2.0)

And consider enhancing the wait_for function with exponential backoff:

def wait_for(assertion_fn, timeout: float = 5.0, initial_interval: float = 0.1, max_interval: float = 10.0):
    """
    Repeatedly calls assertion_fn() with exponential backoff until success or timeout.
    """
    deadline = time.time() + timeout
    interval = initial_interval
    
    while True:
        try:
            return assertion_fn()
        except AssertionError:
            if time.time() >= deadline:
                raise
            time.sleep(interval)
            interval = min(interval * 1.5, max_interval)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8694f08 and d5d7902.

📒 Files selected for processing (3)
  • python/kserve/kserve/__init__.py (1 hunks)
  • test/e2e/llmisvc/test_llm_inference_service.py (1 hunks)
  • test/e2e/pytest.ini (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/pytest.ini
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2025-2049
Timestamp: 2025-07-10T14:58:43.762Z
Learning: The file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant for review in this repository.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:46-65
Timestamp: 2025-07-10T14:58:54.997Z
Learning: The file `test/overlays/llm-istio-experimental/istiod.yaml` is considered external and irrelevant for review in this repository.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istio_base.yaml:6854-6869
Timestamp: 2025-07-10T14:58:14.161Z
Learning: The file `test/overlays/llm-istio-experimental/istio_base.yaml` is considered external and irrelevant for review in this repository.
Learnt from: hdefazio
PR: opendatahub-io/kserve#658
File: pkg/controller/v1beta1/inferenceservice/rawkube_controller_test.go:1873-1886
Timestamp: 2025-06-12T22:10:15.375Z
Learning: If a review comment is raised on code that was not introduced or changed by the current PR, and the user points this out, acknowledge it and clarify that no action is required from the PR author.
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2559-2566
Timestamp: 2025-07-10T14:58:38.489Z
Learning: In the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered external files by the maintainers and should not be flagged for YAML formatting issues or other code quality concerns.
test/e2e/llmisvc/test_llm_inference_service.py (1)
Learnt from: bartoszmajsak
PR: opendatahub-io/kserve#729
File: test/overlays/llm-istio-experimental/istiod.yaml:2559-2566
Timestamp: 2025-07-10T14:58:38.489Z
Learning: In the kserve repository, files in `test/overlays/llm-istio-experimental/` directory are considered external files by the maintainers and should not be flagged for YAML formatting issues or other code quality concerns.
🪛 Ruff (0.11.9)
test/e2e/llmisvc/test_llm_inference_service.py

113-115: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


138-140: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


163-165: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


191-191: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


204-204: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


258-260: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

python/kserve/kserve/__init__.py

52-52: .models.v1alpha1_llm_inference_service.V1alpha1LLMInferenceService imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


54-54: .models.v1alpha1_llm_inference_service_config.V1alpha1LLMInferenceServiceConfig imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


57-57: .models.v1alpha1_llm_inference_service_config_list.V1alpha1LLMInferenceServiceConfigList imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


59-59: .models.v1alpha1_llm_inference_service_list.V1alpha1LLMInferenceServiceList imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: precommit-check
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: test
  • GitHub Check: build (3.10)
  • GitHub Check: Build
🔇 Additional comments (1)
python/kserve/kserve/__init__.py (1)

52-59: LGTM! New LLM inference service models properly exposed in public API.

The new imports correctly expose the LLM inference service model classes in the package's public API, following the established pattern. The static analysis warnings about unused imports are false positives - these imports are intended for external consumption.

Comment on lines +31 to +33
@pytest.mark.llminferenceservice
@pytest.mark.asyncio(scope="session")
async def test_llm_inference_service_facebook_opt():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unnecessary async marker or add async operations.

The test function is marked with @pytest.mark.asyncio but doesn't contain any await statements or async operations. This creates unnecessary overhead.

Apply this diff to remove the async marker:

 @pytest.mark.llminferenceservice
-@pytest.mark.asyncio(scope="session")
-async def test_llm_inference_service_facebook_opt():
+def test_llm_inference_service_facebook_opt():
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.llminferenceservice
@pytest.mark.asyncio(scope="session")
async def test_llm_inference_service_facebook_opt():
@pytest.mark.llminferenceservice
def test_llm_inference_service_facebook_opt():
🤖 Prompt for AI Agents
In test/e2e/llmisvc/test_llm_inference_service.py around lines 31 to 33, the
test function is marked as async with @pytest.mark.asyncio but contains no await
or async operations. Remove the @pytest.mark.asyncio decorator and the async
keyword from the function definition to eliminate unnecessary async overhead.

…odel

> [!IMPORTANT]
> First commit, fixing Python packages can be upstreamed separately.

### How to run

Follow `DEV.md` to set up KinD, create Gateway and Destination Rule
resources.

#### Set up python venv

```shell
python -m venv venv
source venv/bin/activate
cd python/kserve
pip install -e .
pip install pytest pytest-asyncio tabulate requests
cd -
```

#### Execute the test

```shell
kind get kubeconfig --name kserve-llm-d > kubeconfig.e2e

KUBECONFIG=kubeconfig.e2e pytest test/e2e/llmisvc/test_llm_inference_service.py::test_llm_inference_service_facebook_opt -v
```

> [!IMPORTANT]
> It seems that wait conditions are a bit flaky. If you set everything
up before starting the test (and comment out create/delete LLMISVC in
the test code), the test passes.

### TODO

- make watch funcs a bit less verbose :)
- improve assertions (it's now checking `200 OK` only)
- automate setup (gateway and destination rule)
- include as github action
- expand test scenarios

Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Copy link
Copy Markdown
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I'd love to see the test instructions in the PR body added to DEV doc for future quick reference.

Generally LGTM!

@bartoszmajsak
Copy link
Copy Markdown
Author

Closes in favor of #774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants