Skip to content

feat(helm): add LLMGateway helm chart#1998

Merged
steebchen merged 4 commits intotheopenco:mainfrom
yezz123:helm-integration
Apr 19, 2026
Merged

feat(helm): add LLMGateway helm chart#1998
steebchen merged 4 commits intotheopenco:mainfrom
yezz123:helm-integration

Conversation

@yezz123
Copy link
Copy Markdown
Contributor

@yezz123 yezz123 commented Apr 9, 2026

This pull request introduces a Helm chart for deploying LLM Gateway on Kubernetes, along with comprehensive documentation and templating for multiple components and configurations.

The changes add all necessary Helm chart files, Kubernetes manifests for various services, and documentation updates to guide users in deploying LLM Gateway via Helm.

Helm Chart Introduction and Documentation:

  • Added a new Helm chart for LLM Gateway, including a Chart.yaml with metadata and configuration for Kubernetes deployments.
  • Added a README.md in the Helm chart directory with installation instructions and usage details.
  • Updated the self-hosting documentation to mention Helm deployment and link to further Helm chart documentation.

cc @steebchen @smakosh 🫡

Summary by CodeRabbit

  • New Features

    • Added a Helm chart to deploy and configure all application components (API, gateway, UI, worker, docs, admin, playground, code)
    • Built-in PostgreSQL and Redis with optional external DB/Redis support, ingress support, and extensive runtime/config overrides (images, replicas, resources, env, node selection, secrets, LLM provider keys)
    • Chart is published to a container registry on releases for easy helm install
  • Documentation

    • Updated self-hosting docs with Helm installation, version pinning, and links to chart resources

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Walkthrough

Adds a new Helm chart for LLM Gateway (chart metadata, helpers, templates, values), CI job to publish the chart to GHCR, and documentation updates with Helm install instructions for deploying to Kubernetes.

Changes

Cohort / File(s) Summary
Documentation
apps/docs/content/self-host.mdx, infra/helm/README.md
Added Helm installation docs and links to chart README and version listing; shows OCI install command and --version pinning guidance.
Chart Metadata & Helpers
infra/helm/llmgateway/Chart.yaml, infra/helm/llmgateway/templates/_helpers.tpl
New Chart.yaml and helper templates providing naming, labels, image resolution, secret/config names, DB/Redis host/port helpers, and service URL builders.
Component Deployments
infra/helm/llmgateway/templates/*-deployment.yaml (admin, api, code, docs, gateway, playground, ui, worker)
Added conditional Deployment templates for 8 components with replicas, probes, env/config wiring, resources, scheduling, and init containers (API/Gateway/Worker) for DB/Redis readiness.
Component Services
infra/helm/llmgateway/templates/*-service.yaml (admin, api, code, docs, gateway, playground, ui)
Added Service templates for each component with configurable service type (default ClusterIP) and selector/labels.
Stateful Data Plane
infra/helm/llmgateway/templates/postgresql-statefulset.yaml, .../postgresql-service.yaml, .../redis-statefulset.yaml, .../redis-service.yaml
Added StatefulSet and Service templates for PostgreSQL and Redis with persistence, credentials, probes, and optional password/auth handling.
Config & Secrets
infra/helm/llmgateway/templates/configmap.yaml, infra/helm/llmgateway/templates/secret.yaml
Added ConfigMap with many runtime keys (URLs, auth, gateway/API/database/worker/billing/observability/content-filtering) and a Secret template emitting DB/Redis credentials, OAuth keys, LLM provider keys, Stripe, email, and webhook secrets.
Ingress & Top-level Values
infra/helm/llmgateway/templates/ingress.yaml, infra/helm/llmgateway/values.yaml
Ingress template with per-component host rules and optional TLS; comprehensive values.yaml (≈450 lines) defining defaults for all components, external/internal DB/Redis, providers, billing, observability, and feature toggles.
CI: Publish Chart
.github/workflows/images.yml
New publish-helm workflow job: on release, installs Helm, updates Chart.yaml version/appVersion, packages chart and pushes OCI chart to GHCR under charts/<OWNER>.

Sequence Diagram(s)

sequenceDiagram
  participant Release as "GitHub Release"
  participant Actions as "GitHub Actions (publish-helm)"
  participant GHCR as "GitHub Container Registry (OCI)"
  participant User as "Operator (helm CLI)"
  participant K8s as "Kubernetes API"

  Release->>Actions: trigger publish-helm
  Actions->>Actions: update Chart.yaml (version/appVersion)
  Actions->>GHCR: push OCI chart package
  User->>GHCR: helm install oci://ghcr.io/.../llmgateway
  GHCR->>User: chart blob
  User->>K8s: Helm renders templates / apply manifests
  K8s->>K8s: create ConfigMap, Secret, Services, Deployments, StatefulSets, Ingress
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat(helm): add LLMGateway helm chart' accurately describes the main change—introducing a new Helm chart for LLM Gateway deployment.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot changed the title Add Helm chart and deployment instructions for LLMGateway feat(helm): add LLMGateway helm chart Apr 9, 2026
@yezz123 yezz123 force-pushed the helm-integration branch from 81d9cd1 to db4e2a5 Compare April 9, 2026 22:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (5)
infra/helm/llmgateway/Chart.yaml (1)

7-7: Use a concrete appVersion instead of latest for proper release tracking.

Line 7 uses appVersion: "latest", which makes chart metadata non-deterministic. According to Helm best practices, appVersion should reflect the actual application version included in the chart (e.g., matching the default image tag deployed by the chart) to enable proper release tracking and automation.

🔧 Suggested change
-appVersion: "latest"
+appVersion: "0.0.0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/Chart.yaml` at line 7, Replace the non-deterministic
appVersion value in Chart.yaml (appVersion: "latest") with a concrete
application version string that matches the chart’s default image tag; update
the appVersion field to the actual semantic version (e.g., "v1.2.3") and, if
necessary, align values.yaml image.tag (or any default image tag referenced by
the chart) and CI/release automation to keep them in sync so chart metadata is
deterministic and traceable.
infra/helm/llmgateway/templates/ingress.yaml (1)

27-110: Consider generating rules via a loop to reduce drift risk.

The repeated component blocks are easy to desync (host key, service suffix, enable flag). A small range-driven structure would make future additions safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/ingress.yaml` around lines 27 - 110, Replace
the repeated per-component ingress blocks with a single range that iterates over
a map/list of component keys and generates host, enabled check and service name
dynamically; e.g., range over
["ui","api","gateway","playground","code","docs","admin"] and for each key,
check .Values.<key>.enabled and .Values.ingress.hosts.<key>, use the host value
and build the backend service name with include "llmgateway.fullname" . | printf
"%s-%s" (include "llmgateway.fullname" .) key and port name http; this keeps the
logic centralized (references: .Values.ui.enabled, .Values.ingress.hosts.<key>,
include "llmgateway.fullname") and removes the duplicated blocks.
infra/helm/llmgateway/templates/gateway-deployment.yaml (2)

22-42: Init containers can wait indefinitely without timeout.

The wait-for-postgresql and wait-for-redis init containers loop forever until the services are reachable. If the dependent services are misconfigured or fail to start, the pod will remain in Init state indefinitely.

Consider adding a timeout or max retry count to fail fast and surface the issue.

♻️ Proposed fix with timeout
        - name: wait-for-postgresql
          image: busybox:1.37
          command:
            - sh
            - -c
            - |
+             TIMEOUT=300
+             ELAPSED=0
              until nc -z {{ include "llmgateway.postgresql.host" . }} {{ include "llmgateway.postgresql.port" . }}; do
                echo "Waiting for PostgreSQL..."
                sleep 2
+               ELAPSED=$((ELAPSED + 2))
+               if [ $ELAPSED -ge $TIMEOUT ]; then
+                 echo "Timeout waiting for PostgreSQL"
+                 exit 1
+               fi
              done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/gateway-deployment.yaml` around lines 22 -
42, The init containers wait-for-postgresql and wait-for-redis can hang forever;
modify their command scripts to implement a timeout or max-retry and exit
non-zero when exceeded so the pod fails fast. Update the command blocks for the
initContainers (names: wait-for-postgresql, wait-for-redis) to track either a
retry counter or elapsed time (e.g., MAX_RETRIES or TIMEOUT_SECONDS), loop until
nc -z succeeds or the limit is reached, print a clear error message and exit 1
on timeout; optionally expose the timeout/retry via a Helm value so it’s
configurable.

43-77: Consider adding securityContext for defense-in-depth.

Trivy flags missing securityContext.readOnlyRootFilesystem (KSV-0014) and default security context (KSV-0118). While not strictly required, adding a restrictive security context improves the security posture.

🛡️ Suggested securityContext addition
      containers:
        - name: gateway
          image: {{ include "llmgateway.image" (dict "context" . "image" .Values.gateway.image) }}
          imagePullPolicy: {{ .Values.gateway.image.pullPolicy | default "IfNotPresent" }}
+         securityContext:
+           readOnlyRootFilesystem: true
+           runAsNonRoot: true
+           allowPrivilegeEscalation: false
          ports:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/gateway-deployment.yaml` around lines 43 -
77, Add a restrictive security context to the gateway container and a default
pod securityContext for defense-in-depth: under the containers block for the
container named "gateway" add a securityContext with readOnlyRootFilesystem:
true, runAsNonRoot: true, runAsUser (e.g. 1000), allowPrivilegeEscalation:
false, capabilities: drop: ["ALL"], and seccompProfile: { type: "RuntimeDefault"
}; additionally add a podSecurityContext on the Pod template spec
(spec.template.spec) to set fsGroup (e.g. 1000) and runAsNonRoot:true to provide
a default baseline for all containers. Ensure you add these keys adjacent to
existing fields (e.g., alongside ports/env/readinessProbe) so Helm templating
(templates/gateway-deployment.yaml) and .Values overrides remain intact.
infra/helm/llmgateway/templates/postgresql-statefulset.yaml (1)

21-23: Image construction differs from other deployments.

The PostgreSQL container image is constructed directly ({{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }}) rather than using the llmgateway.image helper used by other deployments. This is acceptable since PostgreSQL uses a different image structure (no registry prefix needed for official images), but worth noting for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/postgresql-statefulset.yaml` around lines 21
- 23, The PostgreSQL container image is being built inline using {{
.Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }} which
diverges from other deployments that use the llmgateway image helper; update the
postgresql container spec to use the shared helper (e.g., call the
"llmgateway.image" template with the postgresql image values) or add a dedicated
helper (like "llmgateway.postgresqlImage") and use that in the StatefulSet so
image construction is consistent with other templates (change the image
attribute referenced in the postgresql container to use the helper instead of
direct concatenation).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/docs/content/self-host.mdx`:
- Around line 187-195: Update the prerequisites section to include
Kubernetes/Helm requirements so the new "Helm Chart" instructions are
executable: state that a Kubernetes cluster (minimum version, e.g., v1.20+ or
your supported minimum), kubectl configured to the target cluster, Helm 3
installed, a default StorageClass or PVC capability, access to a container
registry (or imagePullSecrets) for the llmgateway image, and any required
RBAC/load‑balancer considerations; reference the "Helm Chart" heading in the
docs to ensure users see these prerequisites before the helm repo/add/install
commands.

In `@infra/helm/llmgateway/templates/api-deployment.yaml`:
- Around line 29-42: The initContainer loops that wait with nc for PostgreSQL
and Redis can hang indefinitely; modify the shell in both wait loops (the nc
loops that reference include "llmgateway.postgresql.host"/port and the
initContainer named wait-for-redis that references include
"llmgateway.redis.host"/port) to implement bounded retries or a timeout: add a
counter (e.g., max_retries=30), increment on each try, sleep between attempts,
and if the counter exceeds max_retries exit with non-zero (exit 1) so Kubernetes
can surface the failure and trigger restart policy; alternatively use a timeout
wrapper (timeout command) around nc and exit non-zero on timeout—apply the same
change to both the PostgreSQL and Redis wait blocks.
- Around line 22-77: Add Pod- and container-level security contexts: set a
podSecurityContext (e.g., runAsNonRoot: true, runAsUser: 1000, runAsGroup: 1000,
fsGroup: 1000) and add securityContext blocks for the initContainers
"wait-for-postgresql" and "wait-for-redis" and the main container "api" with
hardened settings (readOnlyRootFilesystem: true, allowPrivilegeEscalation:
false, capabilities: drop all, seccompProfile: runtime/default, runAsNonRoot:
true). Apply these fields to the Pod spec and to each container's
securityContext so both init and main containers inherit the hardened runtime
constraints.

In `@infra/helm/llmgateway/templates/code-deployment.yaml`:
- Around line 20-54: Add a hardened securityContext to the code container and
pod: under the Deployment spec add a pod-level securityContext and inside the
container block for the container named "code" (the block that contains image:
{{ include "llmgateway.image" (dict "context" . "image" .Values.code.image) }}
and readinessProbe/livenessProbe) add a container securityContext with
runAsNonRoot: true, runAsUser: 1000, seccompProfile type RuntimeDefault,
allowPrivilegeEscalation: false, and capabilities.drop: [ALL]; mirror the same
pod and container securityContext changes for the other deployments that use
containers named "api", "gateway", "ui", "playground", "docs", "admin", and
"worker" so .Values.*.image blocks receive the same baseline hardening.

In `@infra/helm/llmgateway/templates/configmap.yaml`:
- Around line 69-71: The ConfigMap currently defines KEEP_ALIVE_TIMEOUT_S twice
(under gateway.config and api.config) causing the latter to overwrite the
former; change the keys to component-specific names (e.g. replace
KEEP_ALIVE_TIMEOUT_S in the gateway block with GATEWAY_KEEP_ALIVE_TIMEOUT_S and
the one in the api block with API_KEEP_ALIVE_TIMEOUT_S) and update the
corresponding deployment/manifest templates that read these values so
gateway.config consumers read GATEWAY_KEEP_ALIVE_TIMEOUT_S and api.config
consumers read API_KEEP_ALIVE_TIMEOUT_S (ensure any references in templates,
envFrom/env entries or valueFrom lookups are updated to the new key names).
- Around line 118-120: The ConfigMap currently uses truthiness checks (e.g. '{{-
if .enableDataRetentionCleanup }}') which omit env entries when users
intentionally set false/""; change those guards to key-presence checks using
hasKey against .Values (e.g. replace the if with '{{- if hasKey .Values
"enableDataRetentionCleanup" }}' and render ENABLE_DATA_RETENTION_CLEANUP with
the quoted value), and apply the same pattern to the other boolean/numeric
guards in this template so the env var is emitted whenever the key exists
regardless of its falsy value.

In `@infra/helm/llmgateway/templates/ingress.yaml`:
- Around line 26-110: The template unconditionally emits the Ingress "rules:"
key which can produce an empty rules array when none of the components are
enabled; update the template to only render "rules:" when at least one of the
component conditions is true (check the same conditions used for
ui/api/gateway/playground/code/docs/admin: .Values.ui.enabled and
.Values.ingress.hosts.ui, .Values.api.enabled and .Values.ingress.hosts.api,
etc.), or alternatively provide a safe fallback by rendering a defaultBackend
when none of those conditions are met; locate the block around the "rules:" line
and wrap it with a single conditional that ORs all seven checks (or add a
defaultBackend section under spec) so Kubernetes never receives an empty rules
array.

In `@infra/helm/llmgateway/templates/redis-statefulset.yaml`:
- Around line 18-65: Add a Pod and container security hardening baseline to the
Redis StatefulSet: in the StatefulSet spec add podSecurityContext with
runAsUser: 999, runAsGroup: 999 and fsGroup: 999 and add an emptyDir volume
mounted at /tmp; then on the container named "redis" set securityContext to
disallow privilege escalation (allowPrivilegeEscalation: false), drop all
capabilities, set readOnlyRootFilesystem: true, and set seccompProfile to
runtime/default so the readOnlyRootFilesystem works; ensure the container still
mounts the /tmp emptyDir and that the names (podSecurityContext,
securityContext, container "redis") match existing fields in the template.

In `@infra/helm/llmgateway/templates/secret.yaml`:
- Around line 15-20: The DATABASE_URL construction uses raw passwords and can
break with special chars; update the template to url-encode the password using
the Helm/Go template urlquery function for both .Values.postgresql.password and
.Values.externalPostgresql.password when building DATABASE_URL (refer to the
DATABASE_URL line and $pgUser/$pgHost/$pgPort/$pgDB symbols), and ensure
POSTGRES_PASSWORD is set consistently for runtime consumers (move or add
POSTGRES_PASSWORD outside the if/else or set it from the externalPostgresql
branch as well so containers always receive POSTGRES_PASSWORD).

In `@infra/helm/llmgateway/templates/ui-deployment.yaml`:
- Line 9: The replicas line treats 0 as empty because default considers zero
value empty; change the template to check whether the replicas key is present
and only fall back to 1 when it's absent. Replace the single-line "replicas: {{
.Values.ui.replicas | default 1 }}" with a conditional using hasKey on
.Values.ui (e.g., if hasKey .Values.ui "replicas" then render "replicas: {{
.Values.ui.replicas }}" else render "replicas: 1" end) so an explicit
ui.replicas: 0 is preserved.

In `@infra/helm/llmgateway/values.yaml`:
- Around line 43-47: The values.yaml currently contains placeholder secrets
(auth.authSecret and auth.gatewayApiKeyHashSecret) that must not be used in
production; update Helm templates (e.g., secret.yaml or a new validation
template) to validate these keys at render time by checking
.Values.auth.authSecret and .Values.auth.gatewayApiKeyHashSecret and failing the
install if they still equal the default placeholders (or use the required
function to enforce non-empty, plus explicit equality checks to fail when the
exact default strings are present), emitting clear messages like
"auth.authSecret must be changed from the default value" and
"auth.gatewayApiKeyHashSecret must be changed from the default value".
- Around line 292-299: The default insecure password is set at
postgresql.password ("changeme"); change this to an empty value or require a
user-supplied secret and add validation to ensure postgresql.password is not the
default. Update the helm values template to treat postgresql.password as
optional (empty by default) and/or add the same validation logic used for auth
secrets to fail deployment if postgresql.password == "changeme" or left
unchanged, referencing the postgresql block (postgresql.enabled,
postgresql.user, postgresql.password) so the chart forces explicit credential
configuration.

---

Nitpick comments:
In `@infra/helm/llmgateway/Chart.yaml`:
- Line 7: Replace the non-deterministic appVersion value in Chart.yaml
(appVersion: "latest") with a concrete application version string that matches
the chart’s default image tag; update the appVersion field to the actual
semantic version (e.g., "v1.2.3") and, if necessary, align values.yaml image.tag
(or any default image tag referenced by the chart) and CI/release automation to
keep them in sync so chart metadata is deterministic and traceable.

In `@infra/helm/llmgateway/templates/gateway-deployment.yaml`:
- Around line 22-42: The init containers wait-for-postgresql and wait-for-redis
can hang forever; modify their command scripts to implement a timeout or
max-retry and exit non-zero when exceeded so the pod fails fast. Update the
command blocks for the initContainers (names: wait-for-postgresql,
wait-for-redis) to track either a retry counter or elapsed time (e.g.,
MAX_RETRIES or TIMEOUT_SECONDS), loop until nc -z succeeds or the limit is
reached, print a clear error message and exit 1 on timeout; optionally expose
the timeout/retry via a Helm value so it’s configurable.
- Around line 43-77: Add a restrictive security context to the gateway container
and a default pod securityContext for defense-in-depth: under the containers
block for the container named "gateway" add a securityContext with
readOnlyRootFilesystem: true, runAsNonRoot: true, runAsUser (e.g. 1000),
allowPrivilegeEscalation: false, capabilities: drop: ["ALL"], and
seccompProfile: { type: "RuntimeDefault" }; additionally add a
podSecurityContext on the Pod template spec (spec.template.spec) to set fsGroup
(e.g. 1000) and runAsNonRoot:true to provide a default baseline for all
containers. Ensure you add these keys adjacent to existing fields (e.g.,
alongside ports/env/readinessProbe) so Helm templating
(templates/gateway-deployment.yaml) and .Values overrides remain intact.

In `@infra/helm/llmgateway/templates/ingress.yaml`:
- Around line 27-110: Replace the repeated per-component ingress blocks with a
single range that iterates over a map/list of component keys and generates host,
enabled check and service name dynamically; e.g., range over
["ui","api","gateway","playground","code","docs","admin"] and for each key,
check .Values.<key>.enabled and .Values.ingress.hosts.<key>, use the host value
and build the backend service name with include "llmgateway.fullname" . | printf
"%s-%s" (include "llmgateway.fullname" .) key and port name http; this keeps the
logic centralized (references: .Values.ui.enabled, .Values.ingress.hosts.<key>,
include "llmgateway.fullname") and removes the duplicated blocks.

In `@infra/helm/llmgateway/templates/postgresql-statefulset.yaml`:
- Around line 21-23: The PostgreSQL container image is being built inline using
{{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }}
which diverges from other deployments that use the llmgateway image helper;
update the postgresql container spec to use the shared helper (e.g., call the
"llmgateway.image" template with the postgresql image values) or add a dedicated
helper (like "llmgateway.postgresqlImage") and use that in the StatefulSet so
image construction is consistent with other templates (change the image
attribute referenced in the postgresql container to use the helper instead of
direct concatenation).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8bc06efd-b953-4858-8ade-ce75457edce6

📥 Commits

Reviewing files that changed from the base of the PR and between 28d9ada and db4e2a5.

📒 Files selected for processing (27)
  • apps/docs/content/self-host.mdx
  • infra/helm/README.md
  • infra/helm/llmgateway/Chart.yaml
  • infra/helm/llmgateway/templates/_helpers.tpl
  • infra/helm/llmgateway/templates/admin-deployment.yaml
  • infra/helm/llmgateway/templates/admin-service.yaml
  • infra/helm/llmgateway/templates/api-deployment.yaml
  • infra/helm/llmgateway/templates/api-service.yaml
  • infra/helm/llmgateway/templates/code-deployment.yaml
  • infra/helm/llmgateway/templates/code-service.yaml
  • infra/helm/llmgateway/templates/configmap.yaml
  • infra/helm/llmgateway/templates/docs-deployment.yaml
  • infra/helm/llmgateway/templates/docs-service.yaml
  • infra/helm/llmgateway/templates/gateway-deployment.yaml
  • infra/helm/llmgateway/templates/gateway-service.yaml
  • infra/helm/llmgateway/templates/ingress.yaml
  • infra/helm/llmgateway/templates/playground-deployment.yaml
  • infra/helm/llmgateway/templates/playground-service.yaml
  • infra/helm/llmgateway/templates/postgresql-service.yaml
  • infra/helm/llmgateway/templates/postgresql-statefulset.yaml
  • infra/helm/llmgateway/templates/redis-service.yaml
  • infra/helm/llmgateway/templates/redis-statefulset.yaml
  • infra/helm/llmgateway/templates/secret.yaml
  • infra/helm/llmgateway/templates/ui-deployment.yaml
  • infra/helm/llmgateway/templates/ui-service.yaml
  • infra/helm/llmgateway/templates/worker-deployment.yaml
  • infra/helm/llmgateway/values.yaml

Comment on lines +187 to +195
## Helm Chart

You can also deploy LLMGateway using the Helm chart.

```bash
helm repo add llmgateway https://llmgateway.io/helm
helm repo update
helm install llmgateway llmgateway/llmgateway
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add Helm-specific prerequisites to avoid failed onboarding.

Line [187] adds a Helm deployment path, but prerequisites are still Docker-only. Please add Kubernetes cluster + Helm requirements in the prerequisites section so this option is executable end-to-end.

📘 Proposed doc patch
 ## Prerequisites
 
 - Latest Docker
+- A Kubernetes cluster (local or managed)
+- Helm v3
 - API keys for the LLM providers you want to use (OpenAI, Anthropic, etc.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/docs/content/self-host.mdx` around lines 187 - 195, Update the
prerequisites section to include Kubernetes/Helm requirements so the new "Helm
Chart" instructions are executable: state that a Kubernetes cluster (minimum
version, e.g., v1.20+ or your supported minimum), kubectl configured to the
target cluster, Helm 3 installed, a default StorageClass or PVC capability,
access to a container registry (or imagePullSecrets) for the llmgateway image,
and any required RBAC/load‑balancer considerations; reference the "Helm Chart"
heading in the docs to ensure users see these prerequisites before the helm
repo/add/install commands.

Comment on lines +22 to +77
initContainers:
- name: wait-for-postgresql
image: busybox:1.37
command:
- sh
- -c
- |
until nc -z {{ include "llmgateway.postgresql.host" . }} {{ include "llmgateway.postgresql.port" . }}; do
echo "Waiting for PostgreSQL..."
sleep 2
done
- name: wait-for-redis
image: busybox:1.37
command:
- sh
- -c
- |
until nc -z {{ include "llmgateway.redis.host" . }} {{ include "llmgateway.redis.port" . }}; do
echo "Waiting for Redis..."
sleep 2
done
containers:
- name: api
image: {{ include "llmgateway.image" (dict "context" . "image" .Values.api.image) }}
imagePullPolicy: {{ .Values.api.image.pullPolicy | default "IfNotPresent" }}
ports:
- name: http
containerPort: 80
protocol: TCP
envFrom:
- configMapRef:
name: {{ include "llmgateway.configMapName" . }}
- secretRef:
name: {{ include "llmgateway.secretName" . }}
{{- with .Values.api.extraEnv }}
env:
{{- toYaml . | nindent 12 }}
{{- end }}
readinessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
resources:
{{- toYaml .Values.api.resources | nindent 12 }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "api-deployment.yaml" -type f

Repository: theopenco/llmgateway

Length of output: 117


🏁 Script executed:

find . -name "values.yaml" -path "*/llmgateway*" -type f

Repository: theopenco/llmgateway

Length of output: 99


🏁 Script executed:

cat -n ./infra/helm/llmgateway/templates/api-deployment.yaml

Repository: theopenco/llmgateway

Length of output: 3634


🏁 Script executed:

cat -n ./infra/helm/llmgateway/values.yaml

Repository: theopenco/llmgateway

Length of output: 14853


🏁 Script executed:

find ./infra/helm/llmgateway/templates -name "*deployment.yaml" -type f

Repository: theopenco/llmgateway

Length of output: 512


🏁 Script executed:

rg "securityContext" ./infra/helm/llmgateway/templates/ || echo "No securityContext found in templates"

Repository: theopenco/llmgateway

Length of output: 101


Add pod and container security contexts to harden the deployment.

The api-deployment.yaml (lines 20-77) runs containers with unrestricted security defaults. Both init and main containers lack security context settings, allowing unnecessary privilege surface.

Add the following hardening to reduce attack surface:

🔒 Security hardening changes
     spec:
+      securityContext:
+        runAsNonRoot: true
+        seccompProfile:
+          type: RuntimeDefault
       {{- include "llmgateway.imagePullSecrets" . | nindent 6 }}
       initContainers:
         - name: wait-for-postgresql
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
           image: busybox:1.37
@@
         - name: wait-for-redis
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
           image: busybox:1.37
@@
         - name: api
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
           image: {{ include "llmgateway.image" (dict "context" . "image" .Values.api.image) }}
🧰 Tools
🪛 Trivy (0.69.3)

[error] 32-37: Root file system is not read-only

Container 'wait-for-postgresql' of Deployment 'llmgateway-api' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 42-47: Root file system is not read-only

Container 'wait-for-redis' of Deployment 'llmgateway-api' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 32-37: Default security context configured

container llmgateway-api in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 42-47: Default security context configured

container llmgateway-api in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/api-deployment.yaml` around lines 22 - 77,
Add Pod- and container-level security contexts: set a podSecurityContext (e.g.,
runAsNonRoot: true, runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000) and add
securityContext blocks for the initContainers "wait-for-postgresql" and
"wait-for-redis" and the main container "api" with hardened settings
(readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, capabilities:
drop all, seccompProfile: runtime/default, runAsNonRoot: true). Apply these
fields to the Pod spec and to each container's securityContext so both init and
main containers inherit the hardened runtime constraints.

Comment on lines +29 to +42
until nc -z {{ include "llmgateway.postgresql.host" . }} {{ include "llmgateway.postgresql.port" . }}; do
echo "Waiting for PostgreSQL..."
sleep 2
done
- name: wait-for-redis
image: busybox:1.37
command:
- sh
- -c
- |
until nc -z {{ include "llmgateway.redis.host" . }} {{ include "llmgateway.redis.port" . }}; do
echo "Waiting for Redis..."
sleep 2
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid infinite dependency wait loops in initContainers.

Lines 29-42 can block forever if PostgreSQL/Redis are unreachable, leaving pods stuck in Init without a terminal signal. Add bounded retries (or timeout + non-zero exit) so failures surface and recover via restart policy.

🧰 Tools
🪛 Trivy (0.69.3)

[error] 32-37: Root file system is not read-only

Container 'wait-for-postgresql' of Deployment 'llmgateway-api' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 32-37: Default security context configured

container llmgateway-api in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 29-87: Default security context configured

deployment llmgateway-api in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/api-deployment.yaml` around lines 29 - 42,
The initContainer loops that wait with nc for PostgreSQL and Redis can hang
indefinitely; modify the shell in both wait loops (the nc loops that reference
include "llmgateway.postgresql.host"/port and the initContainer named
wait-for-redis that references include "llmgateway.redis.host"/port) to
implement bounded retries or a timeout: add a counter (e.g., max_retries=30),
increment on each try, sleep between attempts, and if the counter exceeds
max_retries exit with non-zero (exit 1) so Kubernetes can surface the failure
and trigger restart policy; alternatively use a timeout wrapper (timeout
command) around nc and exit non-zero on timeout—apply the same change to both
the PostgreSQL and Redis wait blocks.

Comment on lines +20 to +54
spec:
{{- include "llmgateway.imagePullSecrets" . | nindent 6 }}
containers:
- name: code
image: {{ include "llmgateway.image" (dict "context" . "image" .Values.code.image) }}
imagePullPolicy: {{ .Values.code.image.pullPolicy | default "IfNotPresent" }}
ports:
- name: http
containerPort: 80
protocol: TCP
envFrom:
- configMapRef:
name: {{ include "llmgateway.configMapName" . }}
- secretRef:
name: {{ include "llmgateway.secretName" . }}
{{- with .Values.code.extraEnv }}
env:
{{- toYaml . | nindent 12 }}
{{- end }}
readinessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 20
periodSeconds: 30
timeoutSeconds: 5
resources:
{{- toYaml .Values.code.resources | nindent 12 }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the deployment file
find . -name "code-deployment.yaml" -type f

Repository: theopenco/llmgateway

Length of output: 118


🏁 Script executed:

#!/bin/bash
# Read the full deployment file to see the complete context
if [ -f "infra/helm/llmgateway/templates/code-deployment.yaml" ]; then
  cat -n "infra/helm/llmgateway/templates/code-deployment.yaml"
fi

Repository: theopenco/llmgateway

Length of output: 2745


🏁 Script executed:

#!/bin/bash
# Check the values file to see if security contexts are defined there
find infra/helm/llmgateway -name "values*.yaml" -type f -exec cat -n {} +

Repository: theopenco/llmgateway

Length of output: 14853


🏁 Script executed:

#!/bin/bash
# Check other deployment templates for security context examples
find infra/helm/llmgateway/templates -name "*deployment.yaml" -type f | while read file; do
  echo "=== File: $file ==="
  grep -n "securityContext\|runAsNonRoot\|seccomp\|capabilities" "$file"
  if [ $? -ne 0 ]; then
    echo "No security context found"
  fi
done

Repository: theopenco/llmgateway

Length of output: 816


🏁 Script executed:

#!/bin/bash
# Check if there are any helper templates that define security contexts
find infra/helm/llmgateway/templates -type f -name "*" | head -20

Repository: theopenco/llmgateway

Length of output: 1102


🏁 Script executed:

#!/bin/bash
# Check the helpers template
cat -n infra/helm/llmgateway/templates/_helpers.tpl

Repository: theopenco/llmgateway

Length of output: 5950


Add securityContext to the code container.

The code container lacks hardened security settings. Add pod and container security contexts:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  seccompProfile:
    type: RuntimeDefault
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL

Also apply these settings to other deployments (api, gateway, ui, playground, docs, admin, worker) for consistent baseline security hardening.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/code-deployment.yaml` around lines 20 - 54,
Add a hardened securityContext to the code container and pod: under the
Deployment spec add a pod-level securityContext and inside the container block
for the container named "code" (the block that contains image: {{ include
"llmgateway.image" (dict "context" . "image" .Values.code.image) }} and
readinessProbe/livenessProbe) add a container securityContext with runAsNonRoot:
true, runAsUser: 1000, seccompProfile type RuntimeDefault,
allowPrivilegeEscalation: false, and capabilities.drop: [ALL]; mirror the same
pod and container securityContext changes for the other deployments that use
containers named "api", "gateway", "ui", "playground", "docs", "admin", and
"worker" so .Values.*.image blocks receive the same baseline hardening.

Comment on lines +69 to +71
{{- if .keepAliveTimeoutS }}
KEEP_ALIVE_TIMEOUT_S: {{ .keepAliveTimeoutS | quote }}
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the configmap.yaml file
fd -type f -name "configmap.yaml" | grep -i llmgateway

Repository: theopenco/llmgateway

Length of output: 234


🏁 Script executed:

# Read the entire configmap.yaml to understand structure
cat -n infra/helm/llmgateway/templates/configmap.yaml

Repository: theopenco/llmgateway

Length of output: 6290


KEEP_ALIVE_TIMEOUT_S is defined twice in the same ConfigMap data map.

Lines 70 and 90 both render this key under different contexts (gateway.config and api.config). When both conditions are true, the second occurrence (line 90) overwrites the first, breaking per-component timeout tuning.

Rename to component-specific keys (GATEWAY_KEEP_ALIVE_TIMEOUT_S and API_KEEP_ALIVE_TIMEOUT_S) and wire each component deployment to its corresponding key.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/configmap.yaml` around lines 69 - 71, The
ConfigMap currently defines KEEP_ALIVE_TIMEOUT_S twice (under gateway.config and
api.config) causing the latter to overwrite the former; change the keys to
component-specific names (e.g. replace KEEP_ALIVE_TIMEOUT_S in the gateway block
with GATEWAY_KEEP_ALIVE_TIMEOUT_S and the one in the api block with
API_KEEP_ALIVE_TIMEOUT_S) and update the corresponding deployment/manifest
templates that read these values so gateway.config consumers read
GATEWAY_KEEP_ALIVE_TIMEOUT_S and api.config consumers read
API_KEEP_ALIVE_TIMEOUT_S (ensure any references in templates, envFrom/env
entries or valueFrom lookups are updated to the new key names).

Comment on lines +18 to +65
spec:
{{- include "llmgateway.imagePullSecrets" . | nindent 6 }}
containers:
- name: redis
image: {{ .Values.redis.image.repository }}:{{ .Values.redis.image.tag }}
imagePullPolicy: {{ .Values.redis.image.pullPolicy | default "IfNotPresent" }}
ports:
- name: redis
containerPort: 6379
protocol: TCP
{{- if .Values.redis.password }}
command:
- redis-server
- --requirepass
- $(REDIS_PASSWORD)
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "llmgateway.secretName" . }}
key: REDIS_PASSWORD
{{- end }}
readinessProbe:
exec:
command:
- redis-cli
{{- if .Values.redis.password }}
- -a
- $(REDIS_PASSWORD)
{{- end }}
- ping
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- redis-cli
{{- if .Values.redis.password }}
- -a
- $(REDIS_PASSWORD)
{{- end }}
- ping
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
resources:
{{- toYaml .Values.redis.resources | nindent 12 }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -type f -name "redis-statefulset.yaml" 2>/dev/null | head -20

Repository: theopenco/llmgateway

Length of output: 46


🏁 Script executed:

find . -type f -name "*redis*statefulset*" -o -name "*redis-statefulset*" 2>/dev/null

Repository: theopenco/llmgateway

Length of output: 120


🏁 Script executed:

fd -i redis infra/helm 2>/dev/null

Repository: theopenco/llmgateway

Length of output: 169


🏁 Script executed:

ls -la infra/helm/llmgateway/templates/ 2>/dev/null | head -20

Repository: theopenco/llmgateway

Length of output: 1354


🏁 Script executed:

cat infra/helm/llmgateway/templates/redis-statefulset.yaml

Repository: theopenco/llmgateway

Length of output: 2695


🏁 Script executed:

grep -r "securityContext" infra/helm/llmgateway/templates/ --include="*.yaml"

Repository: theopenco/llmgateway

Length of output: 46


🏁 Script executed:

cat infra/helm/llmgateway/templates/api-deployment.yaml

Repository: theopenco/llmgateway

Length of output: 3032


🏁 Script executed:

cat infra/helm/llmgateway/templates/gateway-deployment.yaml

Repository: theopenco/llmgateway

Length of output: 3125


🏁 Script executed:

cat infra/helm/llmgateway/values.yaml | head -100

Repository: theopenco/llmgateway

Length of output: 2925


🏁 Script executed:

rg -A 10 "redis:" infra/helm/llmgateway/values.yaml

Repository: theopenco/llmgateway

Length of output: 281


🏁 Script executed:

cat infra/helm/llmgateway/templates/postgresql-statefulset.yaml

Repository: theopenco/llmgateway

Length of output: 3032


🌐 Web query:

Redis readOnlyRootFilesystem Kubernetes security context

💡 Result:

To run Redis on Kubernetes with readOnlyRootFilesystem: true, you must (a) keep the image root FS read-only and (b) provide explicit writable mounts for every path Redis (or the OS) needs to write to—typically /data and often /tmp.

Why it breaks without extra mounts

  • The official Redis image expects persistence files under /data (RDB/AOF) and commonly runs with /data as the writable location you mount a volume to. [1]
  • With a read-only root filesystem, writes to /tmp can also fail (many images/tools assume /tmp is writable), leading to CrashLoopBackOff unless you mount a writable volume there. [2]

Example (StatefulSet/Deployment container spec)

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 999
    runAsGroup: 999
    fsGroup: 999   # helps make mounted volumes writable for the process
  containers:
  - name: redis
    image: redis:7
    securityContext:
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
    volumeMounts:
    - name: redis-data
      mountPath: /data
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}
  # For StatefulSet, use volumeClaimTemplates for redis-data; for Deployment, use a PVC volume.

Notes

  • fsGroup is commonly used so Kubernetes mounts can be writable by a non-root process (Kubernetes adjusts volume ownership/permissions to match the fsGroup when mounting). [3]
  • If you also set mounted volumes readOnly: true, be aware of historical Linux/Kubernetes edge-cases around “truly” recursive read-only mounts (mostly relevant when there are submounts). [4]

Sources: Redis official image docs [1], K8s /tmp workaround example [2], fsGroup behavior explanation [3], Kubernetes recursive read-only mount details [4].


Add pod/container security contexts for Redis.

Lines 18-65 currently rely on default security settings. For a database/cache workload, this should be hardened by default (non-root, no privilege escalation, dropped capabilities, runtime default seccomp).

The suggested hardening requires additional configuration for readOnlyRootFilesystem: true to work properly:

🔒 Suggested hardening baseline
     spec:
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 999
+        runAsGroup: 999
+        fsGroup: 999
+        seccompProfile:
+          type: RuntimeDefault
       {{- include "llmgateway.imagePullSecrets" . | nindent 6 }}
       containers:
         - name: redis
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
           image: {{ .Values.redis.image.repository }}:{{ .Values.redis.image.tag }}
           imagePullPolicy: {{ .Values.redis.image.pullPolicy | default "IfNotPresent" }}
           ports:
             - name: redis
               containerPort: 6379
               protocol: TCP
           {{- if .Values.redis.password }}
           command:
             - redis-server
             - --requirepass
             - $(REDIS_PASSWORD)
           env:
             - name: REDIS_PASSWORD
               valueFrom:
                 secretKeyRef:
                   name: {{ include "llmgateway.secretName" . }}
                   key: REDIS_PASSWORD
           {{- end }}
           readinessProbe:
             exec:
               command:
                 - redis-cli
                 {{- if .Values.redis.password }}
                 - -a
                 - $(REDIS_PASSWORD)
                 {{- end }}
                 - ping
             initialDelaySeconds: 5
             periodSeconds: 10
             timeoutSeconds: 5
           livenessProbe:
             exec:
               command:
                 - redis-cli
                 {{- if .Values.redis.password }}
                 - -a
                 - $(REDIS_PASSWORD)
                 {{- end }}
                 - ping
             initialDelaySeconds: 15
             periodSeconds: 30
             timeoutSeconds: 5
           resources:
             {{- toYaml .Values.redis.resources | nindent 12 }}
           volumeMounts:
             - name: data
               mountPath: /data
+            - name: tmp
+              mountPath: /tmp
       volumes:
+        - name: tmp
+          emptyDir: {}

Note: fsGroup: 999 and /tmp emptyDir are required for the read-only filesystem to work; the runAsUser and runAsGroup should match the Redis image's unprivileged user UID/GID (typically 999 for redis:8-alpine).

🧰 Tools
🪛 Trivy (0.69.3)

[error] 30-62: Root file system is not read-only

Container 'redis' of StatefulSet 'llmgateway-redis' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 30-62: Default security context configured

container llmgateway-redis in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 27-62: Default security context configured

statefulset llmgateway-redis in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/redis-statefulset.yaml` around lines 18 - 65,
Add a Pod and container security hardening baseline to the Redis StatefulSet: in
the StatefulSet spec add podSecurityContext with runAsUser: 999, runAsGroup: 999
and fsGroup: 999 and add an emptyDir volume mounted at /tmp; then on the
container named "redis" set securityContext to disallow privilege escalation
(allowPrivilegeEscalation: false), drop all capabilities, set
readOnlyRootFilesystem: true, and set seccompProfile to runtime/default so the
readOnlyRootFilesystem works; ensure the container still mounts the /tmp
emptyDir and that the names (podSecurityContext, securityContext, container
"redis") match existing fields in the template.

Comment on lines +15 to +20
{{- if .Values.postgresql.enabled }}
DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.postgresql.password $pgHost $pgPort $pgDB | quote }}
POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }}
{{- else }}
DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.externalPostgresql.password $pgHost $pgPort $pgDB | quote }}
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Password in DATABASE_URL should be URL-encoded.

If the PostgreSQL password contains special characters (e.g., @, :, /, %), the connection string will be malformed. Use urlquery to encode the password.

Also note: POSTGRES_PASSWORD is only set when using the built-in PostgreSQL (line 17), not for external PostgreSQL. If the worker/api containers expect this env var regardless, it may cause issues.

🔧 Proposed fix for URL encoding
  {{- if .Values.postgresql.enabled }}
- DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.postgresql.password $pgHost $pgPort $pgDB | quote }}
+ DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.postgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }}
  POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }}
  {{- else }}
- DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.externalPostgresql.password $pgHost $pgPort $pgDB | quote }}
+ DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.externalPostgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }}
  {{- end }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{{- if .Values.postgresql.enabled }}
DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.postgresql.password $pgHost $pgPort $pgDB | quote }}
POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }}
{{- else }}
DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.externalPostgresql.password $pgHost $pgPort $pgDB | quote }}
{{- end }}
{{- if .Values.postgresql.enabled }}
DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.postgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }}
POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }}
{{- else }}
DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.externalPostgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }}
{{- end }}
🧰 Tools
🪛 Checkov (3.2.513)

[medium] 16-17: Basic Auth Credentials

(CKV_SECRET_4)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/secret.yaml` around lines 15 - 20, The
DATABASE_URL construction uses raw passwords and can break with special chars;
update the template to url-encode the password using the Helm/Go template
urlquery function for both .Values.postgresql.password and
.Values.externalPostgresql.password when building DATABASE_URL (refer to the
DATABASE_URL line and $pgUser/$pgHost/$pgPort/$pgDB symbols), and ensure
POSTGRES_PASSWORD is set consistently for runtime consumers (move or add
POSTGRES_PASSWORD outside the if/else or set it from the externalPostgresql
branch as well so containers always receive POSTGRES_PASSWORD).

labels:
{{- include "llmgateway.componentLabels" (dict "context" . "component" "ui") | nindent 4 }}
spec:
replicas: {{ .Values.ui.replicas | default 1 }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Helm Sprig default filter zero numeric value behavior Go template

💡 Result:

In Helm templates, default comes from Sprig and returns the fallback when the given value is considered “empty”. For numerics, 0 is defined as empty, so default will replace an explicitly-set 0 with the default value. Example: {{ .Values.replicas | default 1 }} renders 1 when .Values.replicas is 0. [1]

If you need to distinguish “unset” vs “set to 0”, don’t use default on the number directly—check key existence instead:

{{- if hasKey .Values "replicas" -}}
{{ .Values.replicas }}
{{- else -}}
1
{{- end -}}

hasKey returns true when the dict contains the key. [2]

Sources: Sprig “Default Functions” docs [1], Helm template function list (hasKey) [2].


🏁 Script executed:

# Verify the file exists and check the exact content at line 9
fd -t f "ui-deployment.yaml" infra/

Repository: theopenco/llmgateway

Length of output: 114


🏁 Script executed:

cat -n infra/helm/llmgateway/templates/ui-deployment.yaml | head -20

Repository: theopenco/llmgateway

Length of output: 997


Preserve explicit replicas: 0 instead of forcing 1.

On Line 9, default treats 0 as empty, so ui.replicas: 0 still renders as 1. This prevents scale-to-zero deployments. Use hasKey to distinguish between unset and explicitly-set zero values.

Proposed fix
-  replicas: {{ .Values.ui.replicas | default 1 }}
+  {{- if hasKey .Values.ui "replicas" }}
+  replicas: {{ .Values.ui.replicas }}
+  {{- else }}
+  replicas: 1
+  {{- end }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
replicas: {{ .Values.ui.replicas | default 1 }}
{{- if hasKey .Values.ui "replicas" }}
replicas: {{ .Values.ui.replicas }}
{{- else }}
replicas: 1
{{- end }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/templates/ui-deployment.yaml` at line 9, The replicas
line treats 0 as empty because default considers zero value empty; change the
template to check whether the replicas key is present and only fall back to 1
when it's absent. Replace the single-line "replicas: {{ .Values.ui.replicas |
default 1 }}" with a conditional using hasKey on .Values.ui (e.g., if hasKey
.Values.ui "replicas" then render "replicas: {{ .Values.ui.replicas }}" else
render "replicas: 1" end) so an explicit ui.replicas: 0 is preserved.

Comment on lines +43 to +47
auth:
# -- (required) Secret used for signing auth tokens (32+ characters)
authSecret: "change-me-to-a-random-32-char-secret"
# -- (required) HMAC secret for hashing API keys
gatewayApiKeyHashSecret: "change-me-to-a-random-secret"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Default secrets must be changed before production deployment.

The default values for authSecret and gatewayApiKeyHashSecret are placeholder strings. While comments indicate they need to be changed, consider:

  1. Adding validation in the templates to fail if defaults are unchanged
  2. Using required function in templates to enforce values
🛡️ Example template validation

Add to secret.yaml or a dedicated validation template:

{{- if eq .Values.auth.authSecret "change-me-to-a-random-32-char-secret" }}
{{- fail "auth.authSecret must be changed from the default value" }}
{{- end }}
{{- if eq .Values.auth.gatewayApiKeyHashSecret "change-me-to-a-random-secret" }}
{{- fail "auth.gatewayApiKeyHashSecret must be changed from the default value" }}
{{- end }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/values.yaml` around lines 43 - 47, The values.yaml
currently contains placeholder secrets (auth.authSecret and
auth.gatewayApiKeyHashSecret) that must not be used in production; update Helm
templates (e.g., secret.yaml or a new validation template) to validate these
keys at render time by checking .Values.auth.authSecret and
.Values.auth.gatewayApiKeyHashSecret and failing the install if they still equal
the default placeholders (or use the required function to enforce non-empty,
plus explicit equality checks to fail when the exact default strings are
present), emitting clear messages like "auth.authSecret must be changed from the
default value" and "auth.gatewayApiKeyHashSecret must be changed from the
default value".

Comment on lines +292 to +299
postgresql:
enabled: true
image:
repository: postgres
tag: "17-alpine"
pullPolicy: IfNotPresent
user: postgres
password: "changeme"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Default PostgreSQL password "changeme" is insecure.

The default password should either be empty (forcing users to set it) or the template should validate that it has been changed from the default. Same validation pattern as suggested for auth secrets could apply here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infra/helm/llmgateway/values.yaml` around lines 292 - 299, The default
insecure password is set at postgresql.password ("changeme"); change this to an
empty value or require a user-supplied secret and add validation to ensure
postgresql.password is not the default. Update the helm values template to treat
postgresql.password as optional (empty by default) and/or add the same
validation logic used for auth secrets to fail deployment if postgresql.password
== "changeme" or left unchanged, referencing the postgresql block
(postgresql.enabled, postgresql.user, postgresql.password) so the chart forces
explicit credential configuration.

Comment thread apps/docs/content/self-host.mdx Outdated
You can also deploy LLMGateway using the Helm chart.

```bash
helm repo add llmgateway https://llmgateway.io/helm
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yezz123 this link needs to be fixed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should deploy it either in github pages or you can do it with the domain of llmgateway

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/images.yml:
- Around line 409-412: The publish-helm job is not waiting for image tests and
may publish a chart before images pass; update the publish-helm job to depend on
the test-split job by adding "test-split" to its needs list and extend the if
condition to require needs.test-split.result == 'success' (i.e., change needs:
to include test-split and change if: to github.event_name == 'release' &&
needs.merge-split.result == 'success' && needs.test-split.result == 'success')
so chart publishing is gated on both merge-split and image-test success.
- Around line 431-438: Add a strict SemVer validation step that verifies
IMAGE_TAG (and derived CHART_VERSION) conforms to Helm-compatible SemVer before
mutating Chart.yaml: check IMAGE_TAG/CHART_VERSION against a regex that
disallows underscores and matches full SemVer (e.g. allow optional
prerelease/build with segments of [0-9A-Za-z-] separated by dots), fail the job
and do not run the sed edits if it doesn't match; also replace the existing
release-tag validation pattern `(-[a-zA-Z0-9_-]+)*` with the stricter pattern
`(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?` so IMAGE_TAGs like `v1.2.3-rc_1` are
rejected before images are pushed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 50bd52b5-09f9-4a3b-b47d-f8eba3e96347

📥 Commits

Reviewing files that changed from the base of the PR and between db4e2a5 and 96fa4dd.

📒 Files selected for processing (3)
  • .github/workflows/images.yml
  • apps/docs/content/self-host.mdx
  • infra/helm/README.md
✅ Files skipped from review due to trivial changes (2)
  • infra/helm/README.md
  • apps/docs/content/self-host.mdx

Comment on lines +409 to +412
needs:
- setup
- merge-split
if: github.event_name == 'release' && needs.merge-split.result == 'success'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wait for image tests before publishing the Helm chart.

publish-helm runs after merge-split, in parallel with test-split, so a release can publish a chart pointing at images that later fail the split-image tests. Gate chart publishing on the image test job too.

🚦 Proposed workflow gate
     needs:
       - setup
       - merge-split
-    if: github.event_name == 'release' && needs.merge-split.result == 'success'
+      - test-split
+    if: github.event_name == 'release' && needs.merge-split.result == 'success' && needs.test-split.result == 'success'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
needs:
- setup
- merge-split
if: github.event_name == 'release' && needs.merge-split.result == 'success'
needs:
- setup
- merge-split
- test-split
if: github.event_name == 'release' && needs.merge-split.result == 'success' && needs.test-split.result == 'success'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/images.yml around lines 409 - 412, The publish-helm job is
not waiting for image tests and may publish a chart before images pass; update
the publish-helm job to depend on the test-split job by adding "test-split" to
its needs list and extend the if condition to require needs.test-split.result ==
'success' (i.e., change needs: to include test-split and change if: to
github.event_name == 'release' && needs.merge-split.result == 'success' &&
needs.test-split.result == 'success') so chart publishing is gated on both
merge-split and image-test success.

Comment on lines +431 to +438
- name: Set chart version
working-directory: infra/helm/llmgateway
run: |
IMAGE_TAG="${{ needs.setup.outputs.image_tag }}"
CHART_VERSION="${IMAGE_TAG#v}"
sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml
sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml
cat Chart.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verifies that the existing release-tag pattern accepts values rejected by Helm/SemVer-style chart versions.

python - <<'PY'
import re

workflow_tag_re = re.compile(r'^v[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9_-]+)*$')
helm_chart_version_re = re.compile(
    r'^(0|[1-9][0-9]*)\.'
    r'(0|[1-9][0-9]*)\.'
    r'(0|[1-9][0-9]*)'
    r'(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$'
)

for tag in ["v1.2.3", "v1.2.3-rc1", "v1.2.3-rc_1", "v1.2.3-rc.1"]:
    chart_version = tag[1:]
    print(
        tag,
        "workflow_accepts=", bool(workflow_tag_re.fullmatch(tag)),
        "helm_semver_accepts=", bool(helm_chart_version_re.fullmatch(chart_version)),
    )
PY

Repository: theopenco/llmgateway

Length of output: 303


🏁 Script executed:

# First, check the workflow file exists and read the relevant sections
if [ -f .github/workflows/images.yml ]; then
  echo "=== Lines 40-50 (checking release-tag validation at line 44) ==="
  sed -n '40,50p' .github/workflows/images.yml
  echo ""
  echo "=== Lines 425-445 (checking chart version code at lines 431-438) ==="
  sed -n '425,445p' .github/workflows/images.yml
else
  echo "File not found at .github/workflows/images.yml"
  find . -name "images.yml" -o -name "*images*.yml" 2>/dev/null | head -20
fi

Repository: theopenco/llmgateway

Length of output: 1502


🏁 Script executed:

# Create a minimal test to verify Helm behavior with underscores in version
# This tests the actual Helm package behavior
python3 <<'PYEOF'
import re

# Test the actual Helm SemVer spec
# Per https://semver.org and Helm documentation
helm_semver = re.compile(
    r'^(0|[1-9][0-9]*)\.'
    r'(0|[1-9][0-9]*)\.'
    r'(0|[1-9][0-9]*)'
    r'(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$'
)

workflow_pattern = re.compile(r'^v[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9_-]+)*$')

test_cases = [
    ("v1.2.3", "basic version"),
    ("v1.2.3-rc1", "prerelease no separator"),
    ("v1.2.3-rc_1", "prerelease with underscore"),
    ("v1.2.3-rc.1", "prerelease with dot"),
    ("v1.2.3-rc-1", "prerelease with dash"),
    ("v1.2.3-alpha_beta", "multiple underscore"),
]

print("Testing workflow tag validation vs Helm SemVer compatibility:\n")
print(f"{'Tag':<20} {'Workflow':<12} {'Helm SemVer':<15} {'Status':<40}")
print("-" * 87)

for tag, desc in test_cases:
    chart_version = tag[1:] if tag.startswith('v') else tag
    workflow_ok = bool(workflow_pattern.fullmatch(tag))
    helm_ok = bool(helm_semver.fullmatch(chart_version))
    
    if workflow_ok and not helm_ok:
        status = "❌ MISMATCH: Passes workflow, fails Helm"
    elif not workflow_ok and helm_ok:
        status = "⚠️  Inverse: Fails workflow, passes Helm"
    elif workflow_ok and helm_ok:
        status = "✓ Both pass"
    else:
        status = "✗ Both fail"
    
    print(f"{tag:<20} {str(workflow_ok):<12} {str(helm_ok):<15} {status:<40}")
PYEOF

Repository: theopenco/llmgateway

Length of output: 851


Validate chart versions with Helm-compatible SemVer.

CHART_VERSION inherits the Docker tag validator at line 44, which allows underscores in prerelease suffixes (e.g., v1.2.3-rc_1). Such tags pass the workflow validation, images get pushed to the registry, but then fail at helm package due to Helm requiring strict SemVer. Add SemVer validation before mutating Chart.yaml and update the release-tag pattern at line 44 to enforce strict SemVer compliance.

🛡️ Proposed validation before mutating Chart.yaml
       - name: Set chart version
         working-directory: infra/helm/llmgateway
         run: |
           IMAGE_TAG="${{ needs.setup.outputs.image_tag }}"
           CHART_VERSION="${IMAGE_TAG#v}"
+          if [[ ! "${CHART_VERSION}" =~ ^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$ ]]; then
+            echo "Invalid Helm chart version: ${CHART_VERSION}"
+            exit 1
+          fi
           sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml
           sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml
           cat Chart.yaml

Also update line 44 validation pattern from (-[a-zA-Z0-9_-]+)* to (-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)? to reject underscores and enforce SemVer before pushing images.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Set chart version
working-directory: infra/helm/llmgateway
run: |
IMAGE_TAG="${{ needs.setup.outputs.image_tag }}"
CHART_VERSION="${IMAGE_TAG#v}"
sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml
sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml
cat Chart.yaml
- name: Set chart version
working-directory: infra/helm/llmgateway
run: |
IMAGE_TAG="${{ needs.setup.outputs.image_tag }}"
CHART_VERSION="${IMAGE_TAG#v}"
if [[ ! "${CHART_VERSION}" =~ ^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$ ]]; then
echo "Invalid Helm chart version: ${CHART_VERSION}"
exit 1
fi
sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml
sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml
cat Chart.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/images.yml around lines 431 - 438, Add a strict SemVer
validation step that verifies IMAGE_TAG (and derived CHART_VERSION) conforms to
Helm-compatible SemVer before mutating Chart.yaml: check IMAGE_TAG/CHART_VERSION
against a regex that disallows underscores and matches full SemVer (e.g. allow
optional prerelease/build with segments of [0-9A-Za-z-] separated by dots), fail
the job and do not run the sed edits if it doesn't match; also replace the
existing release-tag validation pattern `(-[a-zA-Z0-9_-]+)*` with the stricter
pattern `(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?` so IMAGE_TAGs like `v1.2.3-rc_1`
are rejected before images are pushed.

@steebchen steebchen enabled auto-merge April 19, 2026 11:24
@steebchen steebchen added this pull request to the merge queue Apr 19, 2026
Merged via the queue into theopenco:main with commit 9a46a1b Apr 19, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants