feat(helm): add LLMGateway helm chart#1998
Conversation
WalkthroughAdds a new Helm chart for LLM Gateway (chart metadata, helpers, templates, values), CI job to publish the chart to GHCR, and documentation updates with Helm install instructions for deploying to Kubernetes. Changes
Sequence Diagram(s)sequenceDiagram
participant Release as "GitHub Release"
participant Actions as "GitHub Actions (publish-helm)"
participant GHCR as "GitHub Container Registry (OCI)"
participant User as "Operator (helm CLI)"
participant K8s as "Kubernetes API"
Release->>Actions: trigger publish-helm
Actions->>Actions: update Chart.yaml (version/appVersion)
Actions->>GHCR: push OCI chart package
User->>GHCR: helm install oci://ghcr.io/.../llmgateway
GHCR->>User: chart blob
User->>K8s: Helm renders templates / apply manifests
K8s->>K8s: create ConfigMap, Secret, Services, Deployments, StatefulSets, Ingress
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
81d9cd1 to
db4e2a5
Compare
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (5)
infra/helm/llmgateway/Chart.yaml (1)
7-7: Use a concreteappVersioninstead oflatestfor proper release tracking.Line 7 uses
appVersion: "latest", which makes chart metadata non-deterministic. According to Helm best practices,appVersionshould reflect the actual application version included in the chart (e.g., matching the default image tag deployed by the chart) to enable proper release tracking and automation.🔧 Suggested change
-appVersion: "latest" +appVersion: "0.0.0"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/helm/llmgateway/Chart.yaml` at line 7, Replace the non-deterministic appVersion value in Chart.yaml (appVersion: "latest") with a concrete application version string that matches the chart’s default image tag; update the appVersion field to the actual semantic version (e.g., "v1.2.3") and, if necessary, align values.yaml image.tag (or any default image tag referenced by the chart) and CI/release automation to keep them in sync so chart metadata is deterministic and traceable.infra/helm/llmgateway/templates/ingress.yaml (1)
27-110: Consider generating rules via a loop to reduce drift risk.The repeated component blocks are easy to desync (host key, service suffix, enable flag). A small
range-driven structure would make future additions safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/helm/llmgateway/templates/ingress.yaml` around lines 27 - 110, Replace the repeated per-component ingress blocks with a single range that iterates over a map/list of component keys and generates host, enabled check and service name dynamically; e.g., range over ["ui","api","gateway","playground","code","docs","admin"] and for each key, check .Values.<key>.enabled and .Values.ingress.hosts.<key>, use the host value and build the backend service name with include "llmgateway.fullname" . | printf "%s-%s" (include "llmgateway.fullname" .) key and port name http; this keeps the logic centralized (references: .Values.ui.enabled, .Values.ingress.hosts.<key>, include "llmgateway.fullname") and removes the duplicated blocks.infra/helm/llmgateway/templates/gateway-deployment.yaml (2)
22-42: Init containers can wait indefinitely without timeout.The
wait-for-postgresqlandwait-for-redisinit containers loop forever until the services are reachable. If the dependent services are misconfigured or fail to start, the pod will remain inInitstate indefinitely.Consider adding a timeout or max retry count to fail fast and surface the issue.
♻️ Proposed fix with timeout
- name: wait-for-postgresql image: busybox:1.37 command: - sh - -c - | + TIMEOUT=300 + ELAPSED=0 until nc -z {{ include "llmgateway.postgresql.host" . }} {{ include "llmgateway.postgresql.port" . }}; do echo "Waiting for PostgreSQL..." sleep 2 + ELAPSED=$((ELAPSED + 2)) + if [ $ELAPSED -ge $TIMEOUT ]; then + echo "Timeout waiting for PostgreSQL" + exit 1 + fi done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/helm/llmgateway/templates/gateway-deployment.yaml` around lines 22 - 42, The init containers wait-for-postgresql and wait-for-redis can hang forever; modify their command scripts to implement a timeout or max-retry and exit non-zero when exceeded so the pod fails fast. Update the command blocks for the initContainers (names: wait-for-postgresql, wait-for-redis) to track either a retry counter or elapsed time (e.g., MAX_RETRIES or TIMEOUT_SECONDS), loop until nc -z succeeds or the limit is reached, print a clear error message and exit 1 on timeout; optionally expose the timeout/retry via a Helm value so it’s configurable.
43-77: Consider adding securityContext for defense-in-depth.Trivy flags missing
securityContext.readOnlyRootFilesystem(KSV-0014) and default security context (KSV-0118). While not strictly required, adding a restrictive security context improves the security posture.🛡️ Suggested securityContext addition
containers: - name: gateway image: {{ include "llmgateway.image" (dict "context" . "image" .Values.gateway.image) }} imagePullPolicy: {{ .Values.gateway.image.pullPolicy | default "IfNotPresent" }} + securityContext: + readOnlyRootFilesystem: true + runAsNonRoot: true + allowPrivilegeEscalation: false ports:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/helm/llmgateway/templates/gateway-deployment.yaml` around lines 43 - 77, Add a restrictive security context to the gateway container and a default pod securityContext for defense-in-depth: under the containers block for the container named "gateway" add a securityContext with readOnlyRootFilesystem: true, runAsNonRoot: true, runAsUser (e.g. 1000), allowPrivilegeEscalation: false, capabilities: drop: ["ALL"], and seccompProfile: { type: "RuntimeDefault" }; additionally add a podSecurityContext on the Pod template spec (spec.template.spec) to set fsGroup (e.g. 1000) and runAsNonRoot:true to provide a default baseline for all containers. Ensure you add these keys adjacent to existing fields (e.g., alongside ports/env/readinessProbe) so Helm templating (templates/gateway-deployment.yaml) and .Values overrides remain intact.infra/helm/llmgateway/templates/postgresql-statefulset.yaml (1)
21-23: Image construction differs from other deployments.The PostgreSQL container image is constructed directly (
{{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }}) rather than using thellmgateway.imagehelper used by other deployments. This is acceptable since PostgreSQL uses a different image structure (no registry prefix needed for official images), but worth noting for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infra/helm/llmgateway/templates/postgresql-statefulset.yaml` around lines 21 - 23, The PostgreSQL container image is being built inline using {{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }} which diverges from other deployments that use the llmgateway image helper; update the postgresql container spec to use the shared helper (e.g., call the "llmgateway.image" template with the postgresql image values) or add a dedicated helper (like "llmgateway.postgresqlImage") and use that in the StatefulSet so image construction is consistent with other templates (change the image attribute referenced in the postgresql container to use the helper instead of direct concatenation).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/docs/content/self-host.mdx`:
- Around line 187-195: Update the prerequisites section to include
Kubernetes/Helm requirements so the new "Helm Chart" instructions are
executable: state that a Kubernetes cluster (minimum version, e.g., v1.20+ or
your supported minimum), kubectl configured to the target cluster, Helm 3
installed, a default StorageClass or PVC capability, access to a container
registry (or imagePullSecrets) for the llmgateway image, and any required
RBAC/load‑balancer considerations; reference the "Helm Chart" heading in the
docs to ensure users see these prerequisites before the helm repo/add/install
commands.
In `@infra/helm/llmgateway/templates/api-deployment.yaml`:
- Around line 29-42: The initContainer loops that wait with nc for PostgreSQL
and Redis can hang indefinitely; modify the shell in both wait loops (the nc
loops that reference include "llmgateway.postgresql.host"/port and the
initContainer named wait-for-redis that references include
"llmgateway.redis.host"/port) to implement bounded retries or a timeout: add a
counter (e.g., max_retries=30), increment on each try, sleep between attempts,
and if the counter exceeds max_retries exit with non-zero (exit 1) so Kubernetes
can surface the failure and trigger restart policy; alternatively use a timeout
wrapper (timeout command) around nc and exit non-zero on timeout—apply the same
change to both the PostgreSQL and Redis wait blocks.
- Around line 22-77: Add Pod- and container-level security contexts: set a
podSecurityContext (e.g., runAsNonRoot: true, runAsUser: 1000, runAsGroup: 1000,
fsGroup: 1000) and add securityContext blocks for the initContainers
"wait-for-postgresql" and "wait-for-redis" and the main container "api" with
hardened settings (readOnlyRootFilesystem: true, allowPrivilegeEscalation:
false, capabilities: drop all, seccompProfile: runtime/default, runAsNonRoot:
true). Apply these fields to the Pod spec and to each container's
securityContext so both init and main containers inherit the hardened runtime
constraints.
In `@infra/helm/llmgateway/templates/code-deployment.yaml`:
- Around line 20-54: Add a hardened securityContext to the code container and
pod: under the Deployment spec add a pod-level securityContext and inside the
container block for the container named "code" (the block that contains image:
{{ include "llmgateway.image" (dict "context" . "image" .Values.code.image) }}
and readinessProbe/livenessProbe) add a container securityContext with
runAsNonRoot: true, runAsUser: 1000, seccompProfile type RuntimeDefault,
allowPrivilegeEscalation: false, and capabilities.drop: [ALL]; mirror the same
pod and container securityContext changes for the other deployments that use
containers named "api", "gateway", "ui", "playground", "docs", "admin", and
"worker" so .Values.*.image blocks receive the same baseline hardening.
In `@infra/helm/llmgateway/templates/configmap.yaml`:
- Around line 69-71: The ConfigMap currently defines KEEP_ALIVE_TIMEOUT_S twice
(under gateway.config and api.config) causing the latter to overwrite the
former; change the keys to component-specific names (e.g. replace
KEEP_ALIVE_TIMEOUT_S in the gateway block with GATEWAY_KEEP_ALIVE_TIMEOUT_S and
the one in the api block with API_KEEP_ALIVE_TIMEOUT_S) and update the
corresponding deployment/manifest templates that read these values so
gateway.config consumers read GATEWAY_KEEP_ALIVE_TIMEOUT_S and api.config
consumers read API_KEEP_ALIVE_TIMEOUT_S (ensure any references in templates,
envFrom/env entries or valueFrom lookups are updated to the new key names).
- Around line 118-120: The ConfigMap currently uses truthiness checks (e.g. '{{-
if .enableDataRetentionCleanup }}') which omit env entries when users
intentionally set false/""; change those guards to key-presence checks using
hasKey against .Values (e.g. replace the if with '{{- if hasKey .Values
"enableDataRetentionCleanup" }}' and render ENABLE_DATA_RETENTION_CLEANUP with
the quoted value), and apply the same pattern to the other boolean/numeric
guards in this template so the env var is emitted whenever the key exists
regardless of its falsy value.
In `@infra/helm/llmgateway/templates/ingress.yaml`:
- Around line 26-110: The template unconditionally emits the Ingress "rules:"
key which can produce an empty rules array when none of the components are
enabled; update the template to only render "rules:" when at least one of the
component conditions is true (check the same conditions used for
ui/api/gateway/playground/code/docs/admin: .Values.ui.enabled and
.Values.ingress.hosts.ui, .Values.api.enabled and .Values.ingress.hosts.api,
etc.), or alternatively provide a safe fallback by rendering a defaultBackend
when none of those conditions are met; locate the block around the "rules:" line
and wrap it with a single conditional that ORs all seven checks (or add a
defaultBackend section under spec) so Kubernetes never receives an empty rules
array.
In `@infra/helm/llmgateway/templates/redis-statefulset.yaml`:
- Around line 18-65: Add a Pod and container security hardening baseline to the
Redis StatefulSet: in the StatefulSet spec add podSecurityContext with
runAsUser: 999, runAsGroup: 999 and fsGroup: 999 and add an emptyDir volume
mounted at /tmp; then on the container named "redis" set securityContext to
disallow privilege escalation (allowPrivilegeEscalation: false), drop all
capabilities, set readOnlyRootFilesystem: true, and set seccompProfile to
runtime/default so the readOnlyRootFilesystem works; ensure the container still
mounts the /tmp emptyDir and that the names (podSecurityContext,
securityContext, container "redis") match existing fields in the template.
In `@infra/helm/llmgateway/templates/secret.yaml`:
- Around line 15-20: The DATABASE_URL construction uses raw passwords and can
break with special chars; update the template to url-encode the password using
the Helm/Go template urlquery function for both .Values.postgresql.password and
.Values.externalPostgresql.password when building DATABASE_URL (refer to the
DATABASE_URL line and $pgUser/$pgHost/$pgPort/$pgDB symbols), and ensure
POSTGRES_PASSWORD is set consistently for runtime consumers (move or add
POSTGRES_PASSWORD outside the if/else or set it from the externalPostgresql
branch as well so containers always receive POSTGRES_PASSWORD).
In `@infra/helm/llmgateway/templates/ui-deployment.yaml`:
- Line 9: The replicas line treats 0 as empty because default considers zero
value empty; change the template to check whether the replicas key is present
and only fall back to 1 when it's absent. Replace the single-line "replicas: {{
.Values.ui.replicas | default 1 }}" with a conditional using hasKey on
.Values.ui (e.g., if hasKey .Values.ui "replicas" then render "replicas: {{
.Values.ui.replicas }}" else render "replicas: 1" end) so an explicit
ui.replicas: 0 is preserved.
In `@infra/helm/llmgateway/values.yaml`:
- Around line 43-47: The values.yaml currently contains placeholder secrets
(auth.authSecret and auth.gatewayApiKeyHashSecret) that must not be used in
production; update Helm templates (e.g., secret.yaml or a new validation
template) to validate these keys at render time by checking
.Values.auth.authSecret and .Values.auth.gatewayApiKeyHashSecret and failing the
install if they still equal the default placeholders (or use the required
function to enforce non-empty, plus explicit equality checks to fail when the
exact default strings are present), emitting clear messages like
"auth.authSecret must be changed from the default value" and
"auth.gatewayApiKeyHashSecret must be changed from the default value".
- Around line 292-299: The default insecure password is set at
postgresql.password ("changeme"); change this to an empty value or require a
user-supplied secret and add validation to ensure postgresql.password is not the
default. Update the helm values template to treat postgresql.password as
optional (empty by default) and/or add the same validation logic used for auth
secrets to fail deployment if postgresql.password == "changeme" or left
unchanged, referencing the postgresql block (postgresql.enabled,
postgresql.user, postgresql.password) so the chart forces explicit credential
configuration.
---
Nitpick comments:
In `@infra/helm/llmgateway/Chart.yaml`:
- Line 7: Replace the non-deterministic appVersion value in Chart.yaml
(appVersion: "latest") with a concrete application version string that matches
the chart’s default image tag; update the appVersion field to the actual
semantic version (e.g., "v1.2.3") and, if necessary, align values.yaml image.tag
(or any default image tag referenced by the chart) and CI/release automation to
keep them in sync so chart metadata is deterministic and traceable.
In `@infra/helm/llmgateway/templates/gateway-deployment.yaml`:
- Around line 22-42: The init containers wait-for-postgresql and wait-for-redis
can hang forever; modify their command scripts to implement a timeout or
max-retry and exit non-zero when exceeded so the pod fails fast. Update the
command blocks for the initContainers (names: wait-for-postgresql,
wait-for-redis) to track either a retry counter or elapsed time (e.g.,
MAX_RETRIES or TIMEOUT_SECONDS), loop until nc -z succeeds or the limit is
reached, print a clear error message and exit 1 on timeout; optionally expose
the timeout/retry via a Helm value so it’s configurable.
- Around line 43-77: Add a restrictive security context to the gateway container
and a default pod securityContext for defense-in-depth: under the containers
block for the container named "gateway" add a securityContext with
readOnlyRootFilesystem: true, runAsNonRoot: true, runAsUser (e.g. 1000),
allowPrivilegeEscalation: false, capabilities: drop: ["ALL"], and
seccompProfile: { type: "RuntimeDefault" }; additionally add a
podSecurityContext on the Pod template spec (spec.template.spec) to set fsGroup
(e.g. 1000) and runAsNonRoot:true to provide a default baseline for all
containers. Ensure you add these keys adjacent to existing fields (e.g.,
alongside ports/env/readinessProbe) so Helm templating
(templates/gateway-deployment.yaml) and .Values overrides remain intact.
In `@infra/helm/llmgateway/templates/ingress.yaml`:
- Around line 27-110: Replace the repeated per-component ingress blocks with a
single range that iterates over a map/list of component keys and generates host,
enabled check and service name dynamically; e.g., range over
["ui","api","gateway","playground","code","docs","admin"] and for each key,
check .Values.<key>.enabled and .Values.ingress.hosts.<key>, use the host value
and build the backend service name with include "llmgateway.fullname" . | printf
"%s-%s" (include "llmgateway.fullname" .) key and port name http; this keeps the
logic centralized (references: .Values.ui.enabled, .Values.ingress.hosts.<key>,
include "llmgateway.fullname") and removes the duplicated blocks.
In `@infra/helm/llmgateway/templates/postgresql-statefulset.yaml`:
- Around line 21-23: The PostgreSQL container image is being built inline using
{{ .Values.postgresql.image.repository }}:{{ .Values.postgresql.image.tag }}
which diverges from other deployments that use the llmgateway image helper;
update the postgresql container spec to use the shared helper (e.g., call the
"llmgateway.image" template with the postgresql image values) or add a dedicated
helper (like "llmgateway.postgresqlImage") and use that in the StatefulSet so
image construction is consistent with other templates (change the image
attribute referenced in the postgresql container to use the helper instead of
direct concatenation).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8bc06efd-b953-4858-8ade-ce75457edce6
📒 Files selected for processing (27)
apps/docs/content/self-host.mdxinfra/helm/README.mdinfra/helm/llmgateway/Chart.yamlinfra/helm/llmgateway/templates/_helpers.tplinfra/helm/llmgateway/templates/admin-deployment.yamlinfra/helm/llmgateway/templates/admin-service.yamlinfra/helm/llmgateway/templates/api-deployment.yamlinfra/helm/llmgateway/templates/api-service.yamlinfra/helm/llmgateway/templates/code-deployment.yamlinfra/helm/llmgateway/templates/code-service.yamlinfra/helm/llmgateway/templates/configmap.yamlinfra/helm/llmgateway/templates/docs-deployment.yamlinfra/helm/llmgateway/templates/docs-service.yamlinfra/helm/llmgateway/templates/gateway-deployment.yamlinfra/helm/llmgateway/templates/gateway-service.yamlinfra/helm/llmgateway/templates/ingress.yamlinfra/helm/llmgateway/templates/playground-deployment.yamlinfra/helm/llmgateway/templates/playground-service.yamlinfra/helm/llmgateway/templates/postgresql-service.yamlinfra/helm/llmgateway/templates/postgresql-statefulset.yamlinfra/helm/llmgateway/templates/redis-service.yamlinfra/helm/llmgateway/templates/redis-statefulset.yamlinfra/helm/llmgateway/templates/secret.yamlinfra/helm/llmgateway/templates/ui-deployment.yamlinfra/helm/llmgateway/templates/ui-service.yamlinfra/helm/llmgateway/templates/worker-deployment.yamlinfra/helm/llmgateway/values.yaml
| ## Helm Chart | ||
|
|
||
| You can also deploy LLMGateway using the Helm chart. | ||
|
|
||
| ```bash | ||
| helm repo add llmgateway https://llmgateway.io/helm | ||
| helm repo update | ||
| helm install llmgateway llmgateway/llmgateway | ||
| ``` |
There was a problem hiding this comment.
Add Helm-specific prerequisites to avoid failed onboarding.
Line [187] adds a Helm deployment path, but prerequisites are still Docker-only. Please add Kubernetes cluster + Helm requirements in the prerequisites section so this option is executable end-to-end.
📘 Proposed doc patch
## Prerequisites
- Latest Docker
+- A Kubernetes cluster (local or managed)
+- Helm v3
- API keys for the LLM providers you want to use (OpenAI, Anthropic, etc.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/docs/content/self-host.mdx` around lines 187 - 195, Update the
prerequisites section to include Kubernetes/Helm requirements so the new "Helm
Chart" instructions are executable: state that a Kubernetes cluster (minimum
version, e.g., v1.20+ or your supported minimum), kubectl configured to the
target cluster, Helm 3 installed, a default StorageClass or PVC capability,
access to a container registry (or imagePullSecrets) for the llmgateway image,
and any required RBAC/load‑balancer considerations; reference the "Helm Chart"
heading in the docs to ensure users see these prerequisites before the helm
repo/add/install commands.
| initContainers: | ||
| - name: wait-for-postgresql | ||
| image: busybox:1.37 | ||
| command: | ||
| - sh | ||
| - -c | ||
| - | | ||
| until nc -z {{ include "llmgateway.postgresql.host" . }} {{ include "llmgateway.postgresql.port" . }}; do | ||
| echo "Waiting for PostgreSQL..." | ||
| sleep 2 | ||
| done | ||
| - name: wait-for-redis | ||
| image: busybox:1.37 | ||
| command: | ||
| - sh | ||
| - -c | ||
| - | | ||
| until nc -z {{ include "llmgateway.redis.host" . }} {{ include "llmgateway.redis.port" . }}; do | ||
| echo "Waiting for Redis..." | ||
| sleep 2 | ||
| done | ||
| containers: | ||
| - name: api | ||
| image: {{ include "llmgateway.image" (dict "context" . "image" .Values.api.image) }} | ||
| imagePullPolicy: {{ .Values.api.image.pullPolicy | default "IfNotPresent" }} | ||
| ports: | ||
| - name: http | ||
| containerPort: 80 | ||
| protocol: TCP | ||
| envFrom: | ||
| - configMapRef: | ||
| name: {{ include "llmgateway.configMapName" . }} | ||
| - secretRef: | ||
| name: {{ include "llmgateway.secretName" . }} | ||
| {{- with .Values.api.extraEnv }} | ||
| env: | ||
| {{- toYaml . | nindent 12 }} | ||
| {{- end }} | ||
| readinessProbe: | ||
| httpGet: | ||
| path: / | ||
| port: http | ||
| initialDelaySeconds: 10 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| failureThreshold: 3 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: / | ||
| port: http | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 30 | ||
| timeoutSeconds: 10 | ||
| failureThreshold: 3 | ||
| resources: | ||
| {{- toYaml .Values.api.resources | nindent 12 }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "api-deployment.yaml" -type fRepository: theopenco/llmgateway
Length of output: 117
🏁 Script executed:
find . -name "values.yaml" -path "*/llmgateway*" -type fRepository: theopenco/llmgateway
Length of output: 99
🏁 Script executed:
cat -n ./infra/helm/llmgateway/templates/api-deployment.yamlRepository: theopenco/llmgateway
Length of output: 3634
🏁 Script executed:
cat -n ./infra/helm/llmgateway/values.yamlRepository: theopenco/llmgateway
Length of output: 14853
🏁 Script executed:
find ./infra/helm/llmgateway/templates -name "*deployment.yaml" -type fRepository: theopenco/llmgateway
Length of output: 512
🏁 Script executed:
rg "securityContext" ./infra/helm/llmgateway/templates/ || echo "No securityContext found in templates"Repository: theopenco/llmgateway
Length of output: 101
Add pod and container security contexts to harden the deployment.
The api-deployment.yaml (lines 20-77) runs containers with unrestricted security defaults. Both init and main containers lack security context settings, allowing unnecessary privilege surface.
Add the following hardening to reduce attack surface:
🔒 Security hardening changes
spec:
+ securityContext:
+ runAsNonRoot: true
+ seccompProfile:
+ type: RuntimeDefault
{{- include "llmgateway.imagePullSecrets" . | nindent 6 }}
initContainers:
- name: wait-for-postgresql
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]
image: busybox:1.37
@@
- name: wait-for-redis
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]
image: busybox:1.37
@@
- name: api
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]
image: {{ include "llmgateway.image" (dict "context" . "image" .Values.api.image) }}🧰 Tools
🪛 Trivy (0.69.3)
[error] 32-37: Root file system is not read-only
Container 'wait-for-postgresql' of Deployment 'llmgateway-api' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 42-47: Root file system is not read-only
Container 'wait-for-redis' of Deployment 'llmgateway-api' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 32-37: Default security context configured
container llmgateway-api in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 42-47: Default security context configured
container llmgateway-api in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/templates/api-deployment.yaml` around lines 22 - 77,
Add Pod- and container-level security contexts: set a podSecurityContext (e.g.,
runAsNonRoot: true, runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000) and add
securityContext blocks for the initContainers "wait-for-postgresql" and
"wait-for-redis" and the main container "api" with hardened settings
(readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, capabilities:
drop all, seccompProfile: runtime/default, runAsNonRoot: true). Apply these
fields to the Pod spec and to each container's securityContext so both init and
main containers inherit the hardened runtime constraints.
| until nc -z {{ include "llmgateway.postgresql.host" . }} {{ include "llmgateway.postgresql.port" . }}; do | ||
| echo "Waiting for PostgreSQL..." | ||
| sleep 2 | ||
| done | ||
| - name: wait-for-redis | ||
| image: busybox:1.37 | ||
| command: | ||
| - sh | ||
| - -c | ||
| - | | ||
| until nc -z {{ include "llmgateway.redis.host" . }} {{ include "llmgateway.redis.port" . }}; do | ||
| echo "Waiting for Redis..." | ||
| sleep 2 | ||
| done |
There was a problem hiding this comment.
Avoid infinite dependency wait loops in initContainers.
Lines 29-42 can block forever if PostgreSQL/Redis are unreachable, leaving pods stuck in Init without a terminal signal. Add bounded retries (or timeout + non-zero exit) so failures surface and recover via restart policy.
🧰 Tools
🪛 Trivy (0.69.3)
[error] 32-37: Root file system is not read-only
Container 'wait-for-postgresql' of Deployment 'llmgateway-api' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 32-37: Default security context configured
container llmgateway-api in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 29-87: Default security context configured
deployment llmgateway-api in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/templates/api-deployment.yaml` around lines 29 - 42,
The initContainer loops that wait with nc for PostgreSQL and Redis can hang
indefinitely; modify the shell in both wait loops (the nc loops that reference
include "llmgateway.postgresql.host"/port and the initContainer named
wait-for-redis that references include "llmgateway.redis.host"/port) to
implement bounded retries or a timeout: add a counter (e.g., max_retries=30),
increment on each try, sleep between attempts, and if the counter exceeds
max_retries exit with non-zero (exit 1) so Kubernetes can surface the failure
and trigger restart policy; alternatively use a timeout wrapper (timeout
command) around nc and exit non-zero on timeout—apply the same change to both
the PostgreSQL and Redis wait blocks.
| spec: | ||
| {{- include "llmgateway.imagePullSecrets" . | nindent 6 }} | ||
| containers: | ||
| - name: code | ||
| image: {{ include "llmgateway.image" (dict "context" . "image" .Values.code.image) }} | ||
| imagePullPolicy: {{ .Values.code.image.pullPolicy | default "IfNotPresent" }} | ||
| ports: | ||
| - name: http | ||
| containerPort: 80 | ||
| protocol: TCP | ||
| envFrom: | ||
| - configMapRef: | ||
| name: {{ include "llmgateway.configMapName" . }} | ||
| - secretRef: | ||
| name: {{ include "llmgateway.secretName" . }} | ||
| {{- with .Values.code.extraEnv }} | ||
| env: | ||
| {{- toYaml . | nindent 12 }} | ||
| {{- end }} | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /api/health | ||
| port: http | ||
| initialDelaySeconds: 10 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| livenessProbe: | ||
| httpGet: | ||
| path: /api/health | ||
| port: http | ||
| initialDelaySeconds: 20 | ||
| periodSeconds: 30 | ||
| timeoutSeconds: 5 | ||
| resources: | ||
| {{- toYaml .Values.code.resources | nindent 12 }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the deployment file
find . -name "code-deployment.yaml" -type fRepository: theopenco/llmgateway
Length of output: 118
🏁 Script executed:
#!/bin/bash
# Read the full deployment file to see the complete context
if [ -f "infra/helm/llmgateway/templates/code-deployment.yaml" ]; then
cat -n "infra/helm/llmgateway/templates/code-deployment.yaml"
fiRepository: theopenco/llmgateway
Length of output: 2745
🏁 Script executed:
#!/bin/bash
# Check the values file to see if security contexts are defined there
find infra/helm/llmgateway -name "values*.yaml" -type f -exec cat -n {} +Repository: theopenco/llmgateway
Length of output: 14853
🏁 Script executed:
#!/bin/bash
# Check other deployment templates for security context examples
find infra/helm/llmgateway/templates -name "*deployment.yaml" -type f | while read file; do
echo "=== File: $file ==="
grep -n "securityContext\|runAsNonRoot\|seccomp\|capabilities" "$file"
if [ $? -ne 0 ]; then
echo "No security context found"
fi
doneRepository: theopenco/llmgateway
Length of output: 816
🏁 Script executed:
#!/bin/bash
# Check if there are any helper templates that define security contexts
find infra/helm/llmgateway/templates -type f -name "*" | head -20Repository: theopenco/llmgateway
Length of output: 1102
🏁 Script executed:
#!/bin/bash
# Check the helpers template
cat -n infra/helm/llmgateway/templates/_helpers.tplRepository: theopenco/llmgateway
Length of output: 5950
Add securityContext to the code container.
The code container lacks hardened security settings. Add pod and container security contexts:
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
allowPrivilegeEscalation: false
capabilities:
drop:
- ALLAlso apply these settings to other deployments (api, gateway, ui, playground, docs, admin, worker) for consistent baseline security hardening.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/templates/code-deployment.yaml` around lines 20 - 54,
Add a hardened securityContext to the code container and pod: under the
Deployment spec add a pod-level securityContext and inside the container block
for the container named "code" (the block that contains image: {{ include
"llmgateway.image" (dict "context" . "image" .Values.code.image) }} and
readinessProbe/livenessProbe) add a container securityContext with runAsNonRoot:
true, runAsUser: 1000, seccompProfile type RuntimeDefault,
allowPrivilegeEscalation: false, and capabilities.drop: [ALL]; mirror the same
pod and container securityContext changes for the other deployments that use
containers named "api", "gateway", "ui", "playground", "docs", "admin", and
"worker" so .Values.*.image blocks receive the same baseline hardening.
| {{- if .keepAliveTimeoutS }} | ||
| KEEP_ALIVE_TIMEOUT_S: {{ .keepAliveTimeoutS | quote }} | ||
| {{- end }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the configmap.yaml file
fd -type f -name "configmap.yaml" | grep -i llmgatewayRepository: theopenco/llmgateway
Length of output: 234
🏁 Script executed:
# Read the entire configmap.yaml to understand structure
cat -n infra/helm/llmgateway/templates/configmap.yamlRepository: theopenco/llmgateway
Length of output: 6290
KEEP_ALIVE_TIMEOUT_S is defined twice in the same ConfigMap data map.
Lines 70 and 90 both render this key under different contexts (gateway.config and api.config). When both conditions are true, the second occurrence (line 90) overwrites the first, breaking per-component timeout tuning.
Rename to component-specific keys (GATEWAY_KEEP_ALIVE_TIMEOUT_S and API_KEEP_ALIVE_TIMEOUT_S) and wire each component deployment to its corresponding key.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/templates/configmap.yaml` around lines 69 - 71, The
ConfigMap currently defines KEEP_ALIVE_TIMEOUT_S twice (under gateway.config and
api.config) causing the latter to overwrite the former; change the keys to
component-specific names (e.g. replace KEEP_ALIVE_TIMEOUT_S in the gateway block
with GATEWAY_KEEP_ALIVE_TIMEOUT_S and the one in the api block with
API_KEEP_ALIVE_TIMEOUT_S) and update the corresponding deployment/manifest
templates that read these values so gateway.config consumers read
GATEWAY_KEEP_ALIVE_TIMEOUT_S and api.config consumers read
API_KEEP_ALIVE_TIMEOUT_S (ensure any references in templates, envFrom/env
entries or valueFrom lookups are updated to the new key names).
| spec: | ||
| {{- include "llmgateway.imagePullSecrets" . | nindent 6 }} | ||
| containers: | ||
| - name: redis | ||
| image: {{ .Values.redis.image.repository }}:{{ .Values.redis.image.tag }} | ||
| imagePullPolicy: {{ .Values.redis.image.pullPolicy | default "IfNotPresent" }} | ||
| ports: | ||
| - name: redis | ||
| containerPort: 6379 | ||
| protocol: TCP | ||
| {{- if .Values.redis.password }} | ||
| command: | ||
| - redis-server | ||
| - --requirepass | ||
| - $(REDIS_PASSWORD) | ||
| env: | ||
| - name: REDIS_PASSWORD | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: {{ include "llmgateway.secretName" . }} | ||
| key: REDIS_PASSWORD | ||
| {{- end }} | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - redis-cli | ||
| {{- if .Values.redis.password }} | ||
| - -a | ||
| - $(REDIS_PASSWORD) | ||
| {{- end }} | ||
| - ping | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| livenessProbe: | ||
| exec: | ||
| command: | ||
| - redis-cli | ||
| {{- if .Values.redis.password }} | ||
| - -a | ||
| - $(REDIS_PASSWORD) | ||
| {{- end }} | ||
| - ping | ||
| initialDelaySeconds: 15 | ||
| periodSeconds: 30 | ||
| timeoutSeconds: 5 | ||
| resources: | ||
| {{- toYaml .Values.redis.resources | nindent 12 }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -type f -name "redis-statefulset.yaml" 2>/dev/null | head -20Repository: theopenco/llmgateway
Length of output: 46
🏁 Script executed:
find . -type f -name "*redis*statefulset*" -o -name "*redis-statefulset*" 2>/dev/nullRepository: theopenco/llmgateway
Length of output: 120
🏁 Script executed:
fd -i redis infra/helm 2>/dev/nullRepository: theopenco/llmgateway
Length of output: 169
🏁 Script executed:
ls -la infra/helm/llmgateway/templates/ 2>/dev/null | head -20Repository: theopenco/llmgateway
Length of output: 1354
🏁 Script executed:
cat infra/helm/llmgateway/templates/redis-statefulset.yamlRepository: theopenco/llmgateway
Length of output: 2695
🏁 Script executed:
grep -r "securityContext" infra/helm/llmgateway/templates/ --include="*.yaml"Repository: theopenco/llmgateway
Length of output: 46
🏁 Script executed:
cat infra/helm/llmgateway/templates/api-deployment.yamlRepository: theopenco/llmgateway
Length of output: 3032
🏁 Script executed:
cat infra/helm/llmgateway/templates/gateway-deployment.yamlRepository: theopenco/llmgateway
Length of output: 3125
🏁 Script executed:
cat infra/helm/llmgateway/values.yaml | head -100Repository: theopenco/llmgateway
Length of output: 2925
🏁 Script executed:
rg -A 10 "redis:" infra/helm/llmgateway/values.yamlRepository: theopenco/llmgateway
Length of output: 281
🏁 Script executed:
cat infra/helm/llmgateway/templates/postgresql-statefulset.yamlRepository: theopenco/llmgateway
Length of output: 3032
🌐 Web query:
Redis readOnlyRootFilesystem Kubernetes security context
💡 Result:
To run Redis on Kubernetes with readOnlyRootFilesystem: true, you must (a) keep the image root FS read-only and (b) provide explicit writable mounts for every path Redis (or the OS) needs to write to—typically /data and often /tmp.
Why it breaks without extra mounts
- The official Redis image expects persistence files under
/data(RDB/AOF) and commonly runs with/dataas the writable location you mount a volume to. [1] - With a read-only root filesystem, writes to
/tmpcan also fail (many images/tools assume/tmpis writable), leading to CrashLoopBackOff unless you mount a writable volume there. [2]
Example (StatefulSet/Deployment container spec)
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999
runAsGroup: 999
fsGroup: 999 # helps make mounted volumes writable for the process
containers:
- name: redis
image: redis:7
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: redis-data
mountPath: /data
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
# For StatefulSet, use volumeClaimTemplates for redis-data; for Deployment, use a PVC volume.Notes
fsGroupis commonly used so Kubernetes mounts can be writable by a non-root process (Kubernetes adjusts volume ownership/permissions to match thefsGroupwhen mounting). [3]- If you also set mounted volumes
readOnly: true, be aware of historical Linux/Kubernetes edge-cases around “truly” recursive read-only mounts (mostly relevant when there are submounts). [4]
Sources: Redis official image docs [1], K8s /tmp workaround example [2], fsGroup behavior explanation [3], Kubernetes recursive read-only mount details [4].
Add pod/container security contexts for Redis.
Lines 18-65 currently rely on default security settings. For a database/cache workload, this should be hardened by default (non-root, no privilege escalation, dropped capabilities, runtime default seccomp).
The suggested hardening requires additional configuration for readOnlyRootFilesystem: true to work properly:
🔒 Suggested hardening baseline
spec:
+ securityContext:
+ runAsNonRoot: true
+ runAsUser: 999
+ runAsGroup: 999
+ fsGroup: 999
+ seccompProfile:
+ type: RuntimeDefault
{{- include "llmgateway.imagePullSecrets" . | nindent 6 }}
containers:
- name: redis
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]
image: {{ .Values.redis.image.repository }}:{{ .Values.redis.image.tag }}
imagePullPolicy: {{ .Values.redis.image.pullPolicy | default "IfNotPresent" }}
ports:
- name: redis
containerPort: 6379
protocol: TCP
{{- if .Values.redis.password }}
command:
- redis-server
- --requirepass
- $(REDIS_PASSWORD)
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "llmgateway.secretName" . }}
key: REDIS_PASSWORD
{{- end }}
readinessProbe:
exec:
command:
- redis-cli
{{- if .Values.redis.password }}
- -a
- $(REDIS_PASSWORD)
{{- end }}
- ping
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- redis-cli
{{- if .Values.redis.password }}
- -a
- $(REDIS_PASSWORD)
{{- end }}
- ping
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
resources:
{{- toYaml .Values.redis.resources | nindent 12 }}
volumeMounts:
- name: data
mountPath: /data
+ - name: tmp
+ mountPath: /tmp
volumes:
+ - name: tmp
+ emptyDir: {}Note: fsGroup: 999 and /tmp emptyDir are required for the read-only filesystem to work; the runAsUser and runAsGroup should match the Redis image's unprivileged user UID/GID (typically 999 for redis:8-alpine).
🧰 Tools
🪛 Trivy (0.69.3)
[error] 30-62: Root file system is not read-only
Container 'redis' of StatefulSet 'llmgateway-redis' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 30-62: Default security context configured
container llmgateway-redis in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 27-62: Default security context configured
statefulset llmgateway-redis in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/templates/redis-statefulset.yaml` around lines 18 - 65,
Add a Pod and container security hardening baseline to the Redis StatefulSet: in
the StatefulSet spec add podSecurityContext with runAsUser: 999, runAsGroup: 999
and fsGroup: 999 and add an emptyDir volume mounted at /tmp; then on the
container named "redis" set securityContext to disallow privilege escalation
(allowPrivilegeEscalation: false), drop all capabilities, set
readOnlyRootFilesystem: true, and set seccompProfile to runtime/default so the
readOnlyRootFilesystem works; ensure the container still mounts the /tmp
emptyDir and that the names (podSecurityContext, securityContext, container
"redis") match existing fields in the template.
| {{- if .Values.postgresql.enabled }} | ||
| DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.postgresql.password $pgHost $pgPort $pgDB | quote }} | ||
| POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }} | ||
| {{- else }} | ||
| DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.externalPostgresql.password $pgHost $pgPort $pgDB | quote }} | ||
| {{- end }} |
There was a problem hiding this comment.
Password in DATABASE_URL should be URL-encoded.
If the PostgreSQL password contains special characters (e.g., @, :, /, %), the connection string will be malformed. Use urlquery to encode the password.
Also note: POSTGRES_PASSWORD is only set when using the built-in PostgreSQL (line 17), not for external PostgreSQL. If the worker/api containers expect this env var regardless, it may cause issues.
🔧 Proposed fix for URL encoding
{{- if .Values.postgresql.enabled }}
- DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.postgresql.password $pgHost $pgPort $pgDB | quote }}
+ DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.postgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }}
POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }}
{{- else }}
- DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.externalPostgresql.password $pgHost $pgPort $pgDB | quote }}
+ DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.externalPostgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }}
{{- end }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- if .Values.postgresql.enabled }} | |
| DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.postgresql.password $pgHost $pgPort $pgDB | quote }} | |
| POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }} | |
| {{- else }} | |
| DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser .Values.externalPostgresql.password $pgHost $pgPort $pgDB | quote }} | |
| {{- end }} | |
| {{- if .Values.postgresql.enabled }} | |
| DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.postgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }} | |
| POSTGRES_PASSWORD: {{ .Values.postgresql.password | quote }} | |
| {{- else }} | |
| DATABASE_URL: {{ printf "postgres://%s:%s@%s:%s/%s" $pgUser (.Values.externalPostgresql.password | urlquery) $pgHost $pgPort $pgDB | quote }} | |
| {{- end }} |
🧰 Tools
🪛 Checkov (3.2.513)
[medium] 16-17: Basic Auth Credentials
(CKV_SECRET_4)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/templates/secret.yaml` around lines 15 - 20, The
DATABASE_URL construction uses raw passwords and can break with special chars;
update the template to url-encode the password using the Helm/Go template
urlquery function for both .Values.postgresql.password and
.Values.externalPostgresql.password when building DATABASE_URL (refer to the
DATABASE_URL line and $pgUser/$pgHost/$pgPort/$pgDB symbols), and ensure
POSTGRES_PASSWORD is set consistently for runtime consumers (move or add
POSTGRES_PASSWORD outside the if/else or set it from the externalPostgresql
branch as well so containers always receive POSTGRES_PASSWORD).
| labels: | ||
| {{- include "llmgateway.componentLabels" (dict "context" . "component" "ui") | nindent 4 }} | ||
| spec: | ||
| replicas: {{ .Values.ui.replicas | default 1 }} |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Helm Sprig default filter zero numeric value behavior Go template
💡 Result:
In Helm templates, default comes from Sprig and returns the fallback when the given value is considered “empty”. For numerics, 0 is defined as empty, so default will replace an explicitly-set 0 with the default value. Example: {{ .Values.replicas | default 1 }} renders 1 when .Values.replicas is 0. [1]
If you need to distinguish “unset” vs “set to 0”, don’t use default on the number directly—check key existence instead:
{{- if hasKey .Values "replicas" -}}
{{ .Values.replicas }}
{{- else -}}
1
{{- end -}}
hasKey returns true when the dict contains the key. [2]
Sources: Sprig “Default Functions” docs [1], Helm template function list (hasKey) [2].
🏁 Script executed:
# Verify the file exists and check the exact content at line 9
fd -t f "ui-deployment.yaml" infra/Repository: theopenco/llmgateway
Length of output: 114
🏁 Script executed:
cat -n infra/helm/llmgateway/templates/ui-deployment.yaml | head -20Repository: theopenco/llmgateway
Length of output: 997
Preserve explicit replicas: 0 instead of forcing 1.
On Line 9, default treats 0 as empty, so ui.replicas: 0 still renders as 1. This prevents scale-to-zero deployments. Use hasKey to distinguish between unset and explicitly-set zero values.
Proposed fix
- replicas: {{ .Values.ui.replicas | default 1 }}
+ {{- if hasKey .Values.ui "replicas" }}
+ replicas: {{ .Values.ui.replicas }}
+ {{- else }}
+ replicas: 1
+ {{- end }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| replicas: {{ .Values.ui.replicas | default 1 }} | |
| {{- if hasKey .Values.ui "replicas" }} | |
| replicas: {{ .Values.ui.replicas }} | |
| {{- else }} | |
| replicas: 1 | |
| {{- end }} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/templates/ui-deployment.yaml` at line 9, The replicas
line treats 0 as empty because default considers zero value empty; change the
template to check whether the replicas key is present and only fall back to 1
when it's absent. Replace the single-line "replicas: {{ .Values.ui.replicas |
default 1 }}" with a conditional using hasKey on .Values.ui (e.g., if hasKey
.Values.ui "replicas" then render "replicas: {{ .Values.ui.replicas }}" else
render "replicas: 1" end) so an explicit ui.replicas: 0 is preserved.
| auth: | ||
| # -- (required) Secret used for signing auth tokens (32+ characters) | ||
| authSecret: "change-me-to-a-random-32-char-secret" | ||
| # -- (required) HMAC secret for hashing API keys | ||
| gatewayApiKeyHashSecret: "change-me-to-a-random-secret" |
There was a problem hiding this comment.
Default secrets must be changed before production deployment.
The default values for authSecret and gatewayApiKeyHashSecret are placeholder strings. While comments indicate they need to be changed, consider:
- Adding validation in the templates to fail if defaults are unchanged
- Using
requiredfunction in templates to enforce values
🛡️ Example template validation
Add to secret.yaml or a dedicated validation template:
{{- if eq .Values.auth.authSecret "change-me-to-a-random-32-char-secret" }}
{{- fail "auth.authSecret must be changed from the default value" }}
{{- end }}
{{- if eq .Values.auth.gatewayApiKeyHashSecret "change-me-to-a-random-secret" }}
{{- fail "auth.gatewayApiKeyHashSecret must be changed from the default value" }}
{{- end }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/values.yaml` around lines 43 - 47, The values.yaml
currently contains placeholder secrets (auth.authSecret and
auth.gatewayApiKeyHashSecret) that must not be used in production; update Helm
templates (e.g., secret.yaml or a new validation template) to validate these
keys at render time by checking .Values.auth.authSecret and
.Values.auth.gatewayApiKeyHashSecret and failing the install if they still equal
the default placeholders (or use the required function to enforce non-empty,
plus explicit equality checks to fail when the exact default strings are
present), emitting clear messages like "auth.authSecret must be changed from the
default value" and "auth.gatewayApiKeyHashSecret must be changed from the
default value".
| postgresql: | ||
| enabled: true | ||
| image: | ||
| repository: postgres | ||
| tag: "17-alpine" | ||
| pullPolicy: IfNotPresent | ||
| user: postgres | ||
| password: "changeme" |
There was a problem hiding this comment.
Default PostgreSQL password "changeme" is insecure.
The default password should either be empty (forcing users to set it) or the template should validate that it has been changed from the default. Same validation pattern as suggested for auth secrets could apply here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/helm/llmgateway/values.yaml` around lines 292 - 299, The default
insecure password is set at postgresql.password ("changeme"); change this to an
empty value or require a user-supplied secret and add validation to ensure
postgresql.password is not the default. Update the helm values template to treat
postgresql.password as optional (empty by default) and/or add the same
validation logic used for auth secrets to fail deployment if postgresql.password
== "changeme" or left unchanged, referencing the postgresql block
(postgresql.enabled, postgresql.user, postgresql.password) so the chart forces
explicit credential configuration.
| You can also deploy LLMGateway using the Helm chart. | ||
|
|
||
| ```bash | ||
| helm repo add llmgateway https://llmgateway.io/helm |
There was a problem hiding this comment.
Yes we should deploy it either in github pages or you can do it with the domain of llmgateway
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/images.yml:
- Around line 409-412: The publish-helm job is not waiting for image tests and
may publish a chart before images pass; update the publish-helm job to depend on
the test-split job by adding "test-split" to its needs list and extend the if
condition to require needs.test-split.result == 'success' (i.e., change needs:
to include test-split and change if: to github.event_name == 'release' &&
needs.merge-split.result == 'success' && needs.test-split.result == 'success')
so chart publishing is gated on both merge-split and image-test success.
- Around line 431-438: Add a strict SemVer validation step that verifies
IMAGE_TAG (and derived CHART_VERSION) conforms to Helm-compatible SemVer before
mutating Chart.yaml: check IMAGE_TAG/CHART_VERSION against a regex that
disallows underscores and matches full SemVer (e.g. allow optional
prerelease/build with segments of [0-9A-Za-z-] separated by dots), fail the job
and do not run the sed edits if it doesn't match; also replace the existing
release-tag validation pattern `(-[a-zA-Z0-9_-]+)*` with the stricter pattern
`(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?` so IMAGE_TAGs like `v1.2.3-rc_1` are
rejected before images are pushed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 50bd52b5-09f9-4a3b-b47d-f8eba3e96347
📒 Files selected for processing (3)
.github/workflows/images.ymlapps/docs/content/self-host.mdxinfra/helm/README.md
✅ Files skipped from review due to trivial changes (2)
- infra/helm/README.md
- apps/docs/content/self-host.mdx
| needs: | ||
| - setup | ||
| - merge-split | ||
| if: github.event_name == 'release' && needs.merge-split.result == 'success' |
There was a problem hiding this comment.
Wait for image tests before publishing the Helm chart.
publish-helm runs after merge-split, in parallel with test-split, so a release can publish a chart pointing at images that later fail the split-image tests. Gate chart publishing on the image test job too.
🚦 Proposed workflow gate
needs:
- setup
- merge-split
- if: github.event_name == 'release' && needs.merge-split.result == 'success'
+ - test-split
+ if: github.event_name == 'release' && needs.merge-split.result == 'success' && needs.test-split.result == 'success'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| needs: | |
| - setup | |
| - merge-split | |
| if: github.event_name == 'release' && needs.merge-split.result == 'success' | |
| needs: | |
| - setup | |
| - merge-split | |
| - test-split | |
| if: github.event_name == 'release' && needs.merge-split.result == 'success' && needs.test-split.result == 'success' |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/images.yml around lines 409 - 412, The publish-helm job is
not waiting for image tests and may publish a chart before images pass; update
the publish-helm job to depend on the test-split job by adding "test-split" to
its needs list and extend the if condition to require needs.test-split.result ==
'success' (i.e., change needs: to include test-split and change if: to
github.event_name == 'release' && needs.merge-split.result == 'success' &&
needs.test-split.result == 'success') so chart publishing is gated on both
merge-split and image-test success.
| - name: Set chart version | ||
| working-directory: infra/helm/llmgateway | ||
| run: | | ||
| IMAGE_TAG="${{ needs.setup.outputs.image_tag }}" | ||
| CHART_VERSION="${IMAGE_TAG#v}" | ||
| sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml | ||
| sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml | ||
| cat Chart.yaml |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verifies that the existing release-tag pattern accepts values rejected by Helm/SemVer-style chart versions.
python - <<'PY'
import re
workflow_tag_re = re.compile(r'^v[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9_-]+)*$')
helm_chart_version_re = re.compile(
r'^(0|[1-9][0-9]*)\.'
r'(0|[1-9][0-9]*)\.'
r'(0|[1-9][0-9]*)'
r'(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$'
)
for tag in ["v1.2.3", "v1.2.3-rc1", "v1.2.3-rc_1", "v1.2.3-rc.1"]:
chart_version = tag[1:]
print(
tag,
"workflow_accepts=", bool(workflow_tag_re.fullmatch(tag)),
"helm_semver_accepts=", bool(helm_chart_version_re.fullmatch(chart_version)),
)
PYRepository: theopenco/llmgateway
Length of output: 303
🏁 Script executed:
# First, check the workflow file exists and read the relevant sections
if [ -f .github/workflows/images.yml ]; then
echo "=== Lines 40-50 (checking release-tag validation at line 44) ==="
sed -n '40,50p' .github/workflows/images.yml
echo ""
echo "=== Lines 425-445 (checking chart version code at lines 431-438) ==="
sed -n '425,445p' .github/workflows/images.yml
else
echo "File not found at .github/workflows/images.yml"
find . -name "images.yml" -o -name "*images*.yml" 2>/dev/null | head -20
fiRepository: theopenco/llmgateway
Length of output: 1502
🏁 Script executed:
# Create a minimal test to verify Helm behavior with underscores in version
# This tests the actual Helm package behavior
python3 <<'PYEOF'
import re
# Test the actual Helm SemVer spec
# Per https://semver.org and Helm documentation
helm_semver = re.compile(
r'^(0|[1-9][0-9]*)\.'
r'(0|[1-9][0-9]*)\.'
r'(0|[1-9][0-9]*)'
r'(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$'
)
workflow_pattern = re.compile(r'^v[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9_-]+)*$')
test_cases = [
("v1.2.3", "basic version"),
("v1.2.3-rc1", "prerelease no separator"),
("v1.2.3-rc_1", "prerelease with underscore"),
("v1.2.3-rc.1", "prerelease with dot"),
("v1.2.3-rc-1", "prerelease with dash"),
("v1.2.3-alpha_beta", "multiple underscore"),
]
print("Testing workflow tag validation vs Helm SemVer compatibility:\n")
print(f"{'Tag':<20} {'Workflow':<12} {'Helm SemVer':<15} {'Status':<40}")
print("-" * 87)
for tag, desc in test_cases:
chart_version = tag[1:] if tag.startswith('v') else tag
workflow_ok = bool(workflow_pattern.fullmatch(tag))
helm_ok = bool(helm_semver.fullmatch(chart_version))
if workflow_ok and not helm_ok:
status = "❌ MISMATCH: Passes workflow, fails Helm"
elif not workflow_ok and helm_ok:
status = "⚠️ Inverse: Fails workflow, passes Helm"
elif workflow_ok and helm_ok:
status = "✓ Both pass"
else:
status = "✗ Both fail"
print(f"{tag:<20} {str(workflow_ok):<12} {str(helm_ok):<15} {status:<40}")
PYEOFRepository: theopenco/llmgateway
Length of output: 851
Validate chart versions with Helm-compatible SemVer.
CHART_VERSION inherits the Docker tag validator at line 44, which allows underscores in prerelease suffixes (e.g., v1.2.3-rc_1). Such tags pass the workflow validation, images get pushed to the registry, but then fail at helm package due to Helm requiring strict SemVer. Add SemVer validation before mutating Chart.yaml and update the release-tag pattern at line 44 to enforce strict SemVer compliance.
🛡️ Proposed validation before mutating Chart.yaml
- name: Set chart version
working-directory: infra/helm/llmgateway
run: |
IMAGE_TAG="${{ needs.setup.outputs.image_tag }}"
CHART_VERSION="${IMAGE_TAG#v}"
+ if [[ ! "${CHART_VERSION}" =~ ^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$ ]]; then
+ echo "Invalid Helm chart version: ${CHART_VERSION}"
+ exit 1
+ fi
sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml
sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml
cat Chart.yamlAlso update line 44 validation pattern from (-[a-zA-Z0-9_-]+)* to (-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)? to reject underscores and enforce SemVer before pushing images.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Set chart version | |
| working-directory: infra/helm/llmgateway | |
| run: | | |
| IMAGE_TAG="${{ needs.setup.outputs.image_tag }}" | |
| CHART_VERSION="${IMAGE_TAG#v}" | |
| sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml | |
| sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml | |
| cat Chart.yaml | |
| - name: Set chart version | |
| working-directory: infra/helm/llmgateway | |
| run: | | |
| IMAGE_TAG="${{ needs.setup.outputs.image_tag }}" | |
| CHART_VERSION="${IMAGE_TAG#v}" | |
| if [[ ! "${CHART_VERSION}" =~ ^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$ ]]; then | |
| echo "Invalid Helm chart version: ${CHART_VERSION}" | |
| exit 1 | |
| fi | |
| sed -i "s/^version:.*/version: ${CHART_VERSION}/" Chart.yaml | |
| sed -i "s/^appVersion:.*/appVersion: \"${IMAGE_TAG}\"/" Chart.yaml | |
| cat Chart.yaml |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/images.yml around lines 431 - 438, Add a strict SemVer
validation step that verifies IMAGE_TAG (and derived CHART_VERSION) conforms to
Helm-compatible SemVer before mutating Chart.yaml: check IMAGE_TAG/CHART_VERSION
against a regex that disallows underscores and matches full SemVer (e.g. allow
optional prerelease/build with segments of [0-9A-Za-z-] separated by dots), fail
the job and do not run the sed edits if it doesn't match; also replace the
existing release-tag validation pattern `(-[a-zA-Z0-9_-]+)*` with the stricter
pattern `(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?` so IMAGE_TAGs like `v1.2.3-rc_1`
are rejected before images are pushed.
This pull request introduces a Helm chart for deploying LLM Gateway on Kubernetes, along with comprehensive documentation and templating for multiple components and configurations.
The changes add all necessary Helm chart files, Kubernetes manifests for various services, and documentation updates to guide users in deploying LLM Gateway via Helm.
Helm Chart Introduction and Documentation:
Chart.yamlwith metadata and configuration for Kubernetes deployments.README.mdin the Helm chart directory with installation instructions and usage details.cc @steebchen @smakosh 🫡
Summary by CodeRabbit
New Features
Documentation