Rl sandbox intro by arbrown · Pull Request #1387 · GoogleCloudPlatform/devrel-demos

arbrown · 2026-06-24T20:10:55Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a High-Performance Distributed RL Sandbox environment on GKE, adding Dockerfiles, Kubernetes configurations, and a training script using TRL and Ray. Key feedback includes adding a timeout to the HTTP request, using --no-install-recommends to reduce Docker image size, enabling vLLM in the GRPO configuration for faster generation, making the bash block regex more robust, and moving ray.init() to the main function to prevent initialization issues on remote workers.

gemini-code-assist · 2026-06-24T20:14:39Z

+            try:
+                github_repo = example["repo"]
+                url = f"https://raw.githubusercontent.com/{github_repo}/{example['base_commit']}/{target_file}"
+                with urllib.request.urlopen(url) as response:


The urllib.request.urlopen call does not specify a timeout. If the GitHub raw server is slow or unresponsive, this call can block indefinitely, hanging the dataset mapping process. It is recommended to set a reasonable timeout.

Suggested change

with urllib.request.urlopen(url) as response:

with urllib.request.urlopen(url, timeout=10) as response:

gemini-code-assist · 2026-06-24T20:14:39Z

+RUN apt-get update && apt-get install -y \
+    git \
+    build-essential \
+    libsqlite3-dev \
+    && rm -rf /var/lib/apt/lists/*


To keep the Docker image size as small as possible, it is recommended to use the --no-install-recommends flag with apt-get install.

RUN apt-get update && apt-get install -y --no-install-recommends \ git \ build-essential \ libsqlite3-dev \ && rm -rf /var/lib/apt/lists/*

gemini-code-assist · 2026-06-24T20:14:39Z

+    training_args = GRPOConfig(
+        output_dir="outputs",
+        learning_rate=5e-6,
+        max_steps=10,
+        per_device_train_batch_size=1,
+        gradient_accumulation_steps=4,
+        num_generations=8,
+        generation_batch_size=8,
+    )


Since the GPU worker base image is vllm/vllm-openai, you can significantly accelerate the generation phase of GRPO by enabling vLLM integration in GRPOConfig using use_vllm=True.

Suggested change

training_args = GRPOConfig(

output_dir="outputs",

learning_rate=5e-6,

max_steps=10,

per_device_train_batch_size=1,

gradient_accumulation_steps=4,

num_generations=8,

generation_batch_size=8,

)

training_args = GRPOConfig(

output_dir="outputs",

learning_rate=5e-6,

max_steps=10,

per_device_train_batch_size=1,

gradient_accumulation_steps=4,

num_generations=8,

generation_batch_size=8,

use_vllm=True,

)

gemini-code-assist · 2026-06-24T20:14:39Z

+
+    try:
+        # Check if the code is correctly formatted
+        bash_match = re.search(r"```bash\n(.*?)\n```", code, re.DOTALL)


LLMs frequently output sh instead of bash, or include trailing whitespace after the language identifier. The current regex is strict and will fail to match these variations, resulting in a 0.0 reward for otherwise valid completions. Using a more permissive regex like r"```(?:bash|sh)\\s*\\n(.*?)\\n```" is much more robust.

Suggested change

bash_match = re.search(r"```bash\n(.*?)\n```", code, re.DOTALL)

bash_match = re.search(r"```(?:bash|sh)\\s*\\n(.*?)\\n```", code, re.DOTALL)

gemini-code-assist · 2026-06-24T20:14:39Z

+import urllib.request
+import re
+
+ray.init(ignore_reinit_error=True)


Calling ray.init() at the module level is a Ray anti-pattern. When Ray workers import this module to execute tasks, they will run ray.init() again. Although ignore_reinit_error=True suppresses the error, it can still cause unexpected behavior or warnings. It is best practice to initialize Ray inside the main() function or under the if __name__ == "__main__": block.

gemini-code-assist · 2026-06-24T20:14:39Z

+def main():
+    print("Submitting training job to GPU worker...")
+    ray.get(train.remote())


Initialize Ray inside the main() function to ensure it only runs on the driver process and not on the Ray workers when they import this module.

Suggested change

def main():

print("Submitting training job to GPU worker...")

ray.get(train.remote())

def main():

ray.init(ignore_reinit_error=True)

print("Submitting training job to GPU worker...")

ray.get(train.remote())

arbrown added 2 commits June 24, 2026 14:07

Add code samples for RL Sandbox codelab

b3bd008

Add README for rl-sandbox-intro

895333d

arbrown merged commit a5e8634 into main Jun 24, 2026
6 checks passed

arbrown deleted the rl-sandbox-intro branch June 24, 2026 20:11

gemini-code-assist Bot reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rl sandbox intro#1387

Rl sandbox intro#1387
arbrown merged 2 commits into
mainfrom
rl-sandbox-intro

arbrown commented Jun 24, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	with urllib.request.urlopen(url) as response:
	with urllib.request.urlopen(url, timeout=10) as response:

	bash_match = re.search(r"```bash\n(.*?)\n```", code, re.DOTALL)
	bash_match = re.search(r"```(?:bash\|sh)\\s\\n(.?)\\n```", code, re.DOTALL)

Uh oh!

Conversation

arbrown commented Jun 24, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant