🚀 Reinforcement Learning with PPO on LunarLander-v3

📝 Project Overview

This project implements Proximal Policy Optimization (PPO) to train an agent in the LunarLander-v3 environment using Stable-Baselines3. The training process leverages dynamic hyperparameter tuning, parallel environments, and periodic model evaluations to optimize performance.

🎯 Features

Custom Adaptive Hyperparameter Tuning
- Adjusts entropy coefficient, learning rate, and training epochs in response to training conditions.
- Uses a custom callback (AdjustHyperparamsCallback) to modify hyperparameters on the fly.
Efficient Parallel Training
- Runs 64 environments in parallel to speed up learning.
- Optimized for CUDA acceleration with torch.
Periodic Model Checkpointing & Evaluation
- Saves the trained model at intervals.
- Evaluates performance in a separate multiprocessing process.
Multiprocessing Support in Jupyter Notebooks
- Works with "spawn" to avoid CUDA multiprocessing issues.
- Ensures compatibility with notebooks and Python scripts.

🛠️ Installation

First, clone the repository:

git clone https://github.com/your-username/your-repo.git
cd your-repo

📌 Install Dependencies

To install all required dependencies, run:

pip install -r requirements.txt

🚀 Running the Training

📌 Training from Scratch:

python
from train import train_with_ppo

train_with_ppo(total_timesteps=20_000_000)

📌 Running Evaluation:

python
from evaluate import evaluate

evaluate("trained_model.zip")

🖥️ Running in Jupyter Notebook

If using Jupyter Notebook, ensure multiprocessing compatibility by setting:

python
import multiprocessing
multiprocessing.set_start_method("spawn", force=True)

Then, restart the kernel before running the training.

📂 Project Structure

📦 lunar_lander/
 ┣ 📜 __pycache__/         # Compiled Python files
 ┣ 📜 final_tensorboard/   # TensorBoard logs
 ┣ 📜 .gitattributes       # Git configuration file
 ┣ 📜 eval_only.py         # Script for evaluation-only purposes
 ┣ 📜 final_10m_timesteps_292.zip  # Model checkpoint file
 ┣ 📜 final_285.zip        # Model checkpoint file
 ┣ 📜 final_285.py         # Script associated with final_285.zip
 ┣ 📜 final_291.zip        # Model checkpoint file
 ┣ 📜 final_293.zip        # Model checkpoint file
 ┣ 📜 final_296.zip        # Model checkpoint file
 ┣ 📜 final.py             # Main training script
 ┣ 📜 final.zip            # Final model checkpoint
 ┣ 📜 hyperparams.json     # JSON file for hyperparameters
 ┣ 📜 pipeline.ipynb       # Jupyter Notebook for pipeline
 ┣ 📜 rapids-23.10.yml     # Environment configuration for RAPIDS
 ┣ 📜 README.md            # Project documentation
 ┣ 📜 requirements.txt     # Required dependencies for the project

📌 Hyperparameters Used

Parameter	Value
`policy`	MlpPolicy
`n_steps`	512
`batch_size`	4096
`n_epochs`	8
`gamma`	0.999
`gae_lambda`	0.98
`clip_range`	0.2
`ent_coef`	0.01
`vf_coef`	0.5

📜 License

This project is open-source and available under the MIT License.

🔥 Feel free to contribute, report issues, or suggest improvements! 🔥

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Reinforcement Learning with PPO on LunarLander-v3

📝 Project Overview

🎯 Features

🛠️ Installation

📌 Install Dependencies

🚀 Running the Training

📌 Training from Scratch:

📌 Running Evaluation:

🖥️ Running in Jupyter Notebook

📂 Project Structure

📌 Hyperparameters Used

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
final_tensorboard/PPO_0		final_tensorboard/PPO_0
.gitattributes		.gitattributes
README.md		README.md
RL_LunarLander_Ruben_Avanesov.docx		RL_LunarLander_Ruben_Avanesov.docx
eval_only.py		eval_only.py
final.py		final.py
final.zip		final.zip
final_10m_timesteps_292.zip		final_10m_timesteps_292.zip
final_285.py		final_285.py
final_285.zip		final_285.zip
final_291.zip		final_291.zip
final_293.zip		final_293.zip
final_296.zip		final_296.zip
hyperparams.json		hyperparams.json
lunar_lander.gif		lunar_lander.gif
pipeline.ipynb		pipeline.ipynb
rapids-23.10.yml		rapids-23.10.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🚀 Reinforcement Learning with PPO on LunarLander-v3

📝 Project Overview

🎯 Features

🛠️ Installation

📌 Install Dependencies

🚀 Running the Training

📌 Training from Scratch:

📌 Running Evaluation:

🖥️ Running in Jupyter Notebook

📂 Project Structure

📌 Hyperparameters Used

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages