Due on Github 23:59 Wednesday 20 September
- Expand our Deep Reinforcement Learning toolkit to enable continuous control for robotics.
- Improve our understanding of Actor/Critic methods
- Get some experience using the new IsaacGym DRL / simulator from NVIDIA, if available
Option #1 (for those without a GPU)
If your computer doesn’t have a GPU, find an example of Actor/Critic learning in PyTorch / Gymnasium that you can run on your computer. As we’ve done before with Taxi, once you have the example running, make sure that you have a training / testing phase; or, even better, a training script that saves the trained network, and a testing script that loads and runs it. You can submit your program as a Python script or scripts that I should be able to run on my computer.
If you want to try an example that already works, here is my fork of a repository that uses the PPO algorithm to learn Pendulum-v1. Running pendulum_train.py will save the actor and critic network weights. Can you write a pendulum_test.py script to play the game using the actor network?
Option #2 (for those with a GPU)
First, make sure that Isaac Gym is installed on your computer. I wasn’t able to do the installation on my computer at work, so if you have trouble with this option, I would just switch to Option #1, because I won’t be able to help you with Isaac issues.
Next, install the Isaac Gym environments as follows:
git clone https://github.com/NVIDIA-Omniverse/IsaacGymEnvs cd IsaacGymEnvs
pip3 install -e . python3 train.py
That last command will train your ants in real time with rendering, which as we know can slow things down dramatically. Fortunately Isaac Gym also allows “headless” (no-render) training, as described in the instructions for the repository you just cloned. Use headless training to train your ants this time; then follow the directions for running the train.py program again to show your training results. Because headless training is supposed to stop automatically with decent performance, none of the ants should stumble or tip over now.
Being able to train up a swarm of ants in under a minute is pretty cool, but just how much of an improvement are we getting with Isaac Gym over pytorch by itself (i.e., the td3-learn program from earlier)? To address this question scientifically, we’d like to compare performance on the same environment. Looking over the list of environments (tasks) in Isaac Gym showed that although Isaac Gym currently has far fewer built-in environments than the ones in the table from OpenAI Gym , both platforms appear to have the familiar Cartpole. Looking at the code for Isaac Gym’s Cartpole implementation, however, I began to doubt that it was the same as either of the Cartpole environments (v0 or v1) from OpenAI. Can you spot the difference(s)? As usual, try to start with the obvious.
What to turn in to github
For option #1 (no GPU), just turn in a script (or two) that I can run.
For option #2 (Isaac Gym), turn in the following: (a) your Ant.pth file; (b) a README.md or little PDF writeup describing how the Isaac Gym cartpole reward and action space differ from the Cartpole-v1 that we used in our previous assignment.