CSCI 316 Problem Set #1

Assignment #4

Due on Github 23:59 Wednesday 6 September

What I cannot create, I do not understand   – Richard Feynman (1918 – 1988)

And me, I’m flying in my taxi …  – Harry Chapin  (1942 – 1981)

Objectives

    1. Understand Q-Learning by coding it in Python.
    2. Become familiar with Gymnasium (formerly known as OpenAI Gym), the most popular test platform for Reinforcement Learning.
    3. Be able to use matplotlib to display your results.

Packages required

Getting started

Read through this blog post, which will explain both the Taxi environment for Q-Learning and the Q-Learning algorithm itself, as well as providing you with the code you’ll need to complete this assignment.  You can ignore the code that uses IPython, which we won’t need.    Note also that instead of import gym you will want to write import gymnasium as gym

Unfortunately, it appears that Gymnasium does not do everything in the same simple that that Gym did.  For example, in Gym, you could rely on env.step() setting the done value to True after a certain number of iterations, whereas that doesn’t appear to be the case in Gymnasium.  So I recommend that your QL.play() method have a max_iters parameter with a sensible default like 100, to keep it from looping any more times than that if it fails to win the game.

Creating the QL class and Taxi unit-test

As usual in my assignments, we’re going to create a  general Python class to support an algorithm, and a script with a main to test the class.

In a file ql.py, create a class QLAgent that provides the following methods.  As usual in my assignments, I encourage you to write stubbed versions of the methods first; i.e., simply a return statement and nothing else.  Then you can switch back and forth with the test program described below, adding a little functionality at a time, based on the blog:

    • __init__(self, env): accepts a Gymnasium environment, stores the environment as an instance variable, and uses numpy.zeros to create a Q-table of sized by the environment’s observation-space size and action-space size .
    • train(self, neps, alpha, gamma, epsilon) : trains using the Q-Learning algorithm for neps episodes, with learning rate alpha, discount rate gamma, and exploration rate epsilon.  Reports reward every 100 iterations, and returns an array of rewards suitable for plotting (as shown below).
    • play(self):  Using the Gym environment stored in the constructor, performs the following actions:
      1. Gets the initial state via env.reset()
      2. Loops as follows:
        1. Renders the environment
        2. Gets the best action from the Q-table and the current state
        3. Calls the environment’s step(), method, getting the new state and the done flag
        4. Breaks out of the loop when done is True

In a file taxi.py, create a main that does the following, allowing me to run your code by hitting F5 in IDLE:

    1. Creates a Gym environment for Taxi-v3.
    2. Creates a QLAgent object with this environment.
    3. Runs the agent’s play() method, so we can see how poorly it performs before training.
    4. Runs the agent’s train() method, using the hyper-parameters (constants) from the blog, and assigning the resulting array of rewards to a variable.
    5. Uses matplotlib to plot the rewards from the previous step.
    6. Runs the agent’s play() method again, so we can see how well it performs after training.

Validating your results

Before training, I always saw the taxi stay in the same place without moving.  After training, it always successfully picked up at one location (R, G,B,Y) and dropped off at another.

Here is an example plot of rewards during training:

What to submit to github:  ql.py, taxi.py  Your taxi.py should do the following, in order:

  1. Run the taxi game with the untrained agent
  2. Train using Q-Learning, displaying the rewards plot at the end
  3. Run the game again using the trained agent

I can haz teh extra creditz?

In addition to Taxi-v3, Gymnasium has other “toy text” environments suitable for ordinary Q-Learning.  If you’re feeling ambitious, write a unit-test script for one of them.  For credit, I’ll want to see you solve it!