CMSC818B Mini-Project 2, UMD, Fall ’20

Ophir Gal


This mini-project was originally meant to be based on the paper "Using Soft Actor-Critic for Low-Level UAV Control" (available on ArXiv at https://arxiv.org/abs/2010.02293 ). My plan was to use the code available at https://github.com/larocs/SAC_uav  to reproduce some result as a first step, and then to use the existing implementation under some new scenarios not in the paper.

Unfortunately, I've had to go through a few days of long hours trying to debug the errors I was getting left and right, and could not get the author's code to run on my machine, even when using docker and even after discussing with the first author himself via email (see the full correspondence here). The correspondence includes other errors I got along the way, but the last error I got when trying to run the project natively (on my machine) is shown below:

Traceback (most recent call last):
File "main.py", line 13, in <module>
from sim_framework.envs.drone_env import DroneEnv
File "C:\Users\ophir\SAC_uav-master\sim_framework\envs\drone_env.py",
line 11 in <module>
from pyrep import PyRep
ImportError: cannot import name 'PyRep' from 'pyrep' (C:\Users\ophir\AppData\Roa ming\Python\Python38\site-packages\pyrep\__init__.py)

The issue here ^^ (I later realized) was that I had something called "pyrep" (available on PyPI) when I should have had "PyRep" (available here). But trying to install "PyRep" on my machine also gave me errors which I could not solve:

ERROR: Command errored out with exit status 1: command: 'C:\Program Files\Python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ophir\\AppData\ \Local\\Temp\\pip-req-build-z_qq1rh_\\setup.py'"'"'; __file__='"'"'C:\\Users\\ophir\\AppData\\Local\\Temp\\pip-req-build-z_qq1rh_\\setup .py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile( code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\ophir\AppData\Local\Temp\pip-pip-egg-info-8dljwqbr' cwd: C:\Users\ophir\AppData\Local\Temp\pip-req-build-z_qq1rh_\ Complete output (8 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Users\ophir\AppData\Local\Temp\pip-req-build-z_qq1rh_\setup.py", line 2, in <module> import cffi_build.cffi_build as cffi_build File "C:\Users\ophir\AppData\Local\Temp\pip-req-build-z_qq1rh_\cffi_build\cffi_build.py", line 747, in <module> os.symlink(path, path + '.1') FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Program Files\\CoppeliaRobotics\\CoppeliaSi mEdu\\libcoppeliaSim.so' -> 'C:\\Program Files\\CoppeliaRobotics\\CoppeliaSimEdu\\libcoppeliaSim.so.1' creating symlink: C:\Program Files\CoppeliaRobotics\CoppeliaSimEdu\libcoppeliaSim.so.1 -> C:\Program Files\CoppeliaRobotics\Coppelia SimEdu\libcoppeliaSim.so ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Unfortunately, using Docker was not succesful as well, I was able to instantiate the image and create the container but I got an error when trying to run the training code, the error I got is:

xvfb-run ./training.sh Traceback (most recent call last): File "main.py", line 13, in from sim_framework.envs.drone_env import DroneEnv File "/home/Drone_RL/sim_framework/envs/drone_env.py", line 11, in from pyrep import PyRep File "/usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/__init__.py", line 6, in from .pyrep import PyRep File "/usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/pyrep.py", line 3, in from pyrep.backend import sim, utils File "/usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/backend/sim.py", line 2, in from ._sim_cffi import ffi, lib ImportError: /usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/backend/_sim_cffi.cpython-36m-x86_64-linux- gnu.so: undefined symbol: simAddLog Makefile:42: recipe for target 'training' failed

I've presented the errors above because I've discussed this with professor Tokekar on Sunday 11/15 on Piazza, and he said that if I show that I have done my due diligence in getting things to work and there isn't an obvious workaround that I have not tried out, then I won't be penalized so long as I report the process and attach the errors I got.

Additionally, I tried but struggled to find a similar yet simple enough paper to run the code of and experiment with in the time left for the mini-project. In light of that, I decided to do the following to somehow still get some educational value here: below I share a humble experiment I performed with the gym-pybullet-drones environment, a simple OpenAI Gym environment based on PyBullet for multi-agent reinforcement learning with quadrotors (available here).

I ran their provided experiment code to training a single quadcopter agent to fly through a gate using the 5 available reinforcement algorithms: advantage actor critic (a2c), proximal policy optimization (ppo), soft actor critic (sac), deep deterministic policy gradient (ddpg), and twin delayed DDPG (td3).

For each algorithm, I ran the singleagent.py file inside (under the experiments/learning/ folder) with "rpm" as the action type, "flythrugate" as the environment (the task), "kin" as the observation type (which I believe stands for kinesthetic), and 40,000 time steps.

Results

Running the test_singleagent.py file it was evident the the algorithms performed poorly since 40,000 timesteps appeared to be not enough, all of the agents simply took off and then flew for a bit before immediately crashing into the ground, crashing into the gate, or in the case of a2c, hovering and going up aimlessly. Surprisingly, the SAC algorithm performed well for some of the tests. Below are quick videos of these poor performences (and the succesful SAC test) and graphs describing their position and velocity over time.

A2C


DDPG


TD3


SAC


I had a limited amount of time and computational power but I managed to run the PPO algorithm for 250,000 timesteps, which thankfully resulted in decent performance: video and graphs are shown below.

PPO


Conclusion

Reproducibility is as important in computer science as any other science, and can sometimes be a real issue when code bases have many many dependencies (operating systems, software versions, processors, random seeds, etc.), however, since computer is among the few sciences for which experiments and their conditions can be exactly reproduced (since it's all virtual and depends on things like versions and random seeds) I think computer scientists are ought to hold themselves to a higher standard and make sure to keep track of all the relevant information and make reproducibility as easy and as possible.

My experiment showed that using reinforcement learning for a relatively simply optimal control task can take a large number of iterations to train if we want our models to be robust to initialization. It seems that sometimes, when the initialization is different, the model has an easier time collecting rewards and completing the task succesfully. Despite the surprising relative success of SAC over DDPG, A2C, and TD3, due to the small scope of these results, it's hard to say whether SAC has any sort of empirical advantage over the others.