This mini-project was originally meant to be based on the paper
"Using Soft Actor-Critic for Low-Level
UAV Control" (available on ArXiv at
https://arxiv.org/abs/2010.02293
). My plan was to use the code available at
https://github.com/larocs/SAC_uav
to reproduce some result as a first step, and then to use the
existing implementation under some new scenarios not in the paper.
Unfortunately, I've had to go through a few days of long hours trying to
debug the errors I was getting left and right, and could not get the
author's code to run on my machine, even when using docker and even after
discussing with the first author himself via email (see the full correspondence
here). The correspondence
includes other errors I got along the way, but the last error I got when
trying to run the project natively (on my machine) is shown below:
Traceback (most recent call last):
File "main.py", line 13, in <module>
from sim_framework.envs.drone_env import DroneEnv
File "C:\Users\ophir\SAC_uav-master\sim_framework\envs\drone_env.py",
line 11 in <module>
from pyrep import PyRep
ImportError: cannot import name 'PyRep' from 'pyrep' (C:\Users\ophir\AppData\Roa
ming\Python\Python38\site-packages\pyrep\__init__.py)
ERROR: Command errored out with exit status 1:
command: 'C:\Program Files\Python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ophir\\AppData\
\Local\\Temp\\pip-req-build-z_qq1rh_\\setup.py'"'"'; __file__='"'"'C:\\Users\\ophir\\AppData\\Local\\Temp\\pip-req-build-z_qq1rh_\\setup
.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(
code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\ophir\AppData\Local\Temp\pip-pip-egg-info-8dljwqbr'
cwd: C:\Users\ophir\AppData\Local\Temp\pip-req-build-z_qq1rh_\
Complete output (8 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\ophir\AppData\Local\Temp\pip-req-build-z_qq1rh_\setup.py", line 2, in <module>
import cffi_build.cffi_build as cffi_build
File "C:\Users\ophir\AppData\Local\Temp\pip-req-build-z_qq1rh_\cffi_build\cffi_build.py", line 747, in <module>
os.symlink(path, path + '.1')
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Program Files\\CoppeliaRobotics\\CoppeliaSi
mEdu\\libcoppeliaSim.so' -> 'C:\\Program Files\\CoppeliaRobotics\\CoppeliaSimEdu\\libcoppeliaSim.so.1'
creating symlink: C:\Program Files\CoppeliaRobotics\CoppeliaSimEdu\libcoppeliaSim.so.1 -> C:\Program Files\CoppeliaRobotics\Coppelia
SimEdu\libcoppeliaSim.so
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
xvfb-run ./training.sh
Traceback (most recent call last):
File "main.py", line 13, in
from sim_framework.envs.drone_env import DroneEnv
File "/home/Drone_RL/sim_framework/envs/drone_env.py", line 11, in
from pyrep import PyRep
File "/usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/__init__.py", line 6, in
from .pyrep import PyRep
File "/usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/pyrep.py", line 3, in
from pyrep.backend import sim, utils
File "/usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/backend/sim.py", line 2, in
from ._sim_cffi import ffi, lib
ImportError: /usr/local/lib/python3.6/dist-packages/PyRep-4.1.0-py3.6-linux-x86_64.egg/pyrep/backend/_sim_cffi.cpython-36m-x86_64-linux-
gnu.so: undefined symbol: simAddLog
Makefile:42: recipe for target 'training' failed
I've presented the errors above because I've discussed this with professor
Tokekar on Sunday 11/15 on Piazza, and he said that if I show that I have done my due
diligence in getting things to work and there isn't an obvious workaround that I
have not tried out, then I won't be penalized so long as I report the
process and attach the errors I got.
Additionally, I tried but struggled to find a similar yet simple enough
paper to run the code of and experiment with in the time left
for the mini-project. In light of that, I decided to do the following to
somehow still get some educational value here: below I share a humble experiment
I performed with the gym-pybullet-drones
environment,
a simple OpenAI Gym environment based on PyBullet for multi-agent
reinforcement learning with quadrotors (available
here).
I ran their provided experiment code to training a single quadcopter agent to fly through a gate using the 5 available reinforcement algorithms: advantage actor critic (a2c), proximal policy optimization (ppo), soft actor critic (sac), deep deterministic policy gradient (ddpg), and twin delayed DDPG (td3).
For each algorithm, I ran the singleagent.py
file inside
(under the experiments/learning/ folder) with "rpm" as the action type,
"flythrugate" as the environment (the task), "kin" as the
observation type (which I believe stands for kinesthetic), and
40,000 time steps.
Running the test_singleagent.py
file it was evident the
the algorithms performed poorly since 40,000 timesteps appeared to be
not enough, all of the agents simply took off and then flew for a bit
before immediately crashing into the ground, crashing into the gate, or
in the case of a2c, hovering and going up aimlessly. Surprisingly, the
SAC algorithm performed well for some of the tests. Below are quick
videos of these poor performences (and the succesful SAC test) and
graphs describing their position and velocity over time.
I had a limited amount of time and computational power but I managed to run the PPO algorithm for 250,000 timesteps, which thankfully resulted in decent performance: video and graphs are shown below.
Reproducibility is as important in computer science as any other science, and can sometimes be a real issue when code bases have many many dependencies (operating systems, software versions, processors, random seeds, etc.), however, since computer is among the few sciences for which experiments and their conditions can be exactly reproduced (since it's all virtual and depends on things like versions and random seeds) I think computer scientists are ought to hold themselves to a higher standard and make sure to keep track of all the relevant information and make reproducibility as easy and as possible.
My experiment showed that using reinforcement learning for a relatively simply optimal control task can take a large number of iterations to train if we want our models to be robust to initialization. It seems that sometimes, when the initialization is different, the model has an easier time collecting rewards and completing the task succesfully. Despite the surprising relative success of SAC over DDPG, A2C, and TD3, due to the small scope of these results, it's hard to say whether SAC has any sort of empirical advantage over the others.