This code is a reimplementation of the guided policy search algorithm and LQG-based trajectory optimization, meant to help others understand, reuse, and build upon existing work. It includes a complete robot controller and sensor interface for the PR2 robot via ROS, and an interface for simulated agents in Box2D and MuJoCo. Source code is available on GitHub.
While the core functionality is fully implemented and tested, the codebase is a work in progress. See the FAQ for information on planned future additions to the code.
Relevant papers which have used guided policy search include:
If the codebase is helpful for your research, please cite any relevant paper(s) above and the following:
For bibtex, see this page.
The following are required
One or more of the following agent interfaces is required. Set up instructions for each are below.
One of the following neural network libraries is required for the full guided policy search algorithm
Follow the following steps to get set up:
Install necessary dependencies above. To install protobuf and boost:
sudo apt-get install libprotobuf-dev protobuf-compiler libboost-all-dev
sudo pip install protobuf
Clone the repo:
git clone https://github.com/cbfinn/gps.git
Compile protobuffer:
cd gps
./compile_proto.sh
Set up one or more agents below.
Box2D Setup (optional)
Here are the instructions for setting up Pybox2D.
Install Swig and Pygame:
sudo apt-get install build-essential python-dev swig python-pygame git
Check out the Pybox2d code via GitHub
git clone https://github.com/pybox2d/pybox2d
Build and install the library:
python setup.py build
sudo python setup.py install
MuJoCo Setup (optional)
In addition to the dependencies listed above, OpenSceneGraph(v3.0.1+) is also needed. It can be installed by running sudo apt-get install openscenegraph libopenscenegraph-dev
.
Install MuJoCo (v1.22+) and place the downloaded mjpro
directory into gps/src/3rdparty
.
MuJoCo is a high-quality physics engine and requires requires a license.
Obtain a key, which should be named mjkey.txt
, and place the key into the mjpro
directory.
Build gps/src/3rdparty
by running:
cd gps/build
cmake ../src/3rdparty
make -j
Set up paths by adding the following to your ~/.bashrc
file:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/gps/build/lib
export PYTHONPATH=$PYTHONPATH:/path/to/gps/build/lib
Don’t forget to run source ~/.bashrc
afterward.
ROS Setup (optional)
Install ROS, including the standard PR2 packages
Set up paths by adding the following to your ~/.bashrc
file:
export ROS_PACKAGE_PATH=$ROS_PACKAGE_PATH:/path/to/gps:/path/to/gps/src/gps_agent_pkg
Don’t forget to run source ~/.bashrc
afterward.
Compilation:
cd src/gps_agent_pkg/
cmake .
make -j
ROS Setup with Caffe (optional)
This is required if you intend to run neural network policies with the ROS agent.
Run step 1 and 2 of the above section.
Checkout and build caffe, including running make -j && make distribute
within caffe.
Compilation:
cd src/gps_agent_pkg/
cmake . -DUSE_CAFFE=1 -DCAFFE_INCLUDE_PATH=/path/to/caffe/distribute/include -DCAFFE_LIBRARY_PATH=/path/to/caffe/build/lib
make -j
To compile with GPU, also include the option -DUSE_CAFFE_GPU=1
.
There are two examples of running trajectory optimizaiton using a simple 2D agent in Box2D. Before proceeding, be sure to set up Box2D.
Each example starts from a random controller and learns through experience to minimize cost.
The first is a point mass learning to move to goal position.
To try it out, run the following from the gps directory:
python python/gps/gps_main.py box2d_pointmass_example
The progress of the algorithm is displayed on the GUI. The point mass should start reaching the visualized goal by around the 4th iteration.
The second example is a 2-link arm learning to move to goal state.
To try it out, run this:
python python/gps/gps_main.py box2d_arm_example
The arm should start reaching the visualized goal after around 6 iterations.
All settings for these examples are located in experiments/box2d_[name]_example/hyperparams.py
,
which can be modified to input different target positions and change various hyperparameters of the algorihtm.
To run the mujoco example, be sure to first set up MuJoCo.
The first example is using trajectory optimizing for peg insertion. To try it, run the following from the gps directory:
python python/gps/gps_main.py mjc_example
Here the robot starts with a random initial controller and learns to insert the peg into the hole. The progress of the algorithm is displayed on the GUI.
Now let’s learn to generalize to different positions of the hole. For this, run the guided policy search algorithm:
python python/gps/gps_main.py mjc_badmm_example
The robot learns a neural network policy for inserting the peg under varying initial conditions.
To tinker with the hyperparameters and input, take a look at experiments/mjc_badmm_example/hyperparams.py
.
Additionally, the neural network library can be changed through the ALGORITHM_NN_LIBRARY
variable which can be set to caffe
or tf
.
To run the code on a real or simulated PR2, be sure to first follow the instructions above for ROS setup.
Real-world PR2
On the PR2 computer, run:
roslaunch gps_agent_pkg pr2_real.launch
This will stop the default arm controllers and spawn the GPSPR2Plugin.
Simulated PR2
Note: If you are running ROS hydro or later, open the launch file pr2_gazebo_no_controller.launch and change the include line as specified.
Launch gazebo and the GPSPR2Plugin:
roslaunch gps_agent_pkg pr2_gazebo.launch
Now you’re ready to run the examples via gps_main. This can be done on any machine as long as the ROS environment variables are set appropriately.
The first example starts from a random initial controller and learns to move the gripper to a specified location.
Run the following from the gps directory:
python python/gps/gps_main.py pr2_example
The PR2 should reach the position shown on the right below, and reach a cost of around -600 before the end of 10 iterations.
The second example trains a neural network policy to reach a goal pose from different starting positions, using guided policy search:
python python/gps/gps_main.py pr2_badmm_example
To learn how to make your own experiment and/or set your own initial and target positions, see the next section
Set up a new experiment directory by running:
python python/gps/gps_main.py my_experiment -n
This will create a new directory called my_experiment/ in the experiments directory, with a blank hyperparams.py file.
Fill in a hyperparams.py file in your experiment. See pr2_example and mjc_example for examples.
If you wish to set the initial and/or target positions for the pr2 robot agent, run target setup:
python python/gps/gps_main.py my_experiment -t
See the GUI documentation for details on using the GUI.
Finally, run your experiment
python python/gps/gps_main.py my_experiment
All of the output logs and data will be routed to your experiment directory. For more details, see intended usage.
In addition to the inline docstrings and comments, see the following pages for more detailed documentation:
The code was written to be modular, to make it easy to hook up your own robot. To do so, either use one of the existing agent interfaces (e.g. AgentROS), or write your own.
You can post questions on gps-help. If you want to contribute, please post on gps-dev. When your contribution is ready, make a pull request on GitHub.
This codebase is released under the BSD 2-clause license.
If you plan to use this code for commercial purposes, we ask that you send us a quick email at gps-dev-private@googlegroups.com to let us know that you’re using it.