Computation Graph Toolkit (CGT) is a library for evaluation and differentiation of functions of multidimensional arrays. Source code is available on GitHub.
The core features are as follows:
CGT is motivated by large-scale machine learning and AI problems, however, the core library will focus on the more abstract problems of evaluating and differentiating mathematical expressions. This will ensure that CGT is flexible enough to handle use-cases that are completely unanticipated by the software authors. Libraries for numerical optimization and convenient construction of neural networks will be built on top of CGT’s core functionality.
With regard to previous work, CGT is most similar to Theano. In fact, CGT aims to mostly replicate Theano’s API. However, CGT makes some core architectural changes that necessitated a new codebase:
See Why not Build on Theano? for a more thorough explanation of why we decided to reimplement Theano’s functionality from scratch.
CGT aims to make it easy to to construct large and complicated models, while ensuring that the resulting code is concise and closely resembles the underlying mathematical expressions.
Dependencies:
Python backend requires:
Native (C++) backend requires:
GPU on native backend requires:
No compilation is required for this option, and NumPy is the only dependency. Just update your PYTHONPATH as follows:
export PYTHONPATH=/path/to/cgt
If you want to use the native (C++) backend, which has better performance and enables multithreading, then Cython is required, and the installation procedure is as follows. First, cd into the source directory. Then, type:
mkdir build
cd build
cmake ..
make -j
Then add two directories to your PYTHONPATH:
export PYTHONPATH=/path/to/cgt:/path/to/cgt/build/lib:$PYTHONPATH
Note: if you don’t want to put the build directory inside your source directory, you can put it in some other location, just make sure to alter alter your PYTHONPATH accordingly.
Follow the instructions for Option 2, but alter the cmake command to instead read:
cmake .. -DCGT_ENABLE_CUDA=ON
TODO
You can run our suite of unit tests to verify your installation. In the source directory:
nosetests -v
Note that you’ll have to install the nose python module.
The basic workflow for using CGT is as follows.
1.) Define symbolic variables
import cgt
a = cgt.scalar(name='a') # float-valued scalar, with optional name provided
b = cgt.scalar(name='b')
n = cgt.scalar(name='n', dtype='int64') # integer scalar
2.) Construct a expression out of these variables. Operators like +,<,**
are overloaded.
c = (a**n + b**n)**(1.0/n)
3.) Construct a function
, which maps numerical inputs to numerical outputs.
f = cgt.function([a,b,n], c)
print f(8,15,2)
In step 2, when you constructed the expression c
, CGT was building up a data structure that stores the set of operations needed to compute c
in terms of the input arguments. To be precise, CGT builds up a directed acyclic graph describing the dependencies.
cgt.as_dot(c)
Note that the labels 0 and 1 indicate whether the edge corresponds to the zeroth or first argument to the function.
With this simple example out of the way, let's move on to vectors, matrices and gradients. We'll first consider linear regression, where we have a linear model, $y_{pred} = Xw + b$, and a quadratic loss: $L(X,y,w,b) = \|y - y_{pred}\|^2$.
X_nk = cgt.matrix("X")
y_n = cgt.vector("y")
w_k = cgt.vector("w")
b = cgt.scalar("b")
ypred_n = X_nk.dot(w_k) + b
L = cgt.sum(cgt.square(ypred_n - y_n))
print "L = ",
cgt.print_expr(L);
In the first four lines, we created symbolic variables X_nk, y_n, w_k, b
.
Then, we constructed more complex expressions by applying functions to these symbolic variables, finally giving us the loss function L
.
You can find the available functions by inspecting at the cgt
namespace (type cgt.<tab>
), and you'll find lots of familiar NumPy functions. Note that these symbolic variables have some of the same attributes as NumPy arrays.
print X_nk.ndim, str(X_nk.shape), X_nk.dtype
Next we can compute the gradient of the loss function with respect to the parameters $W,b$. (Note, however, that there's nothing special about parameters: we could just as well compute the gradient with respect to $X$ and $y$.
grads = dLdw, dLdb = cgt.grad(L, [w_k, b])
Here, dLdw
and dLdb
are just expressions. When we called cgt.grad
, CGT walked through the graph that represents the expression L
and constructed expressions for the gradients. This procedure is an variant of reverse-mode automatic differentiation or the backpropagation algorithm.
print "Loss and gradient objects", dLdw, dLdb
print "Pretty-printed gradient: ",
cgt.print_expr(cgt.simplify([dLdw])[0]);
cgt.as_dot(cgt.simplify([dLdw]))
Finally, we can compile functions to compute the loss and gradients.
f_loss = cgt.function([X_nk, y_n, w_k, b], L)
f_grads = cgt.function([X_nk, y_n, w_k, b], grads)
Let's generate some random numerical data and parameter values so we can test out these functions.
import numpy as np, numpy.random as nr
nr.seed(0)
# Generate some random data
Xval = nr.randn(100,3)
yval = nr.randn(100)
# Initialize parameters
wval = nr.randn(3)
bval = nr.randn()
np.set_printoptions(precision=3)
print "loss:", f_loss(Xval, yval, wval, bval)
print "grad:", f_grads(Xval, yval, wval, bval)
Now that we can compute the gradient, we're ready to do some optimization, where we minimize the loss with respect to $W,b$. There are many different styles for doing numerical optimization with CGT.
We'll start with a style that is convenient but not the most efficient for large-scale machine learning. Namely, we'll flatten and concatenate the parameters into one vector $\theta$, and we'll return a flat gradient vector. Then we'll hand off our functions to scipy's BFGS optimizer.
def f(theta):
return f_loss(Xval, yval, theta[0:3], theta[3])
def gradf(theta):
gw,gb = f_grads(Xval, yval, theta[0:3], theta[3])
return np.concatenate([gw,gb.reshape(1)])
import scipy.optimize as opt
theta_opt = opt.fmin_bfgs(f, np.zeros(4), gradf, disp=False)
print "Optimal theta:", theta_opt
Now let's just make sure we got the right answer:
print "Least squares solution:",np.linalg.lstsq(np.concatenate([Xval, np.ones((len(Xval),1))],axis=1), yval)[0];
You can also do the flattening and unflattening using CGT, which will usually be faster:
theta = cgt.vector('theta')
w_k_1 = theta[0:3]
b_1 = theta[3]
ypred_n_1 = X_nk.dot(w_k_1) + b_1
L_1 = cgt.sum(cgt.square(ypred_n_1 - y_n))
dLdtheta, = cgt.grad(L_1, [theta])
f = cgt.function([theta], L_1, givens = [(X_nk,Xval), (y_n,yval)])
gradf = cgt.function([theta], dLdtheta, givens = [(X_nk,Xval), (y_n,yval)])
theta_opt = opt.fmin_bfgs(f, np.zeros(4), gradf, disp=False)
print "Optimal theta:", theta_opt
Note that we provided one additional argument to cgt.function
above: givens
. The value provided should be a list of pairs (input variable, replacement), where replacement can be a numerical value or some other symbolic variable.
Now we'll introduce another way of doing optimization, but here we will be able to do in-place updates of our parameters, which will be useful for large-scale problems. This style is especially useful when using the GPU, to avoid transfering data to and from the device.
stepsize=0.001
w_k_2 = cgt.shared(wval.copy(), name='w')
b_2 = cgt.shared(bval, name='b')
ypred_n_2 = X_nk.dot(w_k_2) + b_2
L_2 = cgt.sum(cgt.square(ypred_n_2 - y_n))
dLdw_2,dLdb_2 = cgt.grad(L_2, [w_k_2, b_2])
updates = [(w_k_2, w_k_2 - stepsize*dLdw_2), (b_2, b_2 - stepsize*dLdb_2)]
givens = [(X_nk,Xval), (y_n,yval)]
f_update = cgt.function([], L_2, givens = givens, updates = updates)
for i in xrange(100):
f_update()
print w_k_2.op.get_value(), b_2.op.get_value()
CGT uses a some global configuration options that you should be aware of. You can set these options in the file ~/.cgtrc; see cgtrc.example in the source directory for a template. You can also modify these values via the command line, e.g. CGT_FLAGS=backend=native,precision=double. The file, cgtrc_spec.ini, included below, provides a listing of the configuration variables.
# At the cost of some overhead,
# store information in the computation graph that helps with debugging
debug = boolean(default=False)
# single or double precision:
precision = string(default=single)
# backend=python means using a pure python module to execute the graph, and using python implementations of ops whenever they exist
# backend=native means using the compiled execution graph interpreter, and using the native (c++) implementation of ops
backend = option("python","native",default="python") # "native" means using compiled implementations and
# Where to put generated files
cache_dir = string(default="~/.cgt_cache")
# Enable in-place optimizations.
enable_inplace_opt = boolean(default=True)
# Enable simplifications of the graph, e.g. arithmetic simplifications like x*1=x
enable_simplification = boolean(default=True)
# Use parallel execution graph interpreter
parallel = boolean(default=False)
# Number of
num_threads = integer(default=4)
# Developer Options
# -----------------
# Force native backend to use python
force_python_impl = boolean(default=False)
# Compile C++ files with debug flags
debug_cpp = boolean(default=False) # use debug flags when compiling c++
# Print lots of diagnostic information
# (we'll break this down at some point)
verbose = boolean(default=False)
For best performance, set backend=native, but for development you should set ``backend=python because the error handling is better.
CGT mostly replicates Theano’s API, but there are a few gotchas.
CGT does not allow automatic broadcasting of singleton dimensions (for elementwise binary operations like addition and multiplication.) Use the broadcast function:
def broadcast(opname, a, b, bcpat):
"""
Perform elementwise binary operation such as addition or multiplication, and expand
singleton dimensions when appropriate.
opname: string name of operation: *,+,-,/,<,>,<=,>=,**,==,!=
a, b: variables
bcpat: a string of x,1 specifying which dimensions are singletons in both a and b. Here are some examples:
"x1,1x": a.shape[1] == 1 and b.shape[0] == 1
"xx1,xxx": a.shape[2] == 1, but we should have a.shape[0]==b.shape[0] and a.shape[1]==b.shape[1]
E.g., here's an example of using this function
a = np.zeros((2,3))
b = np.zeros((2,1))
z = cgt.broadcast("+", a, b, "xx,x1")
"""
...
The nn module (import cgt.nn) provides a light wrapper around CGT’s core API that allows the user to concisely build up complicated neural network models. Below we will show how to build up a convolutional neural network, and then how to parallelize it using the multi-threaded interpreter. A complete code listing can be found in examples/cgt_theano_feedforward_comparison.py.
The following code constructs a simple convolution neural network (sized for MNIST images, what else?), where the loss function is the negative log-likelihood of labels. (A full listing is provided in the source directory, see examples/cgt_theano_feedforward_comparison.py)
# X: a symbolic variable representing a batch of input images,
# with shape (batchsize, nchannels, nrows, ncols)
X = cgt.tensor4("X", fixed_shape=(None,1,28,28))
# We provide the fixed_shape argument so
# CGT can infer the shape of downstream variables
# y: a symbolic variable representing the labels, which are integers
y = cgt.vector("y", dtype='i8')
# SpatialConvolution(...) is a constructor call, which builds the weights
# (filter) and the biases for the convolutional layer
# rectify(...) is just a function call that maps x -> x*(x>0)
conv1 = nn.rectify(
nn.SpatialConvolution(1, 32, kernelshape=(3,3), pad=(0,0),
weight_init=nn.IIDGaussian(std=.1))(X))
# Apply max pooling function
pool1 = nn.max_pool_2d(conv1, kernelshape=(3,3), stride=(2,2))
# Another convolutional layer
conv2 = nn.rectify(
nn.SpatialConvolution(32, 32, kernelshape=(3,3), pad=(0,0),
weight_init=nn.IIDGaussian(std=.1))(pool1))
pool2 = nn.max_pool_2d(conv2, kernelshape=(3,3), stride=(2,2))
# Now we flatten the last output image
d0,d1,d2,d3 = pool2.shape
flatlayer = pool2.reshape([d0,d1*d2*d3])
# CGT can infer the shape of variables
nfeats = cgt.infer_shape(flatlayer)[1]
# One final fully-connected layer, and then a log-softmax
logprobs = nn.logsoftmax(nn.Affine(nfeats, 10)(flatlayer))
neglogliks = -logprobs[cgt.arange(X.shape[0]), y]
loss = neglogliks.mean()
Now that we’ve built up an expression for the loss, we can build an expression for the gradient
# walk through the graph and find all parameters
# i.e., variables constructed with cgt.shared(...)
params = nn.get_parameters(loss)
gparams = cgt.grad(loss, params)
Finally, we can build up a function that updates the parameters:
updates = [(p, p-stepsize*gp) for (p, gp) in zip(params, gparams)]
updater = cgt.function([X, y, stepsize], loss, updates=updates)
CGT is capable of executing computations in parallel. A feedforward network does not offer much opportunity for parallelization, but we can easily transform it to use data parallelism.
First let’s build a Module, which is a parameterized function.
m = nn.Module([X,y], [loss])
The two arguments to the Module constructor are a list of inputs and a list of outputs, respectively.
Now, we can split the data along the zeroth apply the module m separately to each piece.
split_loss = 0
for start in xrange(0, batch_size, batch_size//4):
sli = slice(start, start+batch_size//4)
split_loss += m([X[sli], y[sli]])[0]
split_loss /= 4
params = nn.get_parameters(split_loss)
gparams = cgt.grad(split_loss, params)
updates2 = [(p, p-stepsize*gp) for (p, gp) in zip(params, gparams)]
updater = cgt.function([X,y, stepsize], split_loss, updates=updates2)
Let’s suppose you have compiled a function, but it is failing at runtime or returning invalid results, and you’re trying to figure out what might be wrong. Here are some steps you can take:
Use the python backend. That is, modify .cgtrc or set CGT_FLAGS=backend=python at the command line. The python implementation of operations use numpy, which may catch certain errors (e.g. shape mismatches) that our C++ implementations miss.
Set the configuration variable debug=True (with .cgtrc or CGT_FLAGS.) That will enable several pieces of functionality that store information in the graph when you are building it, so useful information can be printed when an exception is reached.
If there is a shape error, you can sometimes find it by using the infer_shape function. When constructing your variables, specify the known components of their shapes, e.g. Then you can add assertions with infer_shape (We plan to add more functionality soon that will catch shape errors at graph construction time.) Here’s an example:
x = cgt.matrix(fixed_shape=(None,5)) y = cgt.matrix(fixed_shape=(5,4)) z = x.dot(y) assert cgt.infer_shape(z)[1] == 4
You can also take this a step further and compute numerical values associated with the nodes in the graph. Just replace x and y above with constants, and then use cgt.simplify to compute the value of z.
See examples directory:
More examples are coming soon!
You can post your queries on the cgt-users discussion group. If you want to participate in the development of CGT, post on the cgt-devel discussion group.
CGT is heavily based on Theano, and we (the authors of CGT) think that Theano is a beautiful and highly innovative piece of software. However, several limitation of Theano (in its current state) motivated us to consider creating a new library:
Some of issues could be addressed within Theano’s existing codebase, however, we believe that by breaking compatibility and starting from afresh, it will be possible to resolve them more cleanly.