20 ann

Introduction

Several packages contain neural networks stuff: require("aprilann.ann"), require("aprilann.ann.loss"), require("aprilann.ann.optimizer"), require("aprilann.trainable").

This page describe the utilities to build and train ANNs. Four main sections are written: a desciprion of ANN concepts in APRIL-ANN, the easy building procedure for MLPs, the training helpers, and finally the full description of the aprilann.ann package.

ANN components

Inspired by other toolkits (as Torch 7 or pyBrain), ANNs are described as a composition of blocks call ANN components, so one component is a neural network itself. A list of all available components appears executing:

april_help(ann.components)

Nevertheless, the composition procedure will be explained later. An ANN component is identified by a name string (which will be automatically generated if not given). The name must be unique. Some components contains weights in their core, which are estimated by gradient descent algorithm (backpropagation). Connection weights objects are identified by a weights name parameter, which could be reused. If two components have the same weights name, then they share the same connections object.

All components have an input and output size, which defines the number of weights (if needed) and the fan-in/fan-out of the component. Components need to be build (build method) once they are constructed. Build procedure allocates memory for connections and checks input/output sizes of components.

More accurate description is available at april_help, but don't be affraid, the next section presents an abstraction for train MLPs which automatically does a lot of this work:

april_help(ann.components.base)
april_help(ann.components.base.build)

The easy way: all-all MLP

The simpliest kind of ANN is a Multilayer Perceptron (MLP) where each layer is fully connected with the next layer (feed-forward, all-all connections).

Building the MLP: ann.mlp.all_all.generate

The method generate returns an special component object, which cannot be modified. Actually, it is a Lua table formed by an ann.components.stack instance and other information useful to load and save the MLPs, and it implements wrapper Lua functions to ANN component methods.

-- creates an ANN component for a MLP with the given description
thenet = ann.mlp.all_all.generate("256 inputs 128 tanh 10 log_softmax")

-- creates an instance of a trainer object for previous ANN component,
-- using the multi-class cross-entropy loss function (for 10 output units),
-- and using a bunch_size of 32. Loss function and bunch_size are optional.
trainer = trainable.supervised_trainer(thenet,
                       ann.loss.multi_class_cross_entropy(10),
                       32,
                       -- this last parameter is optional, by default is
                       -- SGD => Stochastig Gradient Descent
                       ann.optimizer.sgd())

-- builds the component contained into trainer object
trainer:build()

-- initializes the weights randomly, using fan-in and fan-out
trainer:randomize_weights{
  random      = random(1234),
  inf         = -0.1,
  sup         =  0.1,
  use_fanin   = true,
  use_fanout  = true,
}

As said before, each component has a unique name, and if needed a weights name. The next code iterates over all components:

> for name,c in trainer:iterate_components() do print(name,c) end
actf1   instance 0x7fc3e94850a0 of ann.components.base
actf2   instance 0x7fc3e9485550 of ann.components.base
b1  instance 0x7fc3e9484f80 of ann.components.base
b2  instance 0x7fc3e9485410 of ann.components.base
c1  instance 0x7fc3e9484a10 of ann.components.base
layer1  instance 0x7fc3e9484e80 of ann.components.base
layer2  instance 0x7fc3e9485310 of ann.components.base
w1  instance 0x7fc3e9484ee0 of ann.components.base
w2  instance 0x7fc3e9485370 of ann.components.base

The MLP is composed by 9 components, two activation functions (actf1 and actf2), two bias components (b1 and b2), one stack component which works as a container (c1), two hyperplane components containing one bias and one dot_product each one (layer1 and layer2), and finally two dot_product components (w1 and w2) which contains weight matrixes.

It is also possible to iterate over all weigths names:

> for name,connections in trainer:iterate_weights() do print(name,type(connections)) end
b1  matrix
b2  matrix
w1  matrix
w2  matrix

So, our MLP contains two bias vectors (b1 and b2, corresponding with b1 and b2 components), and two weights matrixes (w1 and w2, corresponding with w1 and w2 components). All MLPs generated automatically assign this names to its components and weights.

One time the component is build by using a trainer instance, the trainer exposes two interesting methods trainer:component(COMPONENT_NAME_STRING) which returns the component given its name, and trainer:weights(WEIGTHS_NAME_STRING) which returns the connection weigths object given its weigths_name attribute.

More info about trainable.supervised_trainer doing:

april_help(trainable.supervised_trainer)

Loss functions: ann.loss

The loss function is used to train the ANNs via gradient descent algorithm. Trainer objects needs an instance of a loss function to perform training, being a very useful abstraction of standard training procedures.

Detailed information about loss functions is in:

april_help(ann.loss)

The loss function could be set at trainer constructor, or using the method set_loss_function:

trainer:set_loss_function(ann.loss.mse())

Three main error functions are implemented: mean square error (MSE), two class cross-entropy, and multi-class cross-entropy. Note that cross-entropy like functions are specialized for log_logistic or log_softmax output activation functions. Almost all the constructors accepts a SIZE=0 parameter, which means that the layer has a dynamic size.:

ann.loss.mse(SIZE) returns an instance of the Mean Squared Error error function for SIZE neurons. It is a quadratic loss function.
ann.loss.mae(SIZE) returns an instance of the Mean Absolute Error function, for SIZE neurons. It is not a quadratic loss function.
ann.loss.cross_entropy(SIZE) returns an instance of the two-class cross-entropy. It only works with log_logistic output activation function. It is based on Kullback-Leibler divergence.
ann.loss.multi_class_cross_entropy(SIZE) returns an instance of the multi-class cross-entropy. The parameter must be SIZE>2, so for two-class problems only one output unit with cross-entropy is needed. It only works with log_logistic or log_softmax output activation function (its better to use log_softmax). It is based on Kullback-Leibler divergence.

ann.optimizer

The optimizer is an object which implements the learning algorithm. Every class in ann.optimizer is an optimizer. Several learning hyperparameters are available, depending in the selected optimizer. This learning hyperparameters are known as options, and could be set globally (to all the connection weight layers of the ANN), or layerwise (to a concrete connection weights object, identified by its name). Optimizers implement the following API:

other = optimizer:clone(): returns a deep copy of the caller object.
value = optimizer:get_option(name): return the global value of a given learning option name.
optimizer:set_option(name, value): sets the global value of a given learning option name.
optimizer:set_layerwise_option(layer_name, option_name, value): sets a layerwise option.
value = optimizer:get_layerwise_option(layer_name, option_name): returns the layerwise option of the given.
value = optimizer:get_option_of(layer_name, option_name): returns the option which is applicable to the given layer_name. If a layerwise option was previously defined, the method returns its value. Otherwise, the value of the global option will be returned.

ann.optimizer.sgd

Different optimizer objects are implemented. They train the neural network following different algorithms which rely in the computation of gradients done by ANN components. Them incorporate regularization and momentum hyperparameters. They options are algorithm dependentendt. In case of Stochastic Gradient Descent, the options are:

learning_rate: the learning rate controls the portion of the gradient used to update the weights. This value is smoothed depending in the bunch_size and in the number K of times that a weight connections object is shared between different components. The smoothing value: learning_rate/sqrt(bunch_size+K)
momentum: is a inertial hyperparameter which applies a portion of the weight update in the previous iteration.
weight_decay: a L2 regularization term.
L1_norm: a L1 regularization term.
max_norm_penalty: a constrain penalty based on the two-norm of the weights.

The algorithm uses the following learning rule:

w = (1 - weight_decay)*w' + momentum*(w' - w'') + lr'*grad(L)/grad(w')

where w, w' and w'' are the weight values at next, current, and previous iterations; lr' is the learning_rate smoothed by the sqrt, and grad(L)/grad(w') is the loss function gradient at the given weight.

Trainer set and get of hyperparameters###

The hyperparemters of optimizer objects can be modified by the trainer object:

trainer:set_option(name,value): sets a global learning option value.
value=trainer:get_option(name): gets a global learning option value.
trainer:set_layerwise_option(layer_name_match,option_name,value): sets a layerwise learning option value of all the connection weight objects whose name matches the given layer_name_match Lua pattern string.
value=trainer:get_option_of(layer_name,option_name): gets the option value applicable to the given layer.

trainer:build()
trainer:set_option("learning_rate", number)
trainer:set_option("momentum", number)
-- regularization is recommended to not be applied at bias connections
trainer:set_layerwise_option("w.*", "weight_decay", number)
trainer:set_layerwise_option("w.*", "max_norm_penalty", number)

-- for dropout (see dropout http://www.cs.toronto.edu/~nitish/msc_thesis.pdf)

-- dropout is a very especial option, it modifies training, but also modifies
-- validation (or test) phase. Also it must be applied carefully to not apply
-- dropout at the output of your model. Dropout is applied as another component
-- which acts as a stochastic filter.

Supervised trainer description

See the documentation for trainable package.

Stopping criteria

See the documentation for trainable package.

ann package reference

ANNs are implemented as a composition of components which implements define the three main operations of an ANN: forward step (compute outputs), backprop step (neuron gradient computation), and gradient computation step (weight gradients). All components are child classes of ann.components.base. See april_help(ann.components.base) for on-line documentation.

Two main remarks before continue following sections. The components has two special properties:

name: is a string which identifies the component in a unique manner, is forbidden that two components sharing the same name.
weights_name: is a string which identifies the connections (weights or biases) of the component. This name could be share by different components, which means that they share the same connections object.

Tokens and matrices

The components are integrated in Lua via the abstract class token, which has two specializations for ANNs:

tokens.matrix is a token which contains a matrix instance.
tokens.sparse_matrix is a token which contains a matrix.sparse instance.

In any case, ANN components wrap the given matrix objects into a token, and unwrap matrix objects when returning a token. So, in practice, you can ignore the token/matrix association.

NOTE that ANN components work with dense matrix or with csr sparse matrices.

Components basis

All components has defined the following basic properties, which are tokens: input, output, error_input, and error_output. Four are the basic methods to train the components:

component,table,table = build(): this method reserves memory for weights and prepares the component to work with.
reset(iteration): it releases all the tokens internally allocated (or given by Lua), and receives the current iteration number. This iteration is not related with the training loop or epoch, it is related to optimizer objects which implement line search or similar (Conjugate Gradient or RProp).
token=forward(token[, boolean]): it receives an input token and returns the output token. For simplicity, it is possible to give a matrix instead of a token, and the method will wrap automatically the given matrix. In any case, the returned value is a token.
token=backprop(token): it receives an error input token (gradient), and returns the output error token (gradient). For simplicity, it is possible to give a matrix instead of a token, and the method will wrap automatically the given matrix. In any case, the returned value is a token.
gradients=compute_gradients( [gradients] ): compute the weight gradients, by using the data stored at the components (input/output tokens, input/output error tokens), given and produced during forward and backprop methods. Additionally, it receives a table of matrix with previously computed gradients, which will be used to store the data avoiding the allocation of new memory. The method returns a table of matrix with the gradients computed for each connection weights object.

Combining this methods with loss functions a component could be trained following this basic example. A linear component is trained to follow OR function, for input=[0,1] and target output=[1]. By default the weights are not initialized, so they contains memory trash.

> o = ann.optimizer.gsd() -- the optimizer
> l = ann.loss.mse(1) -- MSE loss function
> -- an hyperplane component (explained later)
> c = ann.components.hyperplane{ input=2, output=1 }
> c:build() -- allocates memory for weights, and checks components integrity
> l:reset() -- set to zero all the things
> c:reset() -- set to zero all the things
> o:execute(function()
              -- the true indicates training
              output_token=c:forward(matrix(1,2,{0,1}), true)
              -- gradient with desired output 1
              output_error=c:backprop(l:gradient(output_token,
                                                 matrix(1,1,{1})))
              grad = c:compute_gradients(grad)
              return l:compute_loss(output_token,
			                        matrix(1,1,{1}),
                     grad
            end, c:copy_weights())
> output_token=c:forward(matrix(1,2,{0,1}))
> print(output_token) -- the output is closer to 1
0.2
# Matrix of size [1,1] [0xb01ce0 data= 0xad97d0]

Methods common to all the components

Note that all matrices must had at least two dimensions. All computations are done in bunch mode (using mini-batches) and the first dimension size is the number of patterns contained by the bunch. The rest of dimensions must complain the input constrains of the component. A lot of components work with linear inputs, so the input matrix will be bi-dimensional, but some components work with multidimensional matrices. It is possible to use matrices of only one dimension and they will be reinterpreted as two dimensional matrices with only one row, but better if you work always with two-dimensional matrices.

Building procedure

Before doing anything, components could be composed together to build larger components. This procedure needs to call build method at the end, to check the input/output sizes and reserve memory for weights and biases.

The c:build() call executes recursively the build method of all the components composition. This method returns two tables:

> caller_component, weights_dict, components_table = c:build()

The caller_component is the component c in this case.

The weights_dict is a table of matrices, which indexes name (weight name) strings with weight matrices.

The components_table is a Lua table indexed by each name (component name) and contains a reference to the component instance, which is useful to initialize hyper-parameter and other stuff in a component-wise manner.

Input/output sizes

number = c:get_input_size(): returns the size of the input for the caller component. In case of unknown input size, a zero will be returned.
number = c:get_output_size(): returns the size of the output for the caller component. In case of unknown output size, a zero will be returned.
table = c:precompute_output_size( [table] ): allows to compute the output size shape, given an input shape. It is useful to be combined with convolutional ANNs, in order to ask for the output shape size of the convolution. The given table must complains the expected input shape of the component (normally is one dimension, but with CNNs it could be multi-dimensional). The returned table will contain as many dimensions as the produced by the caller component (idem as for input).

Back-propagation computation methods

token = c:forward( token [, boolean] ) receives a token and an optional boolean (by default false). The boolean indicates if this forward is during training or not, because some components has an special behavior during training. It returns a token with the output computation of the caller component. For simplicity, it is possible to give a matrix instead of a token, and the method will wrap automatically the given matrix. In any case, the returned value is a token.
token = c:backprop( token ) receives a token with the input error (gradient of each output neuron), and returns another token with the output error (gradient of each input neuron). For simplicity, it is possible to give a matrix instead of a token, and the method will wrap automatically the given matrix. In any case, the returned value is a token.
gradients = c:compute_gradients( gradients ) returns the weight gradients computed using the tokens given at forward and backprop methods.
c:reset() releases the retained tokens in forward and backprop steps.

Getters of produced and retained tokens

During forward and backprop steps the components compute outputs and error outputs (gradients), and retain the input and error input (gradients) tokens. Before call reset method, you could ask the component for its retained tokens:

token = c:get_input() returns the token given as input at forward method.
token = c:get_output() returns the token computed as output by forward method.
token = c:get_error_input() retruns the token given as error input at backprop method.
token = c:get_error_output() returns the token computed as error output by backprop method.

Weights matrices and bias vectors

Components which require weights has internally a matrix instance. This object is allocated calling the build method of the components (or using the build method of a trainer), and is identified by the weigths_name property, so components with the same weigths_name share the same connections object.

This matrices are defined with OUTPUTxINPUT size (output rows, input columns), so:

Bias vectors: has INPUT=1 and OUTPUT=number of neurons, and they are a column vector.
Weight matrices: contain OUTPUTSxINPUTS weights.

The weights matrices has this format:

w(i1,o1)  w(i2,o1)  w(i3,o1)  ...
w(i1,o2)  w(i2,o2)  w(i3,o2)  ...
...       ...       ...

where w(a,b) is the weight which connects input a with output b. Be sure that your matrices has this format.

Components list

The ANN models are modular components which can be sorted in several ways to produce different topologies.

Basic components

base

ann.components.base{ size=0, [name=STRING] }

The class ann.components.base is the base of all ANN components. It is possible to instance an object of this class, and it performs identity function. The constructor receives optionally the name of the component. The constructor receives two optional arguments, the size=0, by default it allows any input size, and the name of the component.

> c1 = ann.components.base{ name="base1" }
> c2 = ann.components.base()
> input = matrix(10,10):uniformf(0,1,random(237))
> output = c2:forward(input)
> = output:equals(input)
true

bias

ann.components.bias{ size=NUMBER, [name=STRING], [weights=STRING] }

The class ann.components.bias implements an additive bias of a given size. The bias is added iteratively to all the patterns in the bunch (mini-batch). The constructor receives two fields:

name of the component, an optional field.
weights name of the component, an optional field.
size the size of the bias vector.

This components contains a vector of SIZEx1, which is added transposed to all the input patterns (first dimension of the bunch).

> b1 = ann.components.bias{ name='b1', weights='b1', size=5 }
> _,weights = b1:build()
> weights('b1'):linspace()
> = weights('b1')
 1
 2
 3
 4
 5
# Matrix of size [5,1] [0x162eb00 data= 0x16b0260]
> input = matrix(4,5):linspace()
> = input
 1           2           3           4           5
 6           7           8           9           10
 11          12          13          14          15
 16          17          18          19          20
# Matrix of size [4,5] [0x185a3d0 data= 0x17e18d0]
> output = b1:forward(input)
> = output
 2           4           6           8           10
 7           9           11          13          15
 12          14          16          18          20
 17          19          21          23          25
# Matrix of size [4,5] [0x185b370 data= 0x1718450]
> -- the bias component executes the following operation
> for i=1,input:dim(1) do input(i,':'):axpy(1.0, weights('b1'):transpose()) end
> = input
 2           4           6           8           10
 7           9           11          13          15
 12          14          16          18          20
 17          19          21          23          25
# Matrix of size [4,5] [0x185a3d0 data= 0x17e18d0]

dot_product

ann.components.dot_product{ ... }

The class ann.components.dot_product implements the dot product between a weights vector of every neuron and the given input vector, which is a vector-matrix product. If the input is a matrix with a bunch of patterns, the component executes a matrix-matrix product. The component contains a weights matrix with size OxI, where O is the number of neurons (output size), and I is the number of inputs (input size). The constructor receives:

name is a string with the component name, optional.
weights is a string with the weights name, optional.
input is a number with the input size.
output is the number of neurons.
transpose=false is a boolean indicating if the weights matrix is transposed. It is optional, by default it is transpose=false.

> c = ann.components.dot_product{ weights='w1', input=4, output=5 }
> _,weights = c:build()
> weights('w1'):linspace()
> = weights('w1')
 1           2           3           4
 5           6           7           8
 9           10          11          12
 13          14          15          16
 17          18          19          20
# Matrix of size [5,4] [0x186e620 data= 0x182b050]
> input = matrix(3,4):linspace()
> = input
 1           2           3           4
 5           6           7           8
 9           10          11          12
# Matrix of size [3,4] [0x168f420 data= 0x1835190]
> output = c:forward(input)
> = output
 30          70          110         150         190
 70          174         278         382         486
 110         278         446         614         782
# Matrix of size [3,5] [0x185ee70 data= 0x18655c0]
> -- the performed operation is
> = input * weights('w1'):transpose()
 30          70          110         150         190
 70          174         278         382         486
 110         278         446         614         782
# Matrix of size [3,5] [0x1869f50 data= 0x1645e60]

In case of very sparse inputs, it is possible to replace the input matrix by a tokens.sparse_matrix, allowing to improve the efficiency of the operation. Transformation of matrices into tokens and tokens into matrix is automatically performed.

> -- a matrix with two rows:
> -- first row: active components are the 3 with 1, and the 2 with 0.5
> -- second row: active components are the 1 with 0.3
> dense_input = matrix(2,4):zeros():set(1,3,1):set(1,2,0.5):set(2,1,0.3)
> sparse_input = matrix.sparse( dense_input )
> = sparse_input
 0           0.5         1           0
 0.3         0           0           0
# SparseMatrix of size [2,4] in csr [0x17deaa0 data= 0x17864b0 0x17c9540 0x167cdb0], 3 non-zeros
> output = c:forward(input)
> = output
 4           10          16          22          28
 0.3         1.5         2.7         3.9         5.1
# Matrix of size [2,5] [0x18612d0 data= 0x17fb8a0]
> -- which is equivalent to the following
> output = c:forward(dense_input)
> = output
 4           10          16          22          28
 0.3         1.5         2.7         3.9         5.1
# Matrix of size [2,5] [0x185ee70 data= 0x1636e60]

hyperplane

ann.components.hyperplane{ ... }

The class ann.components.hyperplane is a wrapper around a bias and a dot_product components, implementing an hyperplane separator. The constructor receives:

name an optional string with the component name.
dot_product an optional string with the dot_product component name.
bias an optional string with the bias component name.
dot_product_weights an optional string with the dot_product component weights name.
bias_weights an optional string with the bias component weights name.
input a number with the input size.
output a number with the input size.
transpose=false a boolean indicating if the dot_product weights will be transposed in the operation.

> c = ann.components.hyperplane{ dot_product_weights='w1', bias_weights='b1',
                                 input=128, output=256 }
> _,weights = c:build()
> for name,w in pairs(weights) do print(name) print(w) end
w1
Large matrix, not printed to display
# Matrix of size [256,128] [0x185ee70 data= 0x16ae840]
b1
Large matrix, not printed to display
# Matrix of size [256,1] [0x1869120 data= 0x165d540]

Activation function components

logistic

ann.components.actf.logistic()

log_logistic

ann.components.actf.log_logistic()

softmax

ann.components.actf.softmax()

log_softmax

ann.components.actf.log_softmax()

tanh

ann.components.actf.tanh()

hardtanh

ann.components.actf.hardtanh()

relu

ann.components.actf.relu()

softplus

ann.components.actf.softplus()

sin

ann.components.actf.sin()

Container components

stack

ann.components.stack()

> ann.components.reset_id_counters() -- reset ID name generator
> mlp = ann.components.stack()
> mlp:push( ann.components.hyperplane{ input=100, output=200 } )
> mlp:push( ann.components.actf.logistic() )
> mlp:push( ann.components.hyperplane{ input=200, output=40 } )
> mlp:push( ann.components.actf.log_softmax() )
> _,weights = mlp:build()
> for name,w in pairs(weights) do print(name) print(w) end
w0
Large matrix, not printed to display
# Matrix of size [200,100] [0x1863df0 data= 0x1668030]
w2
Large matrix, not printed to display
# Matrix of size [40,200] [0x186bfd0 data= 0x17c71b0]
b1
Large matrix, not printed to display
# Matrix of size [200,1] [0x186aee0 data= 0x18159f0]
b3
Large matrix, not printed to display
# Matrix of size [40,1] [0x186d6d0 data= 0x175c910]

join

ann.components.join()

Filter components

dropout

ann.components.dropout()

> c = ann.components.dropout{ random=random(3284), prob=0.5, value=0.0 }

select

ann.components.select()

slice

ann.components.slice()

gaussian_noise

ann.components.gaussian_noise{ random, prob, var, mean }

salt_and_pepper

ann.components.salt_and_pepper{ random, prob, zero, one }

Convolutional components

This components are used to build Convolutional Neural Networks. If you use dataset.matrix, your patterns will be flattened at converted into a one dimensional matrix. This forces to add a rewrap components at the beginning of your ANN. Follows an example of a FULL CNN for MNIST task (28x28 pixels, images of digits):

-- tables for the CNN configuration
ishape  = {1, 28, 28} -- for input matrix rewrapping
conv1   = {1, 5, 5} nconv1=20
maxp1   = {1, 2, 2}
conv2   = {nconv1, 5, 5,} nconv2=50
maxp2   = {1, 2, 2}
hidden  = 500

thenet = ann.components.stack():
push( ann.components.rewrap{ size=ishape } ):
push( ann.components.convolution{ kernel=conv1, n=nconv1 } ):
push( ann.components.convolution_bias{ n=nconv1, ndims=#conv1 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp1,} ):
push( ann.components.convolution{ kernel=conv2, n=nconv2 } ):
push( ann.components.convolution_bias{ n=nconv2, ndims=#conv2 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp2 } ):
push( ann.components.flatten() )

-- using the method precompute_output_size, it is possible to know
-- the size of the convolution after the flatten operation
local conv_size = thenet:precompute_output_size()[1]

thenet:
push( ann.components.hyperplane{ input=conv_size, output=hidden } ):
push( ann.components.actf.tanh() ):
push( ann.components.hyperplane{ input=hidden, output= 10 } ):
push( ann.components.actf.log_softmax() )

convolution

ann.components.convolution{ kernel, step, n, name, weights, ... }

A convolutional component could be created as:

> c = ann.components.convolution{ kernel={3, 5, 5}, step={1, 1, 1}, n=10,
                                  name="conv-W1", weights="W1",
                                  input_planes_dim=1 }

This component executes a convolution using the given kernel sizes, moving the convolution window following step table, and using n different kernels. This module has a dynamic input/output size, the convolution is performed over all the input following the indicated parameters.

input_planes_dim is a number (optional, by default is 1) which indicates the dimension K at input matrix where are located the input planes.
kernel is a table which describes the size of each kernel. The K element of this table is always the number of PLANES at the input matrix. Therefore, a kernel over a 1-dim signal will be like kernel={1, 5} being K=1. For a 2D image will be kernel={1, 5, 5}, for a 2D image with RGB color will be kernel={3, 5, 5} if K=1, otherwise it could be kernel={5, 3, 5} if K=2 or kernel={5, 5, 3} if K=3. For a RGB video sequence the kernel will be kernel={3, 5, 5, 5} for K=1, and so on.
step is a table which indicates how to move the kernel. The number of steps at each dimension will be (input_dim[i] - kernel[i])/step[i] + 1. The K element of this table is forced to be 1, so that is the number of planes at input matrix. The step is optional, by default has all its elements assigned to 1.
n is the number of kernels to be applied. It is the number of output planes produced by this component (number of neurons).
name and weights are the strings with for search components and connection objects.

The output produced by this component will be of:

output_size[1]=n
output_size[i+1]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=1,...,input_planes_dim-1
output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=input_planes_dim+1,...,#kernel

By default, input_planes_dim=1, so the output size will be simplified as:

output_size[1]=n
output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=2,...,#kernel

convolution_bias

ann.components.convolution_bias{ n, ndims, name, weights }

> c = ann.components.convolution_bias{ n=10, ndims=3,
                                       name="conv-B1", weights="B1" }

n is the number of planes at the input (the first dimension size of the input matrix).
ndims is the number of dimensions expected at the input matrix.
name and weights as usual

max_pooling

ann.components.max_pooling{ kernel, name }

> c = ann.components.max_pooling{ kernel={1, 2, 2}, name="pool-2" }

kernel is a table with the sizes of the kernel applied to the input matrix. Depending on this the behavior of the max-pooling could be to do a down-sampling of an input matrix (as in the example), or to convert the input in a fixed size feature vector (kernel = {1, 0, 0}). The 0 value at one component means to fit this dimension with the same dimension of input matrix. So, the last example {1, 0, 0} will be a max-pooling computed over all positions for each input plane, producing as output a feature vector of INPUT PLANES size.
name as usual.

flatten

ann.components.flatten{ [name] }

This components converts an input matrix formed by N patterns of any dimensionality to an output bidimensional matrix with N rows and M columns, where M is the product of all input matrix dimensions (except the first one which is the number of patterns).

> c = ann.components.flatten{ name="flatten" }

Other components

copy

ann.components.copy

Intro
matrix
tokens
dataset
ann 21. ann.loss 22. ann.optimizer 23. ann.graph 25. ann.autoencoders
trainable
random
autodiff 31. autodiff.ann
matlab
stats 51. stats.MI
complex
util
gzio
Image
ImageIO
AffineTransform2D
class
clustering
knn
hyperopt
FAQ