-
Notifications
You must be signed in to change notification settings - Fork 12
20 ann
Several packages contain neural networks stuff: require("aprilann.ann")
,
require("aprilann.ann.loss")
, require("aprilann.ann.optimizer")
,
require("aprilann.trainable")
.
This page describe the utilities to build and train ANNs. Four main sections are
written: a desciprion of ANN concepts in APRIL-ANN, the easy building procedure
for MLPs, the training helpers, and finally the full description of the
aprilann.ann
package.
Inspired by other toolkits (as Torch 7 or pyBrain), ANNs are described as a composition of blocks call ANN components, so one component is a neural network itself. A list of all available components appears executing:
april_help(ann.components)
Nevertheless, the composition procedure will be explained later. An ANN component is identified by a name string (which will be automatically generated if not given). The name must be unique. Some components contains weights in their core, which are estimated by gradient descent algorithm (backpropagation). Connection weights objects are identified by a weights name parameter, which could be reused. If two components have the same weights name, then they share the same connections object.
All components have an input and output size, which defines the number of weights (if needed) and the fan-in/fan-out of the component. Components need to be build (build method) once they are constructed. Build procedure allocates memory for connections and checks input/output sizes of components.
More accurate description is available at april_help
, but don't be affraid, the next
section presents an abstraction for train MLPs which automatically does a lot of
this work:
april_help(ann.components.base)
april_help(ann.components.base.build)
The simpliest kind of ANN is a Multilayer Perceptron (MLP) where each layer is fully connected with the next layer (feed-forward, all-all connections).
The method generate
returns an special component object, which cannot be
modified. Actually, it is a Lua table formed by an ann.components.stack
instance and other information useful to load and save the MLPs, and it
implements wrapper Lua functions to ANN component methods.
-- creates an ANN component for a MLP with the given description
thenet = ann.mlp.all_all.generate("256 inputs 128 tanh 10 log_softmax")
-- creates an instance of a trainer object for previous ANN component,
-- using the multi-class cross-entropy loss function (for 10 output units),
-- and using a bunch_size of 32. Loss function and bunch_size are optional.
trainer = trainable.supervised_trainer(thenet,
ann.loss.multi_class_cross_entropy(10),
32,
-- this last parameter is optional, by default is
-- SGD => Stochastig Gradient Descent
ann.optimizer.sgd())
-- builds the component contained into trainer object
trainer:build()
-- initializes the weights randomly, using fan-in and fan-out
trainer:randomize_weights{
random = random(1234),
inf = -0.1,
sup = 0.1,
use_fanin = true,
use_fanout = true,
}
As said before, each component has a unique name, and if needed a weights name. The next code iterates over all components:
> for name,c in trainer:iterate_components() do print(name,c) end
actf1 instance 0x7fc3e94850a0 of ann.components.base
actf2 instance 0x7fc3e9485550 of ann.components.base
b1 instance 0x7fc3e9484f80 of ann.components.base
b2 instance 0x7fc3e9485410 of ann.components.base
c1 instance 0x7fc3e9484a10 of ann.components.base
layer1 instance 0x7fc3e9484e80 of ann.components.base
layer2 instance 0x7fc3e9485310 of ann.components.base
w1 instance 0x7fc3e9484ee0 of ann.components.base
w2 instance 0x7fc3e9485370 of ann.components.base
The MLP is composed by 9 components, two activation functions (actf1 and actf2), two bias components (b1 and b2), one stack component which works as a container (c1), two hyperplane components containing one bias and one dot_product each one (layer1 and layer2), and finally two dot_product components (w1 and w2) which contains weight matrixes.
It is also possible to iterate over all weigths names:
> for name,connections in trainer:iterate_weights() do print(name,type(connections)) end
b1 matrix
b2 matrix
w1 matrix
w2 matrix
So, our MLP contains two bias vectors (b1 and b2, corresponding with b1 and b2 components), and two weights matrixes (w1 and w2, corresponding with w1 and w2 components). All MLPs generated automatically assign this names to its components and weights.
One time the component is build by using a trainer instance, the trainer exposes
two interesting methods trainer:component(COMPONENT_NAME_STRING)
which returns
the component given its name, and trainer:weights(WEIGTHS_NAME_STRING)
which
returns the connection weigths object given its weigths_name attribute.
More info about trainable.supervised_trainer
doing:
april_help(trainable.supervised_trainer)
The loss function is used to train the ANNs via gradient descent algorithm. Trainer objects needs an instance of a loss function to perform training, being a very useful abstraction of standard training procedures.
Detailed information about loss functions is in:
april_help(ann.loss)
The loss function could be set at trainer constructor, or using the method set_loss_function:
trainer:set_loss_function(ann.loss.mse())
Three main error functions are implemented: mean square error (MSE), two class
cross-entropy, and multi-class cross-entropy. Note that cross-entropy like
functions are specialized for log_logistic or log_softmax output activation
functions. Almost all the constructors accepts a SIZE=0
parameter, which
means that the layer has a dynamic size.:
-
ann.loss.mse(SIZE)
returns an instance of the Mean Squared Error error function for SIZE neurons. It is a quadratic loss function. -
ann.loss.mae(SIZE)
returns an instance of the Mean Absolute Error function, for SIZE neurons. It is not a quadratic loss function. -
ann.loss.cross_entropy(SIZE)
returns an instance of the two-class cross-entropy. It only works withlog_logistic
output activation function. It is based on Kullback-Leibler divergence. -
ann.loss.multi_class_cross_entropy(SIZE)
returns an instance of the multi-class cross-entropy. The parameter must beSIZE>2
, so for two-class problems only one output unit with cross-entropy is needed. It only works withlog_logistic
orlog_softmax
output activation function (its better to uselog_softmax
). It is based on Kullback-Leibler divergence.
The optimizer is an object which implements the learning algorithm. Every class
in ann.optimizer
is an optimizer. Several learning hyperparameters are available,
depending in the selected optimizer. This learning hyperparameters are known as
options, and could be set globally (to all the connection weight layers of
the ANN), or layerwise (to a concrete connection weights object, identified
by its name). Optimizers implement the following API:
-
other = optimizer:clone()
: returns a deep copy of the caller object. -
value = optimizer:get_option(name)
: return the global value of a given learning option name. -
optimizer:set_option(name, value)
: sets the global value of a given learning option name. -
optimizer:set_layerwise_option(layer_name, option_name, value)
: sets a layerwise option. -
value = optimizer:get_layerwise_option(layer_name, option_name)
: returns the layerwise option of the given. -
value = optimizer:get_option_of(layer_name, option_name)
: returns the option which is applicable to the givenlayer_name
. If a layerwise option was previously defined, the method returns its value. Otherwise, the value of the global option will be returned.
Different optimizer
objects are implemented. They train the neural network
following different algorithms which rely in the computation of gradients done
by ANN components. Them incorporate regularization and momentum
hyperparameters. They options are algorithm dependentendt. In case of
Stochastic Gradient Descent, the options are:
-
learning_rate
: the learning rate controls the portion of the gradient used to update the weights. This value is smoothed depending in thebunch_size
and in the numberK
of times that a weight connections object is shared between different components. The smoothing value:learning_rate/sqrt(bunch_size+K)
-
momentum
: is a inertial hyperparameter which applies a portion of the weight update in the previous iteration. -
weight_decay
: a L2 regularization term. -
L1_norm
: a L1 regularization term. -
max_norm_penalty
: a constrain penalty based on the two-norm of the weights.
The algorithm uses the following learning rule:
w = (1 - weight_decay)*w' + momentum*(w' - w'') + lr'*grad(L)/grad(w')
where w
, w'
and w''
are the weight values at next, current, and previous
iterations; lr'
is the learning_rate
smoothed by the sqrt
, and
grad(L)/grad(w')
is the loss function gradient at the given weight.
The hyperparemters of optimizer
objects can be modified by the trainer
object:
-
trainer:set_option(name,value)
: sets a global learning option value. -
value=trainer:get_option(name)
: gets a global learning option value. -
trainer:set_layerwise_option(layer_name_match,option_name,value)
: sets a layerwise learning option value of all the connection weight objects whose name matches the givenlayer_name_match
Lua pattern string. -
value=trainer:get_option_of(layer_name,option_name)
: gets the option value applicable to the given layer.
trainer:build()
trainer:set_option("learning_rate", number)
trainer:set_option("momentum", number)
-- regularization is recommended to not be applied at bias connections
trainer:set_layerwise_option("w.*", "weight_decay", number)
trainer:set_layerwise_option("w.*", "max_norm_penalty", number)
-- for dropout (see dropout http://www.cs.toronto.edu/~nitish/msc_thesis.pdf)
-- dropout is a very especial option, it modifies training, but also modifies
-- validation (or test) phase. Also it must be applied carefully to not apply
-- dropout at the output of your model. Dropout is applied as another component
-- which acts as a stochastic filter.
See the documentation for trainable
package.
See the documentation for trainable
package.
ANNs are implemented as a composition of components which implements define the
three main operations of an ANN: forward step (compute outputs), backprop
step (neuron gradient computation), and gradient computation step (weight gradients).
All components are child classes of ann.components.base
.
See april_help(ann.components.base)
for on-line documentation.
Two main remarks before continue following sections. The components has two special properties:
- name: is a string which identifies the component in a unique manner, is forbidden that two components sharing the same name.
- weights_name: is a string which identifies the connections (weights or biases) of the component. This name could be share by different components, which means that they share the same connections object.
The components are integrated in Lua via the abstract class token
, which has
two specializations for ANNs:
-
tokens.matrix
is a token which contains amatrix
instance. -
tokens.sparse_matrix
is a token which contains amatrix.sparse
instance.
In any case, ANN components wrap the given matrix
objects into a token
,
and unwrap matrix
objects when returning a token
. So, in practice, you can
ignore the token/matrix association.
NOTE that ANN components work with dense matrix or with csr
sparse
matrices.
All components has defined the following basic properties, which are tokens: input, output, error_input, and error_output. Four are the basic methods to train the components:
-
component,table,table = build()
: this method reserves memory for weights and prepares the component to work with. -
reset(iteration)
: it releases all the tokens internally allocated (or given by Lua), and receives the current iteration number. This iteration is not related with the training loop or epoch, it is related tooptimizer
objects which implement line search or similar (Conjugate Gradient or RProp). -
token=forward(token[, boolean])
: it receives an input token and returns the output token. For simplicity, it is possible to give amatrix
instead of atoken
, and the method will wrap automatically the givenmatrix
. In any case, the returned value is atoken
. -
token=backprop(token)
: it receives an error input token (gradient), and returns the output error token (gradient). For simplicity, it is possible to give amatrix
instead of atoken
, and the method will wrap automatically the givenmatrix
. In any case, the returned value is atoken
. -
gradients=compute_gradients( [gradients] )
: compute the weight gradients, by using the data stored at the components (input/output tokens, input/output error tokens), given and produced duringforward
andbackprop
methods. Additionally, it receives a table ofmatrix
with previously computed gradients, which will be used to store the data avoiding the allocation of new memory. The method returns a table ofmatrix
with the gradients computed for each connection weights object.
Combining this methods with loss functions a component could be trained following this basic example. A linear component is trained to follow OR function, for input=[0,1] and target output=[1]. By default the weights are not initialized, so they contains memory trash.
> o = ann.optimizer.gsd() -- the optimizer
> l = ann.loss.mse(1) -- MSE loss function
> -- an hyperplane component (explained later)
> c = ann.components.hyperplane{ input=2, output=1 }
> c:build() -- allocates memory for weights, and checks components integrity
> l:reset() -- set to zero all the things
> c:reset() -- set to zero all the things
> o:execute(function()
-- the true indicates training
output_token=c:forward(matrix(1,2,{0,1}), true)
-- gradient with desired output 1
output_error=c:backprop(l:gradient(output_token,
matrix(1,1,{1})))
grad = c:compute_gradients(grad)
return l:compute_loss(output_token,
matrix(1,1,{1}),
grad
end, c:copy_weights())
> output_token=c:forward(matrix(1,2,{0,1}))
> print(output_token) -- the output is closer to 1
0.2
# Matrix of size [1,1] [0xb01ce0 data= 0xad97d0]
Note that all matrices must had at least two dimensions. All computations are done in bunch mode (using mini-batches) and the first dimension size is the number of patterns contained by the bunch. The rest of dimensions must complain the input constrains of the component. A lot of components work with linear inputs, so the input matrix will be bi-dimensional, but some components work with multidimensional matrices. It is possible to use matrices of only one dimension and they will be reinterpreted as two dimensional matrices with only one row, but better if you work always with two-dimensional matrices.
Before doing anything, components could be composed together to build larger components.
This procedure needs to call build
method at the end, to check the input/output sizes
and reserve memory for weights and biases.
The c:build()
call executes recursively the build method of all the components composition.
This method returns two tables:
> caller_component, weights_dict, components_table = c:build()
The caller_component
is the component c
in this case.
The weights_dict
is a table of matrices, which indexes name (weight
name) strings with weight matrices.
The components_table
is a Lua table indexed by each name (component name) and
contains a reference to the component instance, which is useful to initialize
hyper-parameter and other stuff in a component-wise manner.
-
number = c:get_input_size()
: returns the size of the input for the caller component. In case of unknown input size, a zero will be returned. -
number = c:get_output_size()
: returns the size of the output for the caller component. In case of unknown output size, a zero will be returned. -
table = c:precompute_output_size( [table] )
: allows to compute the output size shape, given an input shape. It is useful to be combined with convolutional ANNs, in order to ask for the output shape size of the convolution. The giventable
must complains the expected input shape of the component (normally is one dimension, but with CNNs it could be multi-dimensional). The returnedtable
will contain as many dimensions as the produced by the caller component (idem as for input).
-
token = c:forward( token [, boolean] )
receives a token and an optional boolean (by default false). The boolean indicates if this forward is during training or not, because some components has an special behavior during training. It returns a token with the output computation of the caller component. For simplicity, it is possible to give amatrix
instead of atoken
, and the method will wrap automatically the givenmatrix
. In any case, the returned value is atoken
. -
token = c:backprop( token )
receives a token with the input error (gradient of each output neuron), and returns another token with the output error (gradient of each input neuron). For simplicity, it is possible to give amatrix
instead of atoken
, and the method will wrap automatically the givenmatrix
. In any case, the returned value is atoken
. -
gradients = c:compute_gradients( gradients )
returns the weight gradients computed using the tokens given atforward
andbackprop
methods. -
c:reset()
releases the retained tokens inforward
andbackprop
steps.
During forward
and backprop
steps the components compute outputs and error
outputs (gradients), and retain the input and error input (gradients)
tokens. Before call reset
method, you could ask the component for its retained
tokens:
-
token = c:get_input()
returns the token given as input atforward
method. -
token = c:get_output()
returns the token computed as output byforward
method. -
token = c:get_error_input()
retruns the token given as error input atbackprop
method. -
token = c:get_error_output()
returns the token computed as error output bybackprop
method.
Components which require weights has internally a matrix
instance. This object
is allocated calling the build method of the components (or using the build
method of a trainer), and is identified by the weigths_name property, so
components with the same weigths_name share the same connections object.
This matrices are defined with OUTPUTxINPUT size (output rows, input columns), so:
-
Bias vectors: has INPUT=1 and OUTPUT=number of neurons, and they are a column vector.
-
Weight matrices: contain OUTPUTSxINPUTS weights.
The weights matrices has this format:
w(i1,o1) w(i2,o1) w(i3,o1) ...
w(i1,o2) w(i2,o2) w(i3,o2) ...
... ... ...
where w(a,b)
is the weight which connects input a
with output b
. Be sure
that your matrices has this format.
The ANN models are modular components which can be sorted in several ways to produce different topologies.
ann.components.base{ size=0, [name=STRING] }
The class ann.components.base
is the base of all ANN components. It is
possible to instance an object of this class, and it performs identity function.
The constructor receives optionally the name
of the component. The constructor
receives two optional arguments, the size=0
, by default it allows any input
size, and the name
of the component.
> c1 = ann.components.base{ name="base1" }
> c2 = ann.components.base()
> input = matrix(10,10):uniformf(0,1,random(237))
> output = c2:forward(input)
> = output:equals(input)
true
ann.components.bias{ size=NUMBER, [name=STRING], [weights=STRING] }
The class ann.components.bias
implements an additive bias of a given size. The
bias is added iteratively to all the patterns in the bunch (mini-batch). The
constructor receives two fields:
-
name
of the component, an optional field. -
weights
name of the component, an optional field. -
size
the size of the bias vector.
This components contains a vector of SIZEx1
, which is added transposed to all
the input patterns (first dimension of the bunch).
> b1 = ann.components.bias{ name='b1', weights='b1', size=5 }
> _,weights = b1:build()
> weights('b1'):linspace()
> = weights('b1')
1
2
3
4
5
# Matrix of size [5,1] [0x162eb00 data= 0x16b0260]
> input = matrix(4,5):linspace()
> = input
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
# Matrix of size [4,5] [0x185a3d0 data= 0x17e18d0]
> output = b1:forward(input)
> = output
2 4 6 8 10
7 9 11 13 15
12 14 16 18 20
17 19 21 23 25
# Matrix of size [4,5] [0x185b370 data= 0x1718450]
> -- the bias component executes the following operation
> for i=1,input:dim(1) do input(i,':'):axpy(1.0, weights('b1'):transpose()) end
> = input
2 4 6 8 10
7 9 11 13 15
12 14 16 18 20
17 19 21 23 25
# Matrix of size [4,5] [0x185a3d0 data= 0x17e18d0]
ann.components.dot_product{ ... }
The class ann.components.dot_product
implements the dot product between a
weights vector of every neuron and the given input vector, which is a
vector-matrix product. If the input is a matrix with a bunch of patterns, the
component executes a matrix-matrix product. The component contains a weights
matrix
with size OxI
, where O
is the number of neurons (output size),
and I
is the number of inputs (input size). The constructor receives:
-
name
is a string with the component name, optional. -
weights
is a string with the weights name, optional. -
input
is a number with the input size. -
output
is the number of neurons. -
transpose=false
is a boolean indicating if the weightsmatrix
is transposed. It is optional, by default it istranspose=false
.
> c = ann.components.dot_product{ weights='w1', input=4, output=5 }
> _,weights = c:build()
> weights('w1'):linspace()
> = weights('w1')
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
# Matrix of size [5,4] [0x186e620 data= 0x182b050]
> input = matrix(3,4):linspace()
> = input
1 2 3 4
5 6 7 8
9 10 11 12
# Matrix of size [3,4] [0x168f420 data= 0x1835190]
> output = c:forward(input)
> = output
30 70 110 150 190
70 174 278 382 486
110 278 446 614 782
# Matrix of size [3,5] [0x185ee70 data= 0x18655c0]
> -- the performed operation is
> = input * weights('w1'):transpose()
30 70 110 150 190
70 174 278 382 486
110 278 446 614 782
# Matrix of size [3,5] [0x1869f50 data= 0x1645e60]
In case of very sparse inputs, it is possible to replace the input matrix
by a
tokens.sparse_matrix
, allowing to improve the efficiency of the operation.
Transformation of matrices into tokens and tokens into matrix is automatically
performed.
> -- a matrix with two rows:
> -- first row: active components are the 3 with 1, and the 2 with 0.5
> -- second row: active components are the 1 with 0.3
> dense_input = matrix(2,4):zeros():set(1,3,1):set(1,2,0.5):set(2,1,0.3)
> sparse_input = matrix.sparse( dense_input )
> = sparse_input
0 0.5 1 0
0.3 0 0 0
# SparseMatrix of size [2,4] in csr [0x17deaa0 data= 0x17864b0 0x17c9540 0x167cdb0], 3 non-zeros
> output = c:forward(input)
> = output
4 10 16 22 28
0.3 1.5 2.7 3.9 5.1
# Matrix of size [2,5] [0x18612d0 data= 0x17fb8a0]
> -- which is equivalent to the following
> output = c:forward(dense_input)
> = output
4 10 16 22 28
0.3 1.5 2.7 3.9 5.1
# Matrix of size [2,5] [0x185ee70 data= 0x1636e60]
ann.components.hyperplane{ ... }
The class ann.components.hyperplane
is a wrapper around a bias
and a
dot_product
components, implementing an hyperplane separator. The constructor
receives:
-
name
an optional string with the component name. -
dot_product
an optional string with thedot_product
component name. -
bias
an optional string with thebias
component name. -
dot_product_weights
an optional string with thedot_product
component weights name. -
bias_weights
an optional string with thebias
component weights name. -
input
a number with the input size. -
output
a number with the input size. -
transpose=false
a boolean indicating if thedot_product
weights will be transposed in the operation.
> c = ann.components.hyperplane{ dot_product_weights='w1', bias_weights='b1',
input=128, output=256 }
> _,weights = c:build()
> for name,w in pairs(weights) do print(name) print(w) end
w1
Large matrix, not printed to display
# Matrix of size [256,128] [0x185ee70 data= 0x16ae840]
b1
Large matrix, not printed to display
# Matrix of size [256,1] [0x1869120 data= 0x165d540]
ann.components.actf.logistic()
ann.components.actf.log_logistic()
ann.components.actf.softmax()
ann.components.actf.log_softmax()
ann.components.actf.tanh()
ann.components.actf.hardtanh()
ann.components.actf.relu()
ann.components.actf.softplus()
ann.components.actf.sin()
ann.components.stack()
> ann.components.reset_id_counters() -- reset ID name generator
> mlp = ann.components.stack()
> mlp:push( ann.components.hyperplane{ input=100, output=200 } )
> mlp:push( ann.components.actf.logistic() )
> mlp:push( ann.components.hyperplane{ input=200, output=40 } )
> mlp:push( ann.components.actf.log_softmax() )
> _,weights = mlp:build()
> for name,w in pairs(weights) do print(name) print(w) end
w0
Large matrix, not printed to display
# Matrix of size [200,100] [0x1863df0 data= 0x1668030]
w2
Large matrix, not printed to display
# Matrix of size [40,200] [0x186bfd0 data= 0x17c71b0]
b1
Large matrix, not printed to display
# Matrix of size [200,1] [0x186aee0 data= 0x18159f0]
b3
Large matrix, not printed to display
# Matrix of size [40,1] [0x186d6d0 data= 0x175c910]
ann.components.join()
ann.components.dropout()
> c = ann.components.dropout{ random=random(3284), prob=0.5, value=0.0 }
ann.components.select()
ann.components.slice()
ann.components.gaussian_noise{ random, prob, var, mean }
ann.components.salt_and_pepper{ random, prob, zero, one }
This components are used to build Convolutional Neural Networks. If you use
dataset.matrix
, your patterns will be flattened at converted into a one
dimensional matrix. This forces to add a rewrap
components at the beginning of
your ANN. Follows an example of a FULL CNN for MNIST task (28x28 pixels, images
of digits):
-- tables for the CNN configuration
ishape = {1, 28, 28} -- for input matrix rewrapping
conv1 = {1, 5, 5} nconv1=20
maxp1 = {1, 2, 2}
conv2 = {nconv1, 5, 5,} nconv2=50
maxp2 = {1, 2, 2}
hidden = 500
thenet = ann.components.stack():
push( ann.components.rewrap{ size=ishape } ):
push( ann.components.convolution{ kernel=conv1, n=nconv1 } ):
push( ann.components.convolution_bias{ n=nconv1, ndims=#conv1 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp1,} ):
push( ann.components.convolution{ kernel=conv2, n=nconv2 } ):
push( ann.components.convolution_bias{ n=nconv2, ndims=#conv2 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp2 } ):
push( ann.components.flatten() )
-- using the method precompute_output_size, it is possible to know
-- the size of the convolution after the flatten operation
local conv_size = thenet:precompute_output_size()[1]
thenet:
push( ann.components.hyperplane{ input=conv_size, output=hidden } ):
push( ann.components.actf.tanh() ):
push( ann.components.hyperplane{ input=hidden, output= 10 } ):
push( ann.components.actf.log_softmax() )
ann.components.convolution{ kernel, step, n, name, weights, ... }
A convolutional component could be created as:
> c = ann.components.convolution{ kernel={3, 5, 5}, step={1, 1, 1}, n=10,
name="conv-W1", weights="W1",
input_planes_dim=1 }
This component executes a convolution using the given kernel
sizes, moving the convolution window
following step
table, and using n
different kernels. This module has a dynamic input/output size,
the convolution is performed over all the input following the indicated parameters.
-
input_planes_dim
is a number (optional, by default is 1) which indicates the dimension K at input matrix where are located the input planes. -
kernel
is a table which describes the size of each kernel. The K element of this table is always the number of PLANES at the input matrix. Therefore, a kernel over a 1-dim signal will be likekernel={1, 5}
being K=1. For a 2D image will bekernel={1, 5, 5}
, for a 2D image with RGB color will bekernel={3, 5, 5}
if K=1, otherwise it could bekernel={5, 3, 5}
if K=2 orkernel={5, 5, 3}
if K=3. For a RGB video sequence the kernel will bekernel={3, 5, 5, 5}
for K=1, and so on. -
step
is a table which indicates how to move the kernel. The number of steps at each dimension will be(input_dim[i] - kernel[i])/step[i] + 1
. The K element of this table is forced to be1
, so that is the number of planes at input matrix. Thestep
is optional, by default has all its elements assigned to1
. -
n
is the number of kernels to be applied. It is the number of output planes produced by this component (number of neurons). -
name
andweights
are the strings with for search components and connection objects.
The output produced by this component will be of:
-
output_size[1]=
n
-
output_size[i+1]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=1,...,
input_planes_dim
-1 -
output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=
input_planes_dim
+1,...,#kernel
By default, input_planes_dim=1
, so the output size will be simplified as:
-
output_size[1]=
n
-
output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=2,...,
#kernel
ann.components.convolution_bias{ n, ndims, name, weights }
> c = ann.components.convolution_bias{ n=10, ndims=3,
name="conv-B1", weights="B1" }
-
n
is the number of planes at the input (the first dimension size of the input matrix). -
ndims
is the number of dimensions expected at the input matrix. -
name
andweights
as usual
ann.components.max_pooling{ kernel, name }
> c = ann.components.max_pooling{ kernel={1, 2, 2}, name="pool-2" }
-
kernel
is a table with the sizes of the kernel applied to the input matrix. Depending on this the behavior of the max-pooling could be to do a down-sampling of an input matrix (as in the example), or to convert the input in a fixed size feature vector (kernel = {1, 0, 0}
). The0
value at one component means to fit this dimension with the same dimension of input matrix. So, the last example{1, 0, 0}
will be a max-pooling computed over all positions for each input plane, producing as output a feature vector of INPUT PLANES size. -
name
as usual.
ann.components.flatten{ [name] }
This components converts an input matrix formed by N patterns of any dimensionality to an output bidimensional matrix with N rows and M columns, where M is the product of all input matrix dimensions (except the first one which is the number of patterns).
> c = ann.components.flatten{ name="flatten" }
ann.components.copy