Skip to content
Paco Zamora Martinez edited this page Oct 15, 2013 · 23 revisions

Several packages contain neural networks stuff: require("aprilann.ann"), require("aprilann.ann.loss"), require("aprilann.ann.optimizer"), require("aprilann.trainable").

This page describe the utilities to build and train ANNs. Four main sections are written: a desciprion of ANN concepts in April-ANN, the easy building procedure for MLPs, the training helpers, and finally the full description of the aprilann.ann package.

ANN components

Inspired by other toolkits (as Torch 7 or pyBrain), ANNs are described as a composition of blocks call ANN components, so one component is a neural network itself. A list of all available components appears executing:

april_help("ann.components")

Nevertheless, the composition procedure will be explained later. An ANN component is identified by a name string (which will be automatically generated if not given). The name must be unique. Some components contains weights in their core, which are estimated by gradient descent algorithm (backpropagation). Connection weights objects are identified by a weights name parameter, which could be reused. If two components have the same weights name, then they share the same connections object.

All components have an input and output size, which defines the number of weights (if needed) and the fan-in/fan-out of the component. Components need to be build (build method) once they are constructed. Build procedure allocates memory for connections and checks input/output sizes of components.

More accurate description is available at april_help, but don't be affraid, the next section presents an abstraction for train MLPs which automatically does a lot of this work:

april_help("ann.components.base")
april_help("ann.components.base.build")

The easy way: all-all MLP

The simpliest kind of ANN is a Multilayer Perceptron (MLP) where each layer is fully connected with the next layer (feed-forward, all-all connections).

Building the MLP: ann.mlp.all_all.generate

The method generate returns an special component object, which cannot be modified. Actually, it is a Lua table formed by an ann.components.stack instance and other information useful to load and save the MLPs, and it implements wrapper Lua functions to ANN component methods.

-- creates an ANN component for a MLP with the given description
thenet = ann.mlp.all_all.generate("256 inputs 128 tanh 10 log_softmax")

-- creates an instance of a trainer object for previous ANN component,
-- using the multi-class cross-entropy loss function (for 10 output units),
-- and using a bunch_size of 32. Loss function and bunch_size are optional.
trainer = trainable.supervised_trainer(thenet,
				       ann.loss.multi_class_cross_entropy(10),
				       32,
					   -- this last parameter is optional, by default is
					   -- SGD => Stochastig Gradient Descent
					   ann.optimizer.sgd())

-- builds the component contained into trainer object
trainer:build()

-- initializes the weights randomly, using fan-in and fan-out
trainer:randomize_weights{
  random      = random(1234),
  inf         = -0.1,
  sup         =  0.1,
  use_fanin   = true,
  use_fanout  = true,
}

As said before, each component has a unique name, and if needed a weights name. The next code iterates over all components:

> for name,c in trainer:iterate_components() do print(name,c) end
actf1	instance 0x7fc3e94850a0 of ann.components.base
actf2	instance 0x7fc3e9485550 of ann.components.base
b1	instance 0x7fc3e9484f80 of ann.components.base
b2	instance 0x7fc3e9485410 of ann.components.base
c1	instance 0x7fc3e9484a10 of ann.components.base
layer1	instance 0x7fc3e9484e80 of ann.components.base
layer2	instance 0x7fc3e9485310 of ann.components.base
w1	instance 0x7fc3e9484ee0 of ann.components.base
w2	instance 0x7fc3e9485370 of ann.components.base

The MLP is composed by 9 components, two activation functions (actf1 and actf2), two bias components (b1 and b2), one stack component which works as a container (c1), two hyperplane components containing one bias and one dot_product each one (layer1 and layer2), and finally two dot_product components (w1 and w2) which contains weight matrixes.

It is also possible to iterate over all weigths names:

> for name,connections in trainer:iterate_weights() do print(name,connections) end
b1	instance 0x7f8563c11630 of ann.connections
b2	instance 0x7f8563c120c0 of ann.connections
w1	instance 0x7f8563c11500 of ann.connections
w2	instance 0x7f8563c111a0 of ann.connections

So, our MLP contains two bias vectors (b1 and b2, corresponding with b1 and b2 components), and two weights matrixes (w1 and w2, corresponding with w1 and w2 components). All MLPs generated automatically assign this names to its components and weights.

One time the component is build by using a trainer instance, the trainer exposes two interesting methods trainer:component(COMPONENT_NAME_STRING) which returns the component given its name, and trainer:weights(WEIGTHS_NAME_STRING) which returns the connection weigths object given its weigths_name attribute.

More info about trainable.supervised_trainer doing:

april_help("trainable.supervised_trainer")

Load and save

Two save/load schemes are implemented for all-all MLPs. The first is related to the component all-all (generated throught function ann.mlp.all_all.generate). The second is related to the trainable.supervised_trainer object, and will be detailed in following sections.

All-All component save and load: ann.mlp.all_all.load and ann.mlp.all_all.save

This two functions can store and load from a file the component generated via ann.mlp.all_all.generate function. It only works with this kind of object. The save function has the precondition of a build component. The load function loads the weights and returns a built component.

-- saves weights using binary option and also keep weights
-- of previous iteration (for momentum term)
ann.mlp.all_all.save(thenet, "net_filename.net", "binary")
-- saves weights using ascii option
ann.mlp.all_all.save(thenet, "net_filename.net", "ascii")

-- loads weights from a filename, and returns a built component
thenet = ann.mlp.all_all.load("net_filename.net")

-- in any case, it is possible to instantiate a trainer, with MSE loss function
-- asking the component for the number of output units, and with 32 bunch_size
-- parameter
trainer = trainable.supervised_trainer(thenet,
                                       ann.loss.mse(thenet:get_output_size()),
				       32)

Save and load via trainable.supervised_trainer

Save and load via trainable writes to disk the model, weights, loss function, and bunch size (note that this list could be larger in the future). The object must be at build state before save, and load returns a built trainable object:

thenet  = any ann component (even an instance of ann.mlp.all_all)
trainer = trainable.supervised_trainer(thenet, loss_function, bunch_size)
trainer:build()

-- save method
trainer:save("net_filename.net", "binary")

-- load method, loss function, bunch_size and optimizer could be overwritten
-- optionally. If not given the load method uses which objects saved at the
-- file.
trainer = trainable.supervised_trainer.load("net_filename.net")

Loss functions: ann.loss

The loss function is used to train the ANNs via gradient descent algorithm. Trainer objects needs an instance of a loss function to perform training, being a very useful abstraction of standard training procedures.

Detailed information about loss functions is in:

april_help("ann.loss")

The loss function could be set at trainer constructor, or using the method set_loss_function:

trainer:set_loss_function(ann.loss.mse())

Three main error functions are implemented: mean square error (MSE), two class cross-entropy, and multi-class cross-entropy. Note that cross-entropy like functions are specialized for log_logistic or log_softmax output activation functions. Almost all the constructors accepts a SIZE=0 parameter, which means that the layer has a dynamic size.:

  • ann.loss.mse(SIZE) returns an instance of the Mean Squared Error error function for SIZE neurons. It is a quadratic loss function.

  • ann.loss.mae(SIZE) returns an instance of the Mean Absolute Error function, for SIZE neurons. It is not a quadratic loss function.

  • ann.loss.cross_entropy(SIZE) returns an instance of the two-class cross-entropy. It only works with log_logistic output activation function. It is based on Kullback-Leibler divergence.

  • ann.loss.multi_class_cross_entropy(SIZE) returns an instance of the multi-class cross-entropy. The parameter must be SIZE>2, so for two-class problems only one output unit with cross-entropy is needed. It only works with log_logistic or log_softmax output activation function (its better to use log_softmax). It is based on Kullback-Leibler divergence.

ann.optimizer

The optimizer is an object which implements the learning algorithm. Every class in ann.optimizer is an optimizer. Several learning hyperparameters are available, depending in the selected optimizer. This learning hyperparameters are known as options, and could be set globally (to all the connection weight layers of the ANN), or layerwise (to a concrete connection weights object, identified by its name). Optimizers implement the following API:

  • other = optimizer:clone(): returns a deep copy of the caller object.

  • value = optimizer:get_option(name): return the global value of a given learning option name.

  • optimizer:set_option(name, value): sets the global value of a given learning option name.

  • optimizer:set_layerwise_option(layer_name, option_name, value): sets a layerwise option.

  • value = optimizer:get_layerwise_option(layer_name, option_name): returns the layerwise option of the given.

  • value = optimizer:get_option_of(layer_name, option_name): returns the option which is applicable to the given layer_name. If a layerwise option was previously defined, the method returns its value. Otherwise, the value of the global option will be returned.

ann.optimizer.sgd

Currently only one optimizer is implemented. It trains the neural network following the Stochastic Grandient Descent algorithm. It incorporates regularization and momentum hyperparameters. Its options are:

  • learning_rate: the learning rate controls the portion of the gradient used to update the weights. This value is smoothed depending in the bunch_size and in the number K of times that a weight connections object is shared between different components. The smoothing value: learning_rate/sqrt(bunch_size+K)

  • momentum: is a inertial hyperparameter which applies a portion of the weight update in the previous iteration.

  • weight_decay: a L2 regularization term.

  • max_norm_penalty: a constrain penalty based on the two-norm of the weights.

The algorithm uses the following learning rule:

w = (1 - weight_decay)*w' + momentum*(w' - w'') + lr'*grad(L)/grad(w')

where w, w' and w'' are the weight values at next, current, and previous iterations; lr' is the learning_rate smoothed by the sqrt, and grad(L)/grad(w') is the loss function gradient at the given weight.

Trainer set and get of hyperparameters###

The hyperparemters of optimizer objects can be modified by the trainer object:

  • trainer:set_option(name,value): sets a global learning option value.

  • value=trainer:get_option(name): gets a global learning option value.

  • trainer:set_layerwise_option(layer_name_match,option_name,value): sets a layerwise learning option value of all the connection weight objects whose name matches the given layer_name_match Lua pattern string.

  • value=trainer:get_option_of(layer_name,option_name): gets the option value applicable to the given layer.

Additionally, some ANN components has some internal parameters which are configurable via trainer objects:

  • trainer:set_component_option(component_name_match,option_name,value): sets the option of a given component_name_match Lua pattern string.
trainer:build()
trainer:set_option("learning_rate", number)
trainer:set_option("momentum", number)
trainer:set_option("weight_decay", number)
trainer:set_option("max_norm_penalty", number)
-- regularization is recommended to not be applied at bias connections
trainer:set_layerwise_option("b.*", "weight_decay", 0.0)
trainer:set_layerwise_option("b.*", "max_norm_penalty", -1.0)

-- for dropout (see dropout http://www.cs.toronto.edu/~nitish/msc_thesis.pdf)

-- dropout is a very especial option, it modifies training, but also modifies
-- validation (or test) phase. Also it must be applied carefully to not apply
-- dropout at the output of your model. Dropout is applied to
-- activation_function_components. A function like this will help to not apply
-- it to the output activation function:
trainer:set_component_option("actf.*", "dropout_seed",  number)
trainer:set_component_option("actf.*", "dropout_factor, 0.5)
trainer:set_component_option(last_actf_name, "dropout_factor, 0.0)

Supervised trainer description

Training facilities and algorithms

NOTE that this functions receive a trainer prepared to train, so you must properly setup it using set_option functions.

The trainable.supervised_trainer object implements a lot of methods to train ANNs automatically. See april_help("trainable.supervised_trainer") for more details.

Two training methods are implemented:

  • train_wo_validation: Trains an ANN without validation, for a minimum number of epochs and until the improvement in training was less than a given value. It receives a table and returns the BEST ann found during training:
best = trainer:train_wo_validation{
  min_epochs =   10,
  max_epochs = 1000,
  training_table = { input_dataset  = train_input_dataset,
                     output_dataset = train_output_dataset },
  percentage_stopping_criterion = 0.01, -- 1%
  update_function = function(t)
     -- table t = { current_epoch, train_error, train_improvement, train_params }
     printf("%d %f (%f) max epochs: %d\n",
            t.current_epoch, t.train_error, 
            t.train_improvement, train_params.max_epochs)
     -- t.train_params is the table that you use to execute train_wo_validation
     -- function
  end,
}
-- best is an instance of trainable.supervised_trainer
  • ann.train_holdout_validation: Trains an ANN object using a training partition and a validation partition. Training is performed during a minimum number of epochs until certain stopping criterion is accomplished over the validation partition. It recieves a table and returns another table:
training_data = {
  input_dataset  = training_input_dataset,
  output_dataset = training_output_dataset,
  shuffle        = random(SEED), -- SEED is a number
  replacement    = nil,          -- if needed
}
validation_data = {
  input_dataset  = validation_input_dataset,
  output_dataset = validation_output_dataset
}
result = trainer:train_holdout_validation{
  training_table     = training_data,
  validation_table   = validation_data,
  min_epochs         = 4,
  max_epochs         = 1000,
  stopping_criterion = FUNCTION EXPLAINED BELOW,
  update_function    =
  function(t) printf("%4d %.6f %.6f (%4d %.6f) max epochs: %d\n",
                     t.current_epoch,
                     t.train_error,
                     t.validation_error,
                     t.best_epoch,
                     t.best_val_error,
                     t.train_params.max_epochs)
     -- t.train_params is the table that you use to execute train_crossvalidation
     -- function
  end,
  validation_function = function(thenet, val_table)
                          -- by default is this. IT IS AN OPTIONAL FUNCTION
                          return thenet:validate_dataset(val_table)
                        end
}
print(result.best, result.best_val_error, result.best_epoch,
      result.last_train_error, result.last_val_error, result.last_epoch)
-- result.best is an instance of trainable.supervised_trainer

Custom training and validation functions

Previous methods allow the definition of custom training and validation functions. The parameters table allows the definition of fields:

  • training_function(trainer,training_table): it is a Lua function which receives as parameter the trainer and the field training_table. Nevertheless, you could simply ignore the function parameters implementing a closure which uses your own training data (it is recomended to not ignore the trainer parameter).
  • validation_function(trainer, validation_table): it is a Lua function which receives as parameter the trainer and the field validation_table. As before, you coudl implement a closure and use your own validation data, but it is better to not ignore the trainer parameter.

By default, the training and validation functions are trainable.supervised_trainer.train_dataset and trainable.supervised_trainer.validate_dataset respectively. The next example show how to develop sequential trainining and validation functions over datasets.

training_function = function(trainer, tr_table)
  -- ANNs work over dataset.token, we need a wrapper to convert dataset.matrix
  -- into dataset.token
  local input_dataset  = dataset.token.wrapper(tr_table.input_dataset)
  local output_dataset = dataset.token.wrapper(tr_table.output_dataset)
  local bunch_size     = tr_table.bunch_size or trainer.bunch_size or 32
  local nump           = input_dataset:numPatterns()
  self.loss_function:reset()
  for i=1,input_dataset:numPatterns(),bunch_size do
    local last = math.min(i+bunch_size-1, nump)
    local bunch_indexes = {}
    for j=i,last do table.insert(bunch_indexes, j) end
    -- two bunches of patterns
    local input_bunch  = input_dataset:getPatternBunch(bunch_indexes)
    local output_bunch = output_dataset:getPatternBunch(bunch_indexes)
    -- we use the trainer method train_step
    trainer:train_step(input_bunch, output_bunch)
    -- It is better to collectgarbage every K patterns
    if i%100 == 0 then collectgarbage("collect") end
  end
  collectgarbage("collect")
  -- it is important to return the LOSS of the epoch
  return trainer.loss_function:get_accum_loss()
end

validation_function = function(trainer, va_table)
  -- ANNs work over dataset.token, we need a wrapper to convert dataset.matrix
  -- into dataset.token
  local input_dataset  = dataset.token.wrapper(va_table.input_dataset)
  local output_dataset = dataset.token.wrapper(va_table.output_dataset)
  local bunch_size     = va_table.bunch_size or trainer.bunch_size or 32
  local nump           = input_dataset:numPatterns()
  self.loss_function:reset()
  for i=1,input_dataset:numPatterns(),bunch_size do
    local last = math.min(i+bunch_size-1, nump)
    local bunch_indexes = {}
    for j=i,last do table.insert(bunch_indexes, j) end
    -- two bunches of patterns
    local input_bunch  = input_dataset:getPatternBunch(bunch_indexes)
    local output_bunch = output_dataset:getPatternBunch(bunch_indexes)
    -- we use the trainer method validate_step
    trainer:validate_step(input_bunch, output_bunch)
     -- It is better to collectgarbage every K patterns
    if i%100 == 1 then collectgarbage("collect") end
  end
  collectgarbage("collect")
   -- it is important to return the LOSS of the epoch
  return trainer.loss_function:get_accum_loss()
end

result = trainer:train_holdout_validation{
  ...
  training_function = training_function,
  validation_function = validation_function,
  ...
}

More sophisticated functions could be developed if you change train_step and validate_step by your own functions. Please, if you want to do custom development, first read carefully the ANNs from scratch documentation and the packages/ann/trainable/trainable.lua script.

One easy possibility is to use different loss function in validation, as for example compute classification error, using method use_dataset:

validation_function = function(trainer, va_table)
  local hyp_dataset = trainer:use_dataset{ input_dataset = va_table.input_dataset }
  local num_errors = 0
  for ipat,pat in hyp_dataset:patterns() do
    local _,hyp = table.max(pat)
    local _,tgt = table.max( va_table.output_dataset:getPattern(ipat) )
    if hyp ~= tgt then num_errors = num_errors + 1 end
  end
  return num_errors / va_table.input_dataset:numPatterns()
end

Stopping criteria

For holdout-validation scheme, exists two predefined stopping criteria, which are function builders (they return the function used as criterion):

  • trainable.stopping_criteria.make_max_epochs_wo_imp_absolute: which receives a constant indicating the maximum number of epochs without improve validation. A tipical value is between 10 and 20, depending in the task.

  • trainable.stopping_criteria.make_max_epochs_wo_imp_relative: which receives a constant indicating the maximum value for current_epoch/best_epoch. A tipical value for this is 2.

This two criteria could be used as this:

result = trainer:train_holdout_validation{
  ...
  stopping_criterion = trainable.stopping_criteria.make_max_epochs_wo_imp_relative(2),
  ...
}

Also you can create your own stopping criterion, which is a function which receives a table:

result = trainer:train_holdout_validation{
  ann = thenet,
  ...
  stopping_criterion = function(t)
    -- t contains this fields:
    --   * current_epoch
    --   * best_epoch
    --   * best_val_error
    --   * train_error
    --   * validation_error
    --   * train_params
    return true IF ANY CRITERIA USING t TABLE FIELDS
  end,
  ...
}

aprilann.ann package

ANNs are implemented as a composition of components which implements define the three main operations of an ANN: forward step (compute outputs) and backprop step (gradient computation), and update step (update the weights). All components are child classes of ann.components.base. See april_help("ann.components.base") for on-line documentation.

Two main remarks before continue following sections. The components has two special properties:

  • name: is a string which identifies the component in a unique manner, is forbidden that two components sharing the same name.
  • weights_name: is a string which identifies the connections (weights or biases) of the component. This name could be share by different components, which means that they share the same connections object.

Tokens and matrices

The components are integrated in Lua via the abstract class token, which has two specializations for ANNs:

  • tokens.matrix is a token which contains a matrix instance.

  • tokens.vector.sparse is a token which represents an sparse array.

Here we present the tokens.matrix abstraction, which could be constructed as follows:

> m = matrix.col_major(2,2,{1,2,3,4})
> t = tokens.matrix(m)
> print(t)
instance 0xc218b0 of tokens.matrix
> print(t:get_matrix())
1 2
3 4
# Matrix of size [2,2] in col_major [0x1450440 data= 0x13ebdb0]

For simplicity, any token instance has the method get_matrix() defined, which returns the underlying matrix or nil in case of a the given token is not a tokens.matrix instance.

NOTE that ANN components work with col_major matrices.

Components basis

All components has defined the following basic properties, which are tokens: input, output, error_input, and error_output. Four are the basic methods to train the components:

  • table,table,component = build(): this method reserves memory for weights and prepares the component to work with.
  • reset(): it releases all the tokens internally allocated (or given by Lua).
  • token=forward(token[, boolean]): it receives an input token and returns the output token.
  • token=backprop(token): it receives an error input token (gradient), and returns the output error token (gradient).
  • update(): updates internal weights and parameters of the component using the tokens given and produced at forward and backprop methods.

Combining this methods with loss functions a component could be trained following this basic example. A linear component is trained to follow OR function, for input=[0,1] and target output=[1]. By default the weights are not initialized, so they contains memory trash.

> o = ann.optimizer.gsd() -- the optimizer
> l = ann.loss.mse(1) -- MSE loss function
> -- an hyperplane component (explained later)
> c = ann.components.hyperplane{ input=2, output=1 }
> c:build() -- allocates memory for weights, and checks components integrity
> l:reset() -- set to zero all the things
> c:reset() -- set to zero all the things
> -- the true indicates training
> output_token=c:forward(tokens.matrix( matrix.col_major(1,2,{0,1})), true)
> print(output_token:get_matrix())
-6.61649e-31
# Matrix of size [1,1] in col_major [0xb01050 data= 0xad4a80]
> -- gradient with desired output 1
> output_error=c:backprop(l:gradient(output_token,
>>                        tokens.matrix(matrix.col_major(1,1,{1}))))
> print(output_error:get_matrix())
6.61649e-31 -4.5566e-41
# Matrix of size [1,2] in col_major [0xb01630 data= 0xad7bc0]
> grad = c:compute_gradients() -- update the weights
> o:execute(function() return grad,1,output_error end, c:copy_weights())
> output_token=c:forward(tokens.matrix( matrix.col_major(1,2,{0,1})))
> print(output_token:get_matrix()) -- the output is closer to 1
0.2
# Matrix of size [1,1] in col_major [0xb01ce0 data= 0xad97d0]

Methods common to all the components

Note that all matrices must be in col_major and with at least two dimensions. All computations are done in bunch mode (using mini-batches) and the first dimension size is the number of patterns contained by the bunch. The rest of dimensions must complain the input constrains of the component. A lot of components work with linear inputs, so the input matrix will be bi-dimensional, but some components work with multidimensional matrices. It is possible to use matrices of only one dimension and they will be reinterpreted as two dimensional matrices with only one row, but better if you work always with two-dimensional matrices.

Building procedure

Before doing anything, components could be composed together to build larger components. This procedure needs to call build method at the end, to check the input/output sizes and reserve memory for weights and biases.

The c:build() call executes recursively the build method of all the components composition. This method returns two tables:

> weights_table, components_table, caller_component = c:build()

The weights_table is indexed by each weigth_name and contains a connections object (explained latter), which is useful to initialize the value of the weights.

The components_table is indexed by each name (component name) and contains a reference to the component instance, which is useful to initialize hyper-parameter and other stuff in a component-wise manner.

The caller_component is the component c in this case, but this argument could be ignored.

Back-propagation computation methods

  • token = c:forward( token [, boolean] ) receives a token and an optional boolean (by default false). The boolean indicates if this forward is during training or not, because some components has an special behavior during training. It returns a token with the output computation of the caller component.
  • token = c:backprop( token ) receives a token with the input error (gradient of each output neuron), and returns another token with the output error (gradient of each input neuron).
  • gradients = c:compute_gradients( gradients ) returns the weight gradients computed using the tokens given at forward and backprop methods.
  • c:reset() releases the retained tokens in forward and backprop steps.

Parameters get and set

  • c:set_option( name, value ) sets the option given its name string to the given value. Different components has different options, but the most important are: dropout_factor, dropout_seed. Not all components implements all of this options.
  • value = c:get_option( name ) returns the value assigned to the given option name.
  • boolean = c:has_option( name ) asks to a component if it has implemented the given option name.

Getters of produced and retained tokens

During forward and backprop steps the components compute outputs and error outputs (gradients), and retain the input and error input (gradients) tokens. Before call reset method, you could ask the component for its retained tokens:

  • token = c:get_input() returns the token given as input at forward method.
  • token = c:get_output() returns the token computed as output by forward method.
  • token = c:get_error_input() retruns the token given as error input at backprop method.
  • token = c:get_error_output() returns the token computed as error output by backprop method.

Connection weigths object: weights matrices and bias vectors

Components which require weights has internally an ann.connections instance. This object are reserved calling the build method of the components (or using the build method of a trainer), and are identified by the weigths_name property, so components with the same weigths_name share the same connections object.

This objects are basically pure data (with minimum logic), and are defined by an OUTPUTxINPUT size (output rows, input columns), so:

  • Bias vectors: has INPUT=1 and OUTPUT=number of neurons.

  • Weight matrices: contain OUTPUTSxINPUTS weights.

Each of this objects complain the following interface:

-- previous linear component example
c = ann.components.hyperplane{ input=2, output=1 }
weights_table = c:build()
rnd = random(1234) -- for weights random initialization
for _,cnn in pairs(weights_table) do
  -- randomize_weights initialize the weights following uniform distribution
  -- at range [inf, sup]
  cnn:randomize_weights{
    random = rnd,
    inf = -0.1,
    sup =  0.1,
  }
end

-- OTHER METHODS
-- cnn is a connection object in Lua
local cnn_clone = cnn:clone() -- returns a deep copy of cnn object
cnn:load{
  w = weights_matrix (in row major),
  oldw = another_weights_matrix (in row major),
  first_pos = where is the first weight at the given matrix,
  column_size = the size of a column in the cnn object (internally, matrixes are stored in column major)
}
local w,oldw,size = cnn:copy_to() -- copies the weights to matrices and return them
                                  -- and the number of weights
local size = cnn:size() -- number of weights in the object
local input_size = cnn:get_input_size()
local output_size = cnn:get_output_size()

local w,oldw = cnn:matrix() -- returns a reference to the internal matrices (in col_major)
-- of the connections object. BE CAREFUL, any change in this matrices modifies directly the
-- weights of your ANN components

-- method to_lua_string() returns a string which contains Lua instruction necessary to
-- construct the caller connections object
print(cnn:to_lua_string())

Connections are stored internally at column major, but externally they are viewed as row major. Therefore, the loaded and returned weights matrices has this format:

w(i1,o1)  w(i2,o1)  w(i3,o1)  ...
w(i1,o2)  w(i2,o2)  w(i3,o2)  ...
...       ...       ...

where w(a,b) is the weight which connects input a with output b. Be sure that your matrices has this format.

Save and load of components

The best way to save a component is by using an instance of trainable.supervised_trained:

> trainer = trainable.supervised_trainer(c):save("ann.net", "binary")
> c = trainable.supervised_trainer.load("ann.net"):get_component()

However it is possible to save the components in their own using the methods to_lua_string(), which return a Lua string with the composition necessary to construct the objects, and the method c:copy_weights() which returns the same weights_table as the build method. The Lua string and the weights could be stored at a file, and loaded after.

The following functions implement this functionality:

  • ann.save(component, filename)
  • component = ann.load(filename)

Basically this two functions are like the following code:

function save(c, filename)
  local f =  io.open(filename, "w")
  f:write(string.format("return %s:build{ weights={\n %s\n}\n}\n",
			c:to_lua_string(),
			table.concat(
                          table.linearize(
                            table.map2(c:copy_weights(),
				       function(k,v)
				         return string.format("[%q] = %s",
                                                              k,v:to_lua_string())
				       end)), ",\n")))
  f:close()
end

c = ann.components.hyperplane{ input=10, output=10 }
c:build()
save(c, "jaja.net")

-- The load is simple using dofile function. Note that the jaja.net file returns
-- build method outputs, which are three things: a table with connections, a
-- table with components, the caller component
_,_,c = dofile("jaja.net")
print(c)

Components list

Basic components

ann.components.base

ann.components.bias

ann.components.dot_product

ann.components.hyplerplane

Container components

ann.components.join

ann.components.stack

Convolutional components

This components are used to build Convolutional Neural Networks. This components work with input matrices at col_major order. If you use dataset.matrix, your patterns will be flattened at converted into a one dimensional matrix. This forces to add a rewrap components at the beginning of your ANN. Besides, the dimensions ordering is backwards, so if your dataset.matrix is working with images of 20x30 pixels, your need to rewrap the images to 1x30x20 pixels (the first dimension is the number of planes). If you have a RGB color image, be sure that your row_major matrix is of 20x30x3, so your ANN rewraps it to 3x30x20 (having 3 input planes). Follows an example of a FULL CNN for MNIST task (28x28 pixels, images of digits):

-- tables for the CNN configuration
ishape  = {1, 28, 28} -- for input matrix rewrapping
conv1   = {1, 5, 5} nconv1=20
maxp1   = {1, 2, 2}
conv2   = {nconv1, 5, 5,} nconv2=50
maxp2   = {1, 2, 2}
hidden  = 500

-- sizes of each convolution component output
sz1 = { ishape[2] - conv1[2] + 1,    ishape[3] - conv1[3] + 1 }
sz2 = { math.floor(sz1_1/maxp1[2]),  math.floor(sz2_1/maxp1[3]) }
sz3 = { sz1_2 - conv2[2] + 1,        sz2_2 - conv2[3] + 1 }
sz4 = { math.floor(sz1_3/maxp2[2]),  math.floor(sz2_3/maxp2[3]) }

thenet = ann.components.stack():
push( ann.components.rewrap{ size=ishape } ):
push( ann.components.convolution{ kernel=conv1, n=nconv1 } ):
push( ann.components.convolution_bias{ n=nconv1, ndims=#conv1 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp1,} ):
push( ann.components.convolution{ kernel=conv2, n=nconv2 } ):
push( ann.components.convolution_bias{ n=nconv2, ndims=#conv2 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp2 } ):
push( ann.components.flatten() ):
push( ann.components.hyperplane{ input=sz4[1]*sz4[2]*nconv2, output=hidden } ):
push( ann.components.actf.tanh() ):
push( ann.components.hyperplane{ input=hidden, output= 10 } ):
push( ann.components.actf.log_softmax() )

ann.components.convolution

A convolutional component could be created as:

> c = ann.components.convolution{ kernel={3, 5, 5}, step={1, 1, 1}, n=10,
                                  name="conv-W1", weights="W1",
                                  input_planes_dim=1 }

This component executes a convolution using the given kernel sizes, moving the convolution window following step table, and using n different kernels. This module has a dynamic input/output size, the convolution is performed over all the input following the indicated parameters.

  • input_planes_dim is a number (optional, by default is 1) which indicates the dimension K at input matrix where are located the input planes.

  • kernel is a table which describes the size of each kernel. The K element of this table is always the number of PLANES at the input matrix. Therefore, a kernel over a 1-dim signal will be like kernel={1, 5} being K=1. For a 2D image will be kernel={1, 5, 5}, for a 2D image with RGB color will be kernel={3, 5, 5} if K=1, otherwise it could be kernel={5, 3, 5} if K=2 or kernel={5, 5, 3} if K=3. For a RGB video sequence the kernel will be kernel={3, 5, 5, 5} for K=1, and so on.

  • step is a table which indicates how to move the kernel. The number of steps at each dimension will be (input_dim[i] - kernel[i])/step[i] + 1. The K element of this table is forced to be 1, so that is the number of planes at input matrix. The step is optional, by default has all its elements assigned to 1.

  • n is the number of kernels to be applied. It is the number of output planes produced by this component (number of neurons).

  • name and weights are the strings with for search components and connection objects.

The output produced by this component will be of:

  • output_size[1]=n

  • output_size[i+1]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=1,...,input_planes_dim-1

  • output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=input_planes_dim+1,...,#kernel

By default, input_planes_dim=1, so the output size will be simplified as:

  • output_size[1]=n

  • output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=2,...,#kernel

ann.components.convolution_bias

> c = ann.components.convolution_bias{ n=10, ndims=3,
                                       name="conv-B1", weights="B1" }
  • n is the number of planes at the input (the first dimension size of the input matrix).

  • ndims is the number of dimensions expected at the input matrix.

  • name and weights as usual

ann.components.max_pooling

> c = ann.components.max_pooling{ kernel={1, 2, 2}, name="pool-2" }
  • kernel is a table with the sizes of the kernel applied to the input matrix. Depending on this the behavior of the max-pooling could be to do a down-sampling of an input matrix (as in the example), or to convert the input in a fixed size feature vector (kernel = {1, 0, 0}). The 0 value at one component means to fit this dimension with the same dimension of input matrix. So, the last example {1, 0, 0} will be a max-pooling computed over all positions for each input plane, producing as output a feature vector of INPUT PLANES size.

  • name as usual.

ann.components.flatten

This components converts an input matrix formed by N patterns of any dimensionality to an output bidimensional matrix with N rows and M columns, where M is the product of all input matrix dimensions (except the first one which is the number of patterns).

> c = ann.components.flatten{ name="flatten" }

Other components

ann.components.copy

ann.components.gaussian_noise

ann.components.salt_and_pepper

Clone this wiki locally