-
Notifications
You must be signed in to change notification settings - Fork 12
20 ann
Several packages contain neural networks stuff: require("aprilann.ann")
,
require("aprilann.ann.loss")
, require("aprilann.ann.optimizer")
,
require("aprilann.trainable")
.
This page describe the utilities to build and train ANNs. Four main sections are
written: a desciprion of ANN concepts in April-ANN, the easy building procedure
for MLPs, the training helpers, and finally the full description of the
aprilann.ann
package.
Inspired by other toolkits (as Torch 7 or pyBrain), ANNs are described as a composition of blocks call ANN components, so one component is a neural network itself. A list of all available components appears executing:
april_help("ann.components")
Nevertheless, the composition procedure will be explained later. An ANN component is identified by a name string (which will be automatically generated if not given). The name must be unique. Some components contains weights in their core, which are estimated by gradient descent algorithm (backpropagation). Connection weights objects are identified by a weights name parameter, which could be reused. If two components have the same weights name, then they share the same connections object.
All components have an input and output size, which defines the number of weights (if needed) and the fan-in/fan-out of the component. Components need to be build (build method) once they are constructed. Build procedure allocates memory for connections and checks input/output sizes of components.
More accurate description is available at april_help
, but don't be affraid, the next
section presents an abstraction for train MLPs which automatically does a lot of
this work:
april_help("ann.components.base")
april_help("ann.components.base.build")
The simpliest kind of ANN is a Multilayer Perceptron (MLP) where each layer is fully connected with the next layer (feed-forward, all-all connections).
The method generate
returns an special component object, which cannot be
modified. Actually, it is a Lua table formed by an ann.components.stack
instance and other information useful to load and save the MLPs, and it
implements wrapper Lua functions to ANN component methods.
-- creates an ANN component for a MLP with the given description
thenet = ann.mlp.all_all.generate("256 inputs 128 tanh 10 log_softmax")
-- creates an instance of a trainer object for previous ANN component,
-- using the multi-class cross-entropy loss function (for 10 output units),
-- and using a bunch_size of 32. Loss function and bunch_size are optional.
trainer = trainable.supervised_trainer(thenet,
ann.loss.multi_class_cross_entropy(10),
32,
-- this last parameter is optional, by default is
-- SGD => Stochastig Gradient Descent
ann.optimizer.sgd())
-- builds the component contained into trainer object
trainer:build()
-- initializes the weights randomly, using fan-in and fan-out
trainer:randomize_weights{
random = random(1234),
inf = -0.1,
sup = 0.1,
use_fanin = true,
use_fanout = true,
}
As said before, each component has a unique name, and if needed a weights name. The next code iterates over all components:
> for name,c in trainer:iterate_components() do print(name,c) end
actf1 instance 0x7fc3e94850a0 of ann.components.base
actf2 instance 0x7fc3e9485550 of ann.components.base
b1 instance 0x7fc3e9484f80 of ann.components.base
b2 instance 0x7fc3e9485410 of ann.components.base
c1 instance 0x7fc3e9484a10 of ann.components.base
layer1 instance 0x7fc3e9484e80 of ann.components.base
layer2 instance 0x7fc3e9485310 of ann.components.base
w1 instance 0x7fc3e9484ee0 of ann.components.base
w2 instance 0x7fc3e9485370 of ann.components.base
The MLP is composed by 9 components, two activation functions (actf1 and actf2), two bias components (b1 and b2), one stack component which works as a container (c1), two hyperplane components containing one bias and one dot_product each one (layer1 and layer2), and finally two dot_product components (w1 and w2) which contains weight matrixes.
It is also possible to iterate over all weigths names:
> for name,connections in trainer:iterate_weights() do print(name,connections) end
b1 instance 0x7f8563c11630 of ann.connections
b2 instance 0x7f8563c120c0 of ann.connections
w1 instance 0x7f8563c11500 of ann.connections
w2 instance 0x7f8563c111a0 of ann.connections
So, our MLP contains two bias vectors (b1 and b2, corresponding with b1 and b2 components), and two weights matrixes (w1 and w2, corresponding with w1 and w2 components). All MLPs generated automatically assign this names to its components and weights.
One time the component is build by using a trainer instance, the trainer exposes
two interesting methods trainer:component(COMPONENT_NAME_STRING)
which returns
the component given its name, and trainer:weights(WEIGTHS_NAME_STRING)
which
returns the connection weigths object given its weigths_name attribute.
More info about trainable.supervised_trainer
doing:
april_help("trainable.supervised_trainer")
Two save/load schemes are implemented for all-all MLPs. The first is related to
the component all-all (generated throught function ann.mlp.all_all.generate
).
The second is related to the trainable.supervised_trainer
object, and will
be detailed in following sections.
This two functions can store and load from a file the component generated via
ann.mlp.all_all.generate
function. It only works with this kind of object.
The save function has the precondition of a build component. The load function
loads the weights and returns a built component.
-- saves weights using binary option and also keep weights
-- of previous iteration (for momentum term)
ann.mlp.all_all.save(thenet, "net_filename.net", "binary")
-- saves weights using ascii option
ann.mlp.all_all.save(thenet, "net_filename.net", "ascii")
-- loads weights from a filename, and returns a built component
thenet = ann.mlp.all_all.load("net_filename.net")
-- in any case, it is possible to instantiate a trainer, with MSE loss function
-- asking the component for the number of output units, and with 32 bunch_size
-- parameter
trainer = trainable.supervised_trainer(thenet,
ann.loss.mse(thenet:get_output_size()),
32)
Save and load via trainable
writes to disk the model, weights, loss function,
and bunch size (note that this list could be larger in the future). The object
must be at build state before save, and load returns a built trainable
object:
thenet = any ann component (even an instance of ann.mlp.all_all)
trainer = trainable.supervised_trainer(thenet, loss_function, bunch_size)
trainer:build()
-- save method
trainer:save("net_filename.net", "binary")
-- load method, loss function, bunch_size and optimizer could be overwritten
-- optionally. If not given the load method uses which objects saved at the
-- file.
trainer = trainable.supervised_trainer.load("net_filename.net")
The loss function is used to train the ANNs via gradient descent algorithm. Trainer objects needs an instance of a loss function to perform training, being a very useful abstraction of standard training procedures.
Detailed information about loss functions is in:
april_help("ann.loss")
The loss function could be set at trainer constructor, or using the method set_loss_function:
trainer:set_loss_function(ann.loss.mse())
Three main error functions are implemented: mean square error (MSE), two class
cross-entropy, and multi-class cross-entropy. Note that cross-entropy like
functions are specialized for log_logistic or log_softmax output activation
functions. Almost all the constructors accepts a SIZE=0
parameter, which
means that the layer has a dynamic size.:
-
ann.loss.mse(SIZE)
returns an instance of the Mean Squared Error error function for SIZE neurons. It is a quadratic loss function. -
ann.loss.mae(SIZE)
returns an instance of the Mean Absolute Error function, for SIZE neurons. It is not a quadratic loss function. -
ann.loss.cross_entropy(SIZE)
returns an instance of the two-class cross-entropy. It only works withlog_logistic
output activation function. It is based on Kullback-Leibler divergence. -
ann.loss.multi_class_cross_entropy(SIZE)
returns an instance of the multi-class cross-entropy. The parameter must beSIZE>2
, so for two-class problems only one output unit with cross-entropy is needed. It only works withlog_logistic
orlog_softmax
output activation function (its better to uselog_softmax
). It is based on Kullback-Leibler divergence.
The optimizer is an object which implements the learning algorithm. Every class
in ann.optimizer
is an optimizer. Several learning hyperparameters are available,
depending in the selected optimizer. This learning hyperparameters are known as
options, and could be set globally (to all the connection weight layers of
the ANN), or layerwise (to a concrete connection weights object, identified
by its name). Optimizers implement the following API:
-
other = optimizer:clone()
: returns a deep copy of the caller object. -
value = optimizer:get_option(name)
: return the global value of a given learning option name. -
optimizer:set_option(name, value)
: sets the global value of a given learning option name. -
optimizer:set_layerwise_option(layer_name, option_name, value)
: sets a layerwise option. -
value = optimizer:get_layerwise_option(layer_name, option_name)
: returns the layerwise option of the given. -
value = optimizer:get_option_of(layer_name, option_name)
: returns the option which is applicable to the givenlayer_name
. If a layerwise option was previously defined, the method returns its value. Otherwise, the value of the global option will be returned.
Currently only one optimizer
is implemented. It trains the neural network
following the Stochastic Grandient Descent algorithm. It incorporates
regularization and momentum hyperparameters. Its options are:
-
learning_rate
: the learning rate controls the portion of the gradient used to update the weights. This value is smoothed depending in thebunch_size
and in the numberK
of times that a weight connections object is shared between different components. The smoothing value:learning_rate/sqrt(bunch_size+K)
-
momentum
: is a inertial hyperparameter which applies a portion of the weight update in the previous iteration. -
weight_decay
: a L2 regularization term. -
max_norm_penalty
: a constrain penalty based on the two-norm of the weights.
The algorithm uses the following learning rule:
w = (1 - weight_decay)*w' + momentum*(w' - w'') + lr'*grad(L)/grad(w')
where w
, w'
and w''
are the weight values at next, current, and previous
iterations; lr'
is the learning_rate
smoothed by the sqrt
, and
grad(L)/grad(w')
is the loss function gradient at the given weight.
The hyperparemters of optimizer
objects can be modified by the trainer
object:
-
trainer:set_option(name,value)
: sets a global learning option value. -
value=trainer:get_option(name)
: gets a global learning option value. -
trainer:set_layerwise_option(layer_name_match,option_name,value)
: sets a layerwise learning option value of all the connection weight objects whose name matches the givenlayer_name_match
Lua pattern string. -
value=trainer:get_option_of(layer_name,option_name)
: gets the option value applicable to the given layer.
Additionally, some ANN components has some internal parameters which are
configurable via trainer
objects:
-
trainer:set_component_option(component_name_match,option_name,value)
: sets the option of a givencomponent_name_match
Lua pattern string.
trainer:build()
trainer:set_option("learning_rate", number)
trainer:set_option("momentum", number)
trainer:set_option("weight_decay", number)
trainer:set_option("max_norm_penalty", number)
-- regularization is recommended to not be applied at bias connections
trainer:set_layerwise_option("b.*", "weight_decay", 0.0)
trainer:set_layerwise_option("b.*", "max_norm_penalty", -1.0)
-- for dropout (see dropout http://www.cs.toronto.edu/~nitish/msc_thesis.pdf)
-- dropout is a very especial option, it modifies training, but also modifies
-- validation (or test) phase. Also it must be applied carefully to not apply
-- dropout at the output of your model. Dropout is applied to
-- activation_function_components. A function like this will help to not apply
-- it to the output activation function:
trainer:set_component_option("actf.*", "dropout_seed", number)
trainer:set_component_option("actf.*", "dropout_factor, 0.5)
trainer:set_component_option(last_actf_name, "dropout_factor, 0.0)
NOTE that this functions receive a trainer
prepared to train, so you must
properly setup it using set_option
functions.
The trainable.supervised_trainer
object implements a lot of methods to
train ANNs automatically. See april_help("trainable.supervised_trainer")
for more details.
Two training methods are implemented:
-
train_wo_validation
: Trains an ANN without validation, for a minimum number of epochs and until the improvement in training was less than a given value. It receives a table and returns the BEST ann found during training:
best = trainer:train_wo_validation{
min_epochs = 10,
max_epochs = 1000,
training_table = { input_dataset = train_input_dataset,
output_dataset = train_output_dataset },
percentage_stopping_criterion = 0.01, -- 1%
update_function = function(t)
-- table t = { current_epoch, train_error, train_improvement, train_params }
printf("%d %f (%f) max epochs: %d\n",
t.current_epoch, t.train_error,
t.train_improvement, train_params.max_epochs)
-- t.train_params is the table that you use to execute train_wo_validation
-- function
end,
}
-- best is an instance of trainable.supervised_trainer
-
ann.train_holdout_validation
: Trains an ANN object using a training partition and a validation partition. Training is performed during a minimum number of epochs until certain stopping criterion is accomplished over the validation partition. It recieves a table and returns another table:
training_data = {
input_dataset = training_input_dataset,
output_dataset = training_output_dataset,
shuffle = random(SEED), -- SEED is a number
replacement = nil, -- if needed
}
validation_data = {
input_dataset = validation_input_dataset,
output_dataset = validation_output_dataset
}
result = trainer:train_holdout_validation{
training_table = training_data,
validation_table = validation_data,
min_epochs = 4,
max_epochs = 1000,
stopping_criterion = FUNCTION EXPLAINED BELOW,
update_function =
function(t) printf("%4d %.6f %.6f (%4d %.6f) max epochs: %d\n",
t.current_epoch,
t.train_error,
t.validation_error,
t.best_epoch,
t.best_val_error,
t.train_params.max_epochs)
-- t.train_params is the table that you use to execute train_crossvalidation
-- function
end,
validation_function = function(thenet, val_table)
-- by default is this. IT IS AN OPTIONAL FUNCTION
return thenet:validate_dataset(val_table)
end
}
print(result.best, result.best_val_error, result.best_epoch,
result.last_train_error, result.last_val_error, result.last_epoch)
-- result.best is an instance of trainable.supervised_trainer
Previous methods allow the definition of custom training and validation functions. The parameters table allows the definition of fields:
-
training_function(trainer,training_table)
: it is a Lua function which receives as parameter the trainer and the fieldtraining_table
. Nevertheless, you could simply ignore the function parameters implementing a closure which uses your own training data (it is recomended to not ignore the trainer parameter). -
validation_function(trainer, validation_table)
: it is a Lua function which receives as parameter the trainer and the fieldvalidation_table
. As before, you coudl implement a closure and use your own validation data, but it is better to not ignore the trainer parameter.
By default, the training and validation functions are trainable.supervised_trainer.train_dataset
and
trainable.supervised_trainer.validate_dataset
respectively. The next example show how to develop sequential trainining and validation functions over datasets.
training_function = function(trainer, tr_table)
-- ANNs work over dataset.token, we need a wrapper to convert dataset.matrix
-- into dataset.token
local input_dataset = dataset.token.wrapper(tr_table.input_dataset)
local output_dataset = dataset.token.wrapper(tr_table.output_dataset)
local bunch_size = tr_table.bunch_size or trainer.bunch_size or 32
local nump = input_dataset:numPatterns()
self.loss_function:reset()
for i=1,input_dataset:numPatterns(),bunch_size do
local last = math.min(i+bunch_size-1, nump)
local bunch_indexes = {}
for j=i,last do table.insert(bunch_indexes, j) end
-- two bunches of patterns
local input_bunch = input_dataset:getPatternBunch(bunch_indexes)
local output_bunch = output_dataset:getPatternBunch(bunch_indexes)
-- we use the trainer method train_step
trainer:train_step(input_bunch, output_bunch)
-- It is better to collectgarbage every K patterns
if i%100 == 0 then collectgarbage("collect") end
end
collectgarbage("collect")
-- it is important to return the LOSS of the epoch
return trainer.loss_function:get_accum_loss()
end
validation_function = function(trainer, va_table)
-- ANNs work over dataset.token, we need a wrapper to convert dataset.matrix
-- into dataset.token
local input_dataset = dataset.token.wrapper(va_table.input_dataset)
local output_dataset = dataset.token.wrapper(va_table.output_dataset)
local bunch_size = va_table.bunch_size or trainer.bunch_size or 32
local nump = input_dataset:numPatterns()
self.loss_function:reset()
for i=1,input_dataset:numPatterns(),bunch_size do
local last = math.min(i+bunch_size-1, nump)
local bunch_indexes = {}
for j=i,last do table.insert(bunch_indexes, j) end
-- two bunches of patterns
local input_bunch = input_dataset:getPatternBunch(bunch_indexes)
local output_bunch = output_dataset:getPatternBunch(bunch_indexes)
-- we use the trainer method validate_step
trainer:validate_step(input_bunch, output_bunch)
-- It is better to collectgarbage every K patterns
if i%100 == 1 then collectgarbage("collect") end
end
collectgarbage("collect")
-- it is important to return the LOSS of the epoch
return trainer.loss_function:get_accum_loss()
end
result = trainer:train_holdout_validation{
...
training_function = training_function,
validation_function = validation_function,
...
}
More sophisticated functions could be developed if you change train_step
and
validate_step
by your own functions. Please, if you want to do custom
development, first read carefully the ANNs from scratch documentation and the
packages/ann/trainable/trainable.lua
script.
One easy possibility is to use different loss function in validation, as for
example compute classification error, using method use_dataset
:
validation_function = function(trainer, va_table)
local hyp_dataset = trainer:use_dataset{ input_dataset = va_table.input_dataset }
local num_errors = 0
for ipat,pat in hyp_dataset:patterns() do
local _,hyp = table.max(pat)
local _,tgt = table.max( va_table.output_dataset:getPattern(ipat) )
if hyp ~= tgt then num_errors = num_errors + 1 end
end
return num_errors / va_table.input_dataset:numPatterns()
end
For holdout-validation scheme, exists two predefined stopping criteria, which are function builders (they return the function used as criterion):
-
trainable.stopping_criteria.make_max_epochs_wo_imp_absolute
: which receives a constant indicating the maximum number of epochs without improve validation. A tipical value is between 10 and 20, depending in the task. -
trainable.stopping_criteria.make_max_epochs_wo_imp_relative
: which receives a constant indicating the maximum value for current_epoch/best_epoch. A tipical value for this is 2.
This two criteria could be used as this:
result = trainer:train_holdout_validation{
...
stopping_criterion = trainable.stopping_criteria.make_max_epochs_wo_imp_relative(2),
...
}
Also you can create your own stopping criterion, which is a function which receives a table:
result = trainer:train_holdout_validation{
ann = thenet,
...
stopping_criterion = function(t)
-- t contains this fields:
-- * current_epoch
-- * best_epoch
-- * best_val_error
-- * train_error
-- * validation_error
-- * train_params
return true IF ANY CRITERIA USING t TABLE FIELDS
end,
...
}
ANNs are implemented as a composition of components which implements define the
three main operations of an ANN: forward step (compute outputs) and backprop
step (gradient computation), and update step (update the weights).
All components are child classes of ann.components.base
.
See april_help("ann.components.base")
for on-line documentation.
Two main remarks before continue following sections. The components has two special properties:
- name: is a string which identifies the component in a unique manner, is forbidden that two components sharing the same name.
- weights_name: is a string which identifies the connections (weights or biases) of the component. This name could be share by different components, which means that they share the same connections object.
The components are integrated in Lua via the abstract class token
, which has two specializations
for ANNs:
-
tokens.matrix
is a token which contains amatrix
instance. -
tokens.vector.sparse
is a token which represents an sparse array.
Here we present the tokens.matrix
abstraction, which could be constructed as follows:
> m = matrix.col_major(2,2,{1,2,3,4})
> t = tokens.matrix(m)
> print(t)
instance 0xc218b0 of tokens.matrix
> print(t:get_matrix())
1 2
3 4
# Matrix of size [2,2] in col_major [0x1450440 data= 0x13ebdb0]
For simplicity, any token instance has the method get_matrix()
defined, which returns
the underlying matrix or nil
in case of a the given token is not a tokens.matrix
instance.
NOTE that ANN components work with col_major
matrices.
All components has defined the following basic properties, which are tokens: input, output, error_input, and error_output. Four are the basic methods to train the components:
-
table,table,component = build()
: this method reserves memory for weights and prepares the component to work with. -
reset()
: it releases all the tokens internally allocated (or given by Lua). -
token=forward(token[, boolean])
: it receives an input token and returns the output token. -
token=backprop(token)
: it receives an error input token (gradient), and returns the output error token (gradient). -
update()
: updates internal weights and parameters of the component using the tokens given and produced atforward
andbackprop
methods.
Combining this methods with loss functions a component could be trained following this basic example. A linear component is trained to follow OR function, for input=[0,1] and target output=[1]. By default the weights are not initialized, so they contains memory trash.
> o = ann.optimizer.gsd() -- the optimizer
> l = ann.loss.mse(1) -- MSE loss function
> -- an hyperplane component (explained later)
> c = ann.components.hyperplane{ input=2, output=1 }
> c:build() -- allocates memory for weights, and checks components integrity
> l:reset() -- set to zero all the things
> c:reset() -- set to zero all the things
> -- the true indicates training
> output_token=c:forward(tokens.matrix( matrix.col_major(1,2,{0,1})), true)
> print(output_token:get_matrix())
-6.61649e-31
# Matrix of size [1,1] in col_major [0xb01050 data= 0xad4a80]
> -- gradient with desired output 1
> output_error=c:backprop(l:gradient(output_token,
>> tokens.matrix(matrix.col_major(1,1,{1}))))
> print(output_error:get_matrix())
6.61649e-31 -4.5566e-41
# Matrix of size [1,2] in col_major [0xb01630 data= 0xad7bc0]
> grad = c:compute_gradients() -- update the weights
> o:execute(function() return grad,1,output_error end, c:copy_weights())
> output_token=c:forward(tokens.matrix( matrix.col_major(1,2,{0,1})))
> print(output_token:get_matrix()) -- the output is closer to 1
0.2
# Matrix of size [1,1] in col_major [0xb01ce0 data= 0xad97d0]
Note that all matrices must be in col_major
and with at least two dimensions.
All computations are done in bunch mode (using mini-batches) and the first dimension size
is the number of patterns contained by the bunch.
The rest of dimensions must complain the input constrains of the component. A lot of components
work with linear inputs, so the input matrix will be bi-dimensional, but some components
work with multidimensional matrices.
It is possible to use matrices of only one dimension and they will be reinterpreted
as two dimensional matrices with only one row, but better if you work always with two-dimensional
matrices.
Before doing anything, components could be composed together to build larger components.
This procedure needs to call build
method at the end, to check the input/output sizes
and reserve memory for weights and biases.
The c:build()
call executes recursively the build method of all the components composition.
This method returns two tables:
> weights_table, components_table, caller_component = c:build()
The weights_table
is indexed by each weigth_name and contains a connections object (explained latter), which is useful to initialize the value of the weights.
The components_table
is indexed by each name (component name) and contains a reference to the
component instance, which is useful to initialize hyper-parameter and other stuff in a component-wise
manner.
The caller_component
is the component c
in this case, but this argument could be ignored.
-
token = c:forward( token [, boolean] )
receives a token and an optional boolean (by default false). The boolean indicates if this forward is during training or not, because some components has an special behavior during training. It returns a token with the output computation of the caller component. -
token = c:backprop( token )
receives a token with the input error (gradient of each output neuron), and returns another token with the output error (gradient of each input neuron). -
gradients = c:compute_gradients( gradients )
returns the weight gradients computed using the tokens given atforward
andbackprop
methods. -
c:reset()
releases the retained tokens inforward
andbackprop
steps.
-
c:set_option( name, value )
sets the option given its name string to the given value. Different components has different options, but the most important are: dropout_factor, dropout_seed. Not all components implements all of this options. -
value = c:get_option( name )
returns the value assigned to the given option name. -
boolean = c:has_option( name )
asks to a component if it has implemented the given option name.
During forward
and backprop
steps the components compute outputs and error outputs (gradients),
and retain the input and error input (gradients) tokens. Before call reset
method, you could
ask the component for its retained tokens:
-
token = c:get_input()
returns the token given as input atforward
method. -
token = c:get_output()
returns the token computed as output byforward
method. -
token = c:get_error_input()
retruns the token given as error input atbackprop
method. -
token = c:get_error_output()
returns the token computed as error output bybackprop
method.
Components which require weights has internally an ann.connections
instance.
This object are reserved calling the build method of the components (or using
the build method of a trainer), and are identified by the weigths_name property,
so components with the same weigths_name share the same connections object.
This objects are basically pure data (with minimum logic), and are defined by an OUTPUTxINPUT size (output rows, input columns), so:
-
Bias vectors: has INPUT=1 and OUTPUT=number of neurons.
-
Weight matrices: contain OUTPUTSxINPUTS weights.
Each of this objects complain the following interface:
-- previous linear component example
c = ann.components.hyperplane{ input=2, output=1 }
weights_table = c:build()
rnd = random(1234) -- for weights random initialization
for _,cnn in pairs(weights_table) do
-- randomize_weights initialize the weights following uniform distribution
-- at range [inf, sup]
cnn:randomize_weights{
random = rnd,
inf = -0.1,
sup = 0.1,
}
end
-- OTHER METHODS
-- cnn is a connection object in Lua
local cnn_clone = cnn:clone() -- returns a deep copy of cnn object
cnn:load{
w = weights_matrix (in row major),
oldw = another_weights_matrix (in row major),
first_pos = where is the first weight at the given matrix,
column_size = the size of a column in the cnn object (internally, matrixes are stored in column major)
}
local w,oldw,size = cnn:copy_to() -- copies the weights to matrices and return them
-- and the number of weights
local size = cnn:size() -- number of weights in the object
local input_size = cnn:get_input_size()
local output_size = cnn:get_output_size()
local w,oldw = cnn:matrix() -- returns a reference to the internal matrices (in col_major)
-- of the connections object. BE CAREFUL, any change in this matrices modifies directly the
-- weights of your ANN components
-- method to_lua_string() returns a string which contains Lua instruction necessary to
-- construct the caller connections object
print(cnn:to_lua_string())
Connections are stored internally at column major, but externally they are viewed as row major. Therefore, the loaded and returned weights matrices has this format:
w(i1,o1) w(i2,o1) w(i3,o1) ...
w(i1,o2) w(i2,o2) w(i3,o2) ...
... ... ...
where w(a,b)
is the weight which connects input a
with output b
. Be sure
that your matrices has this format.
The best way to save a component is by using an instance of trainable.supervised_trained
:
> trainer = trainable.supervised_trainer(c):save("ann.net", "binary")
> c = trainable.supervised_trainer.load("ann.net"):get_component()
However
it is possible to save the components in their own using the methods to_lua_string()
, which return
a Lua string with the composition necessary to construct the objects, and the method c:copy_weights()
which returns the same weights_table as the build
method. The Lua string and the weights could be
stored at a file, and loaded after.
The following functions implement this functionality:
ann.save(component, filename)
component = ann.load(filename)
Basically this two functions are like the following code:
function save(c, filename)
local f = io.open(filename, "w")
f:write(string.format("return %s:build{ weights={\n %s\n}\n}\n",
c:to_lua_string(),
table.concat(
table.linearize(
table.map2(c:copy_weights(),
function(k,v)
return string.format("[%q] = %s",
k,v:to_lua_string())
end)), ",\n")))
f:close()
end
c = ann.components.hyperplane{ input=10, output=10 }
c:build()
save(c, "jaja.net")
-- The load is simple using dofile function. Note that the jaja.net file returns
-- build method outputs, which are three things: a table with connections, a
-- table with components, the caller component
_,_,c = dofile("jaja.net")
print(c)
This components are used to build Convolutional Neural Networks. This components work
with input matrices at col_major
order. If you use dataset.matrix
, your patterns
will be flattened at converted into a one dimensional matrix. This forces to add
a rewrap
components at the beginning of your ANN. Besides, the dimensions ordering is
backwards, so if your dataset.matrix
is working with images of 20x30 pixels, your need
to rewrap the images to 1x30x20 pixels (the first dimension is the number of planes). If you
have a RGB color image, be sure that your row_major
matrix is of 20x30x3, so
your ANN rewraps it to 3x30x20 (having 3 input planes). Follows an example of a FULL CNN
for MNIST task (28x28 pixels, images of digits):
-- tables for the CNN configuration
ishape = {1, 28, 28} -- for input matrix rewrapping
conv1 = {1, 5, 5} nconv1=20
maxp1 = {1, 2, 2}
conv2 = {nconv1, 5, 5,} nconv2=50
maxp2 = {1, 2, 2}
hidden = 500
-- sizes of each convolution component output
sz1 = { ishape[2] - conv1[2] + 1, ishape[3] - conv1[3] + 1 }
sz2 = { math.floor(sz1_1/maxp1[2]), math.floor(sz2_1/maxp1[3]) }
sz3 = { sz1_2 - conv2[2] + 1, sz2_2 - conv2[3] + 1 }
sz4 = { math.floor(sz1_3/maxp2[2]), math.floor(sz2_3/maxp2[3]) }
thenet = ann.components.stack():
push( ann.components.rewrap{ size=ishape } ):
push( ann.components.convolution{ kernel=conv1, n=nconv1 } ):
push( ann.components.convolution_bias{ n=nconv1, ndims=#conv1 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp1,} ):
push( ann.components.convolution{ kernel=conv2, n=nconv2 } ):
push( ann.components.convolution_bias{ n=nconv2, ndims=#conv2 } ):
push( ann.components.actf.tanh() ):
push( ann.components.max_pooling{ kernel=maxp2 } ):
push( ann.components.flatten() ):
push( ann.components.hyperplane{ input=sz4[1]*sz4[2]*nconv2, output=hidden } ):
push( ann.components.actf.tanh() ):
push( ann.components.hyperplane{ input=hidden, output= 10 } ):
push( ann.components.actf.log_softmax() )
A convolutional component could be created as:
> c = ann.components.convolution{ kernel={3, 5, 5}, step={1, 1, 1}, n=10,
name="conv-W1", weights="W1",
input_planes_dim=1 }
This component executes a convolution using the given kernel
sizes, moving the convolution window
following step
table, and using n
different kernels. This module has a dynamic input/output size,
the convolution is performed over all the input following the indicated parameters.
-
input_planes_dim
is a number (optional, by default is 1) which indicates the dimension K at input matrix where are located the input planes. -
kernel
is a table which describes the size of each kernel. The K element of this table is always the number of PLANES at the input matrix. Therefore, a kernel over a 1-dim signal will be likekernel={1, 5}
being K=1. For a 2D image will bekernel={1, 5, 5}
, for a 2D image with RGB color will bekernel={3, 5, 5}
if K=1, otherwise it could bekernel={5, 3, 5}
if K=2 orkernel={5, 5, 3}
if K=3. For a RGB video sequence the kernel will bekernel={3, 5, 5, 5}
for K=1, and so on. -
step
is a table which indicates how to move the kernel. The number of steps at each dimension will be(input_dim[i] - kernel[i])/step[i] + 1
. The K element of this table is forced to be1
, so that is the number of planes at input matrix. Thestep
is optional, by default has all its elements assigned to1
. -
n
is the number of kernels to be applied. It is the number of output planes produced by this component (number of neurons). -
name
andweights
are the strings with for search components and connection objects.
The output produced by this component will be of:
-
output_size[1]=
n
-
output_size[i+1]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=1,...,
input_planes_dim
-1 -
output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=
input_planes_dim
+1,...,#kernel
By default, input_planes_dim=1
, so the output size will be simplified as:
-
output_size[1]=
n
-
output_size[i]=(input_size[i] - kernel[i])/step[i] + 1, FOR i=2,...,
#kernel
> c = ann.components.convolution_bias{ n=10, ndims=3,
name="conv-B1", weights="B1" }
-
n
is the number of planes at the input (the first dimension size of the input matrix). -
ndims
is the number of dimensions expected at the input matrix. -
name
andweights
as usual
> c = ann.components.max_pooling{ kernel={1, 2, 2}, name="pool-2" }
-
kernel
is a table with the sizes of the kernel applied to the input matrix. Depending on this the behavior of the max-pooling could be to do a down-sampling of an input matrix (as in the example), or to convert the input in a fixed size feature vector (kernel = {1, 0, 0}
). The0
value at one component means to fit this dimension with the same dimension of input matrix. So, the last example{1, 0, 0}
will be a max-pooling computed over all positions for each input plane, producing as output a feature vector of INPUT PLANES size. -
name
as usual.
This components converts an input matrix formed by N patterns of any dimensionality to an output bidimensional matrix with N rows and M columns, where M is the product of all input matrix dimensions (except the first one which is the number of patterns).
> c = ann.components.flatten{ name="flatten" }