-
Notifications
You must be signed in to change notification settings - Fork 12
01 Intro
APRIL-ANN (A Pattern Recognizer In Lua with Artificial Neural Networks) is more than an ANNs toolkit. It is pattern recognizer project.
Simple Lua scripts could be implemented to run ANNs experiments. Some examples are below.
Take note that APRIL-ANN offers an inline help with two basic commands:
april_help(...)
april_dir(...)
april_list(...)
The april_help(object)
function takes an object (Lua table, function, userdata, ...)
as a parameter and shows the corresponding help via standard output.
The april_dir(object)
function takes an object as a parameter and shows
the corresponding help via standard output. It is the same as april_help
but
a lot less verbose.
The april_list(table)
function takes a table and shows its content using pairs
function. It has nothing to do with inline help, but is useful in a lot of circumstances when developing scripts.
Play a little with it, so execute april_help(ann.components)
and after
april_help(ann.components.base)
and see what happens ;)
If you want to access instance methods documentation, you have two ways:
- Use the operator
..
(that is,__concat
metamethod) with a class table plus method name string:
> april_help(ann.components.base .. "forward")
method Computes forward step with the given token
description: Computes forward step with the given token
parameters:
1 An input token (usually a matrix)
2 A boolean indicating if the forward is
during_training or not. This information is used by
ann.components.actf objects to apply dropout
during training, and to halve the activation during
validation (or test). It is [optional], by default
is false.
outputs:
1 An output token (usually a matrix)
- Declare an instance of the class and execute
april_help(obj)
:
> c = ann.components.base()
> april_help(c.forward)
method Computes forward step with the given token
description: Computes forward step with the given token
parameters:
1 An input token (usually a matrix)
2 A boolean indicating if the forward is
during_training or not. This information is used by
ann.components.actf objects to apply dropout
during training, and to halve the activation during
validation (or test). It is [optional], by default
is false.
outputs:
1 An output token (usually a matrix)
APRIL-ANN incorporates an adaptation of
[https://github.com/rrthomas/lua-rlcompleter](Lua rlcompleter module) for Lua
5.2, and for the APRIL-ANN object oriented implementation. It allows to
auto-complete pathnames, global names, table fields, and object methods, by using
the <tab>
key.
> matrix.<tab><tab>
_NAME dict fromString ...
_VERSION fromFilename fromTabFilename ...
__sliding_window__ fromHEX join ...
as fromMMap loadImage ...
... ... ...
> matrix.fromTab<tab>
> matrix.fromTabFilename<tab><tab>
fromTabFilename
> matrix.fromTabFilename
Almost any object can be serialized to a disk file, string or a stream using
the function util.serialize()
. Similarly, it can be deserialized using
util.deserialize()
function.
The code described here is at the repo path EXAMPLES/xor.lua
. First, we need to create an ANN
component object which will be trained:
thenet = ann.mlp.all_all.generate("2 inputs 2 logistic 1 logistic")
The object thenet
is a Multilayer Perceptron (MLP) with 2 inputs, a hidden
layer with 2 neurons with logistic activation function, and 1 output neuron with
logistic activation function. Some activation functions are available: logistic,
tanh, linear, softmax, log_logistic, sin, softsign, softplus, .... (see
april_help(ann.components.actf)
).
Now, in order to do easy and fast development of scripts, a trainer helper wrapper can be used:
bunch_size=4
trainer = trainable.supervised_trainer(thenet, ann.loss.mse(1), bunch_size)
The trainer needs the ANN component, the loss function, and the bunch_size. Bunch size is the same as mini-batch size, it is used to train several patterns at the same time, increasing the speed of the experiment. Values between 32 and 64 are tipically used, but in this example onlt 4 is possible, so the XOR problem is composed by 4 patterns.
The next step is to build the component and randomize its weights:
trainer:build()
trainer:randomize_weights{
random = random(1234),
inf = -0.1,
sup = 0.1 }
The weights will be initialized uniformly in the range [inf, sup], using the
given random
object with 1234 as random seed. It is also possible to indicate
if you want to initialize weights.
The components has several learning parameters which needs to be configured:
trainer:set_option("learning_rate", 1.0)
trainer:set_option("momentum", 0.5)
trainer:set_layerwise_option("w.*", "weight_decay", 1e-05)
Data to train the ANN is defined using matrix
and dataset
objects. It is
possible to build XOR problem on a matrix
and use it as training datasets
:
m_xor = matrix.fromString[[
4 3
ascii
0 0 0
0 1 1
1 0 1
1 1 0
]]
ds_input = dataset.matrix(m_xor, {patternSize={1,2}})
ds_output = dataset.matrix(m_xor, {offset={0,2}, patternSize={1,1}})
The variable m_xor
is a matrix object, loaded from the given
string. ds_input
is a dataset.matrix
object, which traverses the matrix by
rows, computing a sliding window of patternSize={1,2}
. The desired output of
the ANN is another dataset.matrix
, but in this case computing the sliding
window with size (1,1) and skipping the first two columns offset={0,2}
.
Finally, we need to train the ANN:
for i=1,10000 do
local error = trainer:train_dataset{ input_dataset = ds_input,
output_dataset = ds_output }
print(i, error)
end
This code trains the ANN for 10,000 epochs, feeding the ANN with input_dataset
and using as desired output the given output_dataset
. Patterns
are grouped at mini-batches of size 4 (bunch_size), and each training epoch is
the training with the full dataset.
This simple example gives you some insight about how to use APRIL-ANN toolkit, but it is not useful in a bit more complicated problems. Next section will explain DIGITS problem, which trains an ANN to classify handwritten digits.
The task aborded at this section is classification of handwritten digits. The
code is at EXAMPLES/digits.lua
, and could be executed following this command:
april-ann digits.lua
. This task uses as data a large PNG image with
handwritten digits ordered by columns and rows. Each columns corresponds to each
digit class (from 0 to 9), and each row contains 10 examples (one for each
class). There are 1000 patterns (100 for each clasS). So, first the image is
loaded using this code, and converted to a matrix where 0 represents white color
and 1 represents black color:
digits_image = ImageIO.read(string.get_path(arg[0]).."digits.png")
m1 = digits_image:to_grayscale():invert_colors():matrix()
This code uses ImageIO.read
function to load the PNG image (you need to
compile libpng package), and uses string.get_path
function in order to find
where the file is located. The image is converted to grayscale, colors are
inverted to be 0=white and 1=black, and finally the corresponding matrix of this
image is generated.
Second, the training input and output dataset are generated following this code:
-- TRAINING --
train_input = dataset.matrix(m1,
{
patternSize = {16,16},
offset = {0,0},
numSteps = {80,10},
stepSize = {16,16},
orderStep = {1,0}
})
-- a simple matrix for the desired output
m2 = matrix(10,{1,0,0,0,0,0,0,0,0,0})
-- a circular dataset which advances with step -1
train_output = dataset.matrix(m2,
{
patternSize = {10},
offset = {0},
numSteps = {800},
stepSize = {-1},
circular = {true}
})
This is a more complicated example of how to create datasets from matrices. The
variable train_input
is a dataset.matrix
generated by a sliding-window of
size 16x16 (the size of one digit), which moves in steps of 16x16 (first 16 in
columns, and when arrive to the end it moves 16 in rows and returns to column
0). The number of patterns (numSteps
) is 80 by rows and 10 by columns. The
output dataset needs an special matrix which contains only one 1 and 9 zeroes,
so the 1 on each pattern will correspond to its class. The dataset.matrix
in
this case slides backwards (stepSize={-1}
), so the 1 moves forward, and is
circular (window positions out of the matrix take the values of the opposite
matrix positions). It has 800 patterns (80x10).
For validation datasets the script is coded similarly:
-- VALIDATION --
val_input = dataset.matrix(m1,
{
patternSize = {16,16},
offset = {1280,0},
numSteps = {20,10},
stepSize = {16,16},
orderStep = {1,0}
})
val_output = dataset.matrix(m2,
{
patternSize = {10},
offset = {0},
numSteps = {200},
stepSize = {-1},
circular = {true}
})
However, in this case the val_input
dataset needs the option parameter
offset
to not be 0, because validation patterns are the 200 last patterns (it
begins at image row position 1280). The first 800 digits are used for training.
The MLP is generated following same steps as for XOR, but in this case the
topology description string uses tanh for activation of hidden layer, and
log_softmax for activation of output layer. In this case the use_fanin
and
use_fanout
flags are set to true, and the error function is
multi_class_cross_entropy
, which is a version of cross-entropy error function,
but mathematically simplified for log_softmax as
output activation functions (if you try other output you must use
mse
). The two-class version of cross-entropy (ann.loss.cross_entropy
) is simplified to be
used with log_logistic outputs:
bunch_size = 64
thenet = ann.mlp.all_all.generate("256 inputs 128 tanh 10 log_softmax")
trainer = trainable.supervised_trainer(thenet,
ann.loss.multi_class_cross_entropy(),
bunch_size)
trainer:build()
trainer:randomize_weights{
random = random(52324),
use_fanin = true,
use_fanout = true,
inf = -1,
sup = 1,
}
trainer:set_option("learning_rate", 0.01)
trainer:set_option("momentum", 0.01)
trainer:set_layerwise_option("w.*", "weight_decay", 1e-05)
For training, it is needed to declare a table which contains the pair input/output datasets and some specific parameters (i.e. shuffle random object to train each epoch with a different permutation of patterns):
training_data = {
input_dataset = train_input,
output_dataset = train_output,
shuffle = random(25234),
}
validation_data = {
input_dataset = val_input,
output_dataset = val_output,
}
The final snippet code train the MLP using holdout-validation, following a
stopping criterion which depends on the relative value between
current_epoch/best_validation_epoch: when this proportion is greater than 2 the
training is stopped (that is, MLP training will stop at 200 epochs if the last
best validation epoch is at epoch 100; MLP training will stop at 400 epochs if
the last best validation epoch is at epoch 200). Stopping criterion is selected
using function helper
trainable.stopping_criteria.make_max_epochs_wo_imp_relative
, and the MLP is
trained using the class trainable.train_holdout_validation
. This last class
receives a table which fields are self-explanatory, and follows a
holdout-validation algorithm in its execute
method, and after each epoch
get_state_string
method is used for output facilities.
print("# Epoch Training Validation BestEpoch BestValidation")
stopping_criterion =
trainable.stopping_criteria.make_max_epochs_wo_imp_relative(2)
train_func = trainable.train_holdout_validation{
min_epochs = 4,
max_epochs = 1000,
stopping_criterion = stopping_criterion,
}
clock = util.stopwatch()
clock:go()
epoch_function = function()
local tr_loss = trainer:train_dataset(training_data)
local va_loss = trainer:validate_dataset(validation_data)
return trainer,tr_loss,va_loss
end
while train_func:execute(epoch_function) do
print(train_func:get_state_string())
end
clock:stop()
cpu,wall = clock:read()
num_epochs = result.last_epoch
printf("# Wall total time: %.3f per epoch: %.3f\n", wall, wall/num_epochs)
printf("# CPU total time: %.3f per epoch: %.3f\n", cpu, cpu/num_epochs)
printf("# Validation error: %f", result.best_val_error)
This introduction explains you the basic steps to write and execute scripts for pattern recognition using ANNs and the toolkit APRIL-ANN. Please, feel free to use this scripts as initial template for yours ;)
APRIL-ANN has a lot of interesting features. The following list show the most important features, which are detailed in the following sections of this documentation:
-
Multidimensional
matrix
library. It allows to perform efficient mathematical operations in Lua. -
Abstract token definition. A token represents anything, and is used in several parts of the toolkit for information interchange:
matrix
instances can be wrapperd into atokens.matrix
instance, and they are interchangable in ANN components. -
Dataset abstraction. It has the ability to build powerful sliding windows over matrices. At the same time, it is possible to filter datasets producing new datasets on-the-fly. Two abstraction exists:
dataset
, anddataset.token
. -
Artificial neural networks. Different packages are implemented to perform efficient training of ANNs. Three main concepts: ANN component, loss function and optimization algorithm.
-
Trainable package. This package knows all the ANNs stuff, and is a good start point to work with ANNs. Implements a lot of useful code for intrspection, training and testing.
-
Random package. The generation of pseudo-random numbers is in this package.
-
Automatic differentiation. For more advanced machine learning, an experimental library for automatic differentiation has been added. It allows to specify totally more general models than ANNs abstraction, but with an important loss in efficiency. However, it is useful to do cool things for research with a little implementation effort, before implement them in ANNs.
-
Matlab package. It allows to load (not save) matrices and data in MAT format. It stills in experimental phase, but the most important things are available.
-
Statistics package. Look here for some statistics standard techniques. PCA, running mean and variance computation, pearson correlation, ...
-
Complex numbers. In experimental phase, APRIL-ANN allows to work with complex numbers, and complex matrices.
-
Util package. It contains a lot of utilities for Lua script development.
-
GZIO package. This is the binding of libZ for load/save of compressed files.
-
Image and ImageIO packages. The class Image allows to work with color or gray images. The package ImageIO implements useful functions for generic read/write of images, depending in their extension.