25 ann.autoencoders

Introduction

Package autoencoders could be loaded via the standalone binary, or in Lua with require("aprilann.autoencoders).

Stacked Denoising Auto-Encoders (SDAE) are a kind of deep neural network which is pre-trained following greedy layerwise algorithm but introducing at noise input of each layerwise auto-encoder. Some function facilities are implemented to help with the training of SDAE.

Greedy layerwise pre-training of SDAE

Greedy layerwise pre-training consists in train each pair of layers, from input to output, in a greedy way (see Paper SDAE, 2010, Vincent Pascal et al.). Pre-training receives as input a table with parameters of training algorithm. For example, a table like this:

layers = {
  { size= 256, actf="logistic"}, -- INPUT
  { size= 256, actf="logistic"}, -- FIRST HIDDEN LAYER
  { size= 128, actf="logistic"}, -- SECOND HIDDEN LAYER
  { size=  32, actf="logistic"}, -- THIRD HIDDEN LAYER
}
perturbation_random = random(824283)
params_pretrain = {
  input_dataset         = train_input,  -- a dataset which is the input of the autoencoders
  replacement           = nil,          -- a number (or nil) indicating replacement
  on_the_fly            = false,        -- a boolean (or nil) for on-the-fly
  shuffle_random        = random(1234), -- for shuffle durint backpropagation
  weights_random        = random(7890), -- for weights random initialization
  layers                = layers,       -- layers description
  supervised_layer      = { size = 10, actf = "log_softmax" }, -- it is possible to pre-train supervised layer
  output_datasets       = { train_output }, -- the output dataset
  bunch_size            = bunch_size,       -- the size of the mini-batch
  optimizer             = function() return ann.optimizer.sgd() end, -- optimizer function
  training_options      = { -- this table contains learning options and dataset noise filters
    -- global options
    global = {
      -- pure ANN learning hyperparameters
      ann_options = { learning_rate = 0.01,
                      momentum      = 0.02,
                      weight_decay  = 1e-05 },
      -- noise filters (a pipeline of filters applied to input in order). Each one must be a dataset
      noise_pipeline = { function(ds) return dataset.perturbation{ -- gaussian noise
                             dataset  = ds,
                             mean     = 0,    -- gaussian mean
                             variance = 0.01, -- gaussian variance
                             random   = perturbation_random } end,
                         function(ds) return dataset.salt_noise{ -- salt noise (or mask noise)
                             dataset  = ds,
                             vd       = 0.10, -- percentage of values masked
                             zero     = 0.0,  -- mask value
                             random   = perturbation_random } end },
      min_epochs            = 4,
      max_epochs            = 200,
      pretraining_percentage_stopping_criterion = 0.01,
    },
    -- it is possible to overwrite global values with layerwise dependent values (also noise_pipeline)
    layerwise = { { min_epochs=50 },  -- first autoencoder pretraining
                  { min_epochs=20 },  -- second autoencoder pretraining
                  { ann_options = { learning_rate = 0.04,
                                    momentum      = 0.02,
                                    weight_decay  = 4e-05 },
                    min_epochs=20 },    -- third autoencoder pretraining
                  { min_epochs=10 }, }, -- supervised pretraining
  }
}

Fields supervised_layer and output_datasets are optional. If they are given, the last layer will be pre-trained in a supervised manner. Anyway, rest of layers are pre-trained in a unsupervised manner.

If field input_dataset is supplied, then distribution field is forbidden and, in case of pre-train supervised layer, output_datasets table must contain only one element.

If field distribution is supplied, then input_dataset is forbidden and, in case of pre-train supervised layer, output_datasets table has the same number of items than distribution table. In this last case, each item output_datasets[i] is the corresponding supervised output dataset for each item of distribution[i].input_dataset.

Ths table is used passed as argument to the algorithm:

sdae_table,deep_net = ann.autoencoders.greedy_layerwise_pretraining(params_pretrain)

This function returns one or two tables:

sdae_table = { bias={ ... }, weights={ ... } }: which contains bias and weights of each unsupervised pre-trained layer.
deep_net: An ANN component. It could be used to fine-tuning training. If you don't pre-train supervised layer, this component needs that you manually push the supervised layer.

### Building codifier from SDAE table ###

codifier_net = ann.autoencoders.build_codifier_from_sdae_table(sdae_table,
							       bunch_size,
							       layers)

The codifier is the SDAE without the supervised layer at output. Needs the same layers definition as greedy pre-trainig function. Returns an ANN object which could receive a pattern as input and produces its encoding.

### Fine-tunning supervised deep ANN ###

The supervised deep ANN could be fine-tuned using cross-validation training algorithm. If you pre-trained supervised layer, object deep_net is directly the whole ANN. Otherwise, you will need to add a new layer to the codifier_net, as in this example:

-- if you want, you could clone the deep_net to keep it as it is
local codifier_net = deep_net:clone()
codifier_net:build{ weights = deep_net:copy_weights() }
-- We add an output layer with 10 neurons and softmax activation function
local last_layer = ann.components.hyperplane{
  dot_product_weights="lastw",
  bias_weights="lastb",
  output=10
}
deep_net:push( last_layer )
deep_net:push( ann.components.actf.log_softmax()
trainer = trainable.supervised_trainer(deep_net, loss_function or nil, bunch_size or nil)
-- The output size needs to be overwitten, so it needs to be given at build method
trainer:build{ output = 10 }
weights_random = random(SEED)

-- Now, EXITS TWO WAYS to randomize the weights of last_layer
-- FIRST using the trainer
trainer:randomize_weights{
  name_match="^last[bw]$", -- the name_match is to only randomize connections which name matches
  inf=-0.1,
  sup=0.1,
  random=weights_random
}
-- SECOND using the component
-- (BE CAREFUL AND USE ONLY ONE OF THIS WAYS)
for _,cnn in pairs(last_layer:copy_weights()) do
  cnn:randomize_weights{
    inf=-0.1,
    sup=0.1,
    random=weights_random
end

### Compute encoding ###

With a trained SDAE (without supervised layer), it is possible to compute encodings of input patterns using this function:

trainer = trainable.supervised_trainer(codifier_net)
encoded_dataset = trainer:use_dataset(input_dataset)

Intro
matrix
tokens
dataset
ann 21. ann.loss 22. ann.optimizer 23. ann.graph 25. ann.autoencoders
trainable
random
autodiff 31. autodiff.ann
matlab
stats 51. stats.MI
complex
util
gzio
Image
ImageIO
AffineTransform2D
class
clustering
knn
hyperopt
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

25 ann.autoencoders

Introduction

Greedy layerwise pre-training of SDAE

Clone this wiki locally