Skip to content

25 ann.autoencoders

Paco Zamora Martinez edited this page Jan 2, 2014 · 3 revisions

Introduction

Package autoencoders could be loaded via the standalone binary, or in Lua with require("aprilann.autoencoders).

Stacked Denoising Auto-Encoders (SDAE) are a kind of deep neural network which is pre-trained following greedy layerwise algorithm but introducing at noise input of each layerwise auto-encoder. Some function facilities are implemented to help with the training of SDAE.

Greedy layerwise pre-training of SDAE

Greedy layerwise pre-training consists in train each pair of layers, from input to output, in a greedy way (see Paper SDAE, 2010, Vincent Pascal et al.). Pre-training receives as input a table with parameters of training algorithm. For example, a table like this:

layers = {
  { size= 256, actf="logistic"}, -- INPUT
  { size= 256, actf="logistic"}, -- FIRST HIDDEN LAYER
  { size= 128, actf="logistic"}, -- SECOND HIDDEN LAYER
  { size=  32, actf="logistic"}, -- THIRD HIDDEN LAYER
}
perturbation_random = random(824283)
params_pretrain = {
  input_dataset         = train_input,  -- a dataset which is the input of the autoencoders
  replacement           = nil,          -- a number (or nil) indicating replacement
  on_the_fly            = false,        -- a boolean (or nil) for on-the-fly
  shuffle_random        = random(1234), -- for shuffle durint backpropagation
  weights_random        = random(7890), -- for weights random initialization
  layers                = layers,       -- layers description
  supervised_layer      = { size = 10, actf = "log_softmax" }, -- it is possible to pre-train supervised layer
  output_datasets       = { train_output }, -- the output dataset
  bunch_size            = bunch_size,       -- the size of the mini-batch
  optimizer             = function() return ann.optimizer.sgd() end, -- optimizer function
  training_options      = { -- this table contains learning options and dataset noise filters
    -- global options
    global = {
      -- pure ANN learning hyperparameters
      ann_options = { learning_rate = 0.01,
                      momentum      = 0.02,
                      weight_decay  = 1e-05 },
      -- noise filters (a pipeline of filters applied to input in order). Each one must be a dataset
      noise_pipeline = { function(ds) return dataset.perturbation{ -- gaussian noise
                             dataset  = ds,
                             mean     = 0,    -- gaussian mean
                             variance = 0.01, -- gaussian variance
                             random   = perturbation_random } end,
                         function(ds) return dataset.salt_noise{ -- salt noise (or mask noise)
                             dataset  = ds,
                             vd       = 0.10, -- percentage of values masked
                             zero     = 0.0,  -- mask value
                             random   = perturbation_random } end },
      min_epochs            = 4,
      max_epochs            = 200,
      pretraining_percentage_stopping_criterion = 0.01,
    },
    -- it is possible to overwrite global values with layerwise dependent values (also noise_pipeline)
    layerwise = { { min_epochs=50 },  -- first autoencoder pretraining
                  { min_epochs=20 },  -- second autoencoder pretraining
                  { ann_options = { learning_rate = 0.04,
                                    momentum      = 0.02,
                                    weight_decay  = 4e-05 },
                    min_epochs=20 },    -- third autoencoder pretraining
                  { min_epochs=10 }, }, -- supervised pretraining
  }
}

Fields supervised_layer and output_datasets are optional. If they are given, the last layer will be pre-trained in a supervised manner. Anyway, rest of layers are pre-trained in a unsupervised manner.

If field input_dataset is supplied, then distribution field is forbidden and, in case of pre-train supervised layer, output_datasets table must contain only one element.

If field distribution is supplied, then input_dataset is forbidden and, in case of pre-train supervised layer, output_datasets table has the same number of items than distribution table. In this last case, each item output_datasets[i] is the corresponding supervised output dataset for each item of distribution[i].input_dataset.

Ths table is used passed as argument to the algorithm:

sdae_table,deep_net = ann.autoencoders.greedy_layerwise_pretraining(params_pretrain)

This function returns one or two tables:

  • sdae_table = { bias={ ... }, weights={ ... } }: which contains bias and weights of each unsupervised pre-trained layer.

  • deep_net: An ANN component. It could be used to fine-tuning training. If you don't pre-train supervised layer, this component needs that you manually push the supervised layer.

### Building codifier from SDAE table ###

codifier_net = ann.autoencoders.build_codifier_from_sdae_table(sdae_table,
							       bunch_size,
							       layers)

The codifier is the SDAE without the supervised layer at output. Needs the same layers definition as greedy pre-trainig function. Returns an ANN object which could receive a pattern as input and produces its encoding.

### Fine-tunning supervised deep ANN ###

The supervised deep ANN could be fine-tuned using cross-validation training algorithm. If you pre-trained supervised layer, object deep_net is directly the whole ANN. Otherwise, you will need to add a new layer to the codifier_net, as in this example:

-- if you want, you could clone the deep_net to keep it as it is
local codifier_net = deep_net:clone()
codifier_net:build{ weights = deep_net:copy_weights() }
-- We add an output layer with 10 neurons and softmax activation function
local last_layer = ann.components.hyperplane{
  dot_product_weights="lastw",
  bias_weights="lastb",
  output=10
}
deep_net:push( last_layer )
deep_net:push( ann.components.actf.log_softmax()
trainer = trainable.supervised_trainer(deep_net, loss_function or nil, bunch_size or nil)
-- The output size needs to be overwitten, so it needs to be given at build method
trainer:build{ output = 10 }
weights_random = random(SEED)

-- Now, EXITS TWO WAYS to randomize the weights of last_layer
-- FIRST using the trainer
trainer:randomize_weights{
  name_match="^last[bw]$", -- the name_match is to only randomize connections which name matches
  inf=-0.1,
  sup=0.1,
  random=weights_random
}
-- SECOND using the component
-- (BE CAREFUL AND USE ONLY ONE OF THIS WAYS)
for _,cnn in pairs(last_layer:copy_weights()) do
  cnn:randomize_weights{
    inf=-0.1,
    sup=0.1,
    random=weights_random
end

### Compute encoding ###

With a trained SDAE (without supervised layer), it is possible to compute encodings of input patterns using this function:

trainer = trainable.supervised_trainer(codifier_net)
encoded_dataset = trainer:use_dataset(input_dataset)
Clone this wiki locally