Tutorial

How to use PEER

Peer can be either used as standalone tool or using one of it's interfaces (currently R and Python). Below you can find a brief introduction to either one of these ways to get started with PEER.

Standalone tool

1. Basic application

As a minimum, PEER requires an expression matrix in csv (comma-separated) format, specified with the -f option. The matrix is assumed to have N rows and G columns, where N is the number of samples, and G is the number of genes. The basic command to apply peer to such a matrix given in expression.csv, learning K=10 hidden confounders is

> peer -f expression.csv -n 10

PEER can read comma-separated (.csv) or tab separated (.tab) files. If the expression data file has a header row, you have to give --has_header switch on the command line.

2. Changing output

The output is written to directory peer_out by default, creating csv files for the residuals after accounting for the factors (residuals.csv, NxG matrix), the inferred factors (X.csv, NxK), the weights of each factor for every gene (W.csv, GxK), and the inverse variance of the weights (Alpha.csv, Kx1).

You can change the output directory with the -o option, e.g.

> peer -f expression.csv -n 5 -o peer_k-5

If you are not interested in the posterior estimates of all the variables, you can suppress their output with switches. For example, to only output the residuals, you can use

> peer -f expression.csv --no_a_out --no_w_out --no_x_out

The --no_res_out switch suppresses the output of residuals.

3. Including covariates

If there are measured experimental variables that may contribute to variability in the data, they can be included in the inference, and specified with the -c flag.

> peer -f expression.csv -c covariates.csv

The covariates file should be in csv or tab format, and have N rows and C columns, where N matches the number of samples in the expression file.

By default, PEER includes a covariate for the mean (a vector of ones). If you do not want this behaviour, you can switch it off with the --no_mean_covariate flag.

4. Inference parameters

As default, PEER iterates through updates of every variable 100 times. To set it to say, 1000, use

> peer -f expression.csv -i 100

PEER finishes if the increase in lower bound on the model evidence ceases to change, or the variance of the residuals has stabilised. The limiting values (tolerances) can be specified as

> peer -f expression.csv --bound_tolerance=0.1 --var_tolerance=0.00000001

In general you can keep the bound tolerance fairly high, but should keep the variation tolerance quite low compared to the variance of the expression matrix. If unsure, use the default values.

Finally, the prior parameters on the noise and weight precision distributions can also be changed. As these are both gamma distributed, you can specify the a and b parameters of both:

> peer -f expression.csv --e_pa=1.0 --e_pb=0.01 --a_pa=10.0 --a_pb=100

PYTHON interface

PEER offers an easy API interface to the core functions form PYTHON. Below the same functionality from the standalone tool is mirrored from PYTHON:

   import sys
   import scipy as SP
   import peer

    #1. load expression dataset
    Y = SP.loadtxt('expression.csv')

    #2. run PEER
    #use up to 20 factors
    Kinf = 20
    #maximum number of iterations
    Nmax_iterations = 100

    vb = peer.VBFA()
    #set data and parameters
    #number of factor for learning
    vb.setNk(Kinf)
    #fit mean effect ?
    vb.setAdd_mean(False)
    vb.setPhenoMean(Y)
    #set prior settings
    #(these are the default settings of PEER)
    vb.setPriorAlpha(0.001,0.1);
    vb.setPriorEps(0.1,10);
    vb.setNmax_iterations(Nmax_iterations)
    vb.update()

    #investigate inferance results
    #factors:
    X = vb.getX()
    #weights:
    W = vb.getW()
    #ARD parameters
    Alpha = vb.getAlpha()

    #get corrected dataset:
    Yc = vb.getResiduals()

    #3. plotting (requires matplotlib)
    if 1:
            import pylab as PL
            #plot relevance of factors:
            PL.figure(1)
            PL.plot(1/Alpha)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly