Skip to content

Celluloid: clustering single cell sequencing data around centroids

License

Notifications You must be signed in to change notification settings

AlgoLab/celluloid

Repository files navigation

celluloid

PyPI version

Celluloid: clustering single cell sequencing data around centroids

How to install

pip3 install celluloid-clust

Input formats

Single Cell file

The input file (specified by the -i parameter) is expected to be a ternary matrix file where the rows represent the cells and the columns the mutations. Each cell must be separated by a space. Each cell of the matrix can be:

Value of cell Meaning
I[i,j] = 0 Mutation j is not observed in cell i
I[i,j] = 1 Mutation j is observed in cell i
I[i,j] = 2 There is no information for mutation j in cell i, i.e. low coverage

Mutations file

This optional file specifies the name of the mutations (parameter -l). Each mutation's name must be on a different line (separated by \n), and the names are assigned to columns from left to right in the input file. If this file is not provided, mutations are progressively named from 1 to the total number of mutations.

Output formats

Celluloid will output one Single Cell file and one Mutations file in the same format as the input but with a number of mutations reduced by the clustering. Labels of mutation are concatenated in the out label file.

How to run

usage: celluloid.py [-h] -i INPUT [-m {huang,random}] [-d {conflict,matching}]
                    [-n N] -k K [-l LABELS] -o OUTDIR [-v]

Celluloid: clustering single cell sequencing data around centroids

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        input file
  -m {huang,random}, --method {huang,random}
                        initialization method
  -d {conflict,matching}, --dissim {conflict,matching}
                        dissimilarity measure
  -n N, --n_inits N     number of iterations
  -k K, --kmodes K      number of modes
  -l LABELS, --labels LABELS
                        label file
  -o OUTDIR, --outdir OUTDIR
                        output directory.
  -v, --verbose         verbose

Replicating the experiments

The preprint of an article outlining a study on Celluloid can be found at https://doi.org/10.1101/586545

A workflow for reproducing the experiments of this study is described as a "Snakefile", which needs to be run with snakemake.

Overview

The workflow is designed to be run from this directory. In principle, you only need to run snakemake here and then the rest is done automatically, but all dependencies need to be installed first. To install dependencies, see How to install, and then install sasc:

git clone https://github.com/sciccolella/sasc
cd sasc && gcc -o sasc *.c -std=gnu99 -g -lm

Run the workflow

Start the workflow with

nice snakemake -p -j 16

Adjust the 16, which is the number of cores you want to use.

Results

The resulting files for each dataset, e.g., data/exp1/sim_1_scs.txt will be reported in a directory named data/exp1/sim_1.{parameters}.celluloid (resp., data/exp1/sim_1.{parameters}.kmeans) for the clustering -- and the downstream run of sasc -- with celluloid (resp., kmeans), where {parameters} are the parameters, e.g., initialization method, dissimilarity measure, k, associated with corresponding clustering method

About

Celluloid: clustering single cell sequencing data around centroids

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages