Home

Notes from first meeting on October 29, 2014.

#ClusterFux

https://github.com/TheCodingCollective/clusterFux http://www.ncbi.nlm.nih.gov/pubmed?term=21179090

FWIW, I really liked Steven's density based clustering. I'd be tot. up for using that (perhaps with the newer dimensionality reduction algorithm he told me about) and splitting the result into clusters w/ the OPTICS algorithm, which Steven said is implemented in Python. ...I'd like to keep up w/ what goes on today re. clustering, but I'm stuck in BKLY b/c Rosie's sick.

#Project

Use the same dataset for many different clustering methods

The maloof lab also has a bunch of tomato data
Sundar lab: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50777
Open Count Data: http://bowtie-bio.sourceforge.net/recount/

##Clustering Methods

kohonen (Self Organizing Maps) - R, need to pick the # clusters
kmeans() -
hclust() - CUT TREE (consistency between high vs low level analysis - smaller clusters are part of larger ones)
HTScluster - High throughput sequences. server, CPU intensive
WGCNA - hclust() wrapper and ease of use "dynamic tree cut"

Ordinations

PCA - euclidian
MDS - used to check RNA-seq samples are clustering together in lower-dimensional space
PCOA - any distance measure- not just euclidean
NMDS - removes horseshoe effect; choose # of dimensions, finds a good projection onto that exact # of dimensions, vs PCA where you would be visualizing the first 2 or 3 dimensions of a projection onto (N-1) dimensions.
CoCA

http://cran.r-project.org/web/views/Cluster.html these are all the R packages related to clustering

http://cran.r-project.org/web/packages/kohonen/kohonen.pdf self organized maps

##Which genes to include in the analysis:

Top 25% co-variance
differentially expressed only
log fold change cutoff
consider genes with expression above that of a gene known to be expressed (?)

##Normalization

DEseq

Cameron - upload your powerpoint slides.

Clustering Analysis

Data type: time series

Discrete vs Analog data
Replicates - to pool or not to pool
Clustering leading to network construction

Distributions are different between microarray / RNAseq data. Best to do intersections in order to find similarities between experiments.

##Picking the cluster numbers

In model based this is not needed
If not how? Resources? -Political

##After cluster analysis

motif enrichment
go enrichment
promoter enrichement

##Questions

Nested designs for clustering?

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50777

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clustering Analysis

Clone this wiki locally