activ

Clustering

Input

Before you can run the clustering pipeline, you will need to generate an input file. Here is an example code snippet that demonstrates how to do that:

import numpy as np
from activ.clustering import write_clustering_input

path = 'path_to_save_input.h5'
X = np.random.random(100).reshape(10,10)
Y = np.random.random(100).reshape(10,10)

write_clustering_input(path, X, Y)

The following command calls activ.clustering, activ.readfile, activ.utils. This script is used to run a clustering pipeline that clusters response data (i.e. "outcomes") to create labels for each sample. The quality of the labels are assessed by predicting those labels with the predictor data (i.e. "biomarkers").

$ python bin/run_subsample_umap_clustering.py

It can be run in parallel with MPI.

$ mpirun -n <N_RANKS> python bin/run_subsample_umap_clustering.py

The results of this workflow are visualized with the notebook notebooks/plot_clustering_results.ipynb. This notebook pulls functions from activ.clustering.summarize

Latent features (i.e. NMF)

This package contains a subpackage for visualizing NMF result, activ.nmf.viz. Examples uses can be found in notebooks/NMF_viz.ipynb.

CCA - Canonical correlation analysis

Within this package, there are a few different implementations of sparse canonical correlation analysis. The most mature is truly-alternating least squares CCA [1]. In [1] least squares steps are regularized using an L2-norm. The implementation here regularizes using an L1-norm to identify sparse weights. This is implemented in the subpackage activ.cca.alscca in the class TALSCCA.

from activ.cca.alscca import TALSCCA
from activ.readfile import load_data
tbifile = load_data()
talscca = TALSCCA(scale=True)
talscca.fit(tbifile.biomarkers, tbifile.outcomes)
bm_cv, oc_cv = talscca.transform(tbifile.biomarkers, tbifile.outcomes)

Additional example use can be found in notebooks/TALS_CCA_CV.ipynb

CT Measures

CT analysis is done using executable submodules, rather than executable Python scripts.

Reformat data using python -m activ.ct.convert
Summarize data using python -m activ.ct.summarize

Test data for running step 1 can be found in data/ct/115Label_fake.

References

Zhiqiang Xu and Ping Lie, Towards Practical Alternating Least-Squares for CCA

Name		Name	Last commit message	Last commit date
Latest commit History 520 Commits
activ		activ
bin		bin
data		data
jenkins		jenkins
journal @ f1d9589		journal @ f1d9589
notebooks		notebooks
paper		paper
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

activ

Clustering

Input

Latent features (i.e. NMF)

CCA - Canonical correlation analysis

CT Measures

References

About

Releases

Packages

Contributors 2

Languages

BouchardLab/ML_4_prec_prognosis

Folders and files

Latest commit

History

Repository files navigation

activ

Clustering

Input

Latent features (i.e. NMF)

CCA - Canonical correlation analysis

CT Measures

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages