Skip to content

BouchardLab/ML_4_prec_prognosis

Repository files navigation

activ

Clustering

Input

Before you can run the clustering pipeline, you will need to generate an input file. Here is an example code snippet that demonstrates how to do that:

import numpy as np
from activ.clustering import write_clustering_input

path = 'path_to_save_input.h5'
X = np.random.random(100).reshape(10,10)
Y = np.random.random(100).reshape(10,10)

write_clustering_input(path, X, Y)

The following command calls activ.clustering, activ.readfile, activ.utils. This script is used to run a clustering pipeline that clusters response data (i.e. "outcomes") to create labels for each sample. The quality of the labels are assessed by predicting those labels with the predictor data (i.e. "biomarkers").

$ python bin/run_subsample_umap_clustering.py

It can be run in parallel with MPI.

$ mpirun -n <N_RANKS> python bin/run_subsample_umap_clustering.py

The results of this workflow are visualized with the notebook notebooks/plot_clustering_results.ipynb. This notebook pulls functions from activ.clustering.summarize

Latent features (i.e. NMF)

This package contains a subpackage for visualizing NMF result, activ.nmf.viz. Examples uses can be found in notebooks/NMF_viz.ipynb.

CCA - Canonical correlation analysis

Within this package, there are a few different implementations of sparse canonical correlation analysis. The most mature is truly-alternating least squares CCA [1]. In [1] least squares steps are regularized using an L2-norm. The implementation here regularizes using an L1-norm to identify sparse weights. This is implemented in the subpackage activ.cca.alscca in the class TALSCCA.

from activ.cca.alscca import TALSCCA
from activ.readfile import load_data
tbifile = load_data()
talscca = TALSCCA(scale=True)
talscca.fit(tbifile.biomarkers, tbifile.outcomes)
bm_cv, oc_cv = talscca.transform(tbifile.biomarkers, tbifile.outcomes)

Additional example use can be found in notebooks/TALS_CCA_CV.ipynb

CT Measures

CT analysis is done using executable submodules, rather than executable Python scripts.

  1. Reformat data using python -m activ.ct.convert
  2. Summarize data using python -m activ.ct.summarize

Test data for running step 1 can be found in data/ct/115Label_fake.

References

  1. Zhiqiang Xu and Ping Lie, Towards Practical Alternating Least-Squares for CCA

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published