Skip to content

Visualization codes from the 2nd "speeding up science workshop". This repository contains codes to plot and calculate correlation (linear regression) between metagenomic and metatranscriptomic sequencing results acquired from the same sample.

License

Notifications You must be signed in to change notification settings

speeding-up-science-workshops/speeding-up-sci-correlation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

speeding-up-sci-correlation

Binder DOI

Summary

Visualization codes from the 2nd "speeding up science workshop". This repository contains codes to plot and calculate correlation (linear regression) between metagenomic and metatranscriptomic sequencing results acquired from the same sample.

There is an example of visualizing gene abundances (DNA) compared to their expression levels (RNA) included in the binder. The data are from one Mediterranean site from the TARA Oceans project (https://science.sciencemag.org/content/348/6237/1261359.long). The metagenomics come from sample accession SAMEA2619782, and the metatransciptomics come from SAMEA2619784.

Quick Start

  • Click the jupyter notebook file (Correlation_dna_rna.ipynb) to enter the interactive user interface.
  • You can either run the notebook with the included example or upload new data files by clicking the Upload button at the upper right corner of the binder homepage. See below for examples of input files.
  • The code chunks can be excuted by pressing Ctrl + Enter, or click the Run button on top of the notebook.

Example Input

1. A count table containing genes found in both DNA and RNA sequencing results.

Gene DNA RNA
0 TOBG-MED-1076_1101 3.57863 12.9926
1 TOBG-MED-1076_1116 0.71486 4.03726
2 TOBG-MED-1076_1131 7.72704 5.45492
3 TOBG-MED-1076_1151 2.85944 15.5723
4 TOBG-MED-1076_1195 8.81305 12.3797

2. An annotation table. If a gene has "nan" value for KO ID, this means that this gene does not have any match within the KEGG database. These "nan" values will be removed.

Gene KO_ID
0 TOBG-MED-1076_1019 nan
1 TOBG-MED-1076_1027 nan
2 TOBG-MED-1076_1028 K04084
3 TOBG-MED-1076_1032 nan
4 TOBG-MED-1076_1038 K05540

3. A KEGG Orthology table. Each KO ID may belong to multiple pathways. Therefore, the user will need to manually curate this table.

KO_ID Category1
0 K00360 Nitrogen metabolism
1 K00362 Nitrogen metabolism
2 K00363 Nitrogen metabolism
3 K00366 Nitrogen metabolism
4 K00367 Nitrogen metabolism

Example Output

The figure below is a static example of the output figure. The actual figure generated by Binder is an interactive plot. User can hover over each dot and line to see their annotation.

Authors

Links

About

Visualization codes from the 2nd "speeding up science workshop". This repository contains codes to plot and calculate correlation (linear regression) between metagenomic and metatranscriptomic sequencing results acquired from the same sample.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 69.8%
  • HTML 30.2%