speeding-up-sci-correlation

Summary

Visualization codes from the 2nd "speeding up science workshop". This repository contains codes to plot and calculate correlation (linear regression) between metagenomic and metatranscriptomic sequencing results acquired from the same sample.

There is an example of visualizing gene abundances (DNA) compared to their expression levels (RNA) included in the binder. The data are from one Mediterranean site from the TARA Oceans project (https://science.sciencemag.org/content/348/6237/1261359.long). The metagenomics come from sample accession SAMEA2619782, and the metatransciptomics come from SAMEA2619784.

Quick Start

Click the jupyter notebook file (Correlation_dna_rna.ipynb) to enter the interactive user interface.
You can either run the notebook with the included example or upload new data files by clicking the Upload button at the upper right corner of the binder homepage. See below for examples of input files.
The code chunks can be excuted by pressing Ctrl + Enter, or click the Run button on top of the notebook.

Example Input

1. A count table containing genes found in both DNA and RNA sequencing results.

	Gene	DNA	RNA
0	TOBG-MED-1076_1101	3.57863	12.9926
1	TOBG-MED-1076_1116	0.71486	4.03726
2	TOBG-MED-1076_1131	7.72704	5.45492
3	TOBG-MED-1076_1151	2.85944	15.5723
4	TOBG-MED-1076_1195	8.81305	12.3797

2. An annotation table. If a gene has "nan" value for KO ID, this means that this gene does not have any match within the KEGG database. These "nan" values will be removed.

	Gene	KO_ID
0	TOBG-MED-1076_1019	nan
1	TOBG-MED-1076_1027	nan
2	TOBG-MED-1076_1028	K04084
3	TOBG-MED-1076_1032	nan
4	TOBG-MED-1076_1038	K05540

3. A KEGG Orthology table. Each KO ID may belong to multiple pathways. Therefore, the user will need to manually curate this table.

	KO_ID	Category1
0	K00360	Nitrogen metabolism
1	K00362	Nitrogen metabolism
2	K00363	Nitrogen metabolism
3	K00366	Nitrogen metabolism
4	K00367	Nitrogen metabolism

Example Output

The figure below is a static example of the output figure. The actual figure generated by is an interactive plot. User can hover over each dot and line to see their annotation.

Authors

Zhengyao "Zeya" Xue, Github ID @zeyaxue and ORCID
Michael D. Lee, Github ID @AstrobioMike and ORCID

Links

Zenodo Binder:
Github Binder:
Github Repository: https://github.com/speeding-up-science-workshops/speeding-up-sci-correlation

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Correlation_dna_rna.html		Correlation_dna_rna.html
Correlation_dna_rna.ipynb		Correlation_dna_rna.ipynb
KEGG-annotations-all-genes.tsv		KEGG-annotations-all-genes.tsv
Kegg-Orthology-table.tsv		Kegg-Orthology-table.tsv
LICENSE		LICENSE
Our-Kegg-annotations.tsv		Our-Kegg-annotations.tsv
README.md		README.md
TARA_025-merged-TPM.tsv		TARA_025-merged-TPM.tsv
TARA_025-merged-only-those-in-both-TPM.tsv		TARA_025-merged-only-those-in-both-TPM.tsv
TARA_025-merged-only-those-in-both-full-TPM.tsv		TARA_025-merged-only-those-in-both-full-TPM.tsv
TARA_025-merged-raw-counts.tsv		TARA_025-merged-raw-counts.tsv
environment.yml		environment.yml
kegg-groups-Lee-2017.tsv		kegg-groups-Lee-2017.tsv
pd_df2md.ipynb		pd_df2md.ipynb
sample-info.tsv		sample-info.tsv
sample_output.svg		sample_output.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speeding-up-sci-correlation

Summary

Quick Start

Example Input

1. A count table containing genes found in both DNA and RNA sequencing results.

2. An annotation table. If a gene has "nan" value for KO ID, this means that this gene does not have any match within the KEGG database. These "nan" values will be removed.

3. A KEGG Orthology table. Each KO ID may belong to multiple pathways. Therefore, the user will need to manually curate this table.

Example Output

Authors

Links

About

Releases

Packages

Languages

License

speeding-up-science-workshops/speeding-up-sci-correlation

Folders and files

Latest commit

History

Repository files navigation

speeding-up-sci-correlation

Summary

Quick Start

Example Input

1. A count table containing genes found in both DNA and RNA sequencing results.

2. An annotation table. If a gene has "nan" value for KO ID, this means that this gene does not have any match within the KEGG database. These "nan" values will be removed.

3. A KEGG Orthology table. Each KO ID may belong to multiple pathways. Therefore, the user will need to manually curate this table.

Example Output

Authors

Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages