Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any plans to generate covariates for hg38? #30

Open
pwaltman opened this issue May 22, 2019 · 12 comments
Open

any plans to generate covariates for hg38? #30

pwaltman opened this issue May 22, 2019 · 12 comments

Comments

@pwaltman
Copy link

I realize that this isn't really an issue, but I'm curious if you have given thought to generating a comparable set of covariates for hg38?

@im3sanger
Copy link
Owner

Hello,

Thanks for your suggestion. The covariates for hg19 were generated using files from Epigenomics RoadMap. Unfortunately, their files are only available for hg19, as far as I am aware, although there are a few liftovered to GRCh38 that I could use. If there is enough demand from users, I could try generating some covariates (even with the limitions above).

Users can also generate their own covariates, for example using expression data, chromatin data or even coverage metrics from their own experiments. As described in the dNdScv tutorial, covariates can be fed as a numeric matrix with one covariate per column and genes as rownames.

load("RefCDS_human_GRCh38.p12.rda")
gene_list = sapply(RefCDS, function(x) x$gene_name) # List of gene names from the GRCh38 object

Also, please note that dndscv can be run on GRCh38 without covariates.

Inigo

@carladosanjos
Copy link

Hi Iñigo,
Just letting you know that I am one of those users interested in the covariates for the GRCh38 versions.
Cheers

@JMarzec
Copy link

JMarzec commented Jul 10, 2020

Hey Inigo,

Same as above, we find your method very useful and use it quite frequently in our research and so we'd be interested in the covariates for the GRCh38 versions as well.
Many thanks

@skanwal
Copy link

skanwal commented Jul 10, 2020

Hi Inigo @im3sanger

First, thanks for the great method. It's super useful and use it for conducting driver analysis for clinical samples, in our research group.
We use genome build 38, so at the moment stuck with what covariates to use for the GRCH38 version.

It'll be great help if you can provide covariates for v38.

Best,
Sehrish

@im3sanger
Copy link
Owner

Hi all,

Thank you for your interest. I will try to create covariates for GRCh38 in the near future as more users are requesting them. In the meantime, please remember that you can still use dndscv on GRCh38 without covariates, using "cv=NULL" as an argument to dndscv and using the reference object from the link below:
https://github.com/im3sanger/dndscv_data/tree/master/data

You can also feed dndscv your own covariates, such as expression level or coverage per gene, as described in the tutorial below:
http://htmlpreview.github.io/?http://github.com/im3sanger/dndscv/blob/master/vignettes/buildref.html

Please feel free to continue expressing your interest and I will do my best to generate covariates in the near future. I will post a note here as soon as new covariates are available.

Best wishes,
Inigo

@alhafidzhamdan
Copy link

Hi Inigo, another one here requesting covariates for hg38 please!
The tutorial listed at http://htmlpreview.github.io/?http://github.com/im3sanger/dndscv/blob/master/vignettes/buildref.html does not explain how to convert expression/epigenomic data to principal components; perhaps if you could help us with this instead? Either way, much appreciated!
A

@skanwal
Copy link

skanwal commented Aug 14, 2020

Hi @im3sanger,

Wondering, if you had a chance to look into generating the covariates for GRCh38?

Best,
Sehrish.

@xtmgah
Copy link

xtmgah commented Jan 21, 2021

Hi All:

Any progress on developing the covariates for hg38? Thanks.

@vivekruhela
Copy link

I am also requesting covariates for hg38. Please let us know when I can get that. Thanks.

@joonan30
Copy link

Hi Inigo, I am also looking forward hg38 covariates. This tool is amazing - please end the era of mutsigCV (which is no longer under maintenance and super-long hours of running).

@im3sanger
Copy link
Owner

Thank you everyone for the nudges. We are putting together a set of GRCh38 covariates, which we are hoping to release in the next few weeks. I will update this thread when they are available.

@im3sanger
Copy link
Owner

Hi everyone,

Thank you for your patience. I have uploaded a new RefCDS and new covariates for GRCh38/hg38 here:
https://github.com/im3sanger/dndscv_data/tree/master/data

The new covariate file is called: covariates_hg19_hg38_epigenome_pcawg.rda.

And they can be used with the following new RefCDS files for GRCh37/hg19 and GRCh38/hg38:
RefCDS_human_GRCh38_GencodeV18_recommended.rda
RefCDS_human_hg19_GencodeV18_newcovariates.rda

So you can run dndscv on hg38 using:

load("covariates_hg19_hg38_epigenome_pcawg.rda") # Loads the covs object
dndsout = dndscv(mutations, refdb = "RefCDS_human_GRCh38_GencodeV18_recommended.rda", cv = covs)

These covariates were developed by Federico Abascal. Big thanks to him! They were generated combining epigenomic data (from the Roadmap Epigenomics) and whole-genome mutation density vectors (from the PCAWG consortium), collapsed into 20 principal components. They were tested on TCGA data, at a pancancer level and on individual cancer types, and appear to perform generally well.

We may further refine them in the next few weeks and I will try to integrate them in the package by default, so that you can use dndscv on hg38 without downloading additional files. But I wanted to share them with you without further delay.

Please do share any feedback, positive or negative, as we are still testing them and any feedback helps.

Best,
Inigo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants