Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcfR Subsetting to the first chromosome #211

Open
charbel2998 opened this issue Apr 19, 2023 · 1 comment
Open

vcfR Subsetting to the first chromosome #211

charbel2998 opened this issue Apr 19, 2023 · 1 comment

Comments

@charbel2998
Copy link

Hello,
I am writing this code:

library(vcfR)
library(ape)
#input vcf Data S17 RES SNPs
vcf = read.vcfR("/Desktop/Final_Table/RES_S17_readyAnalysis_SNP.vcf")
genome = ape::read.dna("
/Desktop/Resource/hg38.fa", format = "fasta")
ref = read.table("~/Desktop/GRCh38_latest_genomic.gff", sep = '\t', quote = "")
#creating a ChromeR object
chrom <- create.chromR(name="S17SNP", vcf=vcf, seq=genome, ann=ref, verbose=TRUE)
plot(chrom)

I'm getting this error that I can't explain:
vcfR object includes more than one chromosome (CHROM)
Subsetting to the first chromosome
Warning message:
In create.chromR(name = "S17SNP", vcf = vcf, seq = genome, ann = ref, :
Names in variant data and annotation data do not match perfectly.
If you choose to proceed, we'll do our best to match the data.
But prepare yourself for unexpected results.

the fasta file is the same one I used to do the alignment and the annotation.

Does anyone have any input?

@knausb
Copy link
Owner

knausb commented Apr 19, 2023

Hi @charbel2998 ,

For the following

vcfR object includes more than one chromosome (CHROM)
Subsetting to the first chromosome

please see the following link.

https://knausb.github.io/vcfR_documentation/subset_data_to_1chrom.html

The chromR object is intended for a single chromosome (sequence).

For the WARNING

In create.chromR(name = "S17SNP", vcf = vcf, seq = genome, ann = ref, :
Names in variant data and annotation data do not match perfectly.
If you choose to proceed, we'll do our best to match the data.
But prepare yourself for unexpected results.

Please see below.

https://knausb.github.io/vcfR_documentation/chromR_object.html

It has been my experience that the names do not always match. Sometimes we download them like this. Sometime a processing step we use creates the issue. For example, I believe bwa (and other aligners) tends to truncate FASTA sequence names on whitespace, resulting in sequence names you may not have expected. You can validate this by comparing the names in your FASTA (e.g., grep "^>" my_data.FASTA) with the [BS]AM header (e.g., samtools view -H my_data.BAM). This is a WARNING, meaning we're trying to make you aware of this issue. If you are sure you understand this you should be able to proceed. If you are incorrect and proceed anyway, you may be creating problems for yourself downstream. I suggest you take a minute to validate that the input order is identical and if so, proceed with caution.

Does that provide you with a solution?
Brian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants