vcfR Subsetting to the first chromosome #211

charbel2998 · 2023-04-19T12:55:14Z

Hello,
I am writing this code:

library(vcfR)
library(ape)
#input vcf Data S17 RES SNPs
vcf = read.vcfR("/Desktop/Final_Table/RES_S17_readyAnalysis_SNP.vcf")
genome = ape::read.dna("/Desktop/Resource/hg38.fa", format = "fasta")
ref = read.table("~/Desktop/GRCh38_latest_genomic.gff", sep = '\t', quote = "")
#creating a ChromeR object
chrom <- create.chromR(name="S17SNP", vcf=vcf, seq=genome, ann=ref, verbose=TRUE)
plot(chrom)

I'm getting this error that I can't explain:
vcfR object includes more than one chromosome (CHROM)
Subsetting to the first chromosome
Warning message:
In create.chromR(name = "S17SNP", vcf = vcf, seq = genome, ann = ref, :
Names in variant data and annotation data do not match perfectly.
If you choose to proceed, we'll do our best to match the data.
But prepare yourself for unexpected results.

the fasta file is the same one I used to do the alignment and the annotation.

Does anyone have any input?

knausb · 2023-04-19T16:39:08Z

Hi @charbel2998 ,

For the following

vcfR object includes more than one chromosome (CHROM)
Subsetting to the first chromosome

please see the following link.

https://knausb.github.io/vcfR_documentation/subset_data_to_1chrom.html

The chromR object is intended for a single chromosome (sequence).

For the WARNING

In create.chromR(name = "S17SNP", vcf = vcf, seq = genome, ann = ref, :
Names in variant data and annotation data do not match perfectly.
If you choose to proceed, we'll do our best to match the data.
But prepare yourself for unexpected results.

Please see below.

https://knausb.github.io/vcfR_documentation/chromR_object.html

It has been my experience that the names do not always match. Sometimes we download them like this. Sometime a processing step we use creates the issue. For example, I believe bwa (and other aligners) tends to truncate FASTA sequence names on whitespace, resulting in sequence names you may not have expected. You can validate this by comparing the names in your FASTA (e.g., grep "^>" my_data.FASTA) with the [BS]AM header (e.g., samtools view -H my_data.BAM). This is a WARNING, meaning we're trying to make you aware of this issue. If you are sure you understand this you should be able to proceed. If you are incorrect and proceed anyway, you may be creating problems for yourself downstream. I suggest you take a minute to validate that the input order is identical and if so, proceed with caution.

Does that provide you with a solution?
Brian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vcfR Subsetting to the first chromosome #211

vcfR Subsetting to the first chromosome #211

charbel2998 commented Apr 19, 2023

knausb commented Apr 19, 2023

vcfR Subsetting to the first chromosome #211

vcfR Subsetting to the first chromosome #211

Comments

charbel2998 commented Apr 19, 2023

knausb commented Apr 19, 2023