-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sex determination #182
Comments
Hi Anjali,
I'm afraid I do not know how to respond to your question about sex
determination. In mammals we discuss X/Y sex chromosomes. In fungi we
discuss mating type loci, and in oomycetes the genetic determination loci
are uncharacterized. So I do not know what you are looking for. Do you work
in human?
I also do not understand you second question. VCF data varies from the
software that creates it. I would think that "PL" would be "phred scaled
likelihood" but I don't know because I do not have access to your file. If
I'm correct then I think that a numeric matrix would be correct.
Thanks!
Brian
…On Sat, Apr 10, 2021 at 3:43 PM Anjali Chawla ***@***.***> wrote:
Hi,
Thanks for this very cool tool!
Is there a way to determine sex of every sample in vcfR? For example, how
can I calculate heterozygosity rate on chrX?
In addition, could you please tell me the what extract.gt(element="PL",
as.numeric="TRUE") would go from PL values to numeric.
Thanks a lot!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#182>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLQWXJFA4NHXGHHNBCXCUTTIDICLANCNFSM42W67ORA>
.
|
Hi Brian, I am working with humans. I am trying to figure out a way to determine chrX heterozygosity using "PL" which as you guessed, is absolutely correct, is the "Phred scaled likelihood." Yes, it does produce a numeric matrix, however, PL values are usually in for exmaple 0:20:40 format, corresponding to 0/0:0/1:1/1 genotypes. I am unsure how "extract.gt(element="PL", as.numeric="TRUE")" converts 0:20:40 to a number? Is it the lowest number that gets extracted as that represents the genotype likelihood? Thank you |
Hi AnjaliC4, I think you want something like the following. library(vcfR)
#>
#> ***** *** vcfR *** *****
#> This is vcfR 1.12.0
#> browseVignettes('vcfR') # Documentation
#> citation('vcfR') # Citation
#> ***** ***** ***** *****
data("vcfR_example")
vcf
#> ***** Object of Class vcfR *****
#> 18 samples
#> 1 CHROMs
#> 2,533 variants
#> Object size: 3.2 Mb
#> 8.497 percent missing data
#> ***** ***** *****
vcf@gt[1:3, 1:4]
#> FORMAT BL2009P4_us23 DDR7602
#> [1,] "GT:AD:DP:GQ:PL" "0|0:62,0:62:99:0,190,2835" "0|0:12,0:12:39:0,39,585"
#> [2,] "GT:AD:DP:GQ:PL" "1|0:5,5:10:99:111,0,114" NA
#> [3,] "GT:AD:DP:GQ:PL" NA NA
#> IN2009T1_us22
#> [1,] "0|0:37,0:37:99:0,114,1709"
#> [2,] "0|1:2,1:3:16:16,0,48"
#> [3,] "0|0:2,0:2:6:0,6,51"
pl <- extract.gt(vcf, element = "PL")
pl[1:4, 1:6]
#> BL2009P4_us23 DDR7602 IN2009T1_us22 LBUS5 NL07434
#> Supercontig_1.50_2 "0,190,2835" "0,39,585" "0,114,1709" "0,39,585" NA
#> Supercontig_1.50_246 "111,0,114" NA "16,0,48" NA NA
#> Supercontig_1.50_549 NA NA "0,6,51" NA NA
#> Supercontig_1.50_668 "0,3,44" NA "25,3,0" NA "0,3,28"
#> P10127
#> Supercontig_1.50_2 "0,24,360"
#> Supercontig_1.50_246 "0,3,25"
#> Supercontig_1.50_549 "99,12,0"
#> Supercontig_1.50_668 "0,6,54"
pl1 <- masplit(pl, record = 1, sort = 0)
pl1[1:3, 1:4]
#> BL2009P4_us23 DDR7602 IN2009T1_us22 LBUS5
#> Supercontig_1.50_2 0 0 0 0
#> Supercontig_1.50_246 111 NA 16 NA
#> Supercontig_1.50_549 NA NA 0 NA Created on 2021-04-13 by the reprex package (v1.0.0) This should give you the first PL element as a numeric. But is the question you're trying to address is if chromX is heterozygous? Or are you trying to assess the quality of the genotypes? Because these are scaled, "0" is always the lowest. Just curious. |
Hi Brian, Thanks for the example! I am trying to address if chrX is heterozygous. I believe in that case, I should extract the second element using mapsplit? Since, PL values are usually in 0/0:0/1:1/1 format. Please let me know what you think. |
Hi AnjaliC4, PL is a comma delimited list of integers, one for each potential genotype. You could use this information to explore the confidence of the genotype call. Only one genotype is called per sample and per variant. So I feel your last statement was incorrect. If you just want the genotypes you could use the following.
|
Hi Brian, |
Hi AnjaliC4, It appears to me that you have two questions here. First, is X heterozygous. Second, What is the quality of the genotypes. If you ask your variant caller to call diploid variants on X you would expect some level of heterozygosity in a female because there are two copies of X. In males X is haploid so we would expect it to be homozygous. However, in genomic data we deal with lots of data so we would expect false positives. One way to manage this is to count the number of heterozygous positions on each chromosome and divide by it's length, giving us the rate of heterozygosity. I would expect that the autosomes would have a similar rate of heterozygosity. A female would have a rate of heterozygosity that is similar to the autosomes while a male X should have a decreased level of heterozygosity. The "quality" of a genotype is much more difficult to estimate. I like to use depth and you can see how I use this at the below link. https://knausb.github.io/vcfR_documentation/quick_intro.html You seem to be interested in PL. One way of doing this would be to compare the best likelihood to the second best. A large difference would indicate confidence while a small difference would suggest ambiguity. With "phred scaled likelihoods" the "best' likelihood is always zero. So you can just use the value of the second lowest value. It has been my experience that repetitive regions that are poorly assembled have high coverage, which can lead to good likelihoods. My interpretation is that maximum likelihood does not understand copy number variation. So you'll have to make a judgement call here. Good luck! |
Thank you very much, Brian for your opinions and advice. |
Hi,
Thanks for this very cool tool!
Could you please suggest a way to determine sex of every sample in vcfR?
One approach I know is heterozygosity rate on chrX. However, not sure how to calclaute that in vcfR- would appreciate your advise.
In addition, could you please tell me the behavior of extract.gt(element="PL", as.numeric="TRUE"). I get numeric values but not sure how they go from e.g. 0:20:30 to one number.
Thanks a lot!
The text was updated successfully, but these errors were encountered: