-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish Version III of the SNP+TR reference haplotype panel #27
Open
gymreklab
wants to merge
32
commits into
main
Choose a base branch
from
fix-ref
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
54d9a15
adding draft script to fix reference
386fb7c
add convert to bref3
nicholema 08d52a2
add script to download hg38 ref panel
nicholema f963e56
fix bug
nicholema 2e93e64
add INFO field VT=OTHER/TR
nicholema f93bab6
updating how we get locus IDs to accommodate duplicates
4aa269f
fix merge conflict
fd3daa6
checks to remove loci with too many or too few alleles
017c688
update readme with description of fixes
4b37611
updating print statements in fix ref script
9923347
overhaul of fixref script to remove alleles with count=0
dbba8ab
remove convert to bref script
ee145ec
update to write to stdout so we can pipe to bgzip
403c1d7
fix readme format issue
e76b3cd
fix readme format issue
f1bcc57
update the main README.md and make some changes on the fix_ensembletr…
yli091230 c3a0d99
Merge pull request #26 from yli091230/fix-ref
gymreklab 70d2b66
adding links to v3 panel
510820f
adding links to v3 panel
3503a58
adding links to v3 panel
b8fbefd
adding links to v3 panel
19cec77
adding links to v3 panel
507e4c9
adding links to v3 panel
24e1e61
use bref3
4a46920
update fixref script to remove duplicates
705af29
document rm dups for fix-ref
842b119
fix file name, remove duplicate loci and update README
yli091230 9cd4842
Merge pull request #28 from yli091230/fix-ref
gymreklab 94904c9
fix issue with REF count 0 in fix ref script
9c8aa4e
Merge pull request #29 from gymrek-lab/fix-noref-issue
yli091230 ab3044b
fix the missing reference allele
yli091230 0b6d7c3
Merge pull request #30 from yli091230/fix-ref
yli091230 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,6 +79,10 @@ statSTR --vcf EnsembleTR_file.vcf.gz | |
--out EnsembleTR_per_locus_allele_frequency | ||
``` | ||
|
||
# EnsembleTR data releases | ||
|
||
Archived datasets, including the Version II calls and other versions of haplotype panel files can be found [here](archive_ensembletr_datasets.md). | ||
|
||
## Version II of EnsembleTR calls on samples from 1000 Genomes Project and H3Africa | ||
|
||
Chromosome 1 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr1_filtered.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr1_filtered.vcf.gz.tbi) | ||
|
@@ -125,76 +129,86 @@ Chromosome 21 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntr | |
|
||
Chromosome 22 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr22_filtered.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr22_filtered.vcf.gz.tbi) | ||
|
||
## Version II of reference SNP+TR haplotype panel for imputation of TR variants | ||
## Version IV of reference SNP+TR haplotype panel for imputation of TR variants | ||
|
||
### Dataset description | ||
These files contain: | ||
* [Phased SNP and indel variants](http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/) of 3,202 samples from the 1000 Genomes Project (1kGP). | ||
* TRs phased/imputed from 3,202 1kGP samples based on EnsembleTR calls. | ||
|
||
[Phased variants](http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/) of 3,202 samples from the 1000 Genomes Project (1kGP). | ||
There are in total 1,070,762 TRs and 70,692,015 SNPs/indels. | ||
|
||
TRs imputed from 3,202 1kGP samples. | ||
All the coordinates are based on **hg38** human reference genome. | ||
|
||
Total 70,692,015 variants + 1,091,550 TR markers. | ||
These files contain the same data as [Version II](archive_ensembletr_datasets.md), with the following updates to facilitate use in downstream imputation pipelines: | ||
|
||
All the coordinates are based on **hg38** human reference genome. | ||
1. Remove TRs for which the REF allele does not match the expected sequence based on CHR:POS | ||
2. For each TR, remove alelles with 0 count. | ||
* If reference allele have 0 count, keep the reference alleles. | ||
3. Remove TRs which have more than 100 alleles. | ||
4. Remove TRs which have less than 2 alleles. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this mean at least one alternative alleles? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. |
||
5. Remove the DS/GP fields which are large and not used by downstream steps. | ||
6. Add unique IDs for each TR of the format EnsTR:CHROM:POS. For TRs with the same CHR:POS, add the duplicate number of the TR following format: EnsTR:CHROM:POS:Duplicate_num. Duplicated loci with identical alleles are removed. | ||
7. Add VT field, set to VT=TR for TRs and VT=OTHER for other variant types | ||
8. Add the bref format files which have the same information as the VCFs but can improve Beagle imputation performance. | ||
|
||
### Availability | ||
All file description and download links can be found [here](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_4_readme.txt). Data and links for each chromosome for the Verson IV panel are also provided below. | ||
|
||
Chromosome 1 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr1_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr1_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 1 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr1.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr1.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr1.bref3)] SNPs/indels=5,759,060 TRs=92,378 | ||
|
||
Chromosome 2 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr2_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr2_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 2 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr2.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr2.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr2.bref3)] SNPs/indels=6,088,598 TRs=91,137 | ||
|
||
Chromosome 3 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr3_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr3_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 3 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr3.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr3.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr3.bref3)] SNPs/indels=4,983,185 TRs=75,243 | ||
|
||
Chromosome 4 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr4_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr4_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 4 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr4.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr4.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr4.bref3)] SNPs/indels=4,875,465 TRs=69,327 | ||
|
||
Chromosome 5 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr5_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr5_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 5 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr5.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr5.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr5.bref3)] SNPs/indels=4,536,819 TRs=66,492 | ||
|
||
Chromosome 6 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr6_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr6_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 6 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr6.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr6.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr6.bref3)] SNPs/indels=4,315,217 TRs=65,940 | ||
|
||
Chromosome 7 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr7_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr7_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 7 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr7.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr7.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr7.bref3)] SNPs/indels=4,137,254 TRs=59,422 | ||
|
||
Chromosome 8 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr8_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr8_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 8 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr8.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr8.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr8.bref3)] SNPs/indels=3,886,222 TRs=55,144 | ||
|
||
Chromosome 9 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr9_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr9_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 9 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr9.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr9.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr9.bref3)] SNPs/indels=3,165,513 TRs=44,189 | ||
|
||
Chromosome 10 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr10_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr10_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 10 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr10.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr10.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr10.bref3)] SNPs/indels=3,495,473 TRs=51,640 | ||
|
||
Chromosome 11 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr11_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr11_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 11 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr11.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr11.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr11.bref3)] SNPs/indels=3,423,341 TRs=49,603 | ||
|
||
Chromosome 12 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr12_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr12_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 12 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr12.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr12.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr12.bref3)] SNPs/indels=3,332,788 TRs=55,887 | ||
|
||
Chromosome 13 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr13_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr13_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 13 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr13.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr13.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr13.bref3)] SNPs/indels=2,509,179 TRs=35,720 | ||
|
||
Chromosome 14 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr14_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr14_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 14 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr14.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr14.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr14.bref3)] SNPs/indels=2,290,400 TRs=36,203 | ||
|
||
Chromosome 15 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr15_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr15_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 15 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr15.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr15.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr15.bref3)] SNPs/indels=2,109,285 TRs=32,338 | ||
|
||
Chromosome 16 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr16_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr16_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 16 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr16.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr16.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr16.bref3)] SNPs/indels=2,362,361 TRs=35,452 | ||
|
||
Chromosome 17 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr17_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr17_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 17 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr17.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr17.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr17.bref3)] SNPs/indels=2,073,624 TRs=38,382 | ||
|
||
Chromosome 18 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr18_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr18_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 18 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr18.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr18.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr18.bref3)] SNPs/indels=1,963,845 TRs=28,446 | ||
|
||
Chromosome 19 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr19_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr19_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 19 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr19.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr19.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr19.bref3)] SNPs/indels=1,670,692 TRs=33,536 | ||
|
||
Chromosome 20 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr20_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr20_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 20 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr20.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr20.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr20.bref3)] SNPs/indels=1,644,384 TRs=25,745 | ||
|
||
Chromosome 21 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr21_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr21_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 21 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr21.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr21.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr21.bref3)] SNPs/indels=1,002,753 TRs=12,894 | ||
|
||
Chromosome 22 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr22_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr22_final_SNP_merged_additional_TRs.vcf.gz.tbi) | ||
Chromosome 22 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr22.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr22.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr22.bref3)] SNPs/indels=1,066,557 TRs=15,644 | ||
|
||
### Usage | ||
|
||
Use [Beagle](https://faculty.washington.edu/browning/beagle/beagle.html) to impute TRs into SNP data: | ||
|
||
``` | ||
java -Xmx4g -jar beagle.version.jar \ | ||
gt=SNPs.vcf.gz \ | ||
ref=${chrom}_final_SNP_merged.vcf.gz \ | ||
out=imputed_TR_SNPs | ||
gt=SNPs_chr${chrom}.vcf.gz \ | ||
ref=ensembletr_refpanel_v4_chr${chrom}.bref3 \ | ||
out=imputed_TR_SNPs_chr${chrom} | ||
``` | ||
|
||
Please use the [version 5.4](https://github.com/gymreklab/1000Genomes-TR-Analysis/raw/main/phasing/validation/beagle.19Apr22.7c0.jar) for this analysis as we had issues with the newer versions of Beagle and we are right now communicating it with Beagle developers. | ||
We have tested this with Beagle jar file [beagle.27May24.118.jar](https://faculty.washington.edu/browning/beagle/beagle.27May24.118.jar). Earlier releases of Beagle 5.4 had problems imputing from this panel due to a file decompression issue. | ||
|
||
## Additional resources | ||
|
||
|
@@ -203,14 +217,6 @@ Per locus summary statistics can be downloaded from [here](https://ensemble-tr.s | |
Population-specific per locus statistics on allele frequency, heterozygosity, and the number of called samples can be found [here](https://ensemble-tr.s3.us-east-2.amazonaws.com/tables/afreq_tables.zip). Statistics are computed using statSTR from the TRTools package. | ||
|
||
|
||
## Version I | ||
|
||
For version I of EnsembleTR calls, please use | ||
https://ensemble-tr.s3.us-east-2.amazonaws.com/split/ensemble_chr"$chr"_filtered.vcf.gz for VCF file and https://ensemble-tr.s3.us-east-2.amazonaws.com/split/ensemble_chr"$chr"_filtered.vcf.gz.tbi for tbi file. | ||
|
||
For version I of phased panels, please use | ||
https://ensemble-tr.s3.us-east-2.amazonaws.com/phased-split/chr"$chr"_final_SNP_merged.vcf.gz for VCF file and https://ensemble-tr.s3.us-east-2.amazonaws.com/phased-split/chr"$chr"_final_SNP_merged.vcf.gz.csi for tbi file. | ||
|
||
## Notes on HipSTR input | ||
|
||
HipSTR might expand the coordinates of the repeat if there is a nearby SNP. If you have multiple HipSTR outputs from different individuals and want to use mergeSTR to merge them, please use our python script, *Hipstr_correction.py*, to correct the merged HipSTR VCF file ensuring that multiple records from the same repeat culminate in a single unified record. | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we publishing version 3 or 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I saw that we archived version 3, can you remind me what changed from version 3 to 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I not sure. The version 3 have some issues with missing reference alleles. @gymreklab , which version number should we use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In version 3, some REF alleles are missing, due to no REF allele detected. This will cause error in the downstream analysis. To fix it, we always keep the REF alleles in version 4.