Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish Version III of the SNP+TR reference haplotype panel #27

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
54d9a15
adding draft script to fix reference
Aug 5, 2024
386fb7c
add convert to bref3
nicholema Aug 6, 2024
08d52a2
add script to download hg38 ref panel
nicholema Aug 6, 2024
f963e56
fix bug
nicholema Aug 6, 2024
2e93e64
add INFO field VT=OTHER/TR
nicholema Aug 8, 2024
f93bab6
updating how we get locus IDs to accommodate duplicates
Aug 14, 2024
4aa269f
fix merge conflict
Aug 14, 2024
fd3daa6
checks to remove loci with too many or too few alleles
Aug 31, 2024
017c688
update readme with description of fixes
Aug 31, 2024
4b37611
updating print statements in fix ref script
Sep 6, 2024
9923347
overhaul of fixref script to remove alleles with count=0
Sep 6, 2024
dbba8ab
remove convert to bref script
Sep 6, 2024
ee145ec
update to write to stdout so we can pipe to bgzip
Sep 6, 2024
403c1d7
fix readme format issue
Sep 6, 2024
e76b3cd
fix readme format issue
Sep 6, 2024
f1bcc57
update the main README.md and make some changes on the fix_ensembletr…
yli091230 Sep 9, 2024
c3a0d99
Merge pull request #26 from yli091230/fix-ref
gymreklab Sep 9, 2024
70d2b66
adding links to v3 panel
Sep 10, 2024
510820f
adding links to v3 panel
Sep 10, 2024
3503a58
adding links to v3 panel
Sep 10, 2024
b8fbefd
adding links to v3 panel
Sep 10, 2024
19cec77
adding links to v3 panel
Sep 10, 2024
507e4c9
adding links to v3 panel
Sep 10, 2024
24e1e61
use bref3
Sep 23, 2024
4a46920
update fixref script to remove duplicates
Sep 24, 2024
705af29
document rm dups for fix-ref
Sep 24, 2024
842b119
fix file name, remove duplicate loci and update README
yli091230 Sep 27, 2024
9cd4842
Merge pull request #28 from yli091230/fix-ref
gymreklab Sep 27, 2024
94904c9
fix issue with REF count 0 in fix ref script
Nov 3, 2024
9c8aa4e
Merge pull request #29 from gymrek-lab/fix-noref-issue
yli091230 Nov 5, 2024
ab3044b
fix the missing reference allele
yli091230 Nov 6, 2024
0b6d7c3
Merge pull request #30 from yli091230/fix-ref
yli091230 Nov 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 47 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ statSTR --vcf EnsembleTR_file.vcf.gz
--out EnsembleTR_per_locus_allele_frequency
```

# EnsembleTR data releases

Archived datasets, including the Version II calls and other versions of haplotype panel files can be found [here](archive_ensembletr_datasets.md).

## Version II of EnsembleTR calls on samples from 1000 Genomes Project and H3Africa

Chromosome 1 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr1_filtered.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr1_filtered.vcf.gz.tbi)
Expand Down Expand Up @@ -125,76 +129,86 @@ Chromosome 21 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntr

Chromosome 22 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr22_filtered.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr22_filtered.vcf.gz.tbi)

## Version II of reference SNP+TR haplotype panel for imputation of TR variants
## Version IV of reference SNP+TR haplotype panel for imputation of TR variants
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we publishing version 3 or 4?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I saw that we archived version 3, can you remind me what changed from version 3 to 4?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I not sure. The version 3 have some issues with missing reference alleles. @gymreklab , which version number should we use?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In version 3, some REF alleles are missing, due to no REF allele detected. This will cause error in the downstream analysis. To fix it, we always keep the REF alleles in version 4.


### Dataset description
These files contain:
* [Phased SNP and indel variants](http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/) of 3,202 samples from the 1000 Genomes Project (1kGP).
* TRs phased/imputed from 3,202 1kGP samples based on EnsembleTR calls.

[Phased variants](http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/) of 3,202 samples from the 1000 Genomes Project (1kGP).
There are in total 1,070,762 TRs and 70,692,015 SNPs/indels.

TRs imputed from 3,202 1kGP samples.
All the coordinates are based on **hg38** human reference genome.

Total 70,692,015 variants + 1,091,550 TR markers.
These files contain the same data as [Version II](archive_ensembletr_datasets.md), with the following updates to facilitate use in downstream imputation pipelines:

All the coordinates are based on **hg38** human reference genome.
1. Remove TRs for which the REF allele does not match the expected sequence based on CHR:POS
2. For each TR, remove alelles with 0 count.
* If reference allele have 0 count, keep the reference alleles.
3. Remove TRs which have more than 100 alleles.
4. Remove TRs which have less than 2 alleles.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean at least one alternative alleles?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

5. Remove the DS/GP fields which are large and not used by downstream steps.
6. Add unique IDs for each TR of the format EnsTR:CHROM:POS. For TRs with the same CHR:POS, add the duplicate number of the TR following format: EnsTR:CHROM:POS:Duplicate_num. Duplicated loci with identical alleles are removed.
7. Add VT field, set to VT=TR for TRs and VT=OTHER for other variant types
8. Add the bref format files which have the same information as the VCFs but can improve Beagle imputation performance.

### Availability
All file description and download links can be found [here](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_4_readme.txt). Data and links for each chromosome for the Verson IV panel are also provided below.

Chromosome 1 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr1_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr1_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 1 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr1.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr1.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr1.bref3)] SNPs/indels=5,759,060 TRs=92,378

Chromosome 2 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr2_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr2_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 2 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr2.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr2.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr2.bref3)] SNPs/indels=6,088,598 TRs=91,137

Chromosome 3 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr3_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr3_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 3 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr3.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr3.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr3.bref3)] SNPs/indels=4,983,185 TRs=75,243

Chromosome 4 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr4_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr4_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 4 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr4.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr4.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr4.bref3)] SNPs/indels=4,875,465 TRs=69,327

Chromosome 5 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr5_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr5_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 5 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr5.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr5.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr5.bref3)] SNPs/indels=4,536,819 TRs=66,492

Chromosome 6 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr6_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr6_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 6 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr6.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr6.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr6.bref3)] SNPs/indels=4,315,217 TRs=65,940

Chromosome 7 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr7_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr7_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 7 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr7.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr7.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr7.bref3)] SNPs/indels=4,137,254 TRs=59,422

Chromosome 8 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr8_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr8_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 8 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr8.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr8.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr8.bref3)] SNPs/indels=3,886,222 TRs=55,144

Chromosome 9 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr9_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr9_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 9 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr9.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr9.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr9.bref3)] SNPs/indels=3,165,513 TRs=44,189

Chromosome 10 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr10_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr10_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 10 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr10.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr10.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr10.bref3)] SNPs/indels=3,495,473 TRs=51,640

Chromosome 11 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr11_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr11_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 11 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr11.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr11.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr11.bref3)] SNPs/indels=3,423,341 TRs=49,603

Chromosome 12 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr12_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr12_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 12 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr12.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr12.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr12.bref3)] SNPs/indels=3,332,788 TRs=55,887

Chromosome 13 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr13_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr13_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 13 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr13.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr13.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr13.bref3)] SNPs/indels=2,509,179 TRs=35,720

Chromosome 14 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr14_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr14_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 14 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr14.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr14.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr14.bref3)] SNPs/indels=2,290,400 TRs=36,203

Chromosome 15 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr15_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr15_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 15 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr15.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr15.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr15.bref3)] SNPs/indels=2,109,285 TRs=32,338

Chromosome 16 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr16_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr16_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 16 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr16.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr16.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr16.bref3)] SNPs/indels=2,362,361 TRs=35,452

Chromosome 17 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr17_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr17_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 17 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr17.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr17.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr17.bref3)] SNPs/indels=2,073,624 TRs=38,382

Chromosome 18 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr18_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr18_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 18 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr18.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr18.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr18.bref3)] SNPs/indels=1,963,845 TRs=28,446

Chromosome 19 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr19_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr19_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 19 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr19.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr19.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr19.bref3)] SNPs/indels=1,670,692 TRs=33,536

Chromosome 20 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr20_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr20_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 20 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr20.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr20.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr20.bref3)] SNPs/indels=1,644,384 TRs=25,745

Chromosome 21 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr21_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr21_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 21 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr21.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr21.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr21.bref3)] SNPs/indels=1,002,753 TRs=12,894

Chromosome 22 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr22_final_SNP_merged_additional_TRs.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/additional-phased-trs/chr22_final_SNP_merged_additional_TRs.vcf.gz.tbi)
Chromosome 22 [[VCF](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr22.vcf.gz)] [[tbi](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr22.vcf.gz.tbi)] [[bref](https://ensemble-tr.s3.us-east-2.amazonaws.com/ensembletr-refpanel-v4/ensembletr_refpanel_v4_chr22.bref3)] SNPs/indels=1,066,557 TRs=15,644

### Usage

Use [Beagle](https://faculty.washington.edu/browning/beagle/beagle.html) to impute TRs into SNP data:

```
java -Xmx4g -jar beagle.version.jar \
gt=SNPs.vcf.gz \
ref=${chrom}_final_SNP_merged.vcf.gz \
out=imputed_TR_SNPs
gt=SNPs_chr${chrom}.vcf.gz \
ref=ensembletr_refpanel_v4_chr${chrom}.bref3 \
out=imputed_TR_SNPs_chr${chrom}
```

Please use the [version 5.4](https://github.com/gymreklab/1000Genomes-TR-Analysis/raw/main/phasing/validation/beagle.19Apr22.7c0.jar) for this analysis as we had issues with the newer versions of Beagle and we are right now communicating it with Beagle developers.
We have tested this with Beagle jar file [beagle.27May24.118.jar](https://faculty.washington.edu/browning/beagle/beagle.27May24.118.jar). Earlier releases of Beagle 5.4 had problems imputing from this panel due to a file decompression issue.

## Additional resources

Expand All @@ -203,14 +217,6 @@ Per locus summary statistics can be downloaded from [here](https://ensemble-tr.s
Population-specific per locus statistics on allele frequency, heterozygosity, and the number of called samples can be found [here](https://ensemble-tr.s3.us-east-2.amazonaws.com/tables/afreq_tables.zip). Statistics are computed using statSTR from the TRTools package.


## Version I

For version I of EnsembleTR calls, please use
https://ensemble-tr.s3.us-east-2.amazonaws.com/split/ensemble_chr"$chr"_filtered.vcf.gz for VCF file and https://ensemble-tr.s3.us-east-2.amazonaws.com/split/ensemble_chr"$chr"_filtered.vcf.gz.tbi for tbi file.

For version I of phased panels, please use
https://ensemble-tr.s3.us-east-2.amazonaws.com/phased-split/chr"$chr"_final_SNP_merged.vcf.gz for VCF file and https://ensemble-tr.s3.us-east-2.amazonaws.com/phased-split/chr"$chr"_final_SNP_merged.vcf.gz.csi for tbi file.

## Notes on HipSTR input

HipSTR might expand the coordinates of the repeat if there is a nearby SNP. If you have multiple HipSTR outputs from different individuals and want to use mergeSTR to merge them, please use our python script, *Hipstr_correction.py*, to correct the merged HipSTR VCF file ensuring that multiple records from the same repeat culminate in a single unified record.
Expand Down
Loading