-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish Version III of the SNP+TR reference haplotype panel #27
base: main
Are you sure you want to change the base?
Conversation
…_snpstr_reference.py
update the main README.md and make some changes on the fix_ensembletr…
fix file name, remove duplicate loci and update README
fix issue with REF count 0 in fix ref script
@heliziii you can review but let's hold off on merging until @yli091230 updates links to the new ref files |
Fix missing reference
@heliziii I just updated links and README file. It should be good to go. |
@@ -125,76 +129,86 @@ Chromosome 21 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntr | |||
|
|||
Chromosome 22 [VCF file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr22_filtered.vcf.gz) and [tbi file](https://ensemble-tr.s3.us-east-2.amazonaws.com/add-vntrs/ensemble_chr22_filtered.vcf.gz.tbi) | |||
|
|||
## Version II of reference SNP+TR haplotype panel for imputation of TR variants | |||
## Version IV of reference SNP+TR haplotype panel for imputation of TR variants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we publishing version 3 or 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I saw that we archived version 3, can you remind me what changed from version 3 to 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I not sure. The version 3 have some issues with missing reference alleles. @gymreklab , which version number should we use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In version 3, some REF alleles are missing, due to no REF allele detected. This will cause error in the downstream analysis. To fix it, we always keep the REF alleles in version 4.
2. For each TR, remove alelles with 0 count. | ||
* If reference allele have 0 count, keep the reference alleles. | ||
3. Remove TRs which have more than 100 alleles. | ||
4. Remove TRs which have less than 2 alleles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean at least one alternative alleles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
The Version III files are the same as VII, with the following updates to facilitate use in downstream imputation pipelines:
The script
scripts/fix-ref/fix_ensembletr_snpstr_reference.py
makes these changes.VII files (and V1 genotype files) have been moved to
archive_ensembletr_datasets.md
so the main README doesn't get too cluttered.