-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add G groups, frequency data, and update protein sequence to #90
Draft
apmody
wants to merge
42
commits into
master
Choose a base branch
from
G_group
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…and 3) or class II (exon 2)
…worked index update code
…for protein match between current MRO and allele
For item #3, do you also update the accession and the resource name to
match the new sequence?
…On Tue, Mar 30, 2021 at 3:55 PM Apurva Mody ***@***.***> wrote:
1. Add three and four field HLA alleles ("MHC gene allele") and
corresponding nucleotide sequences.
2. Add terms for G groups (http://hla.alleles.org/alleles/g_groups.html
)
3. Choose "best" protein sequence forchain-sequences.tsv so that the
longest protein sequence is chosen from all alleles that have the same
first two-fields (e.g. choose the longest protein sequence between HLA-A*01:140:01
and HLA-A*01:140:02)
4. Add frequency data based on CIWD 3.0 data for different HLA alleles
(data
<https://www.ihiw18.org/component-immunogenetics/download-common-and-well-documented-alleles-3-0/>,
paper <https://onlinelibrary.wiley.com/doi/full/10.1111/tan.13811>
------------------------------
You can view, comment on, or merge this pull request online at:
#90
Commit Summary
- First pass: Get G groups, get full CDS sequences, import population
group terms
- Reworked parsing of hla.dat file, add template strings
- Get appropriate exons for G group when MHC allele is class I (exon 2
and 3) or class II (exon 2)
- Merge branch 'master' into G_group
- Added code to add terms to index.tsv and error report
- Merge branch 'master' into G_group
- Changed template strings
- Added alleles which have partial sequence
- Organized gen_allele_update_seq.py into functions, Modified
import.txt
- First pass at extracting population frequency data
- Merge branch 'master' into G_group
- Added code to verify accession numbers from frequency data and
build/hla.dat file
- Second pass adding population frequency data for chains, G groups,
reworked index update code
- Add frequency of gene_alleles
- Added helper function for getting G group exons, removed requirement
for protein match between current MRO and allele
- Fix protein sequence to get best protein sequence
File Changes
- *M* Makefile
<https://github.com/IEDB/MRO/pull/90/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52>
(38)
- *M* index.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-03713472c9d385b498690e3c0f507cd67f378e9db1f2ef63711897c392c2ed7e>
(27251)
- *A* ontology/G-group-frequencies.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-701bac6402654020951e3c390d75587dd4756261438e663cdda7832394e7690e>
(277)
- *A* ontology/G-group.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-176f1865135d177fb0e7f02ac311066b2b935963d6d3d259b98f40f55e7cf750>
(514)
- *A* ontology/chain-frequencies.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-9ed7c8aa1d1ccac9b602d7c96c8643687e2485db3738cf4268127567d03f02dc>
(784)
- *M* ontology/chain-sequence.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-93686f7a9062e877738eb4ef9e8bac75995fd7c448012a9f57dfcddbf79b29c7>
(727)
- *M* ontology/core.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-cce693e5e496c2640c02d0f726ef8c1b40657ba6389aee1b1040d246ad55494d>
(2)
- *A* ontology/frequency-properties.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-3ccc40ddae15df9428e8a3ad74ccddb26dd834f2e18b3585011dbedbdf2b527c>
(10)
- *A* ontology/gene-allele-frequencies.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-150e1823d5eff9a49e96fe52034affca6e0ae7019c325cd9d35e99ed5804d9a0>
(8427)
- *A* ontology/gene-alleles.tsv
<https://github.com/IEDB/MRO/pull/90/files#diff-91369a4d23b970756c572cca4d3e92f1e572e04fc5a373f82cd5ef9b5314e732>
(26731)
- *M* ontology/import.txt
<https://github.com/IEDB/MRO/pull/90/files#diff-5a8021ed1bc10adb45b4669b0a9b18150676f1e4c5d637eda5219bb499a66dc4>
(9)
- *M* requirements.txt
<https://github.com/IEDB/MRO/pull/90/files#diff-4d7c51b1efe9043e44439a949dfd92e5827321b34082903477fd04876edb7552>
(1)
- *M* src/scripts/prefixes.sql
<https://github.com/IEDB/MRO/pull/90/files#diff-fab310065ad989bdb99a832e186f9952e28139b74e172b566ca8944b2f5746f0>
(5)
- *A* src/update_gene_allele_seq.py
<https://github.com/IEDB/MRO/pull/90/files#diff-494ef043e3467db44f01ea1e46731661c064f5f6e52369884cb31b7012957d8d>
(499)
Patch Links:
- https://github.com/IEDB/MRO/pull/90.patch
- https://github.com/IEDB/MRO/pull/90.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#90>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKDY422AMITODDDRD2A6YTTGJJFFANCNFSM42DEXWVQ>
.
--
Randi Vita, M.D.
Lead Ontology and Quality Manager
Immune Epitope Database and Analysis Project
La Jolla Institute for Allergy & Immunology
9420 Athena Circle
La Jolla, Ca 92037
***@***.***
www.immuneepitope.org
858-752-6912
|
Yes, I have added the accession and the resource name as it appears in IMGT. |
…te alleles for new sequences
This |
…n. Use rdfs:isDefinedBy for allele definitions.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
chain-sequences.tsv
so that the longest protein sequence is chosen from all alleles that have the same first two-fields (e.g. choose the longest protein sequence between HLA-A*01:140:01 and HLA-A*01:140:02)