Add G groups, frequency data, and update protein sequence to #90

apmody · 2021-03-30T22:54:59Z

Add three and four field HLA alleles ("MHC gene allele") and corresponding nucleotide sequences.
Add terms for G groups
Choose "best" protein sequence for chain-sequences.tsv so that the longest protein sequence is chosen from all alleles that have the same first two-fields (e.g. choose the longest protein sequence between HLA-A*01:140:01 and HLA-A*01:140:02)
Add frequency data based on CIWD 3.0 data for different HLA alleles (data, paper)

…roup terms

…and 3) or class II (exon 2)

…hla.dat file

…worked index update code

…for protein match between current MRO and allele

rvita · 2021-03-30T23:55:24Z

For item #3, do you also update the accession and the resource name to match the new sequence?

…

On Tue, Mar 30, 2021 at 3:55 PM Apurva Mody ***@***.***> wrote: 1. Add three and four field HLA alleles ("MHC gene allele") and corresponding nucleotide sequences. 2. Add terms for G groups (http://hla.alleles.org/alleles/g_groups.html ) 3. Choose "best" protein sequence forchain-sequences.tsv so that the longest protein sequence is chosen from all alleles that have the same first two-fields (e.g. choose the longest protein sequence between HLA-A*01:140:01 and HLA-A*01:140:02) 4. Add frequency data based on CIWD 3.0 data for different HLA alleles (data <https://www.ihiw18.org/component-immunogenetics/download-common-and-well-documented-alleles-3-0/>, paper <https://onlinelibrary.wiley.com/doi/full/10.1111/tan.13811> ------------------------------ You can view, comment on, or merge this pull request online at: #90 Commit Summary - First pass: Get G groups, get full CDS sequences, import population group terms - Reworked parsing of hla.dat file, add template strings - Get appropriate exons for G group when MHC allele is class I (exon 2 and 3) or class II (exon 2) - Merge branch 'master' into G_group - Added code to add terms to index.tsv and error report - Merge branch 'master' into G_group - Changed template strings - Added alleles which have partial sequence - Organized gen_allele_update_seq.py into functions, Modified import.txt - First pass at extracting population frequency data - Merge branch 'master' into G_group - Added code to verify accession numbers from frequency data and build/hla.dat file - Second pass adding population frequency data for chains, G groups, reworked index update code - Add frequency of gene_alleles - Added helper function for getting G group exons, removed requirement for protein match between current MRO and allele - Fix protein sequence to get best protein sequence File Changes - *M* Makefile <https://github.com/IEDB/MRO/pull/90/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52> (38) - *M* index.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-03713472c9d385b498690e3c0f507cd67f378e9db1f2ef63711897c392c2ed7e> (27251) - *A* ontology/G-group-frequencies.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-701bac6402654020951e3c390d75587dd4756261438e663cdda7832394e7690e> (277) - *A* ontology/G-group.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-176f1865135d177fb0e7f02ac311066b2b935963d6d3d259b98f40f55e7cf750> (514) - *A* ontology/chain-frequencies.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-9ed7c8aa1d1ccac9b602d7c96c8643687e2485db3738cf4268127567d03f02dc> (784) - *M* ontology/chain-sequence.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-93686f7a9062e877738eb4ef9e8bac75995fd7c448012a9f57dfcddbf79b29c7> (727) - *M* ontology/core.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-cce693e5e496c2640c02d0f726ef8c1b40657ba6389aee1b1040d246ad55494d> (2) - *A* ontology/frequency-properties.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-3ccc40ddae15df9428e8a3ad74ccddb26dd834f2e18b3585011dbedbdf2b527c> (10) - *A* ontology/gene-allele-frequencies.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-150e1823d5eff9a49e96fe52034affca6e0ae7019c325cd9d35e99ed5804d9a0> (8427) - *A* ontology/gene-alleles.tsv <https://github.com/IEDB/MRO/pull/90/files#diff-91369a4d23b970756c572cca4d3e92f1e572e04fc5a373f82cd5ef9b5314e732> (26731) - *M* ontology/import.txt <https://github.com/IEDB/MRO/pull/90/files#diff-5a8021ed1bc10adb45b4669b0a9b18150676f1e4c5d637eda5219bb499a66dc4> (9) - *M* requirements.txt <https://github.com/IEDB/MRO/pull/90/files#diff-4d7c51b1efe9043e44439a949dfd92e5827321b34082903477fd04876edb7552> (1) - *M* src/scripts/prefixes.sql <https://github.com/IEDB/MRO/pull/90/files#diff-fab310065ad989bdb99a832e186f9952e28139b74e172b566ca8944b2f5746f0> (5) - *A* src/update_gene_allele_seq.py <https://github.com/IEDB/MRO/pull/90/files#diff-494ef043e3467db44f01ea1e46731661c064f5f6e52369884cb31b7012957d8d> (499) Patch Links: - https://github.com/IEDB/MRO/pull/90.patch - https://github.com/IEDB/MRO/pull/90.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#90>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADKDY422AMITODDDRD2A6YTTGJJFFANCNFSM42DEXWVQ> .

-- Randi Vita, M.D. Lead Ontology and Quality Manager Immune Epitope Database and Analysis Project La Jolla Institute for Allergy & Immunology 9420 Athena Circle La Jolla, Ca 92037 ***@***.*** www.immuneepitope.org 858-752-6912

apmody · 2021-04-01T15:33:33Z

For item #3, do you also update the accession and the resource name to match the new sequence?

Yes, I have added the accession and the resource name as it appears in IMGT.

…te alleles for new sequences

jamesaoverton · 2021-07-02T19:22:47Z

This make mro.owl task must pass before this can be merged. It's failing for me locally and here in GitHub Actions.

…n. Use rdfs:isDefinedBy for allele definitions.

…ne-alleles.tsv

apmody added 18 commits March 12, 2021 12:00

First pass: Get G groups, get full CDS sequences, import population g…

55b2e52

…roup terms

Reworked parsing of hla.dat file, add template strings

3cba443

Get appropriate exons for G group when MHC allele is class I (exon 2 …

64d9499

…and 3) or class II (exon 2)

Merge branch 'master' into G_group

9ad56b0

Added code to add terms to index.tsv and error report

7ad3293

Merge branch 'master' into G_group

fa40809

Changed template strings

f6811b8

Added alleles which have partial sequence

a9188d2

Organized gen_allele_update_seq.py into functions, Modified import.txt

6b868b1

First pass at extracting population frequency data

5b3e511

Merge branch 'master' into G_group

6be8051

Added code to verify accession numbers from frequency data and build/…

e5c5f30

…hla.dat file

Second pass adding population frequency data for chains, G groups, re…

5f5e776

…worked index update code

Add frequency of gene_alleles

fb31e00

Added helper function for getting G group exons, removed requirement …

103ccb7

…for protein match between current MRO and allele

Fix protein sequence to get best protein sequence

919ae1c

Fixed bug in update_gene_allele_seq to add IMGT accession

87e938a

Merge branch 'master' into G_group

7f7f8d2

apmody added 10 commits April 7, 2021 07:33

Reset chain-sequence.tsv, chain.tsv, molecule.tsv, index.tsv and upda…

928988e

…te alleles for new sequences

Fix bugs in updating G groups, population frequency, and gene sequences

de416dd

Fix bug to add more frequency data for chains

98a88d7

Merge branch 'master' into G_group

7354cd2

Added NCIT terms for MHC genes.

ce7ea5f

Fixed makefile, header on ontology templates, added definition source

9ed1dab

Moved external_ncit.tsv to external-ncit.tsv

42b6f99

Update chain-frequencies.tsv and gene-alleles.tsv ontology tables

0b5af01

Change table name in Makefile, added allele information

f3c590a

Merge branch 'master' into G_group

f43c6f6

apmody added 14 commits July 8, 2021 12:19

Drop repeated two-field entries. Add World instead of Total populatio…

e01eb6f

…n. Use rdfs:isDefinedBy for allele definitions.

Update HLA chains and and subclass locus for G group

49c7dd5

Merge branch 'master' into G_group

1845f64

Add updated index.tsv

d8be297

Merge branch 'master' into G_group

8cb045f

Update Makefile for update_gene_alleles_seq.py

18b5d37

Remove external-obi.tsv from assign-ids.py. Fixed locus column for ge…

40340b5

…ne-alleles.tsv

Fixed prefixes.sql

a8c9bc3

Delete external-obi.tsv

12c72a2

Rename ontology/allele-information.tsv to gene-allele.tsv

6b0dfcb

Removed allele information stuff and put under gene-allele

af968b2

Fixed index.tsv true value in obsolete column

ae10487

Moved frequency-properties.tsv to properties.tsv

3953b42

Add tables for mro.xlsx target

43ddaa1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add G groups, frequency data, and update protein sequence to #90

Add G groups, frequency data, and update protein sequence to #90

apmody commented Mar 30, 2021 •

edited

Loading

rvita commented Mar 30, 2021 via email

apmody commented Apr 1, 2021

jamesaoverton commented Jul 2, 2021

Add G groups, frequency data, and update protein sequence to #90

Are you sure you want to change the base?

Add G groups, frequency data, and update protein sequence to #90

Conversation

apmody commented Mar 30, 2021 • edited Loading

rvita commented Mar 30, 2021 via email

apmody commented Apr 1, 2021

jamesaoverton commented Jul 2, 2021

apmody commented Mar 30, 2021 •

edited

Loading