-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-define compound table #35
Comments
For ChEBI (2018-12-03):
So, we don't have an InChI for all of them and we have compounds with the same InChI! Apart from the name and the ID these compounds are however identical: compound_id compound_name
8564 CHEBI:17775 7,9-dihydro-1H-purine-2,6,8(3H)-trione
18506 CHEBI:46811 2,6-dihydroxy-7,9-dihydro-8H-purin-8-one
18507 CHEBI:46814 9H-purine-2,6,8-triol
18509 CHEBI:46817 7H-purine-2,6,8-triol
18513 CHEBI:46823 1H-purine-2,6,8-triol
27249 CHEBI:62589 6-hydroxy-1H-purine-2,8(7H,9H)-dione
inchi
8564 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18506 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18507 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18509 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
18513 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
27249 InChI=1S/C5H4N4O3/c10-3-1-2(7-4(11)6-1)8-5(12)9-3/h(H4,6,7,8,9,10,11,12)
inchi_key formula mass
8564 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18506 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18507 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18509 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
18513 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
27249 LEHOTFFKMJEONL-UHFFFAOYSA-N C5H4N4O3 168.028
>
Question is whether these compounds would have different MS2 spectra? If so it would not make sense to combine them! Some of the compounds without an inchi are listed below: compound_id compound_name inchi inchi_key
3 CHEBI:10003 ribostamycin sulfate <NA> <NA>
15 CHEBI:10036 wax ester <NA> <NA>
91 CHEBI:10283 2-hydroxy fatty acid <NA> <NA>
140 CHEBI:10545 electron <NA> <NA>
148 CHEBI:10583 kappa-carrageenan <NA> <NA>
154 CHEBI:106304 sphingomyelin d18:1/16:0 <NA> <NA>
formula mass
3 C17H34N4O10.(H2O4S)n NA
15 CO2R2 43.990
91 C2H3O3R __ C2H3O3R(CH2)n 75.008
140 <NA> 0.000
148 (C12H17O12S)n NA
154 C39H79N2O6P 702.568
|
In the case of CHEBI:46814 and CHEBI:46817 for instance (and I suspect the rest of them) then they are not the same chemical at first glance (see below, different locations of a hydrogen), but in fact they are tautomers of each other. This is also indicated in the CHEBI entries of some of them if you look them up in CHEBI. That means they readily convert from one to the other without any external input (energy or otherwise) and thus should really be thought of as a mixture of all of them. The MS2 spectrum "should" be similar if not identical, buut the actualy ionization conditions (pH, buffer ions etc) might also have a big effect leading to different MS2 spectra. Here I would suggest to get input from people that are actually working with tautomers to hear what they have to say about it. |
Thanks for your input @SiggiSmara ! I'll try to get some input from people actually working with MS2 spectra and identification. |
I have no experience with tautomers but one option could be to use the SMILES where this is explicit. You can also generate a non-standard InChI with the fixed-H layer from the SMILES. |
Had also feedback from Steffen. They use the same approach than pubchem: a compound table with unique InChI and a substance table with additional annotations (eventually multiple entries per compound). |
The purpose of the
compound
table:The question however is how to define a compound. What is a compound? An entity with its unique, own InChI? Structure == compound?
For the HMDB database it was pretty straight forward as HMDB provides compound identifiers. MoNa (issue #23)and Massbank (issue #34) however are more complicated as they don't allow to unify the data.
What we should do:
The text was updated successfully, but these errors were encountered: