-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCI60 - Several incorrect values in chem_name #236
Comments
Do we know what drugs are causing this? |
Out of this small list (the 6 errors above were all of the errors that the validation script found), only two could be found through this simple search. Both mapped to the same drug: SMI_54937 , Pubchem ID 581 which can be found on pubchem. The values are not seen as identifiers here. |
Here is the REST call for that compound (according to the pubchem_retrieval.py script): https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/CID/581/synonyms/JSON I think these are somehow missed by the pubchem call and instead get added as NSC identifiers, but without the 'NSC' prefix.
|
I have been working on the schema checker and the previous errors were truncated. This issue is also in the other datasets that use these drug. This may help with tracking down the issue.
|
Was this resolved in #237? |
The schema was changed to allow for any values in the chem_name column, but
these specific 5 or so chems are still present.
|
I will look into where this is occurring during the pubchem / build process. |
The validation script is finding several errors in the NCI60 chem_name column in the drugs file.
(This currently translates to be in broad_sanger_drugs.tsv)
The text was updated successfully, but these errors were encountered: