Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcdk::matches() function bugs #136

Open
YANGJJ93MS opened this issue Oct 28, 2022 · 3 comments
Open

rcdk::matches() function bugs #136

YANGJJ93MS opened this issue Oct 28, 2022 · 3 comments

Comments

@YANGJJ93MS
Copy link

There is an issue for substructure match function. I got true value even if the substructure is not in the query molecule.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


There is an issue for substructure match function. I got true value even if the substructure is not in the query molecule.

mol = parse.smiles('CN(C)c1cccc2c(S(=O)(=O)Oc3ccc4c5c3OC3C(=O)CC(O)C6(O)C(C4)N(CC4CC4)CCC536)cccc12')
query =  'CN(C)CCc1ccccc1'
rcdk::matches(query,mol)

Screenshots

"rcdk::matches(query2,mol1)
CN(C)c1cccc2c(S(=O)(=O)Oc3ccc4c5c3OC3C(=O)CC(O)C6(O)C(C4)N(CC4CC4)CCC536)cccc12.match
TRUE"

System (please complete the following information):

  • OS: windows 11
  • R-4.1.1
@zachcp
Copy link
Contributor

zachcp commented Feb 12, 2023

Hi, thanks for your report. So I am less familiar with the SMARTS but here's what I've found:

  • rcdk::matches uses SMARTSQueryTool under the hood.
  • SMARTSQueryTool is deprecated
  • SMARTSPattern is prefferred.
  • I am not familiar enough with SMARTS to know if your example is truly expected to be negative. You should probaby confirm on the CDK users mailing list with your specific pattern.
  • rcdk should either fix or deprecate this function (@rajarshi would need to weigh in there)

@rajarshi
Copy link
Collaborator

rajarshi commented Feb 12, 2023 via email

@YANGJJ93MS
Copy link
Author

Dear Rajarshi,

Thank you for your reply!

I found that the rdkit substructer matching fucntion did the same mistake. Please kindly find the picture below:
figure-cdkit

As a matter of fact, the substructure that I am looking for is an benzene structure with an ammonia side chain, which is totally different from the naphthalene structure. I believed the reason is that the algorithm took the naphthalene structure as an alkyl structure.

Best regards,
Junjie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants