Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only "0" returned for AromaticAtomsCountDescriptor and AromaticBondsCountDescriptor #147

Open
bsedio opened this issue Nov 8, 2023 · 4 comments

Comments

@bsedio
Copy link

bsedio commented Nov 8, 2023

Hello rcdk,

I am querying the CDK chemical properties using smiles generated using the CSI:FingerID module of Sirius. For some reason, I get "0" for number of aromatic atoms and bonds for every metabolite in my dataset, which includes many aromatic compounds (flavonoids, alkaloids from plant extracts).

Here is a small example:

dn.arom = c("org.openscience.cdk.qsar.descriptors.molecular.AromaticBondsCountDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.AromaticAtomsCountDescriptor")
mol.aromtest = parse.smiles("C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4")
mol.aromtest = parse.smiles(c("C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4","CC(=NO)C1=CC(=C(C=C1CC2=NC=CC3=CC(=C(C=C32)OC)OC)OC)OC"))
desc.arom.test = eval.desc(mol.aromtest, dn.arom)
head(desc.arom.test)
                                                                    nAromBond naAromAtom
C=CC1C2CC3C4=C(CCN3C(=O)C2=COC1OC5C(C(C(C(O5)CO)O)O)O)C6=CC=CC=C6N4         0          0
CC(=NO)C1=CC(=C(C=C1CC2=NC=CC3=CC(=C(C=C32)OC)OC)OC)OC                      0          0

I am using R version 4.3.1 "Beagle Scouts" on Mac OS 12.6 Monterey
and
java version "1.8.0_391"
Java(TM) SE Runtime Environment (build 1.8.0_391-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rcdk_3.8.1 rcdklibs_2.8 rJava_1.0-6

loaded via a namespace (and not attached):
[1] compiler_4.3.1 parallel_4.3.1 fingerprint_3.5.7 iterators_1.0.14 itertools_0.1-3 png_0.1-8

Thank you very much for any assistance,
Brian

@zachcp
Copy link
Contributor

zachcp commented Mar 8, 2024

Hi Brian,

Sorry for the slow reply. You have to set/find aromaticity explicitly. I followed the CDK docs to set this explicitly and then Aromatic features are calculated as you would expect.

library(rcdk)

mols <- parse.smiles("C1=CC=CC=C1")
descriptor1 <- .jnew('org.openscience.cdk.qsar.descriptors.molecular.AromaticBondsCountDescriptor')
descriptor1$initialise(get.chem.object.builder())
val <- descriptor1$calculate(mols[[1]])
val$getNames()
val$getValue()
# zero

# set aromaticity explicitly
# follow the CDK text
# http://cdk.github.io/cdk/2.9/docs/api/org/openscience/cdk/aromaticity/Aromaticity.html

electron_donation <- J('org.openscience.cdk.aromaticity.ElectronDonation')
aromaticity <- J('org.openscience.cdk.aromaticity.Aromaticity')
cycles <- J('org.openscience.cdk.graph.Cycles')
model  <- electron_donation$daylight()
cyc    <- cycles$or(cycles$all(), cycles$all(as.integer(6)))

aroma  <- new(aromaticity, model, cyc)
aroma$apply(mols)


descriptor3 <- .jnew('org.openscience.cdk.qsar.descriptors.molecular.AromaticAtomsCountDescriptor')
descriptor3$initialise(get.chem.object.builder())
val3 <- descriptor2$calculate(mols)
val3$getNames()
val3$getValue()
# 6

@zachcp
Copy link
Contributor

zachcp commented Mar 8, 2024

So something like this could work:

set_aromatic <- function(molecules) {
  electron_donation <- J('org.openscience.cdk.aromaticity.ElectronDonation')
  aromaticity <- J('org.openscience.cdk.aromaticity.Aromaticity')
  cycles <- J('org.openscience.cdk.graph.Cycles')
  model  <- electron_donation$daylight()
  cyc    <- cycles$or(cycles$all(), cycles$all(as.integer(6)))
  
  aroma  <- new(aromaticity, model, cyc)
  
  for (mol in mols) {
    aroma$apply(mols)  
  }

  molecules  
  
}

@zachcp
Copy link
Contributor

zachcp commented Mar 8, 2024

another workaround is using lowercase in SMILES to denote aromaticity:

c1=cc=cc=c1 vs: C1=CC=CC=1

@zachcp
Copy link
Contributor

zachcp commented Mar 8, 2024

related: #111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants