Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Calculate Properties" for macromolecules #2552

Open
ljubica-milovic opened this issue Oct 14, 2024 · 0 comments
Open

"Calculate Properties" for macromolecules #2552

ljubica-milovic opened this issue Oct 14, 2024 · 0 comments

Comments

@ljubica-milovic
Copy link
Collaborator

ljubica-milovic commented Oct 14, 2024

Issue on Ketcher side: #5727

Bellow is a list of properties that should be included:

Name Symbol Unit Explanation Type of biomolecule Note
Molecular mass M kDa (Da and MDa rarely used; 1Da=1g/mol) The mass of one mole (6,022 x 20^24 molecules) of the substance Peptide/RNA/DNA
Isoelectric point pI Dimensionless pH at what the protein has no charge Peptide Median of all pKa values for that peptide
Melting temperature Tm (m in subscript) Celsius Temperature at what half of the NA is denatured RNA/DNA
Extinction coefficient ε (lowercase epsilon) 1/Mcm; M=mol/L Measure of how much light of a specific wavelength the substance absorbs; constant for one substance and one λ Peptide We will have ε only for λ=280nm
Hydrophobicity "Fear of water", how much certain residues dislike a water environment; it can influence many things like: solubility, the ability to pass though the cell membrane etc. Peptide The output should not be a number, but a graph with the x-axis representing residue numbers, and the y representing hydrophobicity of every amino acid residue
Monomer count How many monomers of each type there are in the biomolecule, for example: 3 alanines, 5 cysteines etc. Peptide/RNA/DNA

Molecular mass

Any one structure in macromolecules mode

  1. The mass of one monomer is the mass of the structure minus the mass of leaving group atom(s) if an attachment point is occupied.

M(Cys)=121,16g/mol
M(Cys, R1 occupied)=M(Cys)-M(H)=120,15g/mol
M(Cys, R1∧R2 occupied)=M(Cys)-M(H)-M(OH)=103.14g/mol

  1. The molecular mass of the whole polymer is the sum of the molecular masses of its monomers plus the molecular mass of any small molecules.
  2. Indigo should return one numerical value with the unit in kDa (1kDa=1000g/mol)

Isoelectric point

Only peptides

Peptide is any chain that has one or more amino acids in the backbone.

  1. pKa values for all ionizable groups of all monomers should be determined, ignoring the leaving group atoms if an attachment point is occupied.
  2. pI should be the median (not the mean!!!) of all pKa values for all groups of that polymer.
  3. Indigo should return one numerical value.

Melting temperature

Only for two chains of RNA/DNA where every base is connected via a hydrogen bond to a base from the other chain

  1. Variables for the equation are:
    SP (strength parameter per base),
    L (length of nucleotide sequence),
    UPC (molar (mol/L) concentration of unipositive cations; value can be entered by user in mM, but the default value is the average physiological - 140 mM)
    NAC (molar (M=mol/L) concentration of the nucleotide strands; user should enter the value in units μM or nM)

  2. Bases C, T and U are pyrimidines (Y), bases A and G are purines (R). Indigo should read only one chain from the 5' direction observing pairs of nucleotides and assign them a strength parameter (see bellow). Dividing the sum of strength parameters by a number of bases, one gets the strength parameter per base (SP)

Let's say we have a double stranded DNA with the sequence of one strand being: 5'-GACGAATGCT-3'
First we observe the pair GA - in the table bellow we get 8.
For AC=10; CG=10; GA=8; AA=5; AT=7; TG=7; GC=13; CT=8.
For a ten nucleotide sequence we have 9 nucleotide pairs whose sum of strength parameters is 76. So, the strength per base is (SP) 7,6.

  1. The equation for the melting temperature is as follows:
    Tm [°C] = 7,35 * SP + 17,34 * ln(L) + 4,96 * ln(UPC) + 0,89 * ln(NAC) - 25,42

  2. Inspecto should return one numerical value.

RY YY RR YR
CG=13 CC=11 GG=11 CG=10
AC=10 TC/UC=8 AG=8 TG/UG=7
GT/GU=10 CT/CU=8 GA=8 CA=7
AT/AU=7 TT/UU/TU/UT=5 AA=5 TA/UA=4

A, C, G, T, and U are to be considered natural analogues.

Extinction coefficient

Only peptides

  1. For peptides the extinction coefficient (at λ=280nm) is ε = N(W)*5500 + N(Y)*1490 + N(C)*125, where N(W), N(Y), and N(C) are number of tryptophans, tyrosines and cysteines.
  2. Inspecto should return one numerical value.

Hydrophobicity

Only peptides

  1. Indigo should return a list with the x-axis values representing amino acid number (skipping non-amino acids, ambiguous amino acids and amino acids with natural analogue X), and the y-axis values representing the hydrophobicity coefficient of that amino acid (see bellow).
Natural analogue of amino acid Coefficient Natural analogue of amino acid Coefficient Natural analogue of amino acid Coefficient Natural analogue of amino acid Coefficient
A 0,616 G 0,501 M 0,738 S 0,359
C 0,680 H 0,165 N 0,236 T 0,450
D 0,028 I 0,943 P 0,711 V 0,825
E 0,043 K 0,283 Q 0,251 W 0,878
F 1,000 L 0,943 R 0,000 Y 0,880

Monomer count

Peptides, RNA, DNA

  1. For peptides, every monomer should be sorted into one of 21 categories (see bellow) and counted.

Peptide is any chain that has one or more amino acids in the backbone.

  1. For RNA/DNA, only bases (who are a part of a nucleotide/nucleoside) should be sorted into one of 6 categories (see bellow) and counted.

RNA/DNA is any chain that has one sugar in the backbone and a base connected to it via R3 (sugar) - R1 (base).

  1. Indigo should return a list containing the number of monomers in each category.
  • For peptides:
Symbol Monomers Symbol Monomers Symbol Monomers
A Alanine, and all other amino acids with natural analogue A I Isoleucine, and all other amino acids with natural analogue I R Arginine, and all other amino acids with natural analogue R
C Cysteine, and all other amino acids with natural analogue C K Lysine, and all other amino acids with natural analogue K S Serine, and all other amino acids with natural analogue S
D Aspartic acid, and all other amino acids with natural analogue D L Leucine, and all other amino acids with natural analogue L T Threonine, and all other amino acids with natural analogue T
E Glutamic acid, and all other amino acids with natural analogue E M Methionine, and all other amino acids with natural analogue M V Valine, and all other amino acids with natural analogue V
F Phenylalanine, and all other amino acids with natural analogue F N Asparagine, and all other amino acids with natural analogue N W Tryptophan, and all other amino acids with natural analogue W
G Glycine, and all other amino acids with natural analogue G P Proline, and all other amino acids with natural analogue P Y Tyrosine, and all other amino acids with natural analogue Y
H Histidine, and all other amino acids with natural analogue H Q Glutamine, and all other amino acids with natural analogue Q Other Amino acids with natural analogues O, U, and X; ambiguous amino acids; all non-amino acid monomers
  • For RNA/DNA:
Symbol Bases
A Adenine, and all other bases with natural analogue A
C Cytosine, and all other bases with natural analogue C
G Guanine, and all other bases with natural analogue G
T Thymine, and all other bases with natural analogue T
U Uracil, and all other bases with natural analogue U
Other Bases with natural analogue X; ambiguous bases
@ljubica-milovic ljubica-milovic changed the title [DRAFT] "Calculate Properties" for macromolecules "Calculate Properties" for macromolecules Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant