Skip to content

StructuralClusters

Ian Sillitoe edited this page Sep 18, 2018 · 12 revisions

Using Structure to Merge FunFam Alignments

Overview

  • A CATH Superfamily contains many Functional Families (FunFams)
  • If a FunFam contains at least one CATH domain structure then it is included in the Structural Clusters (SCs).

Structural Clusters (SC):

  • use structural similarity (5A and 9A) to group FunFam representatives into structural clusters.
  • use CORA to create a multiple structure alignment from these clusters.
  • use this reference alignment to "glue" the FunFam alignments together.
  • add consensus information such as scorecons and GroupSim.
  • output alignment in STOCKHOLM format.

Working Example

Structural Cluster

The following is an example of a multiple structure alignment of a structural cluster (view). There are 4 CATH domains, each representing a different FunFam alignment.

>domain|2damA00|4_2_0
---GSSGSSGAPEERDLTQEQTEKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGSGPSSG
-----
>domain|1wj7A01|4_2_0
QATAEQIRLAQMISDHNDADFEEKVKQLIDITGK-NQDECVIALHDCNGDVNRAINVLLEG---------
-----
>domain|2jp7A00|4_2_0
---------------RLNPVQLELLNKLHLETKL-NAEYTFMLAEQSNWNYEVAIKGFQSSMNGIPREAF
VQF--
>domain|2jujA00|4_2_0
--------------ATASPQLSSEIENLMSQG--YSYQDIQKALVIAQNNIEMAKNILREFVS---ISSP
AHVAT

FunFams

Summary of information on the FunFams that these domains in the structural cluster represent:

Rep FunFam Sequences Alignment
2damA00/1-67 1.10.8.10-ff-14534 51 .sto .fa
1wj7A01/19-78 1.10.8.10-ff-15516 429 .sto .fa
2jp7A00/1-57 1.10.8.10-ff-5069 14 .sto .fa
2jujA00/1-56 1.10.8.10-ff-15593 203 .sto .fa

Mapping sequence to structure

The structure-based alignments use the residues observed in the ATOM records of PDB files. The sequence-based alignments (FunFam) use the native protein residues (effectively the SEQRES records of the PDB). The correspondence between structure and sequence residues should be handled automatically, but it's worth bearing this in mind.

Merged Alignment

The merged alignment can be found here: merge.sto

Headers

# STOCKHOLM 1.0
#=GF ID 1.10.8.10-FF_SSG9-6
#=GF DE 1.10.8.10, Structural Cluster (FF_SSG9) 6
#=GF AC 1.10.8.10-FF_SSG9-6
#=GF TP FF_SSG9
#=GF DR DOPS: 97.416
#=GS 2damA00/1-67          AC Q96CS3
#=GS 2damA00/1-67          OS Homo sapiens
#=GS 2damA00/1-67          DE FAS-associated factor 2
#=GS 2damA00/1-67          DR CATH; 2dam; A:1-67;
#=GS 2damA00/1-67          DR ORG; Eukaryota; Metazoa; Chordata; Craniata; Mammalia; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae;
 Homo; Homo sapiens;
#=GS 2damA00/1-67          DR GO; GO:0005515; GO:0005576; GO:0005783; GO:0005811; GO:0030433; GO:0030970; GO:0031625; GO:0034098; GO:0034389; GO:0035473; GO:0035578; GO:0043130; GO:004
3312; GO:0055102;
#=GS 2damA00/1-67          CLUSTER_ID 14534

Sequences

2damA00/1-67                  ---GSSGSSGAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGSGPSSG-----
Q96CS3/1-60                   --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGV-PSV------
Q3TDN2/1-60                   --------MAAP..EE....Q..D....LTQ..E.QT..EKLLQFQDLTGIESMEQCRLALEQHNWNMEAAVQDRLNEQEGV-PSV------
B4E2M8/1-60                   --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGV-PSV------
Q2HJD0/1-60                   --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGV-PSV------
T0NRL3/1-46                   --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAL---------------------
Q28BP9/1-60                   --------MAAP..EE....R..E....LSQ..E.QT..EKLLQFQDLTGIESMDQCRQTLQQHNWNIEAAVQDRLNEQEGV-PSV------
L7N2T6/1-60                   --------MAAP..EE....R..E....LSQ..E.QT..EKLLQFQDLTGIESMDQCRQTLQQHNWNIEAAVQDRLNEQEGV-PSV------

Consensus info

#=GC groupsim                 -----------------------2----454--5-56--4666556664---56565656557746464725676667--------------
#=GC scorecons                112222122322--21-0002--2-000343--4-44--45645744352115354323673344354437343646211000000000000

Notes:

  • a dot . signifies a gap that has been opened up in the reference alignment
  • a lower case character signifies a residue in the FunFam alignment that has no equivalent residue in the reference structure