-
Notifications
You must be signed in to change notification settings - Fork 5
StructuralClusters
Ian Sillitoe edited this page Sep 18, 2018
·
12 revisions
- A CATH Superfamily contains many Functional Families (FunFams)
- If a FunFam contains at least one CATH domain structure then it is included in the Structural Clusters (SCs).
Structural Clusters (SC):
- use structural similarity (5A and 9A) to group FunFam representatives into structural clusters.
- use CORA to create a multiple structure alignment from these clusters.
- use this reference alignment to "glue" the FunFam alignments together.
- add consensus information such as
scorecons
andGroupSim
. - output alignment in
STOCKHOLM
format.
The following is an example of a multiple structure alignment of a structural cluster (view). There are 4 CATH domains, each representing a different FunFam alignment.
>domain|2damA00|4_2_0
---GSSGSSGAPEERDLTQEQTEKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGSGPSSG
-----
>domain|1wj7A01|4_2_0
QATAEQIRLAQMISDHNDADFEEKVKQLIDITGK-NQDECVIALHDCNGDVNRAINVLLEG---------
-----
>domain|2jp7A00|4_2_0
---------------RLNPVQLELLNKLHLETKL-NAEYTFMLAEQSNWNYEVAIKGFQSSMNGIPREAF
VQF--
>domain|2jujA00|4_2_0
--------------ATASPQLSSEIENLMSQG--YSYQDIQKALVIAQNNIEMAKNILREFVS---ISSP
AHVAT
Summary of information on the FunFams that these domains in the structural cluster represent:
Rep | FunFam | Sequences | Alignment |
---|---|---|---|
2damA00/1-67 |
1.10.8.10-ff-14534 | 51 |
.sto .fa
|
1wj7A01/19-78 |
1.10.8.10-ff-15516 | 429 |
.sto .fa
|
2jp7A00/1-57 |
1.10.8.10-ff-5069 | 14 |
.sto .fa
|
2jujA00/1-56 |
1.10.8.10-ff-15593 | 203 |
.sto .fa
|
The structure-based alignments use the residues observed in the ATOM records of PDB files. The sequence-based alignments (FunFam) use the native protein residues (effectively the SEQRES records of the PDB). The correspondence between structure and sequence residues should be handled automatically, but it's worth bearing this in mind.
The merged alignment can be found here: merge.sto
# STOCKHOLM 1.0
#=GF ID 1.10.8.10-FF_SSG9-6
#=GF DE 1.10.8.10, Structural Cluster (FF_SSG9) 6
#=GF AC 1.10.8.10-FF_SSG9-6
#=GF TP FF_SSG9
#=GF DR DOPS: 97.416
#=GS 2damA00/1-67 AC Q96CS3
#=GS 2damA00/1-67 OS Homo sapiens
#=GS 2damA00/1-67 DE FAS-associated factor 2
#=GS 2damA00/1-67 DR CATH; 2dam; A:1-67;
#=GS 2damA00/1-67 DR ORG; Eukaryota; Metazoa; Chordata; Craniata; Mammalia; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae;
Homo; Homo sapiens;
#=GS 2damA00/1-67 DR GO; GO:0005515; GO:0005576; GO:0005783; GO:0005811; GO:0030433; GO:0030970; GO:0031625; GO:0034098; GO:0034389; GO:0035473; GO:0035578; GO:0043130; GO:004
3312; GO:0055102;
#=GS 2damA00/1-67 CLUSTER_ID 14534
2damA00/1-67 ---GSSGSSGAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGSGPSSG-----
Q96CS3/1-60 --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGV-PSV------
Q3TDN2/1-60 --------MAAP..EE....Q..D....LTQ..E.QT..EKLLQFQDLTGIESMEQCRLALEQHNWNMEAAVQDRLNEQEGV-PSV------
B4E2M8/1-60 --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGV-PSV------
Q2HJD0/1-60 --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAAVQDRLNEQEGV-PSV------
T0NRL3/1-46 --------MAAP..EE....R..D....LTQ..E.QT..EKLLQFQDLTGIESMDQCRHTLEQHNWNIEAL---------------------
Q28BP9/1-60 --------MAAP..EE....R..E....LSQ..E.QT..EKLLQFQDLTGIESMDQCRQTLQQHNWNIEAAVQDRLNEQEGV-PSV------
L7N2T6/1-60 --------MAAP..EE....R..E....LSQ..E.QT..EKLLQFQDLTGIESMDQCRQTLQQHNWNIEAAVQDRLNEQEGV-PSV------
#=GC groupsim -----------------------2----454--5-56--4666556664---56565656557746464725676667--------------
#=GC scorecons 112222122322--21-0002--2-000343--4-44--45645744352115354323673344354437343646211000000000000
- a dot
.
signifies a gap that has been opened up in the reference alignment - a lower case character signifies a residue in the FunFam alignment that has no equivalent residue in the reference structure