-
Notifications
You must be signed in to change notification settings - Fork 10
Challenge data file structure
This page contains the file structure of the CELPP weekly pose prediction challenge candidates.
These candidates have not been prepared, but are merely extracted from the filtering stage blastnfilter and stored in a compressed gzipped tar file for easy download.
The challenge data package gzipped tar files are available from this website:
NOTE: The contents of latest.txt file denotes the current challenge data package. The latest.txt file is removed when the weekly challenge ends (Tuesday 3PM PDT) and appears again at the start of the new challenge (Saturday).
The data will be made available in a gzipped tar file. The file will be named with the following convention:
The XX will denote the week of the year ie 14 and YYYY will denote the year ie: 2016.
celpp_week14_2016.tar.gz
The above compressed file will uncompress to a directory with the same name:
$ tar -zxf celpp_week14_2016.tar.gz
$ ls -la
drwxrwxr-x. 3 bob bob 73 Mar 31 15:48 .
drwxr-xr-x. 8 bob bob 4096 Mar 31 15:48 ..
drwxrwxr-x. 5 bob bob 55 Mar 31 15:10 celpp_week14_2016
-rw-rw-r--. 1 bob bob 1055052 Mar 31 15:48 celpp_week14_2016.tar.gz
$
Within the tar file will be the following structure:
readme.txt
center.txt
new_release_crystallization_pH.tsv
new_release_structure_nonpolymer.tsv
new_release_structure_sequence.tsv
<target id>/
<target id>.txt
hiResApo-<target id>_<candidate id>-<candidate ligand id>.pdb
hiResHolo-<target id>_<candidate id>-<candidate ligand id>.pdb
LMCSS-<target id>_<candidate id>-<candidate ligand id>-lig.pdb
LMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb
SMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb
lig_<target ligand id>.smi
lig_<target ligand id>.inchi
lig_<target ligand id>.mol
.
.
.
Definitions of target, candidate, ligand, hiResApo, hiResHolo, LMCSS, & SMCSS can be found here
.
|-- 5hib
| |-- 5hib.txt
| |-- hiResApo-5hib_2eb2.pdb
| |-- hiResHolo-5hib_5hg8-634.pdb
| |-- LMCSS-5hib_5em6-5Q3-lig.pdb
| |-- LMCSS-5hib_5em6-5Q3.pdb
| |-- SMCSS-5hib_4wrg-CSX.pdb
| |-- lig_63M.inchi
| |-- lig_63M.mol
| `-- lig_63M.smi
|-- 5hic
| |-- 5hic.txt
| |-- hiResApo-5hic_2eb2.pdb
| |-- hiResHolo-5hic_5hg8-634.pdb
| |-- LMCSS-5hic_5can-4ZB-lig.pdb
| |-- LMCSS-5hic_5can-4ZB.pdb
| |-- SMCSS-5hic_4wrg-CSX.pdb
| |-- lig_63N.inchi
| |-- lig_63N.mol
| `-- lig_63N.smi
|-- 5j1z
| |-- 5j1z.txt
| |-- hiResApo-5j1z_3t13.pdb
| |-- hiResHolo-5j1z_4eqp-THP.pdb
| |-- LMCSS-5j1z_3dhq-THP-lig.pdb
| |-- LMCSS-5j1z_3dhq-THP.pdb
| |-- SMCSS-5j1z_4eqp-THP.pdb
| |-- lig_THP.inchi
| |-- lig_THP.mol
| `-- lig_THP.smi
`-- readme.txt
query, <target id>
ph, <ph value>
ligand, <target ligand id>
inchi, <inchi string of target ligand>
LMCSS, <candidate id>, <candidate ligand id>
SMCSS, <candidate id>, <candidate ligand id>
hiResHolo, <candidate id>, <candidate ligand id>
hiResApo, <candidate id>
query, 5fz3
ph, 7.5
ligand, 7SI
inchi, InChI=1S/C11H12O2/c12-8-3-4-9(13)11-7-2-1-6(5-7)10(8)11/h3-4,6-7,12-13H,1-2,5H2/t6-,7+
LMCSS, 5fz3, 7SI
SMCSS, 5a1f, OGA
hiResHolo, 5fyz, NYK
hiResApo, 5a3p
CELPP Weekly Pose Prediction Challenge
======================================
celpprunner version: 0.11.3
Week: 14
Year: 2016
This tar file contains the CELPP weekly pose prediction challenge
dataset.
Within this readme.txt is a description of the data in this tar file as well as
a summary of the Blastnfilter run which generated these Candidates.
Tsv files downloaded from
=========================
http://www.wwpdb.org/files/new_release_structure_sequence.tsv
http://www.wwpdb.org/files/new_release_structure_nonpolymer.tsv
http://www.wwpdb.org/files/new_release_crystallization_pH.tsv
Structure of data overview
==========================
This tar file contains a set of directories set to the name of Targets.
Targets are proteins which have primary sequence released,
but not 3D coordinates.
Within each directory are a set of Candidates. Candiates are proteins
with similar structure to the Target that also have known 3D coordinates
which can be used for pose prediction.
For more information visit:
https://github.com/drugdata/D3R
or
https://drugdesigndata.org/about/celpp
Structure of data
=================
Below is a definition of the files and directories within this tar file:
[file or directory <text within denote values that change>]
-- Definition
[readme.txt]
-- Description of data and output from celpp blastnfilter stage of
processing.
[new_release_crystallization_pH.tsv]
[new_release_structure_nonpolymer.tsv]
[new_release_structure_sequence.tsv]
-- Tsv files downloaded from: http://www.wwpdb.org/files
[<target id>]/
[<target id>.txt]
-- Summary of Blastnfilter results for target protein
with PDBID.
[LMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb]
-- Candidate protein for docking which:
1) Passes the Blastnfilter criteria
2) Contains the Ligand with the largest maximum common
substructure (MCSS) to the Target Ligand.
Note: If multiple proteins founded, the protein
with the highest resolution will be picked.
[SMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb]
-- Candidate protein for docking which:
1) Passes the Blastnfilter criteria.
2) Contains the Ligand with the smallest maximum common
substructure (MCSS) to the Target Ligand.
Note: If multiple proteins founded, the protein
with the highest resolution will be picked.
[hiResHolo-<target id>_<candidate id>-<candidate ligand id>.pdb]
-- Candidate protein for docking which:
1) Passes the Blastnfilter criteria.
2) Has the highest resolution among all holo proteins.
[hiResApo-<target id>_<candidate id>-<candidate ligand id>.pdb]
-- Candidate protein for docking which:
1) Passes the Blastnfilter criteria.
2) Has the highest resolution among all apo proteins.
[LMCSS-<target id>_<candidate id>-<candidate ligand id>-lig.pdb]
-- Contains the 3D coordinate of the atoms for the ligand in the MaxMCSS candidate (largest) protein.
[lig_<candidate ligand id>.smi]
-- Canonical smile string of the Target Ligand which will be
used in later docking.
[lig_<candidate ligand id>.inchi]
-- Inchi string of the Target Ligand.
[lig_<candidate ligand id>.mol]
-- 2D structure of the Target Ligand.
Blastnfilter summary
====================
INPUT SUMMARY
entries: 214
complexes: 160
dockable complexes: 86
monomers: 142
dockable monomers: 62
multimers: 72
dockable multimers: 24
FILTERING CRITERIA
No. of query sequences <= 1
No. of dockable ligands = 1
Percent identity >= 0.95
Percent Coverage >= 0.9
No. of hit sequences <= 4
Structure determination method: x-ray diffraction
OUTPUT SUMMARY
Targets found: 48
Target: 5hw0|Sequences: 1|Hits: 46|Candidates: 8|Elected:5|PDBids: 1o7j,1jsr,1jsl,1hg1,1hfw
Target: 4xf1|Sequences: 1|Hits: 44|Candidates: 40|Elected:4|PDBids: 4xff,4xez,4jto,4ieu
Target: 5f52|Sequences: 1|Hits: 46|Candidates: 8|Elected:4|PDBids: 1o7j,1jsr,1jsl,1hg1
Target: 5i67|Sequences: 1|Hits: 50|Candidates: 9|Elected:4|PDBids: 4r43,4wiu,4wl8,4rcg
Target: 5hi0|Sequences: 1|Hits: 27|Candidates: 10|Elected:0|PDBids:
Target: 5hi2|Sequences: 1|Hits: 294|Candidates: 12|Elected:4|PDBids: 4xv9,4r5y,4jvg,4cqe
Target: 5edt|Sequences: 1|Hits: 326|Candidates: 28|Elected:3|PDBids: 1n40,4ktl,4ktk
Target: 4xf3|Sequences: 1|Hits: 44|Candidates: 40|Elected:4|PDBids: 4xff,4xez,4jto,4ieu
Target: 5f8y|Sequences: 1|Hits: 3|Candidates: 1|Elected:1|PDBids: 5duy
Target: 5fzl|Sequences: 1|Hits: 98|Candidates: 21|Elected:5|PDBids: 5fz6,5fyz,5a3p,5fpu,5fzb
Target: 5fzk|Sequences: 1|Hits: 98|Candidates: 21|Elected:4|PDBids: 5fyz,5fup,5a3p,5fpu
Target: 5ibd|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktk
Target: 5ibe|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktk
Target: 5ibf|Sequences: 1|Hits: 326|Candidates: 28|Elected:3|PDBids: 1n40,4ktl,4ktk
Target: 5ibg|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktk
Target: 4zvl|Sequences: 1|Hits: 99|Candidates: 60|Elected:5|PDBids: 4fgl,4qod,4zvn,4zvk,4u7h
Target: 5ibh|Sequences: 1|Hits: 326|Candidates: 28|Elected:6|PDBids: 1n40,4ktl,4ktk,4ktj,4ktf,4ipw
Target: 5ibi|Sequences: 1|Hits: 326|Candidates: 28|Elected:3|PDBids: 1n40,4ktl,4ktf
Target: 5ibj|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktl
Target: 5j22|Sequences: 1|Hits: 287|Candidates: 151|Elected:9|PDBids: 3t13,3dhq,3erq,4hmj,4eqp,3d4d,5ekl,5ekk,1tr5
Target: 5j1z|Sequences: 1|Hits: 287|Candidates: 151|Elected:9|PDBids: 3dhq,3t13,3erq,3d4d,4eqp,4hmj,5ekl,5ekk,1tr5
Target: 5bmg|Sequences: 1|Hits: 90|Candidates: 7|Elected:0|PDBids:
Target: 5hmi|Sequences: 1|Hits: 94|Candidates: 31|Elected:6|PDBids: 3g03,3tu1,3lbl,4oba,1t4e,4oq3
Target: 5hib|Sequences: 1|Hits: 347|Candidates: 99|Elected:5|PDBids: 5em6,4rj6,5hg8,4wrg,2eb2
Target: 5hic|Sequences: 1|Hits: 347|Candidates: 99|Elected:6|PDBids: 5can,5c8m,5c8k,5hg8,4wrg,2eb2
Target: 5hid|Sequences: 1|Hits: 294|Candidates: 12|Elected:2|PDBids: 4xv9,3og7
Target: 5hie|Sequences: 1|Hits: 302|Candidates: 33|Elected:4|PDBids: 4e26,3skc,5ct7,1uwh
Target: 5fz0|Sequences: 1|Hits: 98|Candidates: 21|Elected:4|PDBids: 5fyz,5fyt,5a3p,5a1f
Target: 1fcz|Sequences: 1|Hits: 271|Candidates: 9|Elected:3|PDBids: 1fcy,2lbd,1fcz
Target: 5hzn|Sequences: 1|Hits: 311|Candidates: 13|Elected:4|PDBids: 3lw0,1p4o,3nw7,3lvp
Target: 5bmh|Sequences: 1|Hits: 90|Candidates: 7|Elected:0|PDBids:
Target: 5bmi|Sequences: 1|Hits: 90|Candidates: 7|Elected:0|PDBids:
Target: 4zz4|Sequences: 1|Hits: 347|Candidates: 214|Elected:3|PDBids: 2w5i,1o0o,1o0h
Target: 4z51|Sequences: 1|Hits: 11|Candidates: 1|Elected:1|PDBids: 3sop
Target: 5d75|Sequences: 1|Hits: 166|Candidates: 2|Elected:2|PDBids: 1pbk,3kz7
Target: 4z54|Sequences: 1|Hits: 11|Candidates: 1|Elected:1|PDBids: 3sop
Target: 5ftq|Sequences: 1|Hits: 312|Candidates: 37|Elected:5|PDBids: 2xba,4z55,4anl,4tt7,4fnx
Target: 5fkm|Sequences: 1|Hits: 55|Candidates: 23|Elected:5|PDBids: 2xpw,2xpu,2x6o,1bjz,4d7n
Target: 5fkn|Sequences: 1|Hits: 55|Candidates: 23|Elected:5|PDBids: 2xpw,2xpu,2x6o,1bjz,4d7n
Target: 5fko|Sequences: 1|Hits: 55|Candidates: 23|Elected:5|PDBids: 2xpw,2xpv,2x6o,1bjz,4d7n
Target: 5fkk|Sequences: 1|Hits: 55|Candidates: 23|Elected:6|PDBids: 4abz,2xpw,2xpv,2x6o,1bjz,4d7n
Target: 5fkl|Sequences: 1|Hits: 56|Candidates: 23|Elected:4|PDBids: 2xpw,2x6o,1bjz,4d7n
Target: 4rfr|Sequences: 1|Hits: 33|Candidates: 8|Elected:2|PDBids: 3i3q,4jht
Target: 5hmh|Sequences: 1|Hits: 94|Candidates: 30|Elected:6|PDBids: 3tu1,3g03,3lbl,4oba,1t4e,4oq3
Target: 5icr|Sequences: 1|Hits: 53|Candidates: 2|Elected:2|PDBids: 5ey8,5d6j
Target: 5fto|Sequences: 1|Hits: 312|Candidates: 37|Elected:12|PDBids: 2xba,4cnh,4cmo,4ccu,4ccb,4z55,3lct,3lcs,4fob,4anl,4tt7,4fnx
Target: 5hmk|Sequences: 1|Hits: 94|Candidates: 7|Elected:3|PDBids: 4hbm,3g03,3tu1
Target: 5frw|Sequences: 1|Hits: 5|Candidates: 5|Elected:3|PDBids: 5fs0,5frv,5fru