Skip to content

Challenge data file structure

shuail edited this page Apr 3, 2017 · 10 revisions

This page contains the file structure of the CELPP weekly pose prediction challenge candidates.

These candidates have not been prepared, but are merely extracted from the filtering stage blastnfilter and stored in a compressed gzipped tar file for easy download.

Challenge data download link

The challenge data package gzipped tar files are available from this website:

NOTE: The contents of latest.txt file denotes the current challenge data package. The latest.txt file is removed when the weekly challenge ends (Tuesday 3PM PDT) and appears again at the start of the new challenge (Saturday).

Data file name and type

The data will be made available in a gzipped tar file. The file will be named with the following convention:

celpp_weekXX_YYYY.tar.gz

The XX will denote the week of the year ie 14 and YYYY will denote the year ie: 2016.

Example

celpp_week14_2016.tar.gz

The above compressed file will uncompress to a directory with the same name:

$ tar -zxf celpp_week14_2016.tar.gz
$ ls -la
drwxrwxr-x. 3 bob bob      73 Mar 31 15:48 .
drwxr-xr-x. 8 bob bob    4096 Mar 31 15:48 ..
drwxrwxr-x. 5 bob bob      55 Mar 31 15:10 celpp_week14_2016
-rw-rw-r--. 1 bob bob 1055052 Mar 31 15:48 celpp_week14_2016.tar.gz
$

Data file structure

Within the tar file will be the following structure:

readme.txt
center.txt
new_release_crystallization_pH.tsv
new_release_structure_nonpolymer.tsv
new_release_structure_sequence.tsv
<target id>/
        <target id>.txt
        hiResApo-<target id>_<candidate id>-<candidate ligand id>.pdb
        hiResHolo-<target id>_<candidate id>-<candidate ligand id>.pdb
        LMCSS-<target id>_<candidate id>-<candidate ligand id>-lig.pdb
        LMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb
        SMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb
        lig_<target ligand id>.smi 
        lig_<target ligand id>.inchi
        lig_<target ligand id>.mol
.
.
.

Definitions of target, candidate, ligand, hiResApo, hiResHolo, LMCSS, & SMCSS can be found here

Example

.
|-- 5hib
|   |-- 5hib.txt
|   |-- hiResApo-5hib_2eb2.pdb
|   |-- hiResHolo-5hib_5hg8-634.pdb
|   |-- LMCSS-5hib_5em6-5Q3-lig.pdb
|   |-- LMCSS-5hib_5em6-5Q3.pdb
|   |-- SMCSS-5hib_4wrg-CSX.pdb
|   |-- lig_63M.inchi
|   |-- lig_63M.mol
|   `-- lig_63M.smi
|-- 5hic
|   |-- 5hic.txt
|   |-- hiResApo-5hic_2eb2.pdb
|   |-- hiResHolo-5hic_5hg8-634.pdb
|   |-- LMCSS-5hic_5can-4ZB-lig.pdb
|   |-- LMCSS-5hic_5can-4ZB.pdb
|   |-- SMCSS-5hic_4wrg-CSX.pdb
|   |-- lig_63N.inchi
|   |-- lig_63N.mol
|   `-- lig_63N.smi
|-- 5j1z
|   |-- 5j1z.txt
|   |-- hiResApo-5j1z_3t13.pdb
|   |-- hiResHolo-5j1z_4eqp-THP.pdb
|   |-- LMCSS-5j1z_3dhq-THP-lig.pdb
|   |-- LMCSS-5j1z_3dhq-THP.pdb
|   |-- SMCSS-5j1z_4eqp-THP.pdb
|   |-- lig_THP.inchi
|   |-- lig_THP.mol
|   `-- lig_THP.smi
`-- readme.txt

Structure of <target id>.txt file

query, <target id>
ph, <ph value>
ligand, <target ligand id>
inchi, <inchi string of target ligand>
LMCSS, <candidate id>, <candidate ligand id>
SMCSS, <candidate id>, <candidate ligand id>
hiResHolo, <candidate id>, <candidate ligand id>
hiResApo, <candidate id>

Example

query, 5fz3
ph, 7.5
ligand, 7SI
inchi, InChI=1S/C11H12O2/c12-8-3-4-9(13)11-7-2-1-6(5-7)10(8)11/h3-4,6-7,12-13H,1-2,5H2/t6-,7+
LMCSS, 5fz3, 7SI
SMCSS, 5a1f, OGA
hiResHolo, 5fyz, NYK
hiResApo, 5a3p

Example readme.txt

CELPP Weekly Pose Prediction Challenge
======================================

celpprunner version: 0.11.3
Week: 14
Year: 2016

This tar file contains the CELPP weekly pose prediction challenge 
dataset. 

Within this readme.txt is a description of the data in this tar file as well as 
a summary of the Blastnfilter run which generated these Candidates. 

Tsv files downloaded from
=========================

http://www.wwpdb.org/files/new_release_structure_sequence.tsv
http://www.wwpdb.org/files/new_release_structure_nonpolymer.tsv
http://www.wwpdb.org/files/new_release_crystallization_pH.tsv

Structure of data overview
==========================

This tar file contains a set of directories set to the name of Targets.  
Targets are proteins which have primary sequence released, 
but not 3D coordinates.  

Within each directory are a set of Candidates.  Candiates are proteins
with similar structure to the Target that also have known 3D coordinates
which can be used for pose prediction.

For more information visit:

https://github.com/drugdata/D3R

or

https://drugdesigndata.org/about/celpp


Structure of data
=================

Below is a definition of the files and directories within this tar file:

[file or directory <text within denote values that change>]

  -- Definition
 

[readme.txt]

  -- Description of data and output from celpp blastnfilter stage of 
     processing.

[new_release_crystallization_pH.tsv]
[new_release_structure_nonpolymer.tsv]
[new_release_structure_sequence.tsv]

  -- Tsv files downloaded from: http://www.wwpdb.org/files

[<target id>]/
        [<target id>.txt]

           --  Summary of Blastnfilter results for target protein 
               with PDBID.

        [LMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb]

           -- Candidate protein for docking which: 
                1) Passes the Blastnfilter criteria 

                2) Contains the Ligand with the largest maximum common
                   substructure (MCSS) to the Target Ligand. 

                   Note:  If multiple proteins founded, the protein 
                          with the highest resolution will be picked.

        [SMCSS-<target id>_<candidate id>-<candidate ligand id>.pdb]

           -- Candidate protein for docking which:
                1) Passes the Blastnfilter criteria.
                
                2) Contains the Ligand with the smallest maximum common
                   substructure (MCSS) to the Target Ligand. 

                   Note:  If multiple proteins founded, the protein 
                          with the highest resolution will be picked.

        [hiResHolo-<target id>_<candidate id>-<candidate ligand id>.pdb]
 
          -- Candidate protein for docking which:    
                1) Passes the Blastnfilter criteria.
                
                2) Has the highest resolution among all holo proteins.

        [hiResApo-<target id>_<candidate id>-<candidate ligand id>.pdb]

          -- Candidate protein for docking which:
                1) Passes the Blastnfilter criteria.

                2) Has the highest resolution among all apo proteins.

        [LMCSS-<target id>_<candidate id>-<candidate ligand id>-lig.pdb]
           
          -- Contains the 3D coordinate of the atoms for the ligand in the MaxMCSS candidate (largest) protein. 

        [lig_<candidate ligand id>.smi] 

          -- Canonical smile string of the Target Ligand which will be
             used in later docking.

        [lig_<candidate ligand id>.inchi]

          -- Inchi string of the Target Ligand.

        [lig_<candidate ligand id>.mol]

          -- 2D structure of the Target Ligand.


Blastnfilter summary
====================
 
INPUT SUMMARY
  entries:                             214
  complexes:                           160
  dockable complexes:                   86
  monomers:                            142
  dockable monomers:                    62
  multimers:                            72
  dockable multimers:                   24

FILTERING CRITERIA
  No. of query sequences           <=    1
  No. of dockable ligands           =    1
  Percent identity                 >=    0.95
  Percent Coverage                 >=    0.9
  No. of hit sequences             <=    4
  Structure determination method:        x-ray diffraction

OUTPUT SUMMARY
  Targets found:                        48
  Target: 5hw0|Sequences: 1|Hits: 46|Candidates: 8|Elected:5|PDBids: 1o7j,1jsr,1jsl,1hg1,1hfw
  Target: 4xf1|Sequences: 1|Hits: 44|Candidates: 40|Elected:4|PDBids: 4xff,4xez,4jto,4ieu
  Target: 5f52|Sequences: 1|Hits: 46|Candidates: 8|Elected:4|PDBids: 1o7j,1jsr,1jsl,1hg1
  Target: 5i67|Sequences: 1|Hits: 50|Candidates: 9|Elected:4|PDBids: 4r43,4wiu,4wl8,4rcg
  Target: 5hi0|Sequences: 1|Hits: 27|Candidates: 10|Elected:0|PDBids: 
  Target: 5hi2|Sequences: 1|Hits: 294|Candidates: 12|Elected:4|PDBids: 4xv9,4r5y,4jvg,4cqe
  Target: 5edt|Sequences: 1|Hits: 326|Candidates: 28|Elected:3|PDBids: 1n40,4ktl,4ktk
  Target: 4xf3|Sequences: 1|Hits: 44|Candidates: 40|Elected:4|PDBids: 4xff,4xez,4jto,4ieu
  Target: 5f8y|Sequences: 1|Hits: 3|Candidates: 1|Elected:1|PDBids: 5duy
  Target: 5fzl|Sequences: 1|Hits: 98|Candidates: 21|Elected:5|PDBids: 5fz6,5fyz,5a3p,5fpu,5fzb
  Target: 5fzk|Sequences: 1|Hits: 98|Candidates: 21|Elected:4|PDBids: 5fyz,5fup,5a3p,5fpu
  Target: 5ibd|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktk
  Target: 5ibe|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktk
  Target: 5ibf|Sequences: 1|Hits: 326|Candidates: 28|Elected:3|PDBids: 1n40,4ktl,4ktk
  Target: 5ibg|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktk
  Target: 4zvl|Sequences: 1|Hits: 99|Candidates: 60|Elected:5|PDBids: 4fgl,4qod,4zvn,4zvk,4u7h
  Target: 5ibh|Sequences: 1|Hits: 326|Candidates: 28|Elected:6|PDBids: 1n40,4ktl,4ktk,4ktj,4ktf,4ipw
  Target: 5ibi|Sequences: 1|Hits: 326|Candidates: 28|Elected:3|PDBids: 1n40,4ktl,4ktf
  Target: 5ibj|Sequences: 1|Hits: 326|Candidates: 28|Elected:2|PDBids: 1n40,4ktl
  Target: 5j22|Sequences: 1|Hits: 287|Candidates: 151|Elected:9|PDBids: 3t13,3dhq,3erq,4hmj,4eqp,3d4d,5ekl,5ekk,1tr5
  Target: 5j1z|Sequences: 1|Hits: 287|Candidates: 151|Elected:9|PDBids: 3dhq,3t13,3erq,3d4d,4eqp,4hmj,5ekl,5ekk,1tr5
  Target: 5bmg|Sequences: 1|Hits: 90|Candidates: 7|Elected:0|PDBids: 
  Target: 5hmi|Sequences: 1|Hits: 94|Candidates: 31|Elected:6|PDBids: 3g03,3tu1,3lbl,4oba,1t4e,4oq3
  Target: 5hib|Sequences: 1|Hits: 347|Candidates: 99|Elected:5|PDBids: 5em6,4rj6,5hg8,4wrg,2eb2
  Target: 5hic|Sequences: 1|Hits: 347|Candidates: 99|Elected:6|PDBids: 5can,5c8m,5c8k,5hg8,4wrg,2eb2
  Target: 5hid|Sequences: 1|Hits: 294|Candidates: 12|Elected:2|PDBids: 4xv9,3og7
  Target: 5hie|Sequences: 1|Hits: 302|Candidates: 33|Elected:4|PDBids: 4e26,3skc,5ct7,1uwh
  Target: 5fz0|Sequences: 1|Hits: 98|Candidates: 21|Elected:4|PDBids: 5fyz,5fyt,5a3p,5a1f
  Target: 1fcz|Sequences: 1|Hits: 271|Candidates: 9|Elected:3|PDBids: 1fcy,2lbd,1fcz
  Target: 5hzn|Sequences: 1|Hits: 311|Candidates: 13|Elected:4|PDBids: 3lw0,1p4o,3nw7,3lvp
  Target: 5bmh|Sequences: 1|Hits: 90|Candidates: 7|Elected:0|PDBids: 
  Target: 5bmi|Sequences: 1|Hits: 90|Candidates: 7|Elected:0|PDBids: 
  Target: 4zz4|Sequences: 1|Hits: 347|Candidates: 214|Elected:3|PDBids: 2w5i,1o0o,1o0h
  Target: 4z51|Sequences: 1|Hits: 11|Candidates: 1|Elected:1|PDBids: 3sop
  Target: 5d75|Sequences: 1|Hits: 166|Candidates: 2|Elected:2|PDBids: 1pbk,3kz7
  Target: 4z54|Sequences: 1|Hits: 11|Candidates: 1|Elected:1|PDBids: 3sop
  Target: 5ftq|Sequences: 1|Hits: 312|Candidates: 37|Elected:5|PDBids: 2xba,4z55,4anl,4tt7,4fnx
  Target: 5fkm|Sequences: 1|Hits: 55|Candidates: 23|Elected:5|PDBids: 2xpw,2xpu,2x6o,1bjz,4d7n
  Target: 5fkn|Sequences: 1|Hits: 55|Candidates: 23|Elected:5|PDBids: 2xpw,2xpu,2x6o,1bjz,4d7n
  Target: 5fko|Sequences: 1|Hits: 55|Candidates: 23|Elected:5|PDBids: 2xpw,2xpv,2x6o,1bjz,4d7n
  Target: 5fkk|Sequences: 1|Hits: 55|Candidates: 23|Elected:6|PDBids: 4abz,2xpw,2xpv,2x6o,1bjz,4d7n
  Target: 5fkl|Sequences: 1|Hits: 56|Candidates: 23|Elected:4|PDBids: 2xpw,2x6o,1bjz,4d7n
  Target: 4rfr|Sequences: 1|Hits: 33|Candidates: 8|Elected:2|PDBids: 3i3q,4jht
  Target: 5hmh|Sequences: 1|Hits: 94|Candidates: 30|Elected:6|PDBids: 3tu1,3g03,3lbl,4oba,1t4e,4oq3
  Target: 5icr|Sequences: 1|Hits: 53|Candidates: 2|Elected:2|PDBids: 5ey8,5d6j
  Target: 5fto|Sequences: 1|Hits: 312|Candidates: 37|Elected:12|PDBids: 2xba,4cnh,4cmo,4ccu,4ccb,4z55,3lct,3lcs,4fob,4anl,4tt7,4fnx
  Target: 5hmk|Sequences: 1|Hits: 94|Candidates: 7|Elected:3|PDBids: 4hbm,3g03,3tu1
  Target: 5frw|Sequences: 1|Hits: 5|Candidates: 5|Elected:3|PDBids: 5fs0,5frv,5fru

Example file

https://ucsd-cddi.box.com/celppexampleone

Clone this wiki locally