Skip to content

Tutorial

javild edited this page Sep 19, 2014 · 17 revisions


Preliminars

CellBase comes with a command line interface (CLI) is written in Java, you will need at least Java 7 for running the CellBase CLI. After the installation you should have a cellbase/cellbase-build/installation-dir/ directory with the following structure:

/tmp/cellbase/cellbase-build/installation-dir/
├── bin
│   ├── cosmic
│   ├── ensembl-scripts
│   ├── genome-fetcher
│   ├── obsolete
│   └── protein
├── example
├── libs
└── mongodb-scripts

Tu run the CLI you must execute:

cd /tmp/cellbase/cellbase-build/installation-dir 
java -jar libs/cellbase-build-3.1.0.jar --help

Download data sources

A Python script was implemented which allows to download all data that may populate the CellBase database. This script is located at:

/tmp/cellbase/cellbase-build/installation-dir/bin/genome-fetcher/genome-fetcher.py

The script may be run by moving into /tmp/cellbase/cellbase-build/installation-dir/bin/genome-fetcher/ and launching it:

cd /tmp/cellbase/cellbase-build/installation-dir/bin/genome-fetcher/
./genome-fetcher.py --help

For example, in order to download data sources for the Human Gene, Genome sequence and Variation collections, execute:

./genome-fetcher.py -s "Homo sapiens" --sequence 1 --gene 1 
  --variation 1 -o /tmp`

This will download the data files into /tmp/homo_sapiens/ folder, with the following directory structure:

/tmp/homo_sapiens/
├── gene
│   ├── description.txt
│   ├── homo_sapiens.gtf.gz
│   ├── homo_sapiens.gtf.gz.log
│   └── xrefs.txt
├── sequence
│   ├── genome_info.json
│   ├── Homo_sapiens.GRCh38.fa.gz
│   └── Homo_sapiens.GRCh38.fa.gz.log
└── variation
    ├── attrib.txt.gz
    ├── attrib.txt.gz.log
    ├── attrib_type.txt.gz
    ├── attrib_type.txt.gz.log
    ├── motif_feature_variation.txt.gz
    ├── motif_feature_variation.txt.gz.log
    ├── phenotype_feature_attrib.txt.gz
    ├── phenotype_feature_attrib.txt.gz.log
    ├── phenotype_feature.txt.gz
    ├── phenotype_feature.txt.gz.log
    ├── phenotype.txt.gz
    ├── phenotype.txt.gz.log
    ├── seq_region.txt.gz
    ├── seq_region.txt.gz.log
    ├── source.txt.gz
    ├── source.txt.gz.log
    ├── structural_variation_feature.txt.gz
    ├── structural_variation_feature.txt.gz.log
    ├── study.txt.gz
    ├── study.txt.gz.log
    ├── transcript_variation.txt.gz
    ├── transcript_variation.txt.gz.log
    ├── variation_feature.txt.gz
    ├── variation_feature.txt.gz.log
    ├── variation_synonym.txt.gz
    ├── variation_synonym.txt.gz.log
    ├── variation.txt.gz
    └── variation.txt.gz.log

Building CellBase

Once we have downloaded the data we can build the Data Models for MongoDB by running the CLI. For example, for building genome sequence collection execute:

cd /tmp/cellbase/cellbase-build/installation-dir/
java -jar libs/cellbase-build-3.1.0.jar --build genome-sequence 
  --fasta-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh38.fa.gz -o /tmp/

For building gene collection:

java -jar libs/cellbase-build-3.1.0.jar --build gene 
  --indir /tmp/homo_sapiens/gene 
  --fasta-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh38.fa.gz -o /tmp/

For building variation collections:

java -jar libs/cellbase-build-3.1.0.jar --build variation 
  --indir /tmp/homo_sapiens/variation -o /tmp/

JSON files will be created at /tmp after each of these command lines, e.g.:

/tmp/genome_sequence.json

Installing the database

MongoDB 2.6 is at least required for loading the JSON files created in the previous step. Mongo databases and collections can be easily loaded by using the mongoimport command, e.g.:

mongoimport --file /tmp/genome_sequence.json -d hsapiens_cb_v3 
  -c genome_sequence`

Using REST web services

The installation of CellBase web services requires Tomcat 7 to be ready in the server machine. After building the CellBase code, the cellbase.war file should be located at:

/tmp/cellbase/cellbase-server/target/cellbase.war

Copy cellbase.war into the Tomcat 7 webapps directory, e.g.:

cp /tmp/cellbase/cellbase-server/target/cellbase.war /var/lib/tomcat7/webapps/

The general structure of a CellBase web service call is:

servername/cellbase/webservices/rest/{version}/{species}/{category}/{subcategory}/id/{resource}?{filters}

Detailed documentation on CellBase web services can be found at:

http://wiki.opencb.org/projects/cloud/doku.php?id=cellbase:user-manual

For example:

  • Get information of Human genes located in Chromosome 3 between coordinates 55 and 100000; execute

curl http://www.ebi.ac.uk/cellbase/webservices/rest/v3/hsapiens/genomic/region/3:55-100000/gene

  • Get data for Mus Musculus BRCA2:

curl http://www.ebi.ac.uk/cellbase/webservices/rest/v3/mmusculus/feature/gene/BRCA2/info

  • Get all Human variants associated with beta-Thalassemia:

curl http://www.ebi.ac.uk/cellbase/webservices/rest/v3/hsapiens/genomic/variant/beta_Thalassemia/phenotype

  • Get information for all Drosophila Melanogaster chromosomes:

curl http://wwwdev.ebi.ac.uk/cellbase/webservices/rest/v3/dmelanogaster/genomic/chromosome/all