-
Notifications
You must be signed in to change notification settings - Fork 53
Tutorial
CellBase comes with a command line interface (CLI) is written in Java, you will need at least Java 7 for running the CellBase CLI. After the installation you should have a cellbase/cellbase-build/installation-dir/ directory with the following structure:
/tmp/cellbase/cellbase-build/installation-dir/
├── bin
│ ├── cosmic
│ ├── ensembl-scripts
│ ├── genome-fetcher
│ ├── obsolete
│ └── protein
├── example
├── libs
└── mongodb-scripts
Tu run the CLI you must execute:
cd /tmp/cellbase/cellbase-build/installation-dir
java -jar libs/cellbase-build-3.1.0.jar --help
A Python script was implemented which allows to download all data that may populate the CellBase database. This script is located at:
/tmp/cellbase/cellbase-build/installation-dir/bin/genome-fetcher/genome-fetcher.py
The script may be run by moving into /tmp/cellbase/cellbase-build/installation-dir/bin/genome-fetcher/ and launching it:
cd /tmp/cellbase/cellbase-build/installation-dir/bin/genome-fetcher/
./genome-fetcher.py --help
For example, in order to download data sources for the Human Gene, Genome sequence and Variation collections, execute:
./genome-fetcher.py -s "Homo sapiens" --sequence 1 --gene 1
--variation 1 -o /tmp`
This will download the data files into /tmp/homo_sapiens/ folder, with the following directory structure:
/tmp/homo_sapiens/
├── gene
│ ├── description.txt
│ ├── homo_sapiens.gtf.gz
│ ├── homo_sapiens.gtf.gz.log
│ └── xrefs.txt
├── sequence
│ ├── genome_info.json
│ ├── Homo_sapiens.GRCh38.fa.gz
│ └── Homo_sapiens.GRCh38.fa.gz.log
└── variation
├── attrib.txt.gz
├── attrib.txt.gz.log
├── attrib_type.txt.gz
├── attrib_type.txt.gz.log
├── motif_feature_variation.txt.gz
├── motif_feature_variation.txt.gz.log
├── phenotype_feature_attrib.txt.gz
├── phenotype_feature_attrib.txt.gz.log
├── phenotype_feature.txt.gz
├── phenotype_feature.txt.gz.log
├── phenotype.txt.gz
├── phenotype.txt.gz.log
├── seq_region.txt.gz
├── seq_region.txt.gz.log
├── source.txt.gz
├── source.txt.gz.log
├── structural_variation_feature.txt.gz
├── structural_variation_feature.txt.gz.log
├── study.txt.gz
├── study.txt.gz.log
├── transcript_variation.txt.gz
├── transcript_variation.txt.gz.log
├── variation_feature.txt.gz
├── variation_feature.txt.gz.log
├── variation_synonym.txt.gz
├── variation_synonym.txt.gz.log
├── variation.txt.gz
└── variation.txt.gz.log
Once we have downloaded the data we can build the Data Models for MongoDB by running the CLI. For example, for building genome sequence collection execute:
cd /tmp/cellbase/cellbase-build/installation-dir/
java -jar libs/cellbase-build-3.1.0.jar --build genome-sequence
--fasta-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh38.fa.gz -o /tmp/
For building gene collection:
java -jar libs/cellbase-build-3.1.0.jar --build gene
--indir /tmp/homo_sapiens/gene
--fasta-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh38.fa.gz -o /tmp/
For building variation collections:
java -jar libs/cellbase-build-3.1.0.jar --build variation
--indir /tmp/homo_sapiens/variation -o /tmp/
JSON files will be created at /tmp after each of these command lines, e.g.:
/tmp/genome_sequence.json
MongoDB 2.6 is at least required for loading the JSON files created in the previous step. Mongo databases and collections can be easily loaded by using the mongoimport command, e.g.:
mongoimport --file /tmp/genome_sequence.json -d hsapiens_cb_v3
-c genome_sequence`
The installation of CellBase web services requires Tomcat 7 to be ready in the server machine. After building the CellBase code, the cellbase.war file should be located at:
/tmp/cellbase/cellbase-server/target/cellbase.war
Copy cellbase.war into the Tomcat 7 webapps directory, e.g.:
cp /tmp/cellbase/cellbase-server/target/cellbase.war /var/lib/tomcat7/webapps/
The general structure of a CellBase web service call is:
servername/cellbase/webservices/rest/{version}/{species}/{category}/{subcategory}/id/{resource}?{filters}
Detailed documentation on CellBase web services can be found at:
http://wiki.opencb.org/projects/cloud/doku.php?id=cellbase:user-manual
For example:
- Get information of Human genes located in Chromosome 3 between coordinates 55 and 100000; execute
curl http://www.ebi.ac.uk/cellbase/webservices/rest/v3/hsapiens/genomic/region/3:55-100000/gene
- Get data for Mus Musculus BRCA2:
curl http://www.ebi.ac.uk/cellbase/webservices/rest/v3/mmusculus/feature/gene/BRCA2/info
- Get all Human variants associated with beta-Thalassemia:
curl http://www.ebi.ac.uk/cellbase/webservices/rest/v3/hsapiens/genomic/variant/beta_Thalassemia/phenotype