This repository is home to WebSTR-API - REST-full API for and backend for WebSTR - portal of Human genome-wide variation in Short Tandem Repeats (STRs). Our goal is to make large STR genotype datasets used by the broader genomics community by facilitating open access to this data.
WebSTR is the result of collaboration between two scientific groups Maria Anisimova’s Lab and Melissa Gymrek’s Lab.
Source code for the WebSTR web portal can be found here: https://github.com/gymrek-lab/webstr
All the available endpoints are described in automatically generated documentation that includes Python code examples and can be accessed here - http://webstr-api.ucsd.edu/docs
For some example queries to get you started, check out our Getting Started Guide
Yes, for that please use provided Docker file, WebSTR-API can be deployed on any container-based service.
Yes! It is possible and we encourage it if you would like to add your own data to WebSTR or perform any advanced analysis on it.
Install and configure PostgreSQL on your machine and create an empty database called strdb. We provide an sql_dump backup of the current version of the database on request. Restore the database from this backup.
Alternatively you can use docker compose
with provided docker-compose.yml file to set up the PostgreSQL database.
- rename
.env.example
to.env
- copy backup to folder ./db/docker_data/pgdata/webstr_backup.dump
docker compose up -d db
docker exec -it webstr-api-db-1 pg_restore -d strdb /var/lib/postgresql/data/pgdata/webstr_backup.dump
a) Set up python3 and virtualenv on your machine: For Mac, follow instructions here. You can also use conda, in this case follow this instructions to create conda env, it is preffered for newer M1/2 Macs and for infrustructures that already use conda. Activate your environment.
b) Create new virtual env and install all the requirements with the following command:
pip install -r requirements.txt
Step 2: (only for non-docker way) Set environmental variable DATABASE_URL on your machine (or your IDE) to
export DATABASE_URL="postgres://postgres:YOURPASSWORD@localhost:5432/strdb"
Note that this is using the default user postgres, if you created your db on a different user, adjust this variable accordingly.
Optional: add this line to ~/.bashrc
and restart your terminal.
Run the following command from the root folder of this repo:
uvicorn strAPI.main:app --host=0.0.0.0 --port=${PORT:-5000} --reload
See docker part of Step 1 from the previous instructions
Change database url in .env file to DATABASE_URL=sqlite:///db/debug.sqlite Set WEBSTR_DATABASE_DATA_UPGRADE=True and WEBSTR_DEVELOPMENT=True in .env file
Run docker compose up
Now you can access api on localhost:5000 and frontend on localhost:5001
If you want to debug the code and you use VSCode - you can run code in container using vscode and tasks defined in launch.json in .vscode folder. Or you can use devcontaienr plugin to do the same. This is a bit more advanced, so you need to study how debuggin in containers works in vscode a bit before that. And you will have to stop webster-api-api-1 container first if you have it running through docker compose.
We recommend to start from making it work locally on your machine from a ready sql_dump that we provide upon request. Se instructions above. We also provide Python scripts for working with the ORM (abstraction layer on top of the database) to import new data into database. Explore "database_setup" directory for different utilities to import data into the database.
-
If you would like to add a new genome assembly see utility add_genomes. Example usage:
python add_genomes.py -d PATH_TO_DB
Modify the script according to your data.
You will also need to import a GTF file corresponding to this assembly using gtf_to_sql.py
Genes, transcripts and exoms currently available for hg38(GRCh38.p2) assembly have been imported from Encode.
-
To add a new reference panel description and study cohort, use add_panels_and_cohorts.py
-
If you would like to import a new reference panel we recommend making a csv corresponding to the repeats table structure and importing it directly to SQL to save time. Alternatively see
insert_repeats.py
andimport_data_ensembltrs.py
utilities that we made for repeats data coming in different formats. Feel free to contact us for more details if you would like to make your own reference STR panel.
@slmjy added alembic migrations as a proof of concept to the codebase. Following changes were introduced:
- Database migrations in /database_setup/migrations
- alembic.ini and env.py files
- entrypoint.sh can run alembic migratiosn given environment variable WEBSTR_DATABASE_MIGRATE is set to True
- in database.py there is a disabled check if database is on the latest version
If someone wants to start using alembic migrations, they can enable the version check and start generating new migrations using alembic and using them in production.