Latest Release | |
Package Status | |
License | |
Usage Stats |
inspectomop is a lightweight python 3 package that assists in the extraction of electronic health record(EHR) data from relational databases following the OHDSI OMOP Common Data Model(CDM) standard v>=5.
- OHDSI: Observational Health Data Sciences and Informatics
- OMOP: Observation Medical Outcomes Partnership
A large portion of data science research is spent on ETL (Extraction, Transformation, and Loading). If the data are stored in a relational database, the first step includes deciphering the database schema and figuring out how to write SQL queries that will properly gather the information of interest. This can be both laborious and time consuming. inspectomop attempts to simplify extracting data from the OMOP CDM with an API that is easy to use, extensible, and SQL dialect agnostic.
One of the main benefits of adopting a CDM such as OMOP is that it promotes the sharing of ideas and methodology. Queries in inspectomop are simple python functions so using sqlAlchemy any user can create custom queries that can be shared across institutions and database management systems.
def my_query(inputs, inspector):
# create SQL agnostic query usually of the form
statement = select([columns]).where(inputs == criteria)
return inspector.execute(statement)
inspectomop is for any python 3 programmer with an interest in interfacing with an EHR relational database formatted to follow the OMOP CDM standard.
The OHDSI group has developed an excellent library of tools and methods written in R, but there are few, if any tools, for the python community.
- SQL dialect agnostic thanks to SQLAlchemy allowing for a variety of compatible database back ends
- automatic reflection of DB tables to dot accessible python objects for easy traversal and inspection
- preloaded with standard queries from the OHDSI group
- results returnable as pandas dataframes or dataframe chunks for queries with a large number of rows
- extensibility with custom queries built from simple python functions
Below is a table comparing SQL dialect support for inspectomop versus the R SQLRender package written and maintained by the OHDSI group.
dialect | inspectomop (python) | SQLRender (R) |
---|---|---|
BigQuery | No * | Yes |
Impala | Yes * | Yes |
Netezza | No * | Yes |
Oracle | Yes | Yes |
PostgreSQL | Yes | Yes |
Redshift | Yes * | Yes |
SQL Server | Yes | Yes |
SQLite | Yes | Unknown |
Note: Compatibility is primarily based on the availability of dialects written for SQLAlchemy. Most have not bee explicitly tested by the author with the exception of SQLite v2.6.0 and SQL Server 2016 Service Pack 1 (13.0.4001.0). However, success stories and troubleshooting questions are welcome!
* BigQuery : python DB-API, but no sqlalchemy dialect as of 8/17/2018 (googleapis/google-cloud-python#3603)
* Impala : external dialect available via impyla package
* Redshift : external dialect available via sqlalchemy-redshift package
- install from PyPI using pip with
pip install inspectomop
* Developed using SQLAlchemy 1.2.1 and Pandas 0.22.0
Read the official documentation hosted on readthedocs for more information on usage and examples.
Feel free to fork, copy, share and contribute. This software released under GNU Affero GPL v3.0