Gene-drug relation-type extraction in full biomedical texts

This is the source code used in:

L.A. Bugnon, C. Yones, J. Bertinetti, D. Ramírez, D.H. Milone, G. Stegmayer, Gene-drug relation-type extraction in full-text biomedical publications, 2023 (under review)

We propose a framework for extracting gene-drug relationship type from full biomedical texts. Differently from many approaches that are designed for in-sentence classification, our approach is based on the premise that entities interactions may appear far away in the text. Using only the raw text and the identification of biomedical entities of interest as inputs, we propose a combination of word-embeddings and a convolutional neural network to cope with text length.

This repository contains the scripts and dataset to reproduce the paper results. A Python>=3.9 is recomended. Install the required packages with:

pip install -r requirements.txt

The dataset with texts and labels is in DGIdb_sinc/

Word embedding preparation

Word2Vec, FastText and GloVe have low computational cost, thus the embeddings are computed in the training script.

In the case of Flair, embeddings need to be precomputed with

python embed_flair.py conf_flair.json path.json

A similar procedure is required for BioBERT:

python embed_biobert.py conf_biobert.json path.json

Hiperparameter optimization

The complete hiperparameter evaluation of the network for each embedding model is done with

python hp_exploration.py conf_<embedding_name>.json paths.json

This could take several hours.

Run cross-validation

To run a complete cross-validation scheme, use the configuration files as the following

python cross_validation.py conf_{embedding_name}.json paths.json

A summary of the cross validation results can be viewed using the notebook "summary.ipynb"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DGIdb_sinc		DGIdb_sinc
results		results
LICENSE		LICENSE
Predictions using Word2vec.pdf		Predictions using Word2vec.pdf
README.md		README.md
abstract.svg		abstract.svg
base_run.py		base_run.py
classifier_model.py		classifier_model.py
conf_biobert.json		conf_biobert.json
conf_fasttext.json		conf_fasttext.json
conf_flair.json		conf_flair.json
conf_glove.json		conf_glove.json
conf_word2vec.json		conf_word2vec.json
cross_validation.py		cross_validation.py
dataset.py		dataset.py
embed_biobert.py		embed_biobert.py
embed_flair.py		embed_flair.py
hp_exploration.py		hp_exploration.py
paths.json		paths.json
requirements.txt		requirements.txt
summary.ipynb		summary.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gene-drug relation-type extraction in full biomedical texts

Word embedding preparation

Hiperparameter optimization

Run cross-validation

About

Releases

Packages

Languages

License

sinc-lab/gene-drug-relation-extraction

Folders and files

Latest commit

History

Repository files navigation

Gene-drug relation-type extraction in full biomedical texts

Word embedding preparation

Hiperparameter optimization

Run cross-validation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages