Clinical Semantic Textual Similarity (English& Japanese)

The scripts are for calculating semantic similarity in clinical/biomedical domain texts. The model input is sentence pairs annotated with semantic similarity scores between 0 (low semantic similarity) and 5 (high semantic similarity)

Depending on the dataset, the sentence pairs are annotated with discrete semantic similarity scores [0,1,2,3,4,5] or continuous scores [0-5]. For the discrete scores, we use BERT model for sequence classification. For the continuous scores, we use the standard BERT model and add a regression layer on top.

Dataset

The English clinical domain data is not publicly available due to privacy reasons. However, you can use general domain English data from the SemEval STS shared task. This data is saved in the STS_data folder.

Japanese clinical domain STS data can be downloaded freely from this Github repository

BERT Models

Japanese models

General domain BERT: https://github.com/cl-tohoku/bert-japanese

Clinical domain BERT: https://ai-health.m.u-tokyo.ac.jp/uth-bert

English models

Clinical BERT: https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT

SciBERT: https://github.com/allenai/scibert

BioBERT: https://github.com/dmis-lab/biobert

Reference

For more information about this task check the project website and read the paper below.

If you use the Japanese dataset please cite our paper:

@article{mutinda2021semantic,
  title={Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT},
  author={Mutinda, Faith Wavinya and Yada, Shuntaro and Wakamiya, Shoko and Aramaki, Eiji},
  journal={Methods of Information in Medicine},
  year={2021},
  publisher={Georg Thieme Verlag KG}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
STS_data		STS_data
models		models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical Semantic Textual Similarity (English& Japanese)

Dataset

BERT Models

Japanese models

English models

Reference

About

Releases

Packages

Languages

sociocom/Clinical-Semantic-Textual-Similarity

Folders and files

Latest commit

History

Repository files navigation

Clinical Semantic Textual Similarity (English& Japanese)

Dataset

BERT Models

Japanese models

English models

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages