Skip to content

PPR corpus for biomedical relationships between plants and phenotypes

Notifications You must be signed in to change notification settings

DMCB-GIST/PPRcorpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

PPR corpus for biomedical relationships between plants and phenotypes

Abstract

Medicinal plants have demonstrated therapeutic potentials for a wide range of observable characteristics in a human body, called ``phenotype'', in clinical treatment over the past thousands of years. As the interest in plants has increased, many researchers have been trying to extract meaningful information by identifying relationships between plants and phenotypes from the accumulated literature. While the natural language processing (NLP) technique aims to extract useful information from unstructured text data, there was no appropriate corpus to train and evaluate the NLP model for plants and phenotypes. Therefore, we present the Plant-Phenotype Relationship (PPR) corpus, a high-quality resource to support the development of various NLP fields, which consists of 600 PubMed abstracts corresponding to 5668 plant and 11,282 phenotype entities, and a total of 9709 relationships. We also describe benchmark results through named entity recognition and relation extraction systems to verify our data quality and show the significant performance of NLP tasks in the PPR test set.

Architecture

fig1_pipeline

Statistics

Template_for_Data_Descriptor_submissions_to_Scientific_Data 11

Example

fig2_example

Contact

Corresponding author: Hyunju Lee ([email protected])

About

PPR corpus for biomedical relationships between plants and phenotypes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published