PPR corpus for biomedical relationships between plants and phenotypes

Abstract

Medicinal plants have demonstrated therapeutic potentials for a wide range of observable characteristics in a human body, called ``phenotype'', in clinical treatment over the past thousands of years. As the interest in plants has increased, many researchers have been trying to extract meaningful information by identifying relationships between plants and phenotypes from the accumulated literature. While the natural language processing (NLP) technique aims to extract useful information from unstructured text data, there was no appropriate corpus to train and evaluate the NLP model for plants and phenotypes. Therefore, we present the Plant-Phenotype Relationship (PPR) corpus, a high-quality resource to support the development of various NLP fields, which consists of 600 PubMed abstracts corresponding to 5668 plant and 11,282 phenotype entities, and a total of 9709 relationships. We also describe benchmark results through named entity recognition and relation extraction systems to verify our data quality and show the significant performance of NLP tasks in the PPR test set.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
codes		codes
corpus		corpus
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPR corpus for biomedical relationships between plants and phenotypes

Abstract

Architecture

Statistics

Example

Contact

About

Releases

Packages

Languages

DMCB-GIST/PPRcorpus

Folders and files

Latest commit

History

Repository files navigation

PPR corpus for biomedical relationships between plants and phenotypes

Abstract

Architecture

Statistics

Example

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages