Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 3.26 KB

README.md

File metadata and controls

51 lines (35 loc) · 3.26 KB

EmEL-V

EmEL-V is a geometric approach to generate embeddings for the description logic EL++ The implementation is done using Python and Pytorch Library.

The code is organized as follows:

  • Experiments: This contains separate folder for each ontology the experiment is carried out upon.
  • Experiments folder contains models, data and results folder(create an empty results folder and the others required to store the model)
  • models folder contains code which takes in the dataset names
  • The corresponding dataset folders must be present in the data folder
  • Corresponding results folder stores the trained model parameters(make sure to change the path for out_file in the code)
  • The implementation of evaluation metrics - Evaluating_HITS.py

-experiments/data/{dataset_name} : This folder consists of 4 processed files namely, normalized form of the ontology file to be used for training, and training,validation & testing set obtained from subclass relations in ontology.

Implementation of the code is organised in Three Parts for classification task:

  • First: Given an ontology OWL file we normalize it with Normalizer.groovy script using jcel jar. Normalizer file could be found here

    Command to Normalize: groovy -cp jcel.jar Normalizer.groovy -i -o

  • Second: Using the normalized-ontology we identify the subclass relations and generate training, testing and validation set using split of 70%-20%-10%.

  • Third: Performing training using the normalized-ontology file while removing the 30%(validation and testing) subclass relation axioms from it. Using validation data for hyper-parameter tuning and testing to evaluate the fine-tuned models.

Associated model files:

  • Experiments/models/EMEL_trans_m.py : This file denotes the EmEL model implementation with translation operation and variance.
  • Experiments/models/EMEL_trans_bayes.py : This file denotes the EmEL model implementation with translation operation and bayesian inference.
  • Experiments/models/EMEL_sparse.py : This file denotes the EmEL model implementation with relations as matrices.
  • Experiments/models/EMEL_sparse_m.py : This file denotes the EmEL model implementation with relations as matrices and variance.

Executing the code:

  • Before executing the code you need CUDA installed to use a GPU and list of python libraries as provided in requirements.txt.
  • For execution of the code follow the directory structure as it is, further we demonstrate it using an example for GALEN dataset.
  • Go to directory experiments/models/ folder and run python EMEL_trans_m.py --data GO (provide other arguments if needed)
  • This will start the training and if you want to change the dimension size then you need to modify it in the code.
  • This will output corresponding embeddings for classes and relations in pkl files in the results directory.
  • For evaluating the embeddings run python scripts Evaluating_HITS.py and provide the path of the pkl files.

Data

The ontologies used in our evaluation (SNOMED CT, GALEN, GO) are publicly available.

Create Experiments/Data folder which would contain all the data.