Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 787 Bytes

README.md

File metadata and controls

32 lines (20 loc) · 787 Bytes

Eukaryotic protein sequences classification

The aim of this project is to develop a method in order classify eukaryotic protein sequences into the 4 categories: Cytosolic, Secreted, Nuclear and Mitochondrial.

Getting Started

Prerequisites

numpy==1.14.1
pandas==0.22.0
Keras==2.0.8
matplotlib==2.2.0
scikit_learn==0.19.1
biopython

Running

Demo.py is the main demo file. This file shall save the confusion matrix for Uniform Model, Logistic Regression Model, and Random Forest Model. A 'result.txt' is generated to demonstrate the output prediction on the blind test set.

Author

Acknowledgments

  • BioPython