Skip to content

kbrajwani/resume_parser

Repository files navigation

Resume parser

A resume parser used for extracting information from resumes

Built with ❤︎ and ☕ by Kumar Rajwani and Brian Njoroge


Features

  • Extract name
  • Extract email
  • Extract mobile numbers
  • Extract skills
  • Extract total experience
  • Extract college name
  • Extract degree
  • Extract designation
  • Extract company names

Installation

  • You can install this package using
pip install resume-parser
  • Dependency of spacy
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
pip install importlib-metadata==3.2.0
  • For NLP operations we use spacy and nltk. Install them using below commands:
# spaCy
python -m spacy download en_core_web_sm

# nltk
python -m nltk.downloader stopwords
python -m nltk.downloader punkt
python -m nltk.downloader averaged_perceptron_tagger
python -m nltk.downloader universal_tagset
python -m nltk.downloader wordnet
python -m nltk.downloader brown
python -m nltk.downloader maxent_ne_chunker

Supported File Formats

  • PDF and DOCx and TXT files are supported on all Operating Systems

Usage

  • Import it in your Python project
from resume_parser import resumeparse

data = resumeparse.read_file('/path/to/resume/file')

For first time it will take around a minute so please keep patience.

Result

The module would return a dictionary with result as follows:

{'degree': ['BSc','MSc'],
     'designition': [
         'content writer',
         'data scientist',
         'systems administrator',
     ],
     'email': '[email protected]',
     'name': 'Brian Njoroge',
     'phone': '+918511593595',
     'skills': [
         'Python',
         ' C++',
         'Power BI',
         'Tensorflow',
         'Keras',
         'Pytorch',
         'Scikit-Learn',
         'Pandas',
         'NLTK',
         'OpenCv',
         'Numpy',
         'Matplotlib',
         'Seaborn',
         'Django',
         'Linux',
         'Docker'],
     'total_exp': 3,
     'university': ['gujarat university', 'wuhan university', 'egerton university']}

alt_text

Custom training

You can use the following notebook to train the spacy model on your custom data. Following notebook is training the spacy model to identify the Degree in the given text. https://colab.research.google.com/drive/1aSn5tMWU2Lbo4eEPi0GvkBC_003mXxqi?usp=sharing