A resume parser used for extracting information from resumes
Built with ❤︎ and ☕ by Kumar Rajwani and Brian Njoroge
- Extract name
- Extract email
- Extract mobile numbers
- Extract skills
- Extract total experience
- Extract college name
- Extract degree
- Extract designation
- Extract company names
- You can install this package using
pip install resume-parser
- Dependency of spacy
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
pip install importlib-metadata==3.2.0
- For NLP operations we use spacy and nltk. Install them using below commands:
# spaCy
python -m spacy download en_core_web_sm
# nltk
python -m nltk.downloader stopwords
python -m nltk.downloader punkt
python -m nltk.downloader averaged_perceptron_tagger
python -m nltk.downloader universal_tagset
python -m nltk.downloader wordnet
python -m nltk.downloader brown
python -m nltk.downloader maxent_ne_chunker
- PDF and DOCx and TXT files are supported on all Operating Systems
- Import it in your Python project
from resume_parser import resumeparse
data = resumeparse.read_file('/path/to/resume/file')
For first time it will take around a minute so please keep patience.
The module would return a dictionary with result as follows:
{'degree': ['BSc','MSc'],
'designition': [
'content writer',
'data scientist',
'systems administrator',
],
'email': '[email protected]',
'name': 'Brian Njoroge',
'phone': '+918511593595',
'skills': [
'Python',
' C++',
'Power BI',
'Tensorflow',
'Keras',
'Pytorch',
'Scikit-Learn',
'Pandas',
'NLTK',
'OpenCv',
'Numpy',
'Matplotlib',
'Seaborn',
'Django',
'Linux',
'Docker'],
'total_exp': 3,
'university': ['gujarat university', 'wuhan university', 'egerton university']}
You can use the following notebook to train the spacy model on your custom data. Following notebook is training the spacy model to identify the Degree in the given text. https://colab.research.google.com/drive/1aSn5tMWU2Lbo4eEPi0GvkBC_003mXxqi?usp=sharing