This repository is a collection of algorithms for multi-class classification to short texts using Python. Modules are backward compatible unless otherwise specified. Feel free to give suggestions.
To install it, in a console, use pip
.
>>> pip install -U shorttext
or, if you want the most updated code that is not released on PyPI yet, type
>>> pip install -U git+https://github.com/stephenhky/PyShortTextCategorization@master
Developers are advised to make sure Keras
>=2 be installed. Users are advised to install the backend Tensorflow
(preferred) or Theano
in advance.
Before using, check the language model of spaCy has been installed or updated, by running:
>>> spacy download en
See tutorial for how to use the package.
To report any issues, go to the Issues tab of the Github page and start a thread. It is welcome for developers to submit pull requests on their own to fix any errors.
- Documentation: http://shorttext.readthedocs.io
- Github: https://github.com/stephenhky/PyShortTextCategorization
- PyPI: https://pypi.org/project/shorttext/
- "Short Text Mining using Advanced Keras Layers and Maxent: shorttext 0.4.1," WordPress
- "Python Package for Short Text Mining", WordPress
- "Document-Term Matrix: Text Mining in R and Python," WordPress
- An earlier version of this repository is a demonstration of the following blog post: Short Text Categorization using Deep Neural Networks and Word-Embedding Models
- 02/27/2018:
shorttext
0.6.0 released. - 01/19/2018:
shorttext
0.5.11 released. - 01/15/2018:
shorttext
0.5.10 released. - 12/14/2017:
shorttext
0.5.9 released. - 11/08/2017:
shorttext
0.5.8 released. - 10/27/2017:
shorttext
0.5.7 released. - 10/17/2017:
shorttext
0.5.6 released. - 09/28/2017:
shorttext
0.5.5 released. - 09/08/2017:
shorttext
0.5.4 released. - 09/02/2017: end of GSoC project. (Report)
- 08/22/2017:
shorttext
0.5.1 released. - 07/28/2017:
shorttext
0.4.1 released. - 07/26/2017:
shorttext
0.4.0 released. - 06/16/2017:
shorttext
0.3.8 released. - 06/12/2017:
shorttext
0.3.7 released. - 06/02/2017:
shorttext
0.3.6 released. - 05/30/2017: GSoC project (Chinmaya Pancholi, with gensim)
- 05/16/2017:
shorttext
0.3.5 released. - 04/27/2017:
shorttext
0.3.4 released. - 04/19/2017:
shorttext
0.3.3 released. - 03/28/2017:
shorttext
0.3.2 released. - 03/14/2017:
shorttext
0.3.1 released. - 02/23/2017:
shorttext
0.2.1 released. - 12/21/2016:
shorttext
0.2.0 released. - 11/25/2016:
shorttext
0.1.2 released. - 11/21/2016:
shorttext
0.1.1 released.
- Spelling corrections and fuzzy logic;
- Gradually replacing
keras
with directTensorFlow
or thekeras
package withinTensorFlow
; - Jupyter notebooks as tutorials;
- Python 3 compatibility;
- More neural networks;
- More available corpus.