NLPSig (nlpsig
) is a Python package for constructing streams/paths of
embeddings obtained from transformers. The key contributions are:
- A simple API for taking streams of textual data and constructing streams of
embeddings from transformers
- The
nlpsig.SentenceEncoder
andnlpsig.TextEncoder
classes allow you to pass in a corpus of text data (in a variety of formats) and obtain corresponding embeddings using thesentence-transformer
and HuggingFacetransformers
libraries, respectively. - The
nlpsig.PrepareData
allows you to easily construct paths/streams of embeddings which can be used for several downstream tasks.
- The
- Simple API for performing dimensionality reduction with
nlpsig.DimReduce
on the embeddings obtained from transformers by some simple wrappers over popular dimensionality reduction algorithms such as PCA, UMAP, t-SNE, etc.- This is particularly useful if we wish to use path signatures in any downstream model since the dimensionality of the embeddings obtained from transformers is usually very high.
- We present some Signature Network models for longitudinal NLP tasks in the
sig-networks
library which uses these paths constructed in this library as inputs to neural networks which utilise path signature methodology.
- We also have
simple classes
for constructing train/test splits of the data and for K-fold cross-validation
in which are general and are applied to examples in the Signature Networks in
the
sig-networks
library.
NLPSig is used by the
sig-networks
as detailed in our
EACL demo paper
Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling.
NLPSig is available on PyPI and can be installed with pip
:
pip install nlpsig
To take advantage of pre-commit
, which will automatically format your code and
run some basic checks before you commit:
pip install pre-commit # or brew install pre-commit on macOS
pre-commit install # will install a pre-commit hook into the git repo
After doing this, each time you commit, some linters will be applied to format
the codebase. You can also/alternatively run pre-commit run --all-files
to run
the checks.
See CONTRIBUTING.md for more information on running the test
suite using nox
.