Skip to content

Trained PyTorch models for polish language sentiment prediction based on allegro/herbert and CLARIN-PL datasets

License

Notifications You must be signed in to change notification settings

philvec/sentimentPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sentimentPL

PyTorch models for Polish language sentiment regression based on allegro/herbert and CLARIN-PL dataset

PyPI - License PyPI GitHub Repo stars GitHub last commit

Installation

sentimentPL is available on PyPI, so You can just run:

$ pip3 install sentimentpl

Basic Usage

For a given sentence, the model produces output value from (-1;1) range (from most negative to most positive).

from sentimentpl.models import SentimentPLModel

model = SentimentPLModel(from_pretrained='latest')
print(model('Jestem wesoły Romek').item())

Note: The model uses transformers API to load pretrained embedding models from their repository. They should be downloaded and cached on Your machine.

Note: The model loads pretrained state dicts for final regression layers from a file included in the package files (as its size does not exceed 1MB). This will be changed in the future, so the model would be loaded from external repository.

Training

For training You would probably want to download the source code by cloning the repository:

$ git clone https://github.com/philvec/sentimentPL.git

Download training data from
https://clarin-pl.eu/dspace/bitstream/handle/11321/710/dataset_conll.zip
and unzip it to sentimentpl/data.

In the main repository dir, run

$ python3 ./sentimentpl/train.py

Version history

v.0.0.7 latest

fix bugs with incorrect scoring of longer texts: ReLU replaced with LeakyReLU, re-trained to MSE ~0.303

v.0.0.6 latest

model better trained to MSE ~0.307, added HerBERT finetuning option

v.0.0.5

Basic 3-layer MLP with ReLU and input Dropout.

References:

  • Kocoń, Jan; Zaśko-Zielińska, Monika and Miłkowski, Piotr, 2019, PolEmo 2.0 Sentiment Analysis Dataset for CoNLL, CLARIN-PL digital repository, http://hdl.handle.net/11321/710.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi,P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer,P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers:State-of-the-art natural language processing,” inProceedings of the2020 Conference on Empirical Methods in Natural LanguageProcessing: System Demonstrations, (Online), pp. 38–45, Associationfor Computational Linguistics, Oct. 2020.
  • P. Rybak, R. Mroczkowski, J. Tracz, and I. Gawlik, “Klej:Comprehensive benchmark for polish language understanding,” 2020

About

Trained PyTorch models for polish language sentiment prediction based on allegro/herbert and CLARIN-PL datasets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages