Release 0.3.0
Breaking Changes
New Label
class with confidence score (#38)
A tag prediction is not a simple string anymore but a Label
, which holds a value and a confidence score.
To obtain the tag name you need to call tag.value
. To get the score call tag.score
. This can help you build
applications in which you only want to use predictions that lie above a specific confidence threshold.
LockedDropout
moved to the new flair.nn
module (#48)
New Features
Multi-token spans (#54, #97)
Entities are can now be wrapped into multi-token spans (type: Span
). This is helpful for entities that span multiple words, such as "George Washington". A Span
contains the position of the entity in the original text, the tag, a confidence score, and its text. You can get spans from a sentence by using the get_spans()
method, like so:
from flair.data import Sentence
from flair.models import SequenceTagger
# make a sentence
sentence = Sentence('George Washington went to Washington .')
# load and run NER
tagger = SequenceTagger.load('ner')
tagger.predict(sentence)
# get span entities, together with tag and confidence score
for entity in sentence.get_spans('ner'):
print('{} {} {}'.format(entity.text, entity.tag, entity.score))
Predictions with confidence score (#38)
Predicted tags are no longer simple strings, but objects of type Label
that contain a value and a confidence score. These scores are extracted during prediction from the sequence tagger or text classifier and indicate how confident the model is of a prediction. Print confidence scores of tags like this:
from flair.data import Sentence
from flair.models import SequenceTagger
# make a sentence
sentence = Sentence('George Washington went to Washington .')
# load the POS tagger
tagger = SequenceTagger.load('pos')
# run POS over sentence
tagger.predict(sentence)
# print token, predicted POS tag and confidence score
for token in sentence:
print('{} {} {}'.format(token.text, token.get_tag('pos').value, token.get_tag('pos').score))
Visualization routines (#61)
flair
now includes visualizations for plotting training curves and weights when training a sequence tagger or text classifier. We also added visualization routines for plotting embeddings and highlighting tags in a sentence. For instance, to visualize contextual string embeddings, do this:
from flair.data_fetcher import NLPTaskDataFetcher, NLPTask
from flair.embeddings import CharLMEmbeddings
from flair.visual import Visualizer
# get a list of Sentence objects
corpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03).downsample(0.1)
sentences = corpus.train + corpus.test + corpus.dev
# init embeddings (can also be a StackedEmbedding)
embeddings = CharLMEmbeddings('news-forward-fast')
# embed corpus batch-wise
batches = [sentences[x:x + 8] for x in range(0, len(sentences), 8)]
for batch in batches:
embeddings.embed(batch)
# visualize
visualizer = Visualizer()
visualizer.visualize_word_emeddings(embeddings, sentences, 'data/visual/embeddings.html')
Implementation of different dropouts (#48)
Different dropout possibilities (Locked Dropout and Word Dropout) were added and can be used during training.
Memory management for training on large data sets (#137)
flair
now stores contextual string embeddings on disk to speed up training and allow for training on larger datsets.
Pre-trained language models for Polish
Added pre-trained language models for Polish, donated by (Borchmann et al., 2018). Load the Polish embeddings like this:
flm_embeddings = CharLMEmbeddings('polish-forward')
blm_embeddings = CharLMEmbeddings('polish-backward')
Bug Fixes
Fix evaluation of sequence tagger (#79, #75)
The script eval.pl
for sequence tagger contained bugs. flair
now uses its own evaluation methods.
Fix bugs in text classifier (#108)
Fixed bugs in single label training and out-of-memory errors during evaluation.
Others
Standardize logging output (#16)
Logging output for sequence tagger and text classifier is imporved and standardized.
Update torch version (#34, #106)
flair now uses torch version 0.4.1
Updated documentation (#138, #89)
Expanded documentation and tutorials.