Breaking Changes

New `Label` class with confidence score (#38)

A tag prediction is not a simple string anymore but a Label, which holds a value and a confidence score.
To obtain the tag name you need to call tag.value. To get the score call tag.score. This can help you build
applications in which you only want to use predictions that lie above a specific confidence threshold.

`LockedDropout` moved to the new `flair.nn` module (#48)

New Features

Multi-token spans (#54, #97)

Entities are can now be wrapped into multi-token spans (type: Span). This is helpful for entities that span multiple words, such as "George Washington". A Span contains the position of the entity in the original text, the tag, a confidence score, and its text. You can get spans from a sentence by using the get_spans() method, like so:

from flair.data import Sentence
from flair.models import SequenceTagger

# make a sentence
sentence = Sentence('George Washington went to Washington .')

# load and run NER
tagger = SequenceTagger.load('ner')
tagger.predict(sentence)

# get span entities, together with tag and confidence score
for entity in sentence.get_spans('ner'):
    print('{} {} {}'.format(entity.text, entity.tag, entity.score))

Predictions with confidence score (#38)

Predicted tags are no longer simple strings, but objects of type Label that contain a value and a confidence score. These scores are extracted during prediction from the sequence tagger or text classifier and indicate how confident the model is of a prediction. Print confidence scores of tags like this:

from flair.data import Sentence
from flair.models import SequenceTagger

# make a sentence
sentence = Sentence('George Washington went to Washington .')

# load the POS tagger
tagger = SequenceTagger.load('pos')

# run POS over sentence
tagger.predict(sentence)

# print token, predicted POS tag and confidence score
for token in sentence:
    print('{} {} {}'.format(token.text, token.get_tag('pos').value, token.get_tag('pos').score))

Visualization routines (#61)

flair now includes visualizations for plotting training curves and weights when training a sequence tagger or text classifier. We also added visualization routines for plotting embeddings and highlighting tags in a sentence. For instance, to visualize contextual string embeddings, do this:

from flair.data_fetcher import NLPTaskDataFetcher, NLPTask
from flair.embeddings import CharLMEmbeddings
from flair.visual import Visualizer

# get a list of Sentence objects
corpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03).downsample(0.1)
sentences = corpus.train + corpus.test + corpus.dev

# init embeddings (can also be a StackedEmbedding)
embeddings = CharLMEmbeddings('news-forward-fast')

# embed corpus batch-wise
batches = [sentences[x:x + 8] for x in range(0, len(sentences), 8)]
for batch in batches:
    embeddings.embed(batch)

# visualize
visualizer = Visualizer()
visualizer.visualize_word_emeddings(embeddings, sentences, 'data/visual/embeddings.html')

Implementation of different dropouts (#48)

Different dropout possibilities (Locked Dropout and Word Dropout) were added and can be used during training.

Memory management for training on large data sets (#137)

flair now stores contextual string embeddings on disk to speed up training and allow for training on larger datsets.

Pre-trained language models for Polish

Added pre-trained language models for Polish, donated by (Borchmann et al., 2018). Load the Polish embeddings like this:

flm_embeddings = CharLMEmbeddings('polish-forward')
blm_embeddings = CharLMEmbeddings('polish-backward')

Bug Fixes

Fix evaluation of sequence tagger (#79, #75)

The script eval.pl for sequence tagger contained bugs. flair now uses its own evaluation methods.

Fix bugs in text classifier (#108)

Fixed bugs in single label training and out-of-memory errors during evaluation.

Others

Standardize logging output (#16)

Logging output for sequence tagger and text classifier is imporved and standardized.

Update torch version (#34, #106)

flair now uses torch version 0.4.1

Updated documentation (#138, #89)

Expanded documentation and tutorials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.3.0

Breaking Changes

New `Label` class with confidence score (#38)

`LockedDropout` moved to the new `flair.nn` module (#48)

New Features

Multi-token spans (#54, #97)

Predictions with confidence score (#38)

Visualization routines (#61)

Implementation of different dropouts (#48)

Memory management for training on large data sets (#137)

Pre-trained language models for Polish

Bug Fixes

Fix evaluation of sequence tagger (#79, #75)

Fix bugs in text classifier (#108)

Others

Standardize logging output (#16)

Update torch version (#34, #106)

Updated documentation (#138, #89)

Release 0.3.0

Breaking Changes

New Label class with confidence score (#38)

LockedDropout moved to the new flair.nn module (#48)

New Features

Multi-token spans (#54, #97)

Predictions with confidence score (#38)

Visualization routines (#61)

Implementation of different dropouts (#48)

Memory management for training on large data sets (#137)

Pre-trained language models for Polish

Bug Fixes

Fix evaluation of sequence tagger (#79, #75)

Fix bugs in text classifier (#108)

Others

Standardize logging output (#16)

Update torch version (#34, #106)

Updated documentation (#138, #89)

New `Label` class with confidence score (#38)

`LockedDropout` moved to the new `flair.nn` module (#48)