Skip to content

Release 0.3.0

Compare
Choose a tag to compare
@alanakbik alanakbik released this 16 Oct 14:41
· 5725 commits to master since this release
d7c0f17

Breaking Changes

New Label class with confidence score (#38)

A tag prediction is not a simple string anymore but a Label, which holds a value and a confidence score.
To obtain the tag name you need to call tag.value. To get the score call tag.score. This can help you build
applications in which you only want to use predictions that lie above a specific confidence threshold.

LockedDropout moved to the new flair.nn module (#48)

New Features

Multi-token spans (#54, #97)

Entities are can now be wrapped into multi-token spans (type: Span). This is helpful for entities that span multiple words, such as "George Washington". A Span contains the position of the entity in the original text, the tag, a confidence score, and its text. You can get spans from a sentence by using the get_spans() method, like so:

from flair.data import Sentence
from flair.models import SequenceTagger

# make a sentence
sentence = Sentence('George Washington went to Washington .')

# load and run NER
tagger = SequenceTagger.load('ner')
tagger.predict(sentence)

# get span entities, together with tag and confidence score
for entity in sentence.get_spans('ner'):
    print('{} {} {}'.format(entity.text, entity.tag, entity.score))

Predictions with confidence score (#38)

Predicted tags are no longer simple strings, but objects of type Label that contain a value and a confidence score. These scores are extracted during prediction from the sequence tagger or text classifier and indicate how confident the model is of a prediction. Print confidence scores of tags like this:

from flair.data import Sentence
from flair.models import SequenceTagger

# make a sentence
sentence = Sentence('George Washington went to Washington .')

# load the POS tagger
tagger = SequenceTagger.load('pos')

# run POS over sentence
tagger.predict(sentence)

# print token, predicted POS tag and confidence score
for token in sentence:
    print('{} {} {}'.format(token.text, token.get_tag('pos').value, token.get_tag('pos').score))

Visualization routines (#61)

flair now includes visualizations for plotting training curves and weights when training a sequence tagger or text classifier. We also added visualization routines for plotting embeddings and highlighting tags in a sentence. For instance, to visualize contextual string embeddings, do this:

from flair.data_fetcher import NLPTaskDataFetcher, NLPTask
from flair.embeddings import CharLMEmbeddings
from flair.visual import Visualizer

# get a list of Sentence objects
corpus = NLPTaskDataFetcher.fetch_data(NLPTask.CONLL_03).downsample(0.1)
sentences = corpus.train + corpus.test + corpus.dev

# init embeddings (can also be a StackedEmbedding)
embeddings = CharLMEmbeddings('news-forward-fast')

# embed corpus batch-wise
batches = [sentences[x:x + 8] for x in range(0, len(sentences), 8)]
for batch in batches:
    embeddings.embed(batch)

# visualize
visualizer = Visualizer()
visualizer.visualize_word_emeddings(embeddings, sentences, 'data/visual/embeddings.html')

Implementation of different dropouts (#48)

Different dropout possibilities (Locked Dropout and Word Dropout) were added and can be used during training.

Memory management for training on large data sets (#137)

flair now stores contextual string embeddings on disk to speed up training and allow for training on larger datsets.

Pre-trained language models for Polish

Added pre-trained language models for Polish, donated by (Borchmann et al., 2018). Load the Polish embeddings like this:

flm_embeddings = CharLMEmbeddings('polish-forward')
blm_embeddings = CharLMEmbeddings('polish-backward')

Bug Fixes

Fix evaluation of sequence tagger (#79, #75)

The script eval.pl for sequence tagger contained bugs. flair now uses its own evaluation methods.

Fix bugs in text classifier (#108)

Fixed bugs in single label training and out-of-memory errors during evaluation.

Others

Standardize logging output (#16)

Logging output for sequence tagger and text classifier is imporved and standardized.

Update torch version (#34, #106)

flair now uses torch version 0.4.1

Updated documentation (#138, #89)

Expanded documentation and tutorials.