recreate `to_dict` and add relations #3271

helpmefindaname · 2023-06-19T07:18:50Z

partial revert of decd3a6

This cleans up the to_dict() iterface, giving options to get all labels, entities and relations of a sentence

turian · 2023-07-13T16:46:04Z

There are a few things I'd love to see in this PR, which would make me run it instead of main.

You seem to be dropping labels from the pos tagger by default. Why?
I'd really like the token start_position and end_positions in the dict, so I can recover the original text from a list of tokens. (token.text would be nice but unnecessary)

Try this code:

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/pos-english-fast")

# make example sentence
sentence = Sentence("I love Berlin.")

# predict NER tags
tagger.predict(sentence)

print(sentence.to_dict())

with your branch I get:

{'text': 'I love Berlin.', 'labels': [], 'entities': [], 'relations': []}

with main I get:

{'text': 'I love Berlin.', 'all labels': [{'value': 'PRP', 'confidence': 0.9999862909317017}, {'value': 'VBP', 'confidence': 0.999794065952301}, {'value': 'NNP', 'confidence': 0.9999107122421265}, {'value': '.', 'confidence': 0.9998101592063904}]}

Here is a little workaround that works with main:

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/pos-english-fast")

# make example sentence
sentence = Sentence("I love Berlin.")

# predict NER tags
tagger.predict(sentence)

token_list = []
for token in sentence:
    token_list.append((token.start_position, token.end_position))

sentence_dict = sentence.to_dict()
sentence_dict["token positions"] = token_list
assert len(sentence_dict["token positions"]) == len(sentence_dict["all labels"])
print(sentence_dict)

Which gives:

{'text': 'I love Berlin.', 'all labels': [{'value': 'PRP', 'confidence': 0.9999862909317017}, {'value': 'VBP', 'confidence': 0.999794065952301}, {'value': 'NNP', 'confidence': 0.9999107122421265}, {'value': '.', 'confidence': 0.9998101592063904}], 'token positions': [(0, 1), (2, 6), (7, 13), (13, 14)]}

That's what I really want!

alanakbik · 2023-08-08T14:49:19Z

@helpmefindaname thanks for adding this!

helpmefindaname linked an issue Jun 19, 2023 that may be closed by this pull request

[Question]: sentence.to_dict(tag_type='ner') no longer have the 'entities' key #3265

Closed

helpmefindaname mentioned this pull request Jun 19, 2023

[Question]: sentence.to_dict(tag_type='ner') no longer have the 'entities' key #3265

Closed

helpmefindaname force-pushed the 3265-question-sentenceto_dicttag_type=ner-no-longer-have-the-entities-key branch from 142a5f0 to e0fcd6b Compare July 12, 2023 21:51

Benedikt Fuchs added 3 commits August 7, 2023 17:29

recreate to_dict and add relations

72da117

add tokens

0e6c0b6

black formatting and ruff fixes

14d5a07

helpmefindaname force-pushed the 3265-question-sentenceto_dicttag_type=ner-no-longer-have-the-entities-key branch from 7f79e92 to 14d5a07 Compare August 7, 2023 15:29

alanakbik merged commit 856e072 into master Aug 8, 2023
1 check passed

alanakbik deleted the 3265-question-sentenceto_dicttag_type=ner-no-longer-have-the-entities-key branch August 8, 2023 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recreate `to_dict` and add relations #3271

recreate `to_dict` and add relations #3271

helpmefindaname commented Jun 19, 2023

turian commented Jul 13, 2023

alanakbik commented Aug 8, 2023

recreate to_dict and add relations #3271

recreate to_dict and add relations #3271

Conversation

helpmefindaname commented Jun 19, 2023

turian commented Jul 13, 2023

alanakbik commented Aug 8, 2023

recreate `to_dict` and add relations #3271

recreate `to_dict` and add relations #3271