Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recreate to_dict and add relations #3271

Conversation

helpmefindaname
Copy link
Collaborator

Closes #3265

partial revert of decd3a6

This cleans up the to_dict() iterface, giving options to get all labels, entities and relations of a sentence

@turian
Copy link

turian commented Jul 13, 2023

There are a few things I'd love to see in this PR, which would make me run it instead of main.

  1. You seem to be dropping labels from the pos tagger by default. Why?
  2. I'd really like the token start_position and end_positions in the dict, so I can recover the original text from a list of tokens. (token.text would be nice but unnecessary)

Try this code:

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/pos-english-fast")

# make example sentence
sentence = Sentence("I love Berlin.")

# predict NER tags
tagger.predict(sentence)

print(sentence.to_dict())

with your branch I get:

{'text': 'I love Berlin.', 'labels': [], 'entities': [], 'relations': []}

with main I get:

{'text': 'I love Berlin.', 'all labels': [{'value': 'PRP', 'confidence': 0.9999862909317017}, {'value': 'VBP', 'confidence': 0.999794065952301}, {'value': 'NNP', 'confidence': 0.9999107122421265}, {'value': '.', 'confidence': 0.9998101592063904}]}

Here is a little workaround that works with main:

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/pos-english-fast")

# make example sentence
sentence = Sentence("I love Berlin.")

# predict NER tags
tagger.predict(sentence)

token_list = []
for token in sentence:
    token_list.append((token.start_position, token.end_position))

sentence_dict = sentence.to_dict()
sentence_dict["token positions"] = token_list
assert len(sentence_dict["token positions"]) == len(sentence_dict["all labels"])
print(sentence_dict)

Which gives:

{'text': 'I love Berlin.', 'all labels': [{'value': 'PRP', 'confidence': 0.9999862909317017}, {'value': 'VBP', 'confidence': 0.999794065952301}, {'value': 'NNP', 'confidence': 0.9999107122421265}, {'value': '.', 'confidence': 0.9998101592063904}], 'token positions': [(0, 1), (2, 6), (7, 13), (13, 14)]}

That's what I really want!

@helpmefindaname helpmefindaname force-pushed the 3265-question-sentenceto_dicttag_type=ner-no-longer-have-the-entities-key branch from 7f79e92 to 14d5a07 Compare August 7, 2023 15:29
@alanakbik
Copy link
Collaborator

@helpmefindaname thanks for adding this!

@alanakbik alanakbik merged commit 856e072 into master Aug 8, 2023
1 check passed
@alanakbik alanakbik deleted the 3265-question-sentenceto_dicttag_type=ner-no-longer-have-the-entities-key branch August 8, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question]: sentence.to_dict(tag_type='ner') no longer have the 'entities' key
3 participants