Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/emoticon handler [resolves #264] #289

Merged
merged 3 commits into from
Nov 2, 2021

Conversation

ertugrul-dmr
Copy link
Contributor

  • Added emoticon support for tokenizers.
  • Updated new argument for tokenizers (emoticon=True/False) works much like previous emoji, hashtag, mention handlers.
  • Current regex covers most common emoticons you can find on social media and can be updated easily.
  • Added and confirmed test cases for emoticon handlers.

Basic Usage:

from sadedegel.bblock.tokenizers import ICUTokenizer
text = "komik:))
tokenizer = ICUTokenizer(emoticon=False)
tokenizer(text)
>>output ['komik', ':', ')',')']

### if emoticon set to True:

tokenizer = ICUTokenizer(emoticon=True)
tokenizer(text)
>>output ['komik', ':))']

@ertugrul-dmr ertugrul-dmr added enhancement New feature or request ready labels Aug 20, 2021
@ertugrul-dmr ertugrul-dmr requested a review from dafajon August 20, 2021 13:31
@ertugrul-dmr ertugrul-dmr self-assigned this Aug 20, 2021
@ertugrul-dmr ertugrul-dmr linked an issue Aug 20, 2021 that may be closed by this pull request
@dafajon
Copy link
Contributor

dafajon commented Aug 20, 2021

Thanks for contribution. In addition, further in the PR can you report new results of social media/comment based prebuilt models optimized with this feature.

@ertugrul-dmr
Copy link
Contributor Author

Thanks for contribution. In addition, further in the PR can you report new results of social media/comment based prebuilt models optimized with this feature.

Done, really small gains f1 wise, like ~0.005, but might be useful for more not preprocessed text datas...

@husnusensoy husnusensoy merged commit 86f147b into develop Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emoticons :)
3 participants