-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add normalizer type C to text cleaners #85
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR and tests, it looks good! I suggest to rename the function to make the name more intuitive. I'd call it at the start of every cleaner, except no_cleaners()
.
You can run make clean && make lint
to make sure your code passes the style check.
TTS/tts/utils/text/cleaners.py
Outdated
def normalize_nfc(text: str) -> str: | ||
"""Canonical decomposition followed by canonical composition""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def normalize_nfc(text: str) -> str: | |
"""Canonical decomposition followed by canonical composition""" | |
def normalize_unicode(text: str) -> str: | |
"""Normalize Unicode characters.""" |
5412923
to
636ea59
Compare
1576006
to
8ec5d15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this is a very useful contribution!
* Add normalizer type C to text cleaners * Linter recommendations * Add unicode normalize to every cleaner * Format test_text_cleaners.py
There are duplications in the cleaners, should the normalizer be added inside the other cleaners, or be applied to all text?
https://github.com/idiap/coqui-ai-TTS/blob/dev/TTS/tts/utils/text/tokenizer.py#L110
/Closes #63