Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a normalization pipeline #3

Open
2 of 3 tasks
PonteIneptique opened this issue Jun 27, 2018 · 0 comments
Open
2 of 3 tasks

Have a normalization pipeline #3

PonteIneptique opened this issue Jun 27, 2018 · 0 comments
Labels
enhancement New feature or request

Comments

@PonteIneptique
Copy link
Member

PonteIneptique commented Jun 27, 2018

It would be cool to be able to add some kind of normalization :

  • Remove punctuation
  • Remove numbers from lemma (Might be interesting for reconstructing the correct form without having to deal with noise from word embeddings)
  • Lowercase everything

Of course, this should be optional :)

@PonteIneptique PonteIneptique added the enhancement New feature or request label Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant