We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks for your helpful codebase!
I am a bit confused about stop words filtering. The release code removes the document, if its stop words ratio below the certain cutoff.
stop words filtering
data-preparation/preprocessing/training/01b_oscar_cleaning_and_filtering/filtering.py
Line 590 in 9d05884
If the stop words ratio for a document is higher than a certain cutoff, it is removed.
I am wondering which one is more useful in your practice. Thanks in advance!
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Thanks for your helpful codebase!
I am a bit confused about
stop words filtering
.The release code removes the document, if its stop words ratio below the certain cutoff.
data-preparation/preprocessing/training/01b_oscar_cleaning_and_filtering/filtering.py
Line 590 in 9d05884
But in notebook, section 2.5 states
If the stop words ratio for a document is higher than a certain cutoff, it is removed.
I am wondering which one is more useful in your practice.
Thanks in advance!
The text was updated successfully, but these errors were encountered: