A language ID module using TextCat algorithm #16

avitalp · 2015-01-26T12:30:34Z

A language ID module using TextCat algorithm using language n-grams from "An Crubadan" project.
In response to nltk/nltk#107 and using nltk/nltk#845

The method "demo" refers to several sample files which I didn't include, as I was not sure where they should be placed.

@alexrudnick: would you be able to provide sample texts for some of the less well-represented languages?

using language n-grams from "An Crubadan" project.

stevenbird · 2015-01-28T10:31:28Z

Thanks @avitalp. I am considering putting this in nltk/classify.

avitalp · 2015-01-28T17:25:14Z

Thanks @stevenbird, that'd be great. Is there anything you'd like me to modify or add for that?

A language ID module using TextCat algorithm

c6cb055

using language n-grams from "An Crubadan" project.

stevenbird self-assigned this Jan 28, 2015

avitalp mentioned this pull request Feb 21, 2015

Added corpus reader for n-gram frequencies #845 #884 nltk/nltk#890

Merged

Provide feedback