Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate category scheme to english #3

Open
soodoku opened this issue Oct 26, 2016 · 4 comments
Open

Translate category scheme to english #3

soodoku opened this issue Oct 26, 2016 · 4 comments

Comments

@soodoku
Copy link
Member

soodoku commented Oct 26, 2016

Lots of category labels are in a language other than English

www.delphipraxis.net,Top/World/Deutsch/Computer/Programmieren/Sprachen/Delphi
iwamizawach.org,Top/World/Japanese/社会/宗教・精神世界/キリスト教/教団・教派/ペンテコステ・カリスマ派/アッセンブリーズ・オブ・ゴッド/日本アッセンブリーズ・オブ・ゴッド教団/教会/北海道

For non-english, it appears one pattern is that language is in the path: Deutsch, Japanese etc.

Perhaps use google translate to translate it? One package we could use:
https://pypi.python.org/pypi/translate

Final output will have an additional column -> cat_labels_english

@suriyan
Copy link
Member

suriyan commented Oct 27, 2016

Okay, I will do. By quick check there is over 200k unique labels under "Top/World/..." that will be non-English.

But seems Google Translate is limit just 1,000 words/day?

@soodoku
Copy link
Member Author

soodoku commented Oct 27, 2016

Not sure if we have good alternatives. And it seems that Google pricing is reasonable:
https://cloud.google.com/translate/v2/pricing

We can run through it one time.

@suriyan
Copy link
Member

suriyan commented Oct 27, 2016

Actually, Google Translate API has the following limit :- (it's not 1,000 words/day)

  • Queries (characters per day) ==> 2,000,000
  • Queries (characters per 100 seconds per user) ==> 100,000

By splitting each level of the category and grouping them by the language, we can get the smaller unique list of words for each language. It's about 1.5M characters so probably free quota will enough to translate it all.

suriyan added a commit that referenced this issue Oct 28, 2016
@suriyan
Copy link
Member

suriyan commented Oct 28, 2016

Sorry for my confusing, actually Google Translate API it's not free. But above number is quota to use this service per day per account.

Fortunately, Google give $300 credits for 60 days free trial on theirs Cloud services, so we can use this credits.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants