Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add translation tests for argparse #124295

Open
savannahostrowski opened this issue Sep 20, 2024 · 14 comments · Fixed by #124803
Open

Add translation tests for argparse #124295

savannahostrowski opened this issue Sep 20, 2024 · 14 comments · Fixed by #124803
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir tests Tests in the Lib/test dir

Comments

@savannahostrowski
Copy link
Member

savannahostrowski commented Sep 20, 2024

While #12711 was merged, it seems like there's a separate issue worth tracking around adding translation tests for argparse (see #12711 (comment))

Linked PRs

@savannahostrowski savannahostrowski added tests Tests in the Lib/test dir stdlib Python modules in the Lib dir labels Sep 20, 2024
@savannahostrowski savannahostrowski changed the title Add translation tests to argparse Add translation tests for argparse Sep 20, 2024
@tomasr8
Copy link
Member

tomasr8 commented Sep 24, 2024

Continuing the discussion here since the other issue was closed.

Quoting Serhiy:

I thought that we should run the parser with different options to produce different error messages with installed special translator that transforms the original text in simply recognizable way. But there may be better ways.

I was originally thinking the same, but to me that seems like a lot of effort compared to the benefit we'd get. Instead, I think that using a message extractor (e.g. pygettext) we can assert both the presence of a specific message and the line number it appears at. This should give us pretty much the same confidence as directly checking the argparse output but with an order of magnitude less work. What do you think?

@serhiy-storchaka
Copy link
Member

and the line number it appears at

Would not this make the tests extremely fragile?

@tomasr8
Copy link
Member

tomasr8 commented Sep 25, 2024

Would not this make the tests extremely fragile?

Does it matter that much if you don't need to fix the tests manually? You'd have a snapshot of the PO file and a simple command to regenerate it so all you'd need to do is double-check the diff

@serhiy-storchaka
Copy link
Member

You would need to regenerate the snapshot after every change in argparse (and it was changed a lot in last few days), and the diff would be so large that it would be hard to notice a real regression in the noise.

But this is a good first step. Before #27667 was merged, Tools/i18n/pygettext.py produced a warning for _() used with non-literals. They should be treated as errors in the test. Also, line numbers should be ignored in comparison and only used to report errors. We should just collect a set of translated strings, compare it with the expected result and report about missing or additional strings.

@serhiy-storchaka serhiy-storchaka added 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Sep 25, 2024
@tomasr8
Copy link
Member

tomasr8 commented Sep 25, 2024

True, if the module is changing a lot, so would the snapshots. I'll try to come up with a PR based on your suggestions :)

@serhiy-storchaka
Copy link
Member

We should also add translation tests for getopt and optparse, and maybe for other modules (this is separate issues). I am not a fan of converting many tests into packages just to add a data file. This may break history, add conflicts with existing PRs and make future work with tests less convenient. Maybe add a common directory for translation data?

@tomasr8
Copy link
Member

tomasr8 commented Oct 26, 2024

We should also add translation tests for getopt and optparse, and maybe for other modules (this is separate issues)

Agreed, I can take care of that once we're done with argparse.

I am not a fan of converting many tests into packages just to add a data file. This may break history, add conflicts with existing PRs and make future work with tests less convenient. Maybe add a common directory for translation data?

A common folder for all translation data sounds like a good idea!

@tomasr8
Copy link
Member

tomasr8 commented Oct 27, 2024

@serhiy-storchaka The PR also separates the translation tests into its own files which necessitates turning argparse tests into a package. Would you prefer that I merge the translation tests with test_argparse to avoid that?

@serhiy-storchaka
Copy link
Member

Yes, of course.

@serhiy-storchaka
Copy link
Member

Thank you for your contribution @tomasr8. If you wish, you can create new issues and PRs for getopt and optparse, or just PRs.

@tomasr8
Copy link
Member

tomasr8 commented Oct 27, 2024

I'll look into that :)

serhiy-storchaka added a commit that referenced this issue Oct 27, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 27, 2024
…thonGH-124803) (pythonGH-126046)

(cherry picked from commit 0922a4a)

(cherry picked from commit ff044ed)

Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: Tomas R. <[email protected]>
@serhiy-storchaka
Copy link
Member

It does not catch ngettext strings.

@tomasr8
Copy link
Member

tomasr8 commented Oct 27, 2024

IIRC pygettext currently does not know how to handle *gettext() functions which take more than one argument.
There is a way to specify additional keywords via --keyword=XXX but I don't see any way to tell it how many arguments
it takes.

For example invoking

./Tools/i18n/pygettext.py -o messages.pot --keyword=ngettext Lib/argparse.py

Produces several warnings (and does not extract any ngettext strings):

*** Lib/argparse.py:1639: Seen unexpected token ","
*** Lib/argparse.py:2326: Seen unexpected token ","

We could fix pygettext to allow functions like ngettext. We could probably use the same syntax that GNU's xgettext and pybabel use to specify keywords and their arguments: https://www.gnu.org/software/gettext/manual/html_node/xgettext-Invocation.html#index-_002dk_002c-xgettext-option

@serhiy-storchaka
Copy link
Member

For beginning, it should support all gettext functions by default. Later we can make this customizable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir tests Tests in the Lib/test dir
Projects
Status: Doc issues
Development

Successfully merging a pull request may close this issue.

3 participants