-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust line wrapping algorithm to be closer to the GNU gettext tooling #51
Comments
Thank you for your extensive comparison! Always a pleasure to see issues reported this way! I'm all for matching gettext tools in every possible way. That said, based on your HTML example, this might go deep. Maybe we can wrap the C library? I don't know what the recommended approach is these days, so any input would be appreciated. |
Cool, yeah I tried searching around to see if I can figure out how this logic is implemented in the gettext tools but was unsuccesful. Perhaps someone else has experience there and can chime in. I don't think wrapping the C library is an option if you want browser support (and I think it takes away some of the appeal of this library ^^). One approach to this would be to brute force it. The examples in the start of this post could be added as test cases. Some testing tool could then run the same texts through Alternatively a framework like One thing that could expose is that maybe changing from 76 to 77 as the default wrapping length already makes a good difference. EDIT: I see this library already uses chai and mocha ('m a Jest fanboy sorry) which has this package for snapshotting: https://www.npmjs.com/package/mocha-chai-snapshot |
If you see low-hanging fruit (any fruit, really ;)), please send in a PR. In general: |
I'm not sure, but browsing through the GNU gettext git, it seems the In the
Maybe this helps shed a little light on what's happening. :) |
In our project we use both tools for different parts of our translation pipeline. Unfortunately the inconsistency creates very noisy commits. Bringing them closer together will make reviewing changes made with tools relying on gettext-parser a lot easier to review.
Examples from a diff
Below are examples of a diff. On the left is a .po file created by msginit and manipulated by GNU gettext tools. On the right side is the output of a call to gettext-parser's
po.compile
function.The string
<b><em class=\"placeholder\">@count</em> Members</b> are selected
is 65 characters long, so it's not wrapped. However the entire linemsgid_plural "<b><em class=\"placeholder\">@count</em> Members</b> are selected"
is 80 characters long which seems to cause the GNU tools to wrap the line.The same happens for
<em>Books</em> have a built-in hierarchical navigation. Use for handbooks or
is 77 characters long. The GNU tools seem to allow this with the space being on the first line as the 77th character. gettext-parser will wrap one space earlier.This seems to occur more often than expected. For example in the following lines as well.
It seems the GNU tools have a bit more knowledge of HTML while the gettext-parser treats
href=\"\">[social_mentions:mentioned_user]</a>
as an unbreakable string, the GNU break makes more sense because<a href=\"\">
together is more important.A similar case for this can be found below, where the space is found to be a better breaking point than within a tag.
I also saw the following which actually suggests that the GNU tools allow 77 characters on a line so it may be a better default than 76. I couldn't find the GNU tools' line-break algorithm so I'm not sure whether they special case spaces and dots or just count to 77.
Open Social Branding will be replaced by site name (and slogan if available).
is exactly 77 characters.The text was updated successfully, but these errors were encountered: