-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Util::normalizeLineEndings breaks UTF-8 #15
Comments
Oh you are absolutely right the |
Awesome, thanks! I think by the side effects I mean how it will modify the behaviour of the code, and whether it should be treated as a breaking change or not. |
And as always, it seems simple but it isn't especially with encoding. Of course adding the I'll investigate and see what I can find ;) |
This fixes issue #15 in a very hacky way. The pure cyrillic detection must be replaced with a more generic approach. But for now I could not figure it out. So this hack must do for now.
Just in case — I had the same problem not only with Cyrillic, but with binary data as well. The first time I noticed the problem was when I run the FullDiffList over a vendored captainhook — the diff went crazy over this file: https://github.com/captainhookphp/captainhook/blob/main/tools/phive. But that time I didn't really paid attention, because, well, it was binary data :) |
The unicode/utf8 stuff proves to be a pain in the *** again and again. I added a dirty hack to detect if I have to use the modifier or not. For now that's fine but I have to investigate that further to come up with a more sustainable solution. Never the less I released a new version If you need me to compile another |
That's enough for me right now, thanks! I hope you will be able to find a better solution later. |
Hi!
First of all, thank you for all the libraries you've built, they're magnificent! :)
I've encountered an error when working with
\SebastianFeldmann\Git\Command\Diff\Compare\FullDiffList
: it goes crazy when the diff has binary/UTF-8 data. After some digging I found out the reason: it's all because of\SebastianFeldmann\Cli\Output\Util::normalizeLineEndings
. For example, the following text will be processed incorrectly:text: хо
— the last two symbols are Cyrillic letters, but here's the result of the method:It could be enough to replace the regex pattern with
~(BSR_ANYCRLF)*\R~u
— at least it fixes my case, but I'm not sure about the possible side effects.The text was updated successfully, but these errors were encountered: