Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gulp-concat doesnt seem to support UTF 16 #101

Open
billrawlinson opened this issue Jul 9, 2015 · 16 comments
Open

gulp-concat doesnt seem to support UTF 16 #101

billrawlinson opened this issue Jul 9, 2015 · 16 comments

Comments

@billrawlinson
Copy link

When concating files which are UTF 16 Little Endian (unicode) every other file gets munged a bit.

When concatenating files which are UTF 16 Big Endian the same result happens.

If you alternate files where the first is UTF16LE and the second is UTF16BE then just the very end of the second file gets munged.

I have set up a demo project that illustrates this and has a bunch of notes that explain why I even tried these things. I don't know for certain the problem is in gulp-concat (it could be in gulp itself in gulp.src(). )

https://github.com/finalcut/gulp-concat-bug

@yocontra
Copy link
Member

yocontra commented Jul 9, 2015

Is this with concat or gulp itself?

@billrawlinson
Copy link
Author

it seems like it is concat to me considering the characters that are munged are interleaved (every other file). I figured I'd post the problem here first and see if you guys could see it and, possibly, confirm or reject if it is with gulp-concat.

@yocontra
Copy link
Member

@billrawlinson Can you try just piping src to dest a bunch of times and see if that causes the issue as well?

@billrawlinson
Copy link
Author

Sure I'll try Monday. I'm on the road now. If anyone else wants to know
sooner they can pull the demo project and try.

I figure the problem is either in the file read or Concat as the problem
manifests in the middle of the concatted result which should rule out the
write operation

On Fri, Jul 10, 2015, 15:28 contra [email protected] wrote:

@billrawlinson https://github.com/billrawlinson Can you try just piping
src to dest a bunch of times and see if that causes the issue as well?


Reply to this email directly or view it on GitHub
#101 (comment)
.

@billrawlinson
Copy link
Author

So I ran the tests where I just pipe in the files to dest and nothing funky happens to the files in the process.

I've updated the test demo project to where it does both.

If you want to run the tests to see the results just pull the project and give it a run. Each test now puts its results in a folder titled "results#' where # is the number of the test being run.

https://github.com/finalcut/gulp-concat-bug

@yocontra
Copy link
Member

I'm guessing it has something to do with buffer conversions in concat-with-sourcemaps:

Probably mixing a bunch of encodings together using node's Buffer module is causing unexpected results.

@billrawlinson
Copy link
Author

In test example 2 (utf16le) and 3 (utf16be) the encodings are all the
same. Test 1 and 4 with mixed encodings ends up with better results
(though still broken). Test 5,utf8,is the only one that has the correct
results.

On Mon, Jul 13, 2015, 18:13 contra [email protected] wrote:

I'm guessing it has something to do with buffer conversions in
concat-with-sourcemaps:

https://github.com/floridoo/concat-with-sourcemaps/blob/master/index.js#L109

https://github.com/floridoo/concat-with-sourcemaps/blob/master/index.js#L43-L46

https://github.com/floridoo/concat-with-sourcemaps/blob/master/index.js#L15-L18

Probably mixing a bunch of encodings together using node's Buffer module
is causing unexpected results.


Reply to this email directly or view it on GitHub
#101 (comment)
.

@yocontra
Copy link
Member

@billrawlinson I mean that the separator is treated as UTF-8, so combining that with some UTF-16 buffers might be yielding weird results

@billrawlinson
Copy link
Author

ah, that makes perfect sense.

@billrawlinson
Copy link
Author

I assume, due to the nature of gulp pipes that concat has no way of knowing the encoding of the various buffers coming in to it from src?

@billrawlinson
Copy link
Author

you are correct; it is the separator character that is causing the problem. I set up the test like follows:

function runConcatTest(d){
  var testResults =  gulp.src(d.sources)
    .pipe(concat(d.outfile, { newLine: '' }))
    .pipe(gulp.dest(d.outpath));
    testResults.on('data', printToConsole);
}

Where I basically blanked out the newLine character and the test 2 and 3 both work perfectly while test1 and 4 are all mucked up. If I don't override the newline it is broken as before.

Maybe as a temporary solution just the readme could be updated to let people know if they joining UTF16 files that they should put their own newline at the end of the files and then override the join character to be nothing.

UPDATE: I updated the demo project to show the working scenario with test 2 using an empty string as a the separator.

@yocontra
Copy link
Member

@billrawlinson Hmm trying to think up a solution here, going to dig into the buffer docs and see if I can figure something out

@yocontra
Copy link
Member

https://nodejs.org/api/buffer.html#buffer_class_method_buffer_isencoding_encoding

could emit a warning if the users mixes encodings (assuming we can't figure out a way to make it work)

@yocontra
Copy link
Member

I played around with this for a bit and it stumped me, @billrawlinson did you figure anything out?

@billrawlinson
Copy link
Author

I did not. I just resorted to not using UTF 16 👎

@troydemonbreun
Copy link

Have run into the same issue and it turns out the files that end up munged are UTF16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants