-
Notifications
You must be signed in to change notification settings - Fork 282
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add two backends: node & web * Convert core lib files to use the backends (and not use Buffer) * Convert utf16 codec as an example * Add testing for both node side and webpack * Bump Node.js minimal supported version to 4.5.0 and modernize some existing code. This will allow us to get rid of safer-buffer, our only dependency.
- Loading branch information
1 parent
9627ecf
commit e567849
Showing
24 changed files
with
525 additions
and
307 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# iconv-lite backends | ||
|
||
To accomodate different environments (most notably Node.js and Browser) in an efficient manner, iconv-lite has a concept of 'backends'. | ||
Backends convert internal data representations to what customers expect. | ||
|
||
Here's the overview of the data types used in iconv-lite codecs: | ||
|
||
| encoder | decoder | ||
------------------------------ | ----------------- | ------------- | ||
input type | js string | Uint8Array, Buffer or any array with bytes | ||
input internal representation | js string | same as input | ||
input data access | str.charCodeAt(i) | bytes[i] | ||
output type | Backend Bytes | js string | ||
output internal representation | Uint8Array | Uint16Array | ||
output data writing | bytes[i] | rawChars[i] | ||
|
||
The reasoning behind this choice is the following: | ||
* For inputs, we try to use passed-in objects directly and not convert them, | ||
to avoid perf hit. For decoder inputs that means that all codecs need to | ||
be able to work with both Uint8Array-s and Buffer-s at the same time. | ||
* For outputs, we standardize internal representation (what codecs works with) | ||
to Uint8Array and Uint16Array because that seems to be the lowest common denominator between the | ||
backends (Buffer can be interchanged with Uint8Array) that is not sacrificing performance. | ||
|
||
## Backend interface | ||
```typescript | ||
|
||
BackendBytes = .. // Depends on the backend | ||
|
||
interface IconvLiteBackend { | ||
// Encoder output: allocBytes() -> use Uint8Array -> bytesToResult(). | ||
allocBytes(numBytes: int, fill: int): Uint8Array; | ||
bytesToResult(bytes: Uint8Array, finalLen: int): BackendBytes; | ||
concatByteResults(bufs: BackendBytes[]): BackendBytes; | ||
|
||
// Decoder output: allocRawChars -> use Uint16Array -> rawCharsToResult(). | ||
allocRawChars(numChars: int): Uint16Array; | ||
rawCharsToResult(rawChars: Uint16Array, finalLen: int): string; | ||
|
||
// TODO: We'll likely add some more methods here for additional performance | ||
}; | ||
``` | ||
|
||
## Codec pseudocode | ||
```js | ||
class Encoder { | ||
write(str) { | ||
const bytes = this.backend.allocBytes(str.length * max_bytes_per_char); | ||
let bytesPos = 0; | ||
for (let i = 0; i < str.length; i++) { | ||
const char = str.charCodeAt(i); // todo: handle surrogates. | ||
// convert char to bytes | ||
bytes[bytesPos++] = byte1; | ||
... | ||
} | ||
return this.backend.bytesToResult(bytes, bytesPos); | ||
} | ||
} | ||
|
||
class Decoder { | ||
write(buf) { // NOTE: buf here can be Uint8Array, Buffer or regular array. | ||
const chars = this.backend.allocRawChars(buf.length * max_chars_per_byte); | ||
let charsPos = 0; | ||
for (let i = 0; i < buf.length; i++) { | ||
let byte1 = buf[i]; | ||
// convert byte(s) to char | ||
chars[charsPos++] = char; // todo: handle surrogates. | ||
... | ||
} | ||
return this.backend.rawCharsToResult(chars, charsPos); | ||
} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
"use strict"; | ||
// NOTE: This backend uses Buffer APIs that are only available in Node v4.5+ and v5.10+. | ||
|
||
module.exports = { | ||
// Encoder string input: use str directly, .length, .charCodeAt(i). | ||
// Encoder bytes output: allocBytes() -> use Uint8Array -> bytesToResult(). | ||
allocBytes(numBytes, fill) { | ||
// NOTE: We could do a 'new ArrayBuffer' here, but Buffer.alloc gives us pooling, which makes small chunks faster. | ||
const buf = Buffer.alloc(numBytes, fill); | ||
return new Uint8Array(buf.buffer, buf.byteOffset, numBytes); | ||
}, | ||
bytesToResult(bytes, finalLen) { | ||
// In Node 5.10.0-6.3.0, Buffer.from() raises error if fed with zero-length buffer, so we check for it explicitly. | ||
if (finalLen === 0) { | ||
return Buffer.alloc(0); | ||
} | ||
|
||
// In Node 4.5.0-5.10.0, Buffer.from() does not support (arrayBuffer, byteOffset, length) signature, only (arrayBuffer), | ||
// so we emulate it with .slice(). | ||
return Buffer.from(bytes.buffer).slice(bytes.byteOffset, bytes.byteOffset+finalLen); | ||
}, | ||
concatByteResults(bufs) { | ||
return Buffer.concat(bufs); | ||
}, | ||
|
||
// Decoder bytes input: use only array access + .length, so both Buffer-s and Uint8Array-s work. | ||
// Decoder string output: allocRawChars -> use Uint16Array -> rawCharsToResult(). | ||
allocRawChars(numChars) { | ||
// NOTE: We could do a 'new ArrayBuffer' here, but Buffer.alloc gives us pooling, which makes small chunks faster. | ||
const buf = Buffer.alloc(numChars * Uint16Array.BYTES_PER_ELEMENT); | ||
return new Uint16Array(buf.buffer, buf.byteOffset, numChars); | ||
}, | ||
rawCharsToResult(rawChars, finalLen) { | ||
// See comments in bytesToResult about old Node versions support. | ||
if (finalLen === 0) { | ||
return ''; | ||
} | ||
return Buffer.from(rawChars.buffer) | ||
.slice(rawChars.byteOffset, rawChars.byteOffset + finalLen * Uint16Array.BYTES_PER_ELEMENT) | ||
.toString('ucs2'); | ||
}, | ||
|
||
// Optimizations | ||
// maybe buf.swap16()? | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
"use strict"; | ||
// NOTE: This backend uses TextDecoder interface. | ||
|
||
module.exports = { | ||
// Encoder string input: use str directly, .length, .charCodeAt(i). | ||
// Encoder bytes output: allocBytes() -> use Uint8Array -> bytesToResult(). | ||
allocBytes(numBytes, fill) { | ||
const arr = new Uint8Array(new ArrayBuffer(numBytes)); | ||
if (fill != null) { | ||
arr.fill(fill); | ||
} | ||
return arr; | ||
}, | ||
bytesToResult(bytes, finalLen) { | ||
return bytes.subarray(0, finalLen); | ||
}, | ||
concatByteResults(bufs) { | ||
bufs = bufs.filter((b) => b.length > 0); | ||
if (bufs.length === 0) { | ||
return new Uint8Array(); | ||
} else if (bufs.length === 1) { | ||
return bufs[0]; | ||
} | ||
|
||
const totalLen = bufs.reduce((a, b) => a + b.length, 0); | ||
const res = new Uint8Array(new ArrayBuffer(totalLen)); | ||
let curPos = 0; | ||
for (var i = 0; i < bufs.length; i++) { | ||
res.set(bufs[i], curPos); | ||
curPos += bufs[i].length; | ||
} | ||
return res; | ||
}, | ||
|
||
// Decoder bytes input: use only array access + .length, so both Buffer-s and Uint8Array-s work. | ||
// Decoder string output: allocRawChars -> use Uint16Array -> rawCharsToResult(). | ||
allocRawChars(numChars) { | ||
return new Uint16Array(new ArrayBuffer(numChars * Uint16Array.BYTES_PER_ELEMENT)); | ||
}, | ||
rawCharsToResult(rawChars, finalLen) { | ||
return new TextDecoder("utf-16").decode(rawChars.subarray(0, finalLen)); | ||
}, | ||
|
||
// Optimizations | ||
// maybe buf.swap16()? | ||
}; |
Oops, something went wrong.