Skip to content

Commit

Permalink
Introduce the concept of backends
Browse files Browse the repository at this point in the history
 * Add two backends: node & web
 * Convert core lib files to use the backends (and not use Buffer)
 * Convert utf16 codec as an example
 * Add testing for both node side and webpack
 * Bump Node.js minimal supported version to 4.5.0 and modernize some
   existing code. This will allow us to get rid of
   safer-buffer, our only dependency.
  • Loading branch information
ashtuchkin committed Jul 14, 2020
1 parent 9627ecf commit e567849
Show file tree
Hide file tree
Showing 24 changed files with 525 additions and 307 deletions.
11 changes: 6 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
language: node_js
node_js:
- "0.10"
- "0.11"
- "0.12"
- "iojs"
- "4.5.0" # Oldest supported version
- "5.10.0" # Oldest supported version from version 5.x
- "4"
- "6"
- "8"
Expand All @@ -16,4 +14,7 @@ jobs:
- name: webpack
node_js: "12"
install: cd test/webpack; npm install
script: npm test
script: npm test
- name: node-web-backend
node_js: "12"
script: npm run-script test-node-web
73 changes: 73 additions & 0 deletions backends/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# iconv-lite backends

To accomodate different environments (most notably Node.js and Browser) in an efficient manner, iconv-lite has a concept of 'backends'.
Backends convert internal data representations to what customers expect.

Here's the overview of the data types used in iconv-lite codecs:

  | encoder | decoder
------------------------------ | ----------------- | -------------
input type | js string | Uint8Array, Buffer or any array with bytes
input internal representation | js string | same as input
input data access | str.charCodeAt(i) | bytes[i]
output type | Backend Bytes | js string
output internal representation | Uint8Array | Uint16Array
output data writing | bytes[i] | rawChars[i]

The reasoning behind this choice is the following:
* For inputs, we try to use passed-in objects directly and not convert them,
to avoid perf hit. For decoder inputs that means that all codecs need to
be able to work with both Uint8Array-s and Buffer-s at the same time.
* For outputs, we standardize internal representation (what codecs works with)
to Uint8Array and Uint16Array because that seems to be the lowest common denominator between the
backends (Buffer can be interchanged with Uint8Array) that is not sacrificing performance.

## Backend interface
```typescript

BackendBytes = .. // Depends on the backend

interface IconvLiteBackend {
// Encoder output: allocBytes() -> use Uint8Array -> bytesToResult().
allocBytes(numBytes: int, fill: int): Uint8Array;
bytesToResult(bytes: Uint8Array, finalLen: int): BackendBytes;
concatByteResults(bufs: BackendBytes[]): BackendBytes;

// Decoder output: allocRawChars -> use Uint16Array -> rawCharsToResult().
allocRawChars(numChars: int): Uint16Array;
rawCharsToResult(rawChars: Uint16Array, finalLen: int): string;

// TODO: We'll likely add some more methods here for additional performance
};
```

## Codec pseudocode
```js
class Encoder {
write(str) {
const bytes = this.backend.allocBytes(str.length * max_bytes_per_char);
let bytesPos = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i); // todo: handle surrogates.
// convert char to bytes
bytes[bytesPos++] = byte1;
...
}
return this.backend.bytesToResult(bytes, bytesPos);
}
}

class Decoder {
write(buf) { // NOTE: buf here can be Uint8Array, Buffer or regular array.
const chars = this.backend.allocRawChars(buf.length * max_chars_per_byte);
let charsPos = 0;
for (let i = 0; i < buf.length; i++) {
let byte1 = buf[i];
// convert byte(s) to char
chars[charsPos++] = char; // todo: handle surrogates.
...
}
return this.backend.rawCharsToResult(chars, charsPos);
}
}
```
45 changes: 45 additions & 0 deletions backends/node.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"use strict";
// NOTE: This backend uses Buffer APIs that are only available in Node v4.5+ and v5.10+.

module.exports = {
// Encoder string input: use str directly, .length, .charCodeAt(i).
// Encoder bytes output: allocBytes() -> use Uint8Array -> bytesToResult().
allocBytes(numBytes, fill) {
// NOTE: We could do a 'new ArrayBuffer' here, but Buffer.alloc gives us pooling, which makes small chunks faster.
const buf = Buffer.alloc(numBytes, fill);
return new Uint8Array(buf.buffer, buf.byteOffset, numBytes);
},
bytesToResult(bytes, finalLen) {
// In Node 5.10.0-6.3.0, Buffer.from() raises error if fed with zero-length buffer, so we check for it explicitly.
if (finalLen === 0) {
return Buffer.alloc(0);
}

// In Node 4.5.0-5.10.0, Buffer.from() does not support (arrayBuffer, byteOffset, length) signature, only (arrayBuffer),
// so we emulate it with .slice().
return Buffer.from(bytes.buffer).slice(bytes.byteOffset, bytes.byteOffset+finalLen);
},
concatByteResults(bufs) {
return Buffer.concat(bufs);
},

// Decoder bytes input: use only array access + .length, so both Buffer-s and Uint8Array-s work.
// Decoder string output: allocRawChars -> use Uint16Array -> rawCharsToResult().
allocRawChars(numChars) {
// NOTE: We could do a 'new ArrayBuffer' here, but Buffer.alloc gives us pooling, which makes small chunks faster.
const buf = Buffer.alloc(numChars * Uint16Array.BYTES_PER_ELEMENT);
return new Uint16Array(buf.buffer, buf.byteOffset, numChars);
},
rawCharsToResult(rawChars, finalLen) {
// See comments in bytesToResult about old Node versions support.
if (finalLen === 0) {
return '';
}
return Buffer.from(rawChars.buffer)
.slice(rawChars.byteOffset, rawChars.byteOffset + finalLen * Uint16Array.BYTES_PER_ELEMENT)
.toString('ucs2');
},

// Optimizations
// maybe buf.swap16()?
};
46 changes: 46 additions & 0 deletions backends/web.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"use strict";
// NOTE: This backend uses TextDecoder interface.

module.exports = {
// Encoder string input: use str directly, .length, .charCodeAt(i).
// Encoder bytes output: allocBytes() -> use Uint8Array -> bytesToResult().
allocBytes(numBytes, fill) {
const arr = new Uint8Array(new ArrayBuffer(numBytes));
if (fill != null) {
arr.fill(fill);
}
return arr;
},
bytesToResult(bytes, finalLen) {
return bytes.subarray(0, finalLen);
},
concatByteResults(bufs) {
bufs = bufs.filter((b) => b.length > 0);
if (bufs.length === 0) {
return new Uint8Array();
} else if (bufs.length === 1) {
return bufs[0];
}

const totalLen = bufs.reduce((a, b) => a + b.length, 0);
const res = new Uint8Array(new ArrayBuffer(totalLen));
let curPos = 0;
for (var i = 0; i < bufs.length; i++) {
res.set(bufs[i], curPos);
curPos += bufs[i].length;
}
return res;
},

// Decoder bytes input: use only array access + .length, so both Buffer-s and Uint8Array-s work.
// Decoder string output: allocRawChars -> use Uint16Array -> rawCharsToResult().
allocRawChars(numChars) {
return new Uint16Array(new ArrayBuffer(numChars * Uint16Array.BYTES_PER_ELEMENT));
},
rawCharsToResult(rawChars, finalLen) {
return new TextDecoder("utf-16").decode(rawChars.subarray(0, finalLen));
},

// Optimizations
// maybe buf.swap16()?
};
Loading

0 comments on commit e567849

Please sign in to comment.