Introduce the concept of backends

* Add two backends: node & web * Convert core lib files to use the backends (and not use Buffer) * Convert utf16 codec as an example * Add testing for both node side and webpack * Bump Node.js minimal supported version to 4.5.0 and modernize some existing code. This will allow us to get rid of safer-buffer, our only dependency.
ashtuchkin · Jul 14, 2020 · e567849 · e567849
1 parent 9627ecf
commit e567849
Show file tree

Hide file tree

Showing 24 changed files with 525 additions and 307 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -1,9 +1,7 @@
 language: node_js
 node_js:
-  - "0.10"
-  - "0.11"
-  - "0.12"
-  - "iojs"
+  - "4.5.0"    # Oldest supported version
+  - "5.10.0"   # Oldest supported version from version 5.x
   - "4"
   - "6"
   - "8"
@@ -16,4 +14,7 @@ jobs:
     - name: webpack
       node_js: "12"
       install: cd test/webpack; npm install
-      script: npm test
+      script: npm test
+    - name: node-web-backend
+      node_js: "12"
+      script: npm run-script test-node-web
diff --git a/backends/README.md b/backends/README.md
@@ -0,0 +1,73 @@
+# iconv-lite backends
+
+To accomodate different environments (most notably Node.js and Browser) in an efficient manner, iconv-lite has a concept of 'backends'.
+Backends convert internal data representations to what customers expect.
+
+Here's the overview of the data types used in iconv-lite codecs:
+
+ &nbsp;                        | encoder           | decoder 
+------------------------------ | ----------------- | -------------
+input type                     | js string         | Uint8Array, Buffer or any array with bytes
+input internal representation  | js string         | same as input
+input data access              | str.charCodeAt(i) | bytes[i]
+output type                    | Backend Bytes     | js string
+output internal representation | Uint8Array        | Uint16Array
+output data writing            | bytes[i]          | rawChars[i]
+
+The reasoning behind this choice is the following: 
+ * For inputs, we try to use passed-in objects directly and not convert them,
+   to avoid perf hit. For decoder inputs that means that all codecs need to
+   be able to work with both Uint8Array-s and Buffer-s at the same time.
+ * For outputs, we standardize internal representation (what codecs works with)
+   to Uint8Array and Uint16Array because that seems to be the lowest common denominator between the 
+   backends (Buffer can be interchanged with Uint8Array) that is not sacrificing performance.
+
+## Backend interface
+```typescript
+
+BackendBytes = .. // Depends on the backend
+
+interface IconvLiteBackend {
+    // Encoder output: allocBytes() -> use Uint8Array -> bytesToResult().
+    allocBytes(numBytes: int, fill: int): Uint8Array;
+    bytesToResult(bytes: Uint8Array, finalLen: int): BackendBytes;
+    concatByteResults(bufs: BackendBytes[]): BackendBytes;
+
+    // Decoder output: allocRawChars -> use Uint16Array -> rawCharsToResult().
+    allocRawChars(numChars: int): Uint16Array;
+    rawCharsToResult(rawChars: Uint16Array, finalLen: int): string;
+
+    // TODO: We'll likely add some more methods here for additional performance
+};
+```
+
+## Codec pseudocode
+```js
+class Encoder {
+    write(str) {
+        const bytes = this.backend.allocBytes(str.length * max_bytes_per_char);
+        let bytesPos = 0;
+        for (let i = 0; i < str.length; i++) {
+            const char = str.charCodeAt(i);  // todo: handle surrogates.
+            // convert char to bytes
+            bytes[bytesPos++] = byte1;
+            ...
+        }
+        return this.backend.bytesToResult(bytes, bytesPos);
+    }
+}
+
+class Decoder {
+    write(buf) {  // NOTE: buf here can be Uint8Array, Buffer or regular array.
+        const chars = this.backend.allocRawChars(buf.length * max_chars_per_byte);
+        let charsPos = 0;
+        for (let i = 0; i < buf.length; i++) {
+            let byte1 = buf[i]; 
+            // convert byte(s) to char
+            chars[charsPos++] = char; // todo: handle surrogates.
+            ...
+        }
+        return this.backend.rawCharsToResult(chars, charsPos);
+    }
+}
+```
diff --git a/backends/node.js b/backends/node.js
@@ -0,0 +1,45 @@
+"use strict";
+// NOTE: This backend uses Buffer APIs that are only available in Node v4.5+ and v5.10+.
+
+module.exports = {
+    // Encoder string input: use str directly, .length, .charCodeAt(i).
+    // Encoder bytes output: allocBytes() -> use Uint8Array -> bytesToResult().
+    allocBytes(numBytes, fill) {
+        // NOTE: We could do a 'new ArrayBuffer' here, but Buffer.alloc gives us pooling, which makes small chunks faster.
+        const buf = Buffer.alloc(numBytes, fill);
+        return new Uint8Array(buf.buffer, buf.byteOffset, numBytes); 
+    },
+    bytesToResult(bytes, finalLen) {
+        // In Node 5.10.0-6.3.0, Buffer.from() raises error if fed with zero-length buffer, so we check for it explicitly.
+        if (finalLen === 0) {
+            return Buffer.alloc(0);
+        }
+
+        // In Node 4.5.0-5.10.0, Buffer.from() does not support (arrayBuffer, byteOffset, length) signature, only (arrayBuffer),
+        // so we emulate it with .slice().
+        return Buffer.from(bytes.buffer).slice(bytes.byteOffset, bytes.byteOffset+finalLen);
+    },
+    concatByteResults(bufs) {
+        return Buffer.concat(bufs);
+    },
+
+    // Decoder bytes input: use only array access + .length, so both Buffer-s and Uint8Array-s work.
+    // Decoder string output: allocRawChars -> use Uint16Array -> rawCharsToResult().
+    allocRawChars(numChars) {
+        // NOTE: We could do a 'new ArrayBuffer' here, but Buffer.alloc gives us pooling, which makes small chunks faster.
+        const buf = Buffer.alloc(numChars * Uint16Array.BYTES_PER_ELEMENT);
+        return new Uint16Array(buf.buffer, buf.byteOffset, numChars); 
+    },
+    rawCharsToResult(rawChars, finalLen) {
+        // See comments in bytesToResult about old Node versions support.
+        if (finalLen === 0) {
+            return '';
+        }
+        return Buffer.from(rawChars.buffer)
+                     .slice(rawChars.byteOffset, rawChars.byteOffset + finalLen * Uint16Array.BYTES_PER_ELEMENT)
+                     .toString('ucs2');
+    },
+
+    // Optimizations
+    // maybe buf.swap16()?
+};
diff --git a/backends/web.js b/backends/web.js
@@ -0,0 +1,46 @@
+"use strict";
+// NOTE: This backend uses TextDecoder interface.
+
+module.exports = {
+    // Encoder string input: use str directly, .length, .charCodeAt(i).
+    // Encoder bytes output: allocBytes() -> use Uint8Array -> bytesToResult().
+    allocBytes(numBytes, fill) {
+        const arr = new Uint8Array(new ArrayBuffer(numBytes));
+        if (fill != null) {
+            arr.fill(fill);
+        }
+        return arr;
+    },
+    bytesToResult(bytes, finalLen) {
+        return bytes.subarray(0, finalLen);
+    },
+    concatByteResults(bufs) {
+        bufs = bufs.filter((b) => b.length > 0);
+        if (bufs.length === 0) {
+            return new Uint8Array();
+        } else if (bufs.length === 1) {
+            return bufs[0];
+        }
+
+        const totalLen = bufs.reduce((a, b) => a + b.length, 0);
+        const res = new Uint8Array(new ArrayBuffer(totalLen));
+        let curPos = 0;
+        for (var i = 0; i < bufs.length; i++) {
+            res.set(bufs[i], curPos);
+            curPos += bufs[i].length;
+        }
+        return res;
+    },
+
+    // Decoder bytes input: use only array access + .length, so both Buffer-s and Uint8Array-s work.
+    // Decoder string output: allocRawChars -> use Uint16Array -> rawCharsToResult().
+    allocRawChars(numChars) {
+        return new Uint16Array(new ArrayBuffer(numChars * Uint16Array.BYTES_PER_ELEMENT));
+    },
+    rawCharsToResult(rawChars, finalLen) {
+        return new TextDecoder("utf-16").decode(rawChars.subarray(0, finalLen));
+    },
+
+    // Optimizations
+    // maybe buf.swap16()?
+};