-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could this use the SQLite range header trick? #28
Comments
Pretty sure this would consist of implementing https://github.com/phiresky/sql.js-httpvfs in Python using |
I wonder if I could instead implement this by teaching Datasette to use sql.js-httpvfs as the database backend? I've been wanting to add a plugin hook for custom DB backends for a while. This would be a deeply fascinating use case for such a hook. |
I haven't really verified this but I assume that Pyodide is just Python compiled to WASM under the hood. Thus, we should be able to leverage the virtual file system interface of emscripten. We just have to mount a '/data/database.db' that will be visible to the Python executable running inside, and emscripten runtime will call our custom XMLHttpRequest code whenever something makes a read() system call on that file. I believe this is also how sql.js-httpvfs is implemented. https://emscripten.org/docs/porting/files/file_systems_overview.html If I have time I'll be happy to try this over the weekend. |
Here's what it looks like to me. Because Lite relies entirely on the environment provided by Pyodide (instead of translating Datasette to JS or WASM), that's where the patching needs to happen, either:
Providing a shim for Pyodide to replace an existing library is something they already do, e.g. for A native HTTP VFS, like sqlite_web_vfs, is going to use a native HTTP library to make its network calls, and that won't work in the Emscripten environment (this particular one uses A new filesystem really just means adding range request support to an existing Emscripten remote filesystem. There's the original BrowserFS, now unmaintained, and the original issue discussing it. There's its maintained fork, browser-fs, but it doesn't look like it's brought forward a remote HTTP FS module yet, despite the open issue about range requests. Pulling Adding range request support to BrowserFS is probably going to be the most transparent option, since neither Pyodide nor Datasette need to be touched; it also is a pure JavaScript solution. Writing a shim for Pyodide to replace native Making a native HTTP VFS work under Emscripten is probably pretty straightforward for C programmers who understand Emscripten and JS. The Datasette integration is independent, since it's a native VFS module, so it can be written and tested using regular Datasette. Lite will just magically get the remote range ability once the C/Emscripten work is done. |
Hmm, just wanted to drop this POC I did a long time ago that had some interesting conclusions/observations for preparation/hosting datasette-lite on Cloudflare R2 of a 30GB database: |
@nelsonjchen the DB "manifest" you feel like could have been useful sounds a bit like the "helper DBI files" that sqlite_web_vfs can generate and use. |
That DBI thing sounds very very cool and probably could have made my experiment a 2 file upload to CF R2. |
Hi @vitorio, I'm the current BrowserFS maintainer. See message below
I'd like to point to the ZenFS fetch package (npm: @zenfs/fetch). While it does not currently support range requests, I can look into implementing that ahead of some other features. I think the best way to implement it would be to add a configuration option to the import { configure, fs } from '@zenfs/core';
import { Fetch } from '@zenfs/fetch';
await configure({
'/mnt/remote': {
backend: Fetch,
// A potential (optional) option that specifies whether to use range requests
useRanges: true,
}
});
const veryLargeFile = await fs.promises.open('/mnt/remote/large-file');
// 20kb buffer to store the contents in
let partOfFile = new Uint8Array(20_000);
// read the first 20kb from the file, starting at 10kb, into partOfFile at offset 0
await veryLargeFile.read(partOfFile, 0, 20_000, 10_000); What would you like to see with range request support? |
An update with BrowserFS: I made the decision to create a fork, for a number of reasons. Please read the notice in the readme of jvilk/BrowserFS. All my future work will be done on the updated fork, ZenFS. |
https://twitter.com/BardakciBaba/status/1521927918603145216
Right now Datasette Lite downloads the whole SQLite database. But could it work with the range trick?
This would be difficult to pull off, because it would involve compiling a custom build of Python's
sqlite3
module. Could probably use https://github.com/coleifer/pysqlite3 as a starting point for that.I'm going to leave this open as a "research" and "help wanted" issue, in case anyone else is inspired to try and figure it out!
The text was updated successfully, but these errors were encountered: