v0.8.0 release (#335)

* Delta Lake docs (#313) * Create delta.mdx * Update delta.mdx * Update index.mdx * Update delta.mdx * Update delta.mdx * Fixes --------- Co-authored-by: Prashanth Rao <[email protected]> Co-authored-by: prrao87 <[email protected]> * Add Iceberg Extension Documentation (#314) * add ice_berg docu * Update src/content/docs/extensions/iceberg.mdx Co-authored-by: Guodong Jin <[email protected]> * Update src/content/docs/extensions/iceberg.mdx Co-authored-by: Guodong Jin <[email protected]> * restructure * restructure * restructure * update table * update table * Apply suggestions from code review * update table * Fixes --------- Co-authored-by: Guodong Jin <[email protected]> Co-authored-by: Prashanth Rao <[email protected]> Co-authored-by: prrao87 <[email protected]> * Fix file extension * Minor fixes * Create wasm.mdx Update wasm.mdx Update docs (#331) Fix demo script Starting merge for 0.8.0 * remove progress_bar_time from docs (#337) * Fix ignore errors in DataFrame section (#338) * Add doc for `show_indexes`, `show_official_extensions` (#339) * Add doc for `show_indexes`, `show_official_extensions` and `show_loaded_extensions` * Apply suggestions from code review * Update src/content/docs/cypher/query-clauses/call.md --------- Co-authored-by: Prashanth Rao <[email protected]> * FTS index (#332) * Create full-text-search.md * Update full-text-search.md * Update FTS docs --------- Co-authored-by: prrao87 <[email protected]> * Document the behaviour of import/export database with indexes (#340) * Add doc for file-format option (#342) (#343) * Add doc for file-format * Update index.mdx * Apply suggestions from code review --------- Co-authored-by: ziyi chen <[email protected]> * Fix typos and improve formatting * Add doc for yield clause (#347) * Add doc for yield clause * Apply suggestions from code review --------- Co-authored-by: Prashanth Rao <[email protected]> * skip/limit doc (#341) * skip/limit doc * Update limit.md * Update limit.md * Update skip.md * Add documentation on special behaviour for query result getNext() (#351) * Add docs on query result getNext() behaviour * Add manual frees in C API example * Apply suggestions from code review --------- Co-authored-by: Prashanth Rao <[email protected]> * Update rdbms.mdx (#352) * Add doc for duckdb/sqlite/postgres's type conversion (#348) * Add doc for duckdb's type conversion * Update rdbms.mdx * Update rdbms.mdx * Update rdbms.mdx * Update rdbms.mdx * Update src/content/docs/extensions/attach/rdbms.mdx Co-authored-by: Guodong Jin <[email protected]> * Update src/content/docs/extensions/attach/rdbms.mdx Co-authored-by: Guodong Jin <[email protected]> * Update src/content/docs/extensions/attach/rdbms.mdx Co-authored-by: Guodong Jin <[email protected]> * Update src/content/docs/extensions/attach/rdbms.mdx Co-authored-by: Guodong Jin <[email protected]> * Update rdbms.mdx --------- Co-authored-by: Guodong Jin <[email protected]> * Rename output of fts (#354) * Add doc for rel_table_group (#349) * rel-table-group * Polish rel group * Update src/content/docs/cypher/data-definition/create-table.md Co-authored-by: Guodong Jin <[email protected]> * Update src/content/docs/cypher/data-definition/create-table.md Co-authored-by: Guodong Jin <[email protected]> * Update src/content/docs/cypher/data-definition/create-table.md --------- Co-authored-by: xiyang <[email protected]> Co-authored-by: Prashanth Rao <[email protected]> Co-authored-by: Guodong Jin <[email protected]> * Fix formatting * Update src/content/docs/migrate/index.md Co-authored-by: Guodong Jin <[email protected]> * Fix export-db with index doc * Add reference to the bm25 match algo (#357) * edits * Update wasm.mdx * Update wasm.mdx * Update full-text-search.md (#358) Co-authored-by: Prashanth Rao <[email protected]> * Fts minor fix (#362) * fts minor fix * Update full-text-search.md * Update index.mdx * Update installation.mdx * Update installation.mdx * Update installation.mdx * Improve load/scan docs * Update config --------- Co-authored-by: ziyi chen <[email protected]> Co-authored-by: Sterling Shi <[email protected]> Co-authored-by: Guodong Jin <[email protected]> Co-authored-by: 囧囧 <[email protected]> Co-authored-by: Howe Wang <[email protected]> Co-authored-by: Royi Luo <[email protected]> Co-authored-by: xiyang <[email protected]> Co-authored-by: semihsalihoglu-uw <[email protected]>
kuzudb · Feb 5, 2025 · bc5e5c0 · bc5e5c0
1 parent 830169d
commit bc5e5c0
Show file tree

Hide file tree

Showing 20 changed files with 1,190 additions and 181 deletions.
diff --git a/astro.config.mjs b/astro.config.mjs
@@ -65,6 +65,7 @@ export default defineConfig({
                         { label: 'Create your first graph', link: '/get-started' },
                         { label: 'Query & visualize your graph', link: '/get-started/cypher-intro' },
                         { label: 'Run prepared Cypher statements', link: '/get-started/prepared-statements' },
+                        { label: 'Scan data from various sources', link: '/get-started/scan'},
                         { label: 'Run graph algorithms', link: '/get-started/graph-algorithms' },
                     ]
                 },
@@ -148,6 +149,7 @@ export default defineConfig({
                         { label: 'Go', link: '/client-apis/go' },
                         { label: 'C++', link: '/client-apis/cpp' },
                         { label: 'C', link: '/client-apis/c' },
+                        { label: 'WebAssembly', link: '/client-apis/wasm' },
                         { label: '.NET', link: '/client-apis/net', badge: { text: 'Community', variant: 'caution'}},
                         { label: 'Elixir', link: '/client-apis/elixir', badge: { text: 'Community', variant: 'caution'}}
                     ],
@@ -197,8 +199,9 @@ export default defineConfig({
                             ]
                         },
                         { label: 'JSON', link: '/extensions/json' },
-                        { label: 'Iceberg', link: '/extensions/iceberg', badge: { text: 'New' }},
-                        { label: 'Delta Lake', link: '/extensions/delta', badge: { text: 'New' }},
+                        { label: 'Iceberg', link: '/extensions/iceberg' },
+                        { label: 'Delta Lake', link: '/extensions/delta' },
+                        { label: 'Full-text search', link: '/extensions/full-text-search', badge: { text: 'New' }},
                     ],
                     autogenerate: { directory: 'reference' },
                 },

diff --git a/src/content/docs/client-apis/c.mdx b/src/content/docs/client-apis/c.mdx
@@ -73,4 +73,62 @@ And then link against `<install-dest>/libkuzu.so` (or `libkuzu.dylib`/`libkuzu.l
 
 
 The static library is more complicated (as noted above, it's recommended that you use CMake to handle the details) and is not installed by default, but all static libraries will be available in the build directory.
-You need to define `KUZU_STATIC_DEFINE`, and link against the static kuzu library in `build/src`, as well as `antlr4_cypher`, `antlr4_runtime`, `brotlidec`, `brotlicommon`, `utf8proc`, `re2`, `serd`, `fastpfor`, `miniparquet`, `zstd`, `miniz`, `mbedtls`, `lz4` (all of which can be found in the third_party subdirectory of the CMake build directory. E.g. `build/third_party/zstd/libzstd.a`) and whichever standard library you're using.
+You need to define `KUZU_STATIC_DEFINE`, and link against the static Kùzu library in `build/src`, as well as `antlr4_cypher`, `antlr4_runtime`, `brotlidec`, `brotlicommon`, `utf8proc`, `re2`, `serd`, `fastpfor`, `miniparquet`, `zstd`, `miniz`, `mbedtls`, `lz4` (all of which can be found in the third_party subdirectory of the CMake build directory. E.g. `build/third_party/zstd/libzstd.a`) and whichever standard library you're using.
+
+## Handling Kùzu output using `kuzu_query_result_get_next()`
+
+For the examples in this section we will be using the following schema:
+```cypher
+CREATE NODE TABLE person(id INT64 PRIMARY KEY);
+```
+
+The `kuzu_query_result_get_next()` function returns a reference to the resulting flat tuple. Additionally, to reduce resource allocation all calls to `kuzu_query_result_get_next()` reuse the same
+flat tuple object. This means that for a query result, each call to `kuzu_query_result_get_next()` actually overwrites the flat tuple previously returned by the previous call.
+
+Thus, we recommend processing each tuple immediately before making the next call to `getNext`:
+
+```c
+kuzu_query_result result;
+kuzu_connection_query(conn, "MATCH (p:person) RETURN p.*", result);
+while (kuzu_query_result_has_next(result)) {
+  kuzu_flat_tuple tuple;
+  kuzu_query_result_get_next(result, tuple);
+  do_something(tuple);
+}
+```
+
+If you wish to process the tuples later, you must explicitly make a copy of each tuple:
+```cpp
+static kuzu_value* copy_flat_tuple(kuzu_flat_tuple* tuple, uint32_t tupleLen) {
+  kuzu_value* ret = malloc(sizeof(kuzu_value) * tupleLen);
+  for (uint32_t i = 0; i < tupleLen; i++) {
+      kuzu_flat_tuple_get_value(tuple, i, &ret[i]);
+  }
+  return ret;
+}
+
+void mainFunction() {
+  kuzu_query_result result;
+  kuzu_connection_query(conn, "MATCH (p:person) RETURN p.*", &result);
+
+  uint64_t num_tuples = kuzu_query_result_get_num_tuples(&result);
+  kuzu_value** tuples = (kuzu_value**)malloc(sizeof(kuzu_value*) * num_tuples);
+  for (uint64_t i = 0; i < num_tuples; ++i) {
+      kuzu_flat_tuple tuple;
+      kuzu_query_result_get_next(&result, &tuple);
+      tuples[i] = copy_flat_tuple(&tuple, kuzu_query_result_get_num_columns(&result));
+      kuzu_flat_tuple_destroy(&tuple);
+  }
+
+  for (uint64_t i = 0; i < num_tuples; ++i) {
+    for (uint64_t j = 0; j < kuzu_query_result_get_num_columns(&result); ++j) {
+      doSomething(tuples[i][j]);
+      kuzu_value_destroy(&tuples[i][j]);
+    }
+    free(tuples[i]);
+  }
+
+  free((void*)tuples);
+  kuzu_query_result_destroy(&result);
+}
+```
diff --git a/src/content/docs/client-apis/cpp.mdx b/src/content/docs/client-apis/cpp.mdx
@@ -11,10 +11,71 @@ See the following link for the full documentation of the C++ API.
   href="https://kuzudb.com/api-docs/cpp/annotated.html"
 />
 
+## Handling Kùzu output using `getNext()`
+
+For the examples in this section we will be using the following schema:
+```cypher
+CREATE NODE TABLE person(id INT64 PRIMARY KEY);
+```
+
+The `getNext()` function in a `QueryResult` returns a reference to the resulting `FlatTuple`. Additionally, to reduce resource allocation all calls to `getNext()` reuse the same
+FlatTuple object. This means that for a `QueryResult`, each call to `getNext()` actually overwrites the `FlatTuple` previously returned by the previous call to `getNext()`.
+
+Thus, we don't recommend using `QueryResult` like this:
+
+```cpp
+std::unique_ptr<kuzu::main::QueryResult> result = conn.query("MATCH (p:person) RETURN p.*");
+std::vector<std::shared_ptr<kuzu::processor::FlatTuple>> tuples;
+while (result->hasNext()) {
+  // Each call to getNext() actually returns a pointer to the same tuple object
+  tuples.emplace_back(result->getNext());
+}
+
+// This is wrong!
+// The vector stores a bunch of pointers to the same underlying tuple object
+for (const auto& resultTuple: tuples) {
+  doSomething(resultTuple);
+}
+```
+
+Instead, we recommend processing each tuple immediately before making the next call to `getNext`:
+```cpp
+std::unique_ptr<kuzu::main::QueryResult> result = conn.query("MATCH (p:person) RETURN p.*");
+std::vector<std::shared_ptr<kuzu::processor::FlatTuple>> tuples;
+while (result->hasNext()) {
+  auto tuple = result->getNext();
+  doSomething(tuple);
+}
+```
+
+If wish to process the tuples later, you must explicitly make a copy of each tuple:
+```cpp
+static decltype(auto) copyFlatTuple(kuzu::processor::FlatTuple* tuple) {
+  std::vector<std::unique_ptr<kuzu::common::Value>> ret;
+  for (uint32_t i = 0; i < tuple->len(); i++) {
+      ret.emplace_back(tuple->getValue(i)->copy());
+  }
+  return ret;
+}
+
+void mainFunction() {
+  std::unique_ptr<kuzu::main::QueryResult> result = conn->query("MATCH (p:person) RETURN p.*");
+  std::vector<std::vector<std::unique_ptr<kuzu::common::Value>>> tuples;
+  while (result->hasNext()) {
+      auto tuple = result->getNext();
+      tuples.emplace_back(copyFlatTuple(tuple.get()));
+  }
+  for (const auto& tuple : tuples) {
+      doSomething(tuple);
+  }
+}
+```
+
+## UDF API
+
 In addition to interfacing with the database, the C++ API offers users the ability to define custom
 functions via User Defined Functions (UDFs), described below.
 
-## UDF API
 Kùzu provides two interfaces that enable you to define your own custom scalar and vectorized functions.
 
 ### Scalar functions
@@ -211,7 +272,7 @@ conn->createVectorizedFunction<int64_t, int64_t>("addFour", &addFour);
 conn->query("MATCH (p:person) return addFour(p.age)");
 ```
 
-#### Option 2. Vectorized function with input and return type in Cypher 
+#### Option 2. Vectorized function with input and return type in Cypher
 
 Create a vectorized function with input and return type in Cypher.
 ```cpp
@@ -263,4 +324,4 @@ conn->query("MATCH (p:person) return addDate(p.birthdate, p.age)");
 
 ## Linking
 
-See the [C API Documentation](/client-apis/c#linking) for details as linking to the C++ API is more or less identical.
+See the [C API Documentation](/client-apis/c#linking) for details as linking to the C++ API is more or less identical.
diff --git a/src/content/docs/client-apis/java.mdx b/src/content/docs/client-apis/java.mdx
@@ -10,3 +10,63 @@ See the following link for the full documentation of the Java API.
   title="Java API documentation"
   href="https://kuzudb.com/api-docs/java"
 />
+
+## Handling Kùzu output using `getNext()`
+
+For the examples in this section we will be using the following schema:
+```cypher
+CREATE NODE TABLE person(id INT64 PRIMARY KEY);
+```
+
+The `getNext()` function in a `QueryResult` returns a reference to the resulting `FlatTuple`. Additionally, to reduce resource allocation all calls to `getNext()` reuse the same
+FlatTuple object. This means that for a `QueryResult`, each call to `getNext()` actually overwrites the `FlatTuple` previously returned by the previous call to `getNext()`.
+
+Thus, we don't recommend using `QueryResult` like this:
+
+```java
+QueryResult result = conn.query("MATCH (p:person) RETURN p.*");
+List<FlatTuple> tuples = new ArrayList<FlatTuple>();
+while (result.hasNext()) {
+  // Each call to getNext() actually returns a reference to the same tuple object
+  tuples.add(result.getNext());
+}
+
+// This is wrong!
+// The list stores a bunch of references to the same underlying tuple object
+for (FlatTuple resultTuple: tuples) {
+  doSomething(resultTuple);
+}
+```
+
+Instead, we recommend processing each tuple immediately before making the next call to `getNext`:
+```java
+QueryResult result = conn.query("MATCH (p:person) RETURN p.*");
+while (result.hasNext()) {
+  FlatTuple tuple = result.getNext();
+  doSomething(tuple);
+}
+```
+
+If wish to process the tuples later, you must explicitly make a copy of each tuple:
+```java
+List<Value> copyFlatTuple(FlatTuple tuple, long tupleLen) throws ObjectRefDestroyedException {
+  List<Value> ret = new ArrayList<Value>();
+  for (int i = 0; i < tupleLen; i++) {
+      ret.add(tuple.getValue(i).clone());
+  }
+  return ret;
+}
+
+void mainFunction() throws ObjectRefDestroyedException {
+  QueryResult result = conn.query("MATCH (p:person) RETURN p.*");
+  List<List<Value>> tuples = new ArrayList<List<Value>>();
+  while (result.hasNext()) {
+    FlatTuple tuple = result.getNext();
+    tuples.add(copyFlatTuple(tuple, result.getNumColumns()));
+  }
+
+  for (List<Value> tuple: tuples) {
+    doSomething(tuple);
+  }
+}
+```
diff --git a/src/content/docs/client-apis/wasm.mdx b/src/content/docs/client-apis/wasm.mdx
@@ -0,0 +1,84 @@
+---
+title: WebAssembly (Wasm)
+---
+
+[WebAssembly](https://webassembly.org/), a.k.a. _Wasm_, is a standard defining any suitable low-level
+programming language as compilation target, enabling deployment of software within web browsers on a variety
+of devices. This page describes Kùzu's Wasm API, enabling Kùzu databases to run inside Wasm-capable
+browsers.
+
+## Benefits of WASM
+
+Several benefits of Kùzu-Wasm are the following:
+
+- Fast, in-browser graph analysis without ever sending data to a server.
+- Strong data privacy guarantees, as the data never leaves the browser.
+- Real-time interactive in-browser graph analytics and visualization.
+
+## Installation
+
+```bash
+npm i kuzu-wasm
+```
+
+## Example usage
+
+We provide a simple example to demonstrate how to use Kùzu-Wasm. In this example, we will create a simple graph and run a few simple queries.
+
+We provide three versions of this example: 
+- `browser_in_memory`: This example demonstrates how to use Kùzu-Wasm in a web browser with an in-memory filesystem.
+- `browser_persistent`: This example demonstrates how to use Kùzu-Wasm in a web browser with a persistent IDBFS filesystem.
+- `nodejs`: This example demonstrates how to use Kùzu-Wasm in Node.js.
+
+The example can be found in [the examples directory](https://github.com/kuzudb/kuzu/tree/master/tools/wasm/examples).
+
+## Understanding the package
+
+In this package, three different variants of WebAssembly modules are provided:
+- **Default**: This is the default build of the WebAssembly module. It does not support multi-threading and uses Emscripten's default filesystem. This build has the smallest size and works in both Node.js and browser environments. It has the best compatibility and does not require cross-origin isolation. However, the performance may be limited due to the lack of multithreading support. This build is located at the root level of the package.
+- **Multi-threaded**: This build supports multi-threading and uses Emscripten's default filesystem. This build has a larger size compared to the default build and only requires [cross-origin isolation](https://web.dev/articles/cross-origin-isolation-guide) in the browser environment. This build is located in the `multithreaded` directory.
+- **Node.js**: This build is optimized for Node.js and uses Node.js's filesystem instead of Emscripten's default filesystem (`NODEFS` flag is enabled). This build also supports multi-threading. It is distributed as a CommonJS module rather than an ES module to maximize compatibility. This build is located in the `nodejs` directory. Note that this build only works in Node.js and does not work in the browser environment.
+
+In each variant, there are two different versions of the WebAssembly module:
+- **Async**: This version of the module is the default version and each function call returns a Promise. This version dispatches all the function calls to the WebAssembly module to a Web Worker or Node.js worker thread to prevent blocking the main thread. However, this version may have a slight overhead due to the serialization and deserialization of the data required by the worker threads. This version is located at the root level of each variant (e.g., `kuzu-wasm`, `kuzu-wasm/multithreaded`, `kuzu-wasm/nodejs`).
+- **Sync**: This version of the module is synchronous and does not require any callbacks (other than the module initialization). This version is good for scripting / CLI / prototyping purposes but is not recommended to be used in GUI applications or web servers because it may block the main thread and cause unexpected freezes. This alternative version is located in the `sync` directory of each variant (e.g., `kuzu-wasm/sync`, `kuzu-wasm/multithreaded/sync`, `kuzu-wasm/nodejs/sync`).
+
+Note that you cannot mix and match the variants and versions. For example, a `Database` object created with the default variant cannot be passed to a function in the multithreaded variant. Similarly, a `Database` object created with the async version cannot be passed to a function in the sync version.
+
+### Loading the Worker script (for async versions)
+In each variant, the main module is bundled as one script file. However, the worker script is located in a separate file. The worker script is required to run the WebAssembly module in a Web Worker or Node.js worker thread. If you are using a build tool like Webpack, the worker script needs to be copied to the output directory. For example, in Webpack, you can use the `copy-webpack-plugin` to copy the worker script to the output directory. 
+
+By default, the worker script is resolved under the same directory / URL prefix as the main module. If you want to change the location of the worker script, you can use pass the optional worker path parameter to the `setWorkerPath` function. For example:
+```javascript
+import kuzu from "kuzu-wasm";
+kuzu.setWorkerPath('path/to/worker.js');
+```
+
+Note that this function must be called before any other function calls to the WebAssembly module. After the initialization is started, the worker script path cannot be changed and not finding the worker script will cause an error.
+
+For the Node.js variant, the worker script can be resolved automatically and you do not need to set the worker path.
+
+## API documentation
+The API documentation can be found here:
+
+**Synchronous** version: [API documentation](https://kuzudb.com/api-docs/wasm/sync/)
+
+**Asynchronous** version: [API documentation](https://kuzudb.com/api-docs/wasm/async/)
+
+## Local development
+
+This section is relevant if you are interested in contributing to Kùzu's Wasm API.
+
+First, build the WebAssembly module:
+
+```bash
+npm run build
+```
+
+This will build the WebAssembly module in the `release` directory and create a tarball ready for publishing under the current directory.
+
+You can run the tests as follows:
+
+```bash
+npm test
+```
diff --git a/src/content/docs/cypher/configuration.md b/src/content/docs/cypher/configuration.md
@@ -17,7 +17,6 @@ configuration **cannot** be used with other query clauses, such as `RETURN`.
 | `HOME_DIRECTORY`| system home directory                                                                                                           | user home directory    |
 | `FILE_SEARCH_PATH`| file search path                                                                                                                | N/A                    |
 | `PROGRESS_BAR` | enable progress bar in CLI                                                                                                      | false                  |
-| `PROGRESS_BAR_TIME` | show progress bar after time in ms                                                                                              | 1000                   |
 | `CHECKPOINT_THRESHOLD` | the WAL size threshold in bytes at which to automatically trigger a checkpoint                                                  | 16777216 (16MB)        |
 | `WARNING_LIMIT` | maximum number of [warnings](/import#warnings-table-inspect-skipped-rows) that can be stored in a single connection.            | 8192        |
 | `SPILL_TO_DISK` | spill data disk if there is not enough memory when running `COPY FROM (cannot be set to TRUE under in-memory or read-only mode) | true |