Automatic `tfetch` Connection Upgrade #33

bwasti · 2022-09-14T14:39:28Z

This is a proposal for a new feature that will vastly improve performance. Things like NVLink or InfiniBand can be used under the hood.

API

First, reviewing the tfetch interface (network concerns only, there are utilities for gradients not discussed because they won't need to change):

// Get a tensor, much like a GET request in HTTP
const tensor = await sm.io.tfetch(url)
// Send a tensor, without response, much like a PUT request
await sm.io.tfetch(url, tensor)
// Send and then wait on a return tensor
const new_tensor = await sm.io.tfetch(url, tensor)

A hidden feature is that the headers of these requests include a unique ID (that can be forwarded), to facilitate
user distinction. (This is how sm.io.serve can maintain user objects.)

Current Impl

tfetch currently implements point to point transmission with fetch, a slow HTTP protocol. Ideally, this should be upgrade to something if possible. (In the case of federated learning across the network, nothing needs to change.)

Better Impl

For performance reasons, the improved underlying communication protocol will need to be implemented natively. Flashlight is the right place for this. It can expose a multitude of APIs (based on hardware/OS constraints) that can be wrapped in JS easily.

There are a couple of ways to implement async functions in JavaScript:

Polling (bad, but sometimes necessary. this prevents automatic pipelining)

async function tfetch(url, optional_tensor) {
  const conn = cached_connection[url]
  conn.request(optional_tensor)
  while (conn.poll()) {} // could implement a backoff mechanism with JS event loop
  if (!conn.result) {
    throw conn.err
  }
  return conn.result
}

Promises (good! non-blocking and can be done concurrently)

async function tfetch(url, optional_tensor) {
  const conn = cached_connection[url]
  return new Promise((resolve, reject) => {
    conn.setCallback((tensor) => {
      resolve(tensor)
    })
    conn.setErrorCallback((err) => {
      reject(err)
    })
    conn.request(optional_tensor)
  })
}

The text was updated successfully, but these errors were encountered:

bwasti added enhancement New feature or request long term The change described in this issue will likely take much longer than typical issues. labels Sep 14, 2022

bwasti assigned jacobkahn Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic `tfetch` Connection Upgrade #33

Automatic `tfetch` Connection Upgrade #33

bwasti commented Sep 14, 2022 •

edited

Loading

Automatic tfetch Connection Upgrade #33

Automatic tfetch Connection Upgrade #33

Comments

bwasti commented Sep 14, 2022 • edited Loading

API

Current Impl

Better Impl

Automatic `tfetch` Connection Upgrade #33

Automatic `tfetch` Connection Upgrade #33

bwasti commented Sep 14, 2022 •

edited

Loading