Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic tfetch Connection Upgrade #33

Open
bwasti opened this issue Sep 14, 2022 · 0 comments
Open

Automatic tfetch Connection Upgrade #33

bwasti opened this issue Sep 14, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request long term The change described in this issue will likely take much longer than typical issues.

Comments

@bwasti
Copy link
Contributor

bwasti commented Sep 14, 2022

This is a proposal for a new feature that will vastly improve performance. Things like NVLink or InfiniBand can be used under the hood.

API

First, reviewing the tfetch interface (network concerns only, there are utilities for gradients not discussed because they won't need to change):

// Get a tensor, much like a GET request in HTTP
const tensor = await sm.io.tfetch(url)
// Send a tensor, without response, much like a PUT request
await sm.io.tfetch(url, tensor)
// Send and then wait on a return tensor
const new_tensor = await sm.io.tfetch(url, tensor)

A hidden feature is that the headers of these requests include a unique ID (that can be forwarded), to facilitate
user distinction. (This is how sm.io.serve can maintain user objects.)

Current Impl

tfetch currently implements point to point transmission with fetch, a slow HTTP protocol. Ideally, this should be upgrade to something if possible. (In the case of federated learning across the network, nothing needs to change.)

Better Impl

For performance reasons, the improved underlying communication protocol will need to be implemented natively. Flashlight is the right place for this. It can expose a multitude of APIs (based on hardware/OS constraints) that can be wrapped in JS easily.

There are a couple of ways to implement async functions in JavaScript:

Polling (bad, but sometimes necessary. this prevents automatic pipelining)

async function tfetch(url, optional_tensor) {
  const conn = cached_connection[url]
  conn.request(optional_tensor)
  while (conn.poll()) {} // could implement a backoff mechanism with JS event loop
  if (!conn.result) {
    throw conn.err
  }
  return conn.result
}

Promises (good! non-blocking and can be done concurrently)

async function tfetch(url, optional_tensor) {
  const conn = cached_connection[url]
  return new Promise((resolve, reject) => {
    conn.setCallback((tensor) => {
      resolve(tensor)
    })
    conn.setErrorCallback((err) => {
      reject(err)
    })
    conn.request(optional_tensor)
  })
}
@bwasti bwasti added enhancement New feature or request long term The change described in this issue will likely take much longer than typical issues. labels Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request long term The change described in this issue will likely take much longer than typical issues.
Projects
None yet
Development

No branches or pull requests

2 participants