Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write-up summary of Aim 1 conclusion #71

Open
CodyCBakerPhD opened this issue Jun 10, 2024 · 9 comments
Open

Write-up summary of Aim 1 conclusion #71

CodyCBakerPhD opened this issue Jun 10, 2024 · 9 comments
Assignees

Comments

@CodyCBakerPhD
Copy link
Collaborator

For NIH report

@CodyCBakerPhD CodyCBakerPhD self-assigned this Jun 10, 2024
@CodyCBakerPhD
Copy link
Collaborator Author

Less about code, more about capabilities that were added (and point to documentation)

@CodyCBakerPhD
Copy link
Collaborator Author

CodyCBakerPhD commented Sep 28, 2024

In the last year, NeuroConv has developed fully automated processes for building and deploying Docker images of the central package, as well as tangential data transfer utilities for use in cloud environments. These workflows are triggered through the free-to-use GitHub Actions on every official release as well as development branches refreshed daily. All Dockerfiles can be found in the public open-source repository under the /neuroconv/dockerfiles folder.

Additionally, a number of helper functions have been added to the neuroconv.tools.aws submodule such as an API function for automatically setting up an entire AWS EC2 Batch infrastructure, including all related details such as compute environments, job queue, and job definition. This tool is then leveraged to launch containers of the aforementioned images onto an on-demand EC2 instance in a two-step process: (i) Rclone is used to transfer data from a remote cloud storage source (such as Google Drive or Dropbox) onto the EC2 instance, where it is then (ii) converted to NWB format via a YAML specification file and uploaded directly to the DANDI archive. When all tasks are complete, all requested resources are spun down and cleaned up, minimizing costs to the user.

To ensure this pipeline continues to work far into the future, all steps from the Docker images to the helper functions are tested via pytest in continuous integration:

While individual batch job statuses can be tracked from the AWS dashboard, our entire workflow also sends status updates to a central DynamoDB table

image

with plans to further improve the resolution and provenance of the tracking in the future.

All usage instructions may be found in the official NeuroConv documentation, in particular:

@oruebel
Copy link
Contributor

oruebel commented Sep 28, 2024

Thanks @CodyCBakerPhD for the helpful summary. A couple of quick questions:

  1. The last two links on data transfer in AWS tests and data conversion in AWS test are missing the link, could you please add those
  2. Is there an example for how to use these images on AWS?
  3. I was looking at the NeuroConv docs and found a couple of pages on Docker, but wanted to confirm that these are the documentation pages I should point to in the report and to learn how to use this:
  4. Could you clarify how the release for the images works. Looking at the neuroconv repo, if I understand this correctly the Docker images are being stored as packages in the CatalystNeuro GitHub organization at https://github.com/orgs/catalystneuro/packages?repo_name=neuroconv and then registered with the GitHub Container Registry so that they can be installed via something like docker pull ghcr.io/catalystneuro/neuroconv:latest. Is that how this work or am I missing something important?

@CodyCBakerPhD
Copy link
Collaborator Author

Sorry, I should have indicated this was still WIP - I was going to ping you once it's ready

  1. should be done by Monday
  2. The helper functions are the best examples
  3. The docs for the helper functions should be done by Monday as well
  4. Yep, exactly. That is where you can find them (they are also tagged by version and TBH I recommend using that most of the time for easier reproducibility)

@oruebel
Copy link
Contributor

oruebel commented Sep 28, 2024

Sorry, I should have indicated this was still

Got it. Sorry for being eager with questions. This is very cool stuff

@CodyCBakerPhD
Copy link
Collaborator Author

@oruebel OK, the rest has been filled in

Though some PRs are still under review, you will want to update the links for things after those get merged. Those sections that are not yet merged are

  • "data transfer in AWS"
  • "data conversion in AWS tests"
  • AWS usage docs (built from PR, IDK how long that lasts)

@oruebel
Copy link
Contributor

oruebel commented Sep 29, 2024

OK, the rest has been filled in

Thanks for the helpful summary!

sends status updates to a central DynamoDB table

Is this table public, and if yes, could you add the URL? If it is internal, is this accessible to the CN team?

some PRs are still under review, you will want to update the links for things after those get merged.

Thanks for the head up. Will do.

@CodyCBakerPhD
Copy link
Collaborator Author

CodyCBakerPhD commented Sep 29, 2024

Is this table public, and if yes, could you add the URL?

Nope, since all access to/from is metered and charged

If it is internal, is this accessible to the CN team?

Yes, but there is nothing particularly special about this table aside the fact that it is the one used by the testing suite

The general idea is that the process can use DynamocDB to track status updates from any such table you want to specify. So if you used the tools yourself (including demo) you would get your own table for your own use, or you could make a public one for your team and everyone could then use it, etc.

Though also, nothing terribly special about DynamoDB in that respect (we could send status updates to any external target, like how we handle progress updates on NWB GUIDE), just that it is adjacent to all the other AWS entities and so feels a natural go-to for this kind of thing

@oruebel
Copy link
Contributor

oruebel commented Sep 29, 2024

So if you used the tools yourself (including demo) you would get your own table for your own use,

Thanks for the clarification. That makes sense. My impression was that this linkage to the table may be hard-coded, but having it configurable to the user makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants