Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apache Beam, closes #88 #163

Merged
merged 1 commit into from
Sep 9, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ Pipeline frameworks & libraries
* [Anduril](http://www.anduril.org/anduril/site/) - Component-based workflow framework for scientific data analysis.
* [Antha](https://www.antha-lang.org/) - High-level language for biology.
* [AWE](https://github.com/MG-RAST/AWE/) - Workflow and resource management system with CWL support.
* [Balsam](https://github.com/argonne-lcf/balsam) - Python-based high throughput task and workflow engine.
* [Balsam](https://github.com/argonne-lcf/balsam) - Python-based high throughput task and workflow engine.
* [Bds](http://pcingola.github.io/BigDataScript/) - Scripting language for data pipelines.
* [Beam](https://beam.apache.org/) - Unified programming model for batch and streaming data-parallel processing pipelines.
* [BioMake](https://github.com/evoldoers/biomake) - GNU-Make-like utility for managing builds and complex workflows.
* [BioQueue](https://github.com/liyao001/BioQueue) - Explicit framework with web monitoring and resource estimation.
* [Bioshake](https://github.com/papenfusslab/bioshake) - Haskell DSL built on shake with strong typing and EDAM support.
Expand Down Expand Up @@ -60,7 +61,7 @@ Pipeline frameworks & libraries
* [Loom](https://github.com/StanfordBioinformatics/loom) - Tool for running bioinformatics workflows locally or in the cloud.
* [Longbow](http://www.hecbiosim.ac.uk/longbow) - Job proxying tool for biomolecular simulations.
* [Luigi](https://github.com/spotify/luigi) - Python module that helps you build complex pipelines of batch jobs.
* [Maestro](https://github.com/LLNL/maestrowf) - YAML based HPC workflow execution tool.
* [Maestro](https://github.com/LLNL/maestrowf) - YAML based HPC workflow execution tool.
* [Makeflow](http://ccl.cse.nd.edu/software/makeflow/) - Workflow engine for executing large complex workflows on clusters.
* [Mara](https://github.com/mara/data-integration) - A lightweight, opinionated ETL framework, halfway between plain scripts and Apache Airflow.
* [Mario](https://github.com/intentmedia/mario) - Scala library for defining data pipelines.
Expand Down Expand Up @@ -156,7 +157,7 @@ Workflow platforms
* [VisTrails](http://www.vistrails.org/) - Scientific workflow and provenance management system.
* [Wings](http://www.wings-workflows.org) - Semantic workflow system utilizing Pegasus as execution system.
* [Watchdog](https://github.com/klugem/watchdog) - Workflow management system for the automated and distributed analysis of large-scale experimental data.
* [FlowHub](https://www.flowhub.com.cn) - FlowHub is a new workflow cloud platform.
* [FlowHub](https://www.flowhub.com.cn) - FlowHub is a new workflow cloud platform.

Workflow languages
-------------------
Expand All @@ -175,7 +176,7 @@ Workflow standardization initiatives
* [Workflow Patterns Library](http://www.workflowpatterns.com/patterns)
* [ResearchObject.org](http://www.researchobject.org)

ETL & Data orchestration
ETL & Data orchestration
------------------------
* [DVC](https://dvc.org) - Data version control system for ML project with lightweight pipeline support.
* [lakeFS](https://github.com/treeverse/lakeFS) - Repeatable, atomic and versioned data lake on top of object storage.
Expand Down