Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add citation cff file and update readme/index #223

Merged
merged 1 commit into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Sochat
given-names: Vanessa
orcid: https://orcid.org/0000-0002-4387-3819
- family-names: Culquicondor
given-names: Aldo
- family-names: Ojea
given-names: Antonio
- family-names: Milroy
given-names: Daniel
orcid: https://orcid.org/0000-0001-6500-3227
title: "The Flux Operator"
version: 0.2.0
identifiers:
- type: doi
value: 10.12688/f1000research.147989.1
date-released: 2024-03-21
35 changes: 13 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,12 @@
![docs/development/the-operator.jpg](docs/development/the-operator.jpg)
[![DOI](https://zenodo.org/badge/528650707.svg)](https://zenodo.org/badge/latestdoi/528650707)


The Flux Operator is a Kubernetes Cluster [Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
that you can install to your cluster to create and control [Flux Framework](https://flux-framework.org/) "Mini Clusters"
that you can install to your cluster to create and control a [Flux Framework](https://flux-framework.org/) "MiniCluster"
to launch jobs to.

Read more, including user and developer guides, and project background in our 💛 [Documentation](https://flux-framework.org/flux-operator) 💛

🚧️ Under Construction! 🚧️

**Important!** We recently removed a one-off container that ran before the MiniCluster creation to generate a certificate.
We have found [through testing](https://github.com/kubernetes-sigs/jobset/issues/104) that this somehow served as a warmup
for networking, and this means if you use the latest operator here, you may see slow times in creating the initial
broker setup. More details are available in [this post](https://github.com/converged-computing/operator-experiments/tree/main/google/service-timing).
We have fixed the zeromq timeout bug, and will hopefully be able to reproduce the issue outside of the operator
soon to report upstream.

## Presentations

- [Kubecon 2023](https://t.co/vjRydPx1rb)
Expand All @@ -27,33 +17,34 @@ soon to report upstream.
## Organization

The basic idea is that we present the idea of a **MiniCluster** that is a custom resource definition (CRD)
that defines a job container (that must have Flux) that (when submit) will create a set of config maps,
secrets (e.g., tls), and the final Batch job that has the pod containers running with flux. Since
that defines a job container (that does not need to have Flux) that (when submit) will create a set of config maps,
secrets, and the final Indexed Job that has the pod containers running with Flux. Since
this is a batchv1.Job, it will have states that we can track.

And you can find the following here:

- [Flux Controllers](controllers/flux) are under `controllers/flux` for the `MiniCluster`
- [API Spec](api/v1alpha1/) are under `api/v1alpha1/` also for `MiniCluster`
- [API Spec](api/v1alpha1/) are under `api/v1alpha2/` also for `MiniCluster`
- [Packages](pkg) include supporting packages for job conditions (state), if we eventually want that.
- [Config](config) includes mostly automatically generated yaml configuration files needed by Kubernetes

And the following external resources might be useful:

- [Flux Cloud](https://github.com/converged-computing/flux-cloud): automation of experiments using the Flux Operator
- [Flux Framework](https://flux-framework.org)
- [Flux RESTful API](https://github.com/flux-framework/flux-restful-api): provides the interface for submitting jobs, if no command provided to the operator.
- [Python SDK](sdk/python): for deploying MiniClusters and port forwarding.
- [Flux HPC Examples](https://github.com/rse-ops/flux-hpc) containers and CRD for the operator to run Flux with HPC workloads (under development)
- [Flux Cloud](https://github.com/converged-computing/flux-cloud): automation of experiments using the Flux Operator

**Note** we welcome contributions to code or to suggest features or identify bugs!

## Citation

**Note** this project is actively under development, and you can expect change and improvements!
We apologize for bugs you run into, and hope you tell us soon so we can work on resolving them.
You can follow the CITATION.cff (right sidebar in GitHub) to cite, or [view the paper directly here](https://doi.org/10.12688/f1000research.147989.1)
A direct (copy paste) citation is the following:

## TODO
> Sochat V, Culquicondor A, Ojea A and Milroy D. The Flux Operator (version 1). F1000Research 2024, 13:203 (https://doi.org/10.12688/f1000research.147989.1)

- add [jar](wget https://repo1.maven.org/maven2/org/openapitools/openapi-generator-cli/6.6.0/openapi-generator-cli-6.6.0.jar -O openapi-generator-cli.jar) to makefile
- Ensure that curve.cert is a variable (path) in case we create a custom one.
- Convert ML (e.g., mnist and pytorch) to use base containers instead of Singularity
- When JobSet is available we need it to say job is successful when main application container is done.

## License

Expand Down
9 changes: 4 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,16 @@
Welcome to the Flux Operator Documentation!

The Flux Operator is a Kubernetes Cluster [Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
that you can install to your cluster to create and control [Flux Framework](https://flux-framework.org/) `MiniCluster`s
that you can install to your cluster to create and control a [Flux Framework](https://flux-framework.org/) `MiniCluster`
to launch jobs to. With the Flux Operator you can:

1. Create an ephemeral MiniCluster to run one job
2. Create a persistent MiniCluster to submit jobs to via user interface, command line, or language SDK.
3. View logs and job information via the same interfaces!

The Flux Operator is currently 🚧️ Under Construction! 🚧️
We are working on core functionality along with providing a library of
examples from HPC. This is a *converged computing* project that aims
to unite the worlds and technologies typical of cloud computing and
The Flux Operator is recently [published](https://f1000research.com/articles/13-203/v1)
and we are actively using it to run HPC workloads in cloud environments.
This is a *converged computing* project that aims to unite the worlds and technologies typical of cloud computing and
high performance computing.

To get started, check out the links below!
Expand Down
Loading