-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple bash action #179
base: master
Are you sure you want to change the base?
Simple bash action #179
Conversation
@MillironX @mashehu curious to see what you both think of this. Bit of a radical approach but I wonder if a massive simplification like this will make maintenance a lot easier. Especially thinking about the bus factor here as it's such a central piece of our infrastructure now. |
This is going to be a long post here. I don't really want to try and argue either way, just to lay out some of my justifications of thinking that went into the creation of this action and shaped it the way it is (esp. since I'm realizing just how poorly I've documented the thing). I've got finals next week, so as long as Why TypescriptOne of the main reasons was that I was using partial version matching a lot at the time (I don't hardly anymore, tho). It still boggles my mind that Nextflow doesn't support YY.MM version aliases, but that can be a discussion for another day. Another reason was how GitHub Actions (GHA) is architected. From my understanding, GHA doesn't guarantee that it will bring a full FHS file system with it between action steps. When using Caching should be much easier in Typescript. Once again, this is because the Typescript APIs allow for explicit declaration of what is cached, and what identifier is assigned to it. Addressing changes
Why not GitHub ActionsI made a modest proposal that we should get away from using GHA for doing CI, instead hooking into something like Earthly or Dagger (thanks @edmundmiller) for the unit testing or really anything that requires Nextflow. I'm going to make the case here that GHA is not reproducible, and is therefore antithetical to the premise of Nextflow. GHA has no concept of dependencies. This dead horse has already been beat better than I can by fasterthanlime in his video Why GitHub Actions Feels Bad. This is what bit us on the Java breakage. The fact that we cannot specify that Nextflow has a dependency on Java is scary. The fact that it worked for so long is scarier. The fact that GHA changes its behavior drastically based on the environment it runs in. Last night, I ran local tests on GHA is not testable locally. Yes, there's act, but the behavior of act is what GHA should be (per GHA's documentation), and not what GHA is. I noticed that you removed the .actrc file, and in previous drafts were questioning the example.yml workflows. This is on me for not documenting this well enough, so I'll explain myself here. .actrc makes sure that we can actually test the action locally without downloading a 50+GB docker image (and provides consistency with Apple Silicon systems), and example.yml is the way to test the action in a workflow (the unit tests only test Typescript functions). Even with those files, clearly there is a huge disconnect between running locally with act and running on a real GHA runner. Bottom lineClearly ripping out the CI system is a big decision, and I'm not asking it to be made lightly. The point of this section is to highlight the issues I've had with GHA as a whole and to jumpstart a discussion. So far it seems like @edmundmiller is the only one with any interest in a different setup, tho. My maintainership issuesI made Like I said, I've got other priorities for the next couple weeks, so I just wanted to throw my general thoughts out there for now. |
I think this action lost the plot. I think if we have in the README steps:
- uses: actions/checkout@v3
+ - uses: actions/setup-java@v4
+ - uses: nf-core/setup-nextflow@v3
- - uses: nf-core/setup-nextflow@v1
- run: nextflow run ${GITHUB_WORKSPACE} And users will be happy. A lot of this came out of #5 and #19, which was working around Nextflow release oddities. |
@edmundmiller yes, but:
@MillironX thanks for the great writeup, much appreciated. I'm keenly aware of how much work you've invested in this already and want to make sure that it's clear that it's recognised how key this action has become in much of the nf-core (and wider Nextflow) ecosystem. The fact that it's just worked for a long time now is significant, given how heavily it's used. Couple of quick thoughts:
What do we consider to be important here, In #78 there's a fair bit of discussion about knowing when to invalidate the cache, but I don't think that we need to care about that. As long as
As above, Nextflow should be handling assets for regular (non
Sure maybe one day, but not any day soon. GHA may not be the best solution but it is by far the easiest. Everything in nf-core is massively baked into the GitHub ecosystem and we intentionally gravitate towards using it whenever we can. In the early days we used Travis for CI (before GHA was a thing) and it was crappy. A lot of work put on core team members who had access to go in and do stuff and a lot of extra maintenance work. Adoption of GHA was a revolution for us because it democratised access. Also it was zero effort setup for anyone forking / building off nf-core for their own stuff. I agree that writing and testing GHA is a pain in the ass, but given the volume that we run it at I think that once set up it tends to be very robust. You just have to test a lot in a manual way. Yesterday I uncovered the same bug you saw overnight by testing a separate private repo on GitHub with the branch - I've never bothered using act because I usually find it more trouble than it's worth for exactly this reason. It's not true in its replication. Generally my preference is very strongly in favour of reducing the number of services and external dependencies that we rely on, especially for anything touching distributed resources (such as pipelines / repositories). I've been burnt a lot of times and it's easy to think that the grass is greener on the other side :)
Although this event was a bit of a 🔥 I think it's good that it's prompted us to review this. It's clear that it's a fairly key part of our infra now (given how many people it affected) and also that its current state was not easy for others to maintain (it took @mashehu and I together about 1.5 days to piece together all the different parts). My motivation for this simplification is to make the action so simple that anyone can look at it and fix it with minimal prior experience. I hope that this will go some way to making it less brittle. Finally, the Nextflow team has had several new engineers join recently and we were already in the middle of reviewing the build and release system. The Capsule system was already removed in the summer and we're hoping to improve process for a bunch of how this works. Specifically I have written an internal postmortem on this and suggested a few concrete things:
I can't promise when or if any of this will change, but it is at least being discussed 😄 If it all comes through then this action can hopefully get even simpler (or even be deprecated entirely). |
Attempt at a vastly simpler composite action using a few lines of bash.
Comes with some limitations:
all
dist
and I don't think I've ever seen anyone use this 🤔On the plus side: