-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Ubuntu packages for FastTree and other programs? #127
Comments
If we are considering installation from prebuilt binaries, we might also consider installing these tools with Conda. We already rely on Conda binaries in our workflow-specific environment files and our nextstrain-base environment. We could have micromamba installed in our first pass of the Docker build and use that to install the third-party binaries we want. I'd trust these binaries that we use in multiple places over the binaries from Ubuntu that we rarely use. |
@huddlej fair point about considering Conda. One thing to note is that it's best not to compromise availability of platform-specific binaries. We have a (historical?) preference for using Bioconda. From what I can tell, their package builders do not support The alternative for FastTree would be to make it available on a different channel that supports building and hosting both |
This issue can be thought of as a discussion of the "Adding a new software program" section in the README which I added recently: Lines 90 to 113 in 3ab045e
There's a few separate things I'd like to discuss:
|
The problem with installing from package managers (dpkg or conda) is that it does not always allow to swap versions easily. For dpkg, once released in a given distro version, these packages pretty much never update. Considering how buggy and low quality the software in the field can be, scientists sometimes have to search for the only only one version that works for a given input (and for example not crashes or not producing garbage results). This is especially true for iqtree - you can find adventures Cornelius went through with it on Slack. To speedup docker build, instead of building inside container, we could add scripts to make prebuilt installable tarballs or debs once, host them on S3 or GH releases, and then just untar them in the Dockerfile. |
These relate to considerations (3) "Is it up to date with the latest version?" and (2) "Can the version be pinned?". It seems like
Speed of docker build is the least important benefit I see from using package managers – the current Docker caching (at least for pinned programs) is already effective at reducing the amount of times a program is built. A more important benefit would be using package managers for dependency management, since it is currently disjoint in the Dockerfile (see #126). |
Lots of intersectional considerations here. For background context, the reason we were compiling FastTree (and others) in the first place, IIRC, was that we started off using an Alpine base image which either did not provide packages for these programs (or potentially packaged too out-of-date versions). We did not reconsider this when switching to a Debian-based base image (e.g. Also, to be precise, we're talking about using Debian packages here not Ubuntu's repackaging of them, as our base distro is Debian "bullseye" (via I've added my thoughts on @victorlin's questions below.
Yes, absolutely. Or at least as much as any other maintainers we implicitly trust, and we already trust Debian maintainers as a whole (a very broad and varied group, not necessarily the specific maintainers of this package) quite a bit.
For FastTree, yes to both, as you've noted.
Yes! but not with the conventional name. See below.
Maybe. We can certainly propose packaging patches, and those will go thru Debian's process at the pace set by the maintainer team for this package. That pace is an unknown, but we could look at past changes to gauge; ~all communication is open/public. But depending on the scope of changes, those may or may not be able to be included in the stable release we're using. The Debian package actually enables double-precision for both binaries it produces, but it doesn't include the conventional Relatedly, there's a case to be made that we shouldn't ever use a FastTree version compiled without double precision, as the results are likely to be wrong for our use cases (c.f. Not so fast, FastTree and the Debian package's only ever bug report). To enforce this in Augur, I think we'd have to do something equivalent to
in Augur (or some external wrapper Augur calls instead). Note that Conda also omits the
The Ubuntu package is not directly relevant here (per the note about the distro we use above), but Debian also does not compile its iqtree package for However, note that the iqtree package for Debian bullseye is 1.x not 2.x as we currently use, so we'd have to downgrade, which I think is probably a nonstarter?
We should prefer packaged versions, esp. from the base distro, as long as they're suitable. "Are they suitable?" likely mostly means, "Are they current/new enough versions?"
If necessary and applicable, but not always. Also, this isn't always feasible, c.f. discussion above about packaging policies/cadences.
Conda packages bring along other issues. For example, they expect to bring along everything but libc, so things like openssl and other common shared libs will get duplicated (increasing image size, increasing complexity of library interactions at runtime, and more). I'm reluctant to mix Conda packages with non-Conda packages for these reasons. That said, we might take a step back and consider building the container image entirely from a static Conda environment. We've (or at least I've) considered this before, but decided it wasn't worth it then. Maybe that's changed, particularly in light of our new Conda runtime defined by a locked package? There are downsides though, like a tighter coupling between runtimes and what they can support (e.g. architectures). Tighter is good in some ways but worse in others. Also, other considerations aside, we may not want to put all our eggs in Conda's basket. Relatedly, but not yet mentioned, is RAxML. We could consider moving to the Debian package which is available for both |
Context
FastTree is currently built from source for the
nextstrain/base
Docker image.While working on #123, I discovered that it is also available as an Ubuntu package
fasttree
which can be installed directly viaapt-get install fasttree
. This made me think whether we should be installing from that directly instead of building from source.Up-sides to installing from Ubuntu's APT package manager
Notes on the above, with examples:
mafft
is available for bothamd64
andarm64
, whereas we have a TODO to figure out how to build it from source.FastTreeDblMP
which is built for the Docker image. The Ubuntufasttree
package provides less-optimal versions. TheFastTreeDblMP
build instructions can be copied over to the Ubuntu package builder to benefit non-Nextstrain users.iqtree
is only available asamd64
. The Dockerfile also only downloads a pre-built binary foramd64
. There is a TODO to build from source which would providearm64
-native IQ-TREE binaries in thenextstrain/base
image. This could instead be done in the Ubuntu package builder to benefit non-Nextstrain users.Considerations
I'm not familiar with Ubuntu packages, but it seems like all those questions can be answered by clicking around the package websites.
The text was updated successfully, but these errors were encountered: