Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

medaka on macOS #228

Closed
1 task
jmtsuji opened this issue Oct 4, 2024 · 5 comments
Closed
1 task

medaka on macOS #228

jmtsuji opened this issue Oct 4, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@jmtsuji
Copy link
Collaborator

jmtsuji commented Oct 4, 2024

Problem description

In addition to Linux, it would be great if we could get rotary working on macOS. So far, installs that seem to be failing are:

(Check if install is addressed:)

  • medaka

medaka install issues

Rotary currently needs to be installed on Mac using the osx-64 architecture subdir of conda (i.e., for x86_64 tools), because many of the bioinformatics tools we are using only have x86_64 installs. This is a global setting across an entire rotary run. When run on a Mac with a M-series processor (e.g., M1, M2, M3, which are ARM-based), the tools are translated using Rosetta so they can run on arm64.

Unfortunately, medaka v1 is no longer installing well this way on macOS. I think that medaka 1.8.0 (which is currently specified in our env file for medaka) used to work on macOS, but it seems that the tensorflow dependency is not translating properly... I get a Illegal instruction: 4 error when I try to load this package.

Medaka v2 was recently released, but support for osx-64 was dropped in favour of osx-arm64. This makes sense given that Mac's are no longer made with x86_64 architecture chips, but it means that medaka v2 is currently unavailable for use in the pipeline unless we hack it in somehow.

Other tools with install issues

I'll add these as replies to this post as I have the chance to keep testing rotary on macOS.

UPDATE (edit on Oct 8th, 2024): simplified this issue to just focus on the medaka install -- changed the issue title. (The rest of the issue description is unedited.)

Proposed solution

medaka

A few different solutions might be possible, ordered here by preference:

  • Ask for osx-64 support to be added for medaka v2 (done: see Support for x86_64 on macOS for medaka v2 nanoporetech/medaka#533)
  • Have macOS users make a arm64 environment for medaka v2 manually when they install rotary. We could add a setting to the config file where they they can add the name of the env to replace the default environment for the medaka rules. This is not ideal, because it would make the install more laborious and less reproducible.
  • Make a hacky install of medaka v1 to work on osx-64, e.g., using combinations of conda and pip. I've gotten this to work on my own machine, but it requires pip flags (--no-deps) that cannot be added into the environment.yml file for medaka and seems pretty unstable. I don't think this is a sustainable option.

Possible caveats etc.

I suppose one other solution would be to ask snakemake dev's is there is a way to specify the conda subdir variable for each conda env that is created by the pipeline. We could then hard-code that the conda subdir for medaka on macOS must be osx-arm64. This is also a bit hacky.

@jmtsuji jmtsuji added the enhancement New feature or request label Oct 4, 2024
@jmtsuji jmtsuji self-assigned this Oct 4, 2024
@LeeBergstrand
Copy link
Collaborator

LeeBergstrand commented Oct 4, 2024

@jmtsuji You also create separate Conda environments for Medaka 1 and 2 (medaka.yaml and medaka2.yaml) and change the Conda environment used by the Medaka snakemake rules depending on the hardware specs using Python.

rule polish_contig_medaka:
    input:
        calls_to_draft_bam='{sample}/{step}/medaka/calls_to_draft.bam',
        calls_to_draft_bam_index='{sample}/{step}/medaka/calls_to_draft.bam.bai'
    output:
        contig_polished=temp('{sample}/{step}/medaka/results/{sample}_{contig}.hd5')
    conda:
        some_function_that_selects_what_medaka() # Returns medaka.yaml or medaka2.yaml, depending on the CPU architecture.

https://stackoverflow.com/questions/7491391/is-there-a-reliable-way-to-determine-the-system-cpu-architecture-using-python

For the medaka.yaml you would lock the tensorflow version at a version that still works with X86_64 OSX.

@jmtsuji jmtsuji changed the title macOS compatibility medaka on macOS Oct 8, 2024
@jmtsuji
Copy link
Collaborator Author

jmtsuji commented Oct 8, 2024

@LeeBergstrand Based on a conversation with the medaka maintainers, there is no plan to make a osx-64 install of medaka2. Here are some updated possible courses of action for us (with additional thoughts from the medaka issue):

  • Transition from supporting osx-64 to osx-arm64 with rotary. This would require updating all of rotary's conda envs to be installable natively on osx-arm64. I have not evaluated each rotary environment yet, but based on a quick glance, it looks like several of the annotation tools (e.g., EggnNOG mapper, newer versions of DFAST) lack a osx-arm64 install, so adding osx-arm64 support for the whole pipeline would not be trivial.
    • However, it might not be that hard to support osx-arm64 installs up to the annotation module... I would need to check this.
  • Drop macOS support entirely. This would simplify maintenance and avoid issues with Mac architecture down the road, but it would also make rotary inaccessible to many individual users.
  • For now, keep medaka v1 available for macOS and move to medaka v2 for Linux, as you mentioned. However, even the medaka v1 install on macOS seems to be breaking now without manually tweaking the environment, based on personal testing, so this might not be a great option anymore.
  • Ask macOS users to make a custom env for medaka v2 with the osx-arm64 subdir while installing rotary, but run the rest of the tool as osx-64, as mentioned in my first post. This could be a temporary "hacky" workaround until a better long-term solution is reached.

@LeeBergstrand Any thoughts? I don't think macOS support is a huge priority at the moment (aside from being able to easily test rotary on laptops), but I think it is worth considering for the finished tool.

@jmtsuji
Copy link
Collaborator Author

jmtsuji commented Oct 8, 2024

As shown in #229, I think that the first option above is not feasible in our development timeline:

- Transition from supporting osx-64 to osx-arm64 with rotary. This would require updating all of rotary's conda envs to be installable natively on osx-arm64. I have not evaluated each rotary environment yet, but based on a quick glance, it looks like several of the annotation tools (e.g., EggnNOG mapper, newer versions of DFAST) lack a osx-arm64 install, so adding osx-arm64 support for the whole pipeline would not be trivial.

@LeeBergstrand
Copy link
Collaborator

LeeBergstrand commented Oct 8, 2024

Drop macOS support entirely. This would simplify maintenance and avoid issues with Mac architecture down the road, but it would also make rotary inaccessible to many individual users.

@jmtsuji Why is MacOS support important for Rotary, other than for development? I don't see the entire pipeline being able to feasibly run on a Mac given the RAM requirements. Most baseline macs people have (MacBook pros and iMacs) top out at 24 gigs of RAM which has to be special ordered (baseline is 8 or 16 gb). The some versions you can get 32 gigs (mac mini and 15 inch MacBook pro). Even the top end modern mac studios ($5500) top out at 64 gigs of ram.

I would support waiting until more of these tools have ARM support.

@jmtsuji
Copy link
Collaborator Author

jmtsuji commented Oct 9, 2024

@LeeBergstrand Earlier in the development of rotary, I was thinking it would be great if most of the pipeline could be accessible to a broad group of users. Many people using Nanopore might not have a big Linux machine (or a Linux machine at all) to run on. I think most of the pipeline outside of annotation (which involves a lot of rules with high RAM / CPU needs) should be feasible on an average Mac. This was one of my key motivations for exploring a macOS install of rotary. However, I think making a macOS install should not be our main priority. Given the large challenges that seem to be involved with making a macOS version and the fact that the annotation module likely wouldn't be feasible on a typical Mac, I agree, let's table making a macOS install for now. We can potentially revisit this in the future if/when more tools have ARM support.

@jmtsuji jmtsuji closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants