Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac GPU Utilization Support #2789

Closed
josebarross opened this issue May 24, 2022 · 8 comments
Closed

Mac GPU Utilization Support #2789

josebarross opened this issue May 24, 2022 · 8 comments
Assignees
Labels
question Further information is requested wontfix This will not be worked on

Comments

@josebarross
Copy link

josebarross commented May 24, 2022

Torch has just released MAC M1 support using mps device. I want to know if flair will support it. I tried setting the flair.device manually to mps but it failed when running. Thank you in advance

@josebarross josebarross added the question Further information is requested label May 24, 2022
@alanakbik
Copy link
Collaborator

Thanks for raising this issue @josebarross!

@whoisjones can you check if this works?

@deakkon
Copy link

deakkon commented Jul 1, 2022

Hi everyone,

not sure if (still) relevant. I tried setting the device, unsuccessfully, via .to():

# setting the device
def get_torch_device():
    if torch.cuda.is_available():
        device = torch.device("cuda:0")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
    else:
        device = torch.device("cpu")
    return device


device = get_torch_device()
nlp = SequenceTagger.load("flair/ner-german-legal").to(device)
sentence = Sentence(text, use_tokenizer=False)
sentence.to(device=device)
nlp.predict(sentence)

Find the traceback below:

Traceback (most recent call last):
  File "/Users/jurica/Projects/data-processing-pipeline/models/annotators/legal_ner.py", line 71, in <module>
    spans = model("Arbeitsbericht Nr. 188 des Büros für Technikfolgen-Abschätzung beim Deutschen Bundestag (TAB): Strukturwandel und Nachhaltigkeit in der Landwirtschaft, Nachhaltigkeitsbewertung vom landwirtschaftlichen Betrieb bis zum Agrarsektor &ndash; Stand und Perspektiven, Vergleich von konventioneller und ökologischer Landwirtschaft, Handlungsoptionen")
  File "/Users/jurica/Projects/data-processing-pipeline/models/annotators/legal_ner.py", line 53, in __call__
    self.nlp.predict(sentence)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/flair/models/sequence_tagger_model.py", line 479, in predict
    features, gold_labels = self.forward(batch)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/flair/models/sequence_tagger_model.py", line 282, in forward
    self.embeddings.embed(sentences)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/flair/embeddings/token.py", line 68, in embed
    embedding.embed(sentences)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/flair/embeddings/base.py", line 62, in embed
    self._add_embeddings_internal(data_points)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/flair/embeddings/token.py", line 728, in _add_embeddings_internal
    all_hidden_states_in_lm = self.lm.get_representation(
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/flair/models/language_model.py", line 155, in get_representation
    _, rnn_output, hidden = self.forward(batch, hidden)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/flair/models/language_model.py", line 75, in forward
    encoded = self.encoder(input)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 159, in forward
    return F.embedding(
  File "/Users/jurica/miniconda3/envs/dpp/lib/python3.9/site-packages/torch/nn/functional.py", line 2197, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

This seems to have already been fixed though

Relevant env info:

torch==1.13.0.dev20220630 # (but should work with 1.12 as well)
flair==0.11.3

Hope it helps!

@stale
Copy link

stale bot commented Nov 1, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Nov 1, 2022
@stale stale bot closed this as completed Nov 13, 2022
@gojefferson
Copy link

I've run into the same issue as @deakkon. This is still a problem.

@mileszim
Copy link

mileszim commented Nov 3, 2023

@gojefferson just tried to figure this out--with the merge of #3350 I am able to leverage torch with m1 by setting

import flair
import torch

mps_device = torch.device("mps")
flair.device = 'mps:0'

then run your code like normal

from flair.data import Sentence
from flair.nn import Classifier

# uses apple GPU
tagger = Classifier.load('ner')

sentence = Sentence("Hello there!")

tagger.predict(sentence)

it works as-is!

@aburns27
Copy link

Changes to Enable Flair on MPS (Apple Silicon)

I was working to train/test on data using flair for a project and wanted to use MPS/GPU on Apple Silicon to run it. Example code:

# Move model to MPS device and ensure float32 precision
tagger.to(flair.device)

# Make all model parameters are in float32
for param in tagger.parameters():
    param.data = param.data.float()

# Make all gradients are also calculated in float32
for param in tagger.parameters():
    if param.grad is not None:
        param.grad.data = param.grad.data.float()

# 6. Initialize the trainer
trainer = ModelTrainer(tagger, corpus)

# 7. Train the model without AMP
trainer.train(
    f'resources/taggers/{runname}_mps',
    learning_rate=0.05,
    mini_batch_size=16,
    save_model_each_k_epochs=2,
    monitor_test=True,
    max_epochs=5,
    use_amp=False,
)

Here are the changes I had to make:

1. trainer.py

Location: flair/trainers/trainer.py

Issue: torch.autocast is used even when AMP was manually disabled (use_amp=False), causing a RuntimeError on MPS devices.

Fix: Made autocast only applied when AMP is enabled.

# Original Code:
with torch.autocast(device_type=flair.device.type, enabled=use_amp):
    loss, datapoint_count = self.model.forward_loss(batch_step)

# Modified Code:
if use_amp:
    with torch.autocast(device_type=flair.device.type):
        loss, datapoint_count = self.model.forward_loss(batch_step)
else:
    loss, datapoint_count = self.model.forward_loss(batch_step)

2. training_utils.py

Location: flair/training_utils.py

Issue: The store_embeddings function tried to pin memory on MPS, causing a NotImplementedError.

Fix: Only pin memory for CUDA devices.

# Original Code:
pin_memory = str(flair.device) != "cpu"

# Modified Code:
pin_memory = str(flair.device) == "cuda"

@aditya-malani
Copy link

@aburns27

I am seeing this error after making the changes you suggested.
RuntimeError: The expanded size of the tensor (2) must match the existing size (24) at non-singleton dimension 0. Target sizes: [2]. Tensor sizes: [24]

@aburns27
Copy link

@aditya-malani The output of your model and the target labels (ground truth) may have different dimensions. That wouldn't be an issue caused by the above changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

8 participants