GitHub - hanhaowen-mt/torch_musa: torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

torch_musa is an extended Python package based on PyTorch. Developing torch_musa in a plug-in way allows torch_musa to be decoupled from PyTorch, which is convenient for code maintenance. Combined with PyTorch, users can take advantage of the strong power of Moore Threads graphics cards through torch_musa. In addition, torch_musa has two significant advantages:

CUDA compatibility could be achieved in torch_musa, which greatly reduces the workload of adapting new operators.
torch_musa API is consistent with PyTorch in format, which allows users accustomed to PyTorch to migrate smoothly to torch_musa.

Installation
Getting Started
Documentation
FAQ

Installation

From Python Package

Package Download Link

# for python3.8
pip install torch-2.0.0_xxxxxx-cp38-cp38-linux_x86_64.whl
pip install torch_musa_xxxxxx-cp38-cp38-linux_x86_64.whl

# for python3.9
pip install torch-2.0.0_xxxxxx-cp39-cp39-linux_x86_64.whl
pip install torch_musa_xxxxxx-cp39-cp39-linux_x86_64.whl

From Source

Prerequisites

MUSA ToolKit
MUDNN
Other Libs (including muThrust, muSparse, muAlg, muRand)
PyTorch Source Code
Docker Container Toolkits

NOTE: Since some of the dependent libraries are in beta and have not yet been officially released, we recommend using the development docker provided below to compile torch_musa. If you really want to compile torch_musa in your own environment, then please contact us for additional dependencies.

Install Dependencies

apt-get install ccache
pip install -r requirements.txt

Set Important Environment Variables

export MUSA_HOME=path/to/musa_libraries(including mudnn and musa_toolkits) # defalut value is /usr/local/musa/
export LD_LIBRARY_PATH=$MUSA_HOME/lib:$LD_LIBRARY_PATH
# if PYTORCH_REPO_PATH is not set, PyTorch-v2.0.0 will be downloaded outside this directory when building with build.sh
export PYTORCH_REPO_PATH=path/to/PyTorch source code

Building With Script (Recommended)

bash build.sh   # build original PyTorch and torch_musa from scratch

# Some important parameters are as follows:
bash build.sh --torch  # build original PyTorch only
bash build.sh --musa   # build torch_musa only
bash build.sh --fp64   # compile fp64 in kernels using mcc in torch_musa
bash build.sh --debug  # build in debug mode
bash build.sh --asan   # build in asan mode
bash build.sh --clean  # clean everything built

Building Step by Step From Source

Apply PyTorch patches

bash build.sh --patch

Building PyTorch

cd pytorch
pip install -r requirements.txt
python setup.py install
# debug mode: DEBUG=1 python setup.py install
# asan mode:  USE_ASAN=1 python setup.py install

Building torch_musa

cd torch_musa
pip install -r requirements.txt
python setup.py install
# debug mode: DEBUG=1 python setup.py install
# asan mode:  USE_ASAN=1 python setup.py install

Docker Image

NOTE: If you want to use torch_musa in docker container, please install mt-container-toolkit first and use '--env MTHREADS_VISIBLE_DEVICES=all' when starting a container.

Docker Image for Developer

docker run -it --privileged --name=torch_musa_dev --env MTHREADS_VISIBLE_DEVICES=all --shm-size=80g torch_musa_develop_image /bin/bash

Docker Image List

Docker Tag	Description
latest/v1.0.0	musatoolkits rc1.4.0 (requires musa driver musa_2.1.1) mudnn rtm_2.1.1; mccl 20230627 libomp-11-dev muAlg _dev-0.1.1 muRAND_dev1.0.0 muSPARSE_dev0.1.0 muThrust_dev-0.1.1

Docker Image for User

docker run -it --privileged --name=torch_musa_release --env MTHREADS_VISIBLE_DEVICES=all --shm-size=80g torch_musa_release_image /bin/bash

Docker Image List

Docker Tag	Description
latest/v1.0.0	musatoolkits rc1.4.0 (requires musa driver musa_2.1.1) mudnn rtm_2.1.1; mccl 20230627 libomp-11-dev muAlg _dev-0.1.1 muRAND_dev1.0.0 muSPARSE_dev0.1.0 muThrust_dev-0.1.1

Getting Started

Key Changes

The following two key changes are required when using torch_musa:

Import torch_musa package
```
import torch
import torch_musa
```

Change the device to musa

import torch
import torch_musa

a = torch.tensor([1.2, 2.3], dtype=torch.float32, device='musa')
b = torch.tensor([1.2, 2.3], dtype=torch.float32, device='cpu').to('musa')

Example of Frequently Used APIs

code

import torch
import torch_musa

torch.musa.is_available()
torch.musa.device_count()
torch.musa.synchronize()

with torch.musa.device(0):
    assert torch.musa.current_device() == 0

if torch.musa.device_count() > 1:
    torch.musa.set_device(1)
    assert torch.musa.current_device() == 1
    torch.musa.synchronize("musa:1")

a = torch.tensor([1.2, 2.3], dtype=torch.float32, device='musa')
b = torch.tensor([1.8, 1.2], dtype=torch.float32, device='musa')
c = a + b

Example of Inference Demo

code

import torch
import torch_musa
import torchvision.models as models

model = models.resnet50().eval()
x = torch.rand((1, 3, 224, 224), device="musa")
model = model.to("musa")
# Perform the inference
y = model(x)

Example of Training Demo

code

import torch
import torch_musa
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# 1. prepare dataset
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
batch_size = 4
train_set = torchvision.datasets.CIFAR10(root='./data',
                                         train=True,
                                         download=True,
                                         transform=transform)
train_loader = torch.utils.data.DataLoader(train_set,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=2)
test_set = torchvision.datasets.CIFAR10(root='./data',
                                        train=False,
                                        download=True,
                                        transform=transform)
test_loader = torch.utils.data.DataLoader(test_set,
                                          batch_size=batch_size,
                                          shuffle=False,
                                          num_workers=2)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
device = torch.device("musa")

# 2. build network
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
net = Net().to(device)

# 3. define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# 4. train
for epoch in range(2):
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = net(inputs.to(device))
        loss = criterion(outputs, labels.to(device))
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if i % 2000 == 1999:
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0
print('Finished Training')
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
net.load_state_dict(torch.load(PATH))

# 5. test
correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        outputs = net(images.to(device))
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels.to(device)).sum().item()
print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Documentation

Developer Guide

FAQ

For more detailed information, please refer to the files in the docs folder. Please let us know by email [email protected] if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ci/templates		ci/templates
cmake		cmake
docker		docker
docs		docs
licenses		licenses
scripts		scripts
tests		tests
tools		tools
torch_musa		torch_musa
torch_patches		torch_patches
CMakeLists.txt		CMakeLists.txt
CONTRIBUTORS.md		CONTRIBUTORS.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
requirements.txt		requirements.txt
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

From Python Package

From Source

Prerequisites

Install Dependencies

Set Important Environment Variables

Building With Script (Recommended)

Building Step by Step From Source

Docker Image

Docker Image for Developer

Docker Image for User

Getting Started

Key Changes

Example of Frequently Used APIs

Example of Inference Demo

Example of Training Demo

Documentation

FAQ

About

Releases

Packages

Languages

License

hanhaowen-mt/torch_musa

Folders and files

Latest commit

History

Repository files navigation

Installation

From Python Package

From Source

Prerequisites

Install Dependencies

Set Important Environment Variables

Building With Script (Recommended)

Building Step by Step From Source

Docker Image

Docker Image for Developer

Docker Image for User

Getting Started

Key Changes

Example of Frequently Used APIs

Example of Inference Demo

Example of Training Demo

Documentation

FAQ

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages