Skip to content

Commit

Permalink
Fix PyTorch Compatibility link and remove incomplete rows (ROCm#4195)
Browse files Browse the repository at this point in the history
* fix pytorch-compatibility filename

fix links

* remove incomplete rows in pytorch-compatibility

* fix broken refs

(cherry picked from commit f76145c)
  • Loading branch information
peterjunpark committed Dec 24, 2024
1 parent b71477a commit 702500d
Show file tree
Hide file tree
Showing 6 changed files with 9 additions and 43 deletions.
2 changes: 1 addition & 1 deletion docs/compatibility/compatibility-matrix-historical-6.0.csv
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ ROCm Version,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
,,,,,,,,,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,
:doc:`PyTorch <../compatibility/pytorch-compatiblity>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`PyTorch <../compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.35,0.4.35,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
Expand Down
2 changes: 1 addition & 1 deletion docs/compatibility/compatibility-matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ compatibility and system requirements.
,gfx908,gfx908,gfx908
,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
:doc:`PyTorch <../compatibility/pytorch-compatiblity>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
:doc:`PyTorch <../compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
:doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1"
:doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.35,0.4.35,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -576,14 +576,6 @@ PyTorch interacts with the CUDA or ROCm environment.
- Globally enables or disables the PyTorch C++ implementation within SDPA.
- 2.1
- ❌
* - ``allow_fp16_bf16_reduction_math_sdp``
- Globally enables FP16 and BF16 precision for reduction operations within
SDPA.
- 2.1
-
..
FIXME:
- Partial?

.. Need to validate and extend.
Expand Down Expand Up @@ -671,40 +663,13 @@ of computational resources and scalability for large-scale tasks.
those on separate machines.
- 1.8
- 5.4
* - RPC Device Map Passing
- RPC Device Map Passing in PyTorch refers to a feature of the Remote
Procedure Call (RPC) framework that enables developers to control and
specify how tensors are transferred between devices during remote
operations. It allows fine-grained management of device placement when
sending tensors across nodes in distributed training or execution
scenarios.
- 1.9
-
* - Gloo
- Gloo is designed for multi-machine and multi-GPU setups, enabling
efficient communication and synchronization between processes. Gloo is
one of the default backends for PyTorch's Distributed Data Parallel
(DDP) and RPC frameworks, alongside other backends like NCCL and MPI.
- 1.0
- 2.0
* - MPI
- MPI (Message Passing Interface) in PyTorch refers to the use of the MPI
backend for distributed communication in the ``torch.distributed`` module.
It enables inter-process communication, primarily in distributed
training settings, using the widely adopted MPI standard.
- 1.9
-
* - TorchElastic
- TorchElastic is a PyTorch library that enables fault-tolerant and
elastic training in distributed environments. It is designed to handle
dynamically changing resources, such as adding or removing nodes during
training, which is especially useful in cloud-based or preemptible
environments.
- 1.9
-

..
FIXME: RPC Device Map Passing "Since ROCm version"

torch.compiler
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
8 changes: 6 additions & 2 deletions docs/how-to/deep-learning-rocm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,14 @@ ROCm provides a comprehensive ecosystem for deep learning development, including
deep learning frameworks and libraries such as PyTorch, TensorFlow, and JAX. ROCm works closely with these
frameworks to ensure that framework-specific optimizations take advantage of AMD accelerator and GPU architectures.

The following guides provide information on compatibility and supported features for ROCm-enabled deep learning frameworks.
The following guides provide information on compatibility and supported
features for these ROCm-enabled deep learning frameworks.

* :doc:`PyTorch compatibility <../compatibility/pytorch-compatibility>`
.. * :doc:`TensorFlow compatibility <../compatibility/tensorflow-compatibility>`
.. * :doc:`JAX compatibility <../compatibility/jax-compatibility>`
The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
This chart steps through typical installation workflows for installing deep learning frameworks for ROCm.

.. image:: ../data/how-to/framework_install_2024_07_04.png
:alt: Flowchart for installing ROCm-aware machine learning frameworks
Expand All @@ -37,3 +40,4 @@ through the following guides.
* :doc:`rocm-for-ai/index`

* :doc:`llm-fine-tuning-optimization/index`

3 changes: 0 additions & 3 deletions docs/how-to/performance-validation/mi300x/vllm-benchmark.rst
Original file line number Diff line number Diff line change
Expand Up @@ -399,9 +399,6 @@ Further reading
- To learn how to optimize inference on LLMs, see
:doc:`Fine-tuning LLMs and inference optimization </how-to/llm-fine-tuning-optimization/index>`.

- For a list of other ready-made Docker images for ROCm, see the
:doc:`Docker image support matrix <rocm-install-on-linux:reference/docker-image-support-matrix>`.

- To compare with the previous version of the ROCm vLLM Docker image for performance validation, refer to
`LLM inference performance validation on AMD Instinct MI300X (ROCm 6.2.0) <https://rocm.docs.amd.com/en/docs-6.2.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_.

2 changes: 1 addition & 1 deletion docs/how-to/tuning-guides/mi300x/workload.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ involves configuring tensor parallelism, leveraging advanced features, and
ensuring efficient execution. Here’s how to optimize vLLM performance:

* Tensor parallelism: Configure the
:ref:`tensor-parallel-size parameter <mi300x-vllm-optimize-tp-gemm>` to distribute
:ref:`tensor-parallel-size parameter <mi300x-vllm-multiple-gpus>` to distribute
tensor computations across multiple GPUs. Adjust parameters such as
``batch-size``, ``input-len``, and ``output-len`` based on your workload.

Expand Down

0 comments on commit 702500d

Please sign in to comment.