Fix PyTorch Compatibility link and remove incomplete rows (ROCm#4195)

* fix pytorch-compatibility filename fix links * remove incomplete rows in pytorch-compatibility * fix broken refs (cherry picked from commit f76145c)
peterjunpark · Dec 24, 2024 · 702500d · 702500d
1 parent b71477a
commit 702500d
Show file tree

Hide file tree

Showing 6 changed files with 9 additions and 43 deletions.
diff --git a/docs/compatibility/compatibility-matrix-historical-6.0.csv b/docs/compatibility/compatibility-matrix-historical-6.0.csv
@@ -22,7 +22,7 @@ ROCm Version,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.
       ,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
       ,,,,,,,,,,,
       FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,
-      :doc:`PyTorch <../compatibility/pytorch-compatiblity>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
+      :doc:`PyTorch <../compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
       :doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
       :doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.35,0.4.35,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
       `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1

diff --git a/docs/compatibility/compatibility-matrix.rst b/docs/compatibility/compatibility-matrix.rst
@@ -47,7 +47,7 @@ compatibility and system requirements.
       ,gfx908,gfx908,gfx908
       ,,,
       FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
-      :doc:`PyTorch <../compatibility/pytorch-compatiblity>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
+      :doc:`PyTorch <../compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
       :doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1"
       :doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.35,0.4.35,0.4.26
       `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3

diff --git a/docs/compatibility/pytorch-compatiblity.rst → docs/compatibility/pytorch-compatibility.rst b/docs/compatibility/pytorch-compatiblity.rst → docs/compatibility/pytorch-compatibility.rst
@@ -576,14 +576,6 @@ PyTorch interacts with the CUDA or ROCm environment.
       - Globally enables or disables the PyTorch C++ implementation within SDPA.
       - 2.1
       - ❌
-    * - ``allow_fp16_bf16_reduction_math_sdp``
-      - Globally enables FP16 and BF16 precision for reduction operations within
-        SDPA.
-      - 2.1
-      - 
-..
-   FIXME:
-      - Partial?
 
 .. Need to validate and extend.
 
@@ -671,40 +663,13 @@ of computational resources and scalability for large-scale tasks.
         those on separate machines.
       - 1.8
       - 5.4
-    * - RPC Device Map Passing
-      - RPC Device Map Passing in PyTorch refers to a feature of the Remote
-        Procedure Call (RPC) framework that enables developers to control and
-        specify how tensors are transferred between devices during remote
-        operations. It allows fine-grained management of device placement when
-        sending tensors across nodes in distributed training or execution
-        scenarios.
-      - 1.9
-      - 
     * - Gloo
       - Gloo is designed for multi-machine and multi-GPU setups, enabling
         efficient communication and synchronization between processes. Gloo is
         one of the default backends for PyTorch's Distributed Data Parallel
         (DDP) and RPC frameworks, alongside other backends like NCCL and MPI.
       - 1.0
       - 2.0
-    * - MPI
-      - MPI (Message Passing Interface) in PyTorch refers to the use of the MPI
-        backend for distributed communication in the ``torch.distributed`` module.
-        It enables inter-process communication, primarily in distributed
-        training settings, using the widely adopted MPI standard.
-      - 1.9
-      -
-    * - TorchElastic
-      - TorchElastic is a PyTorch library that enables fault-tolerant and
-        elastic training in distributed environments. It is designed to handle
-        dynamically changing resources, such as adding or removing nodes during
-        training, which is especially useful in cloud-based or preemptible
-        environments.
-      - 1.9
-      -
-
-.. 
-   FIXME: RPC Device Map Passing "Since ROCm version"
 
 torch.compiler
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/docs/how-to/deep-learning-rocm.rst b/docs/how-to/deep-learning-rocm.rst
@@ -11,11 +11,14 @@ ROCm provides a comprehensive ecosystem for deep learning development, including
 deep learning frameworks and libraries such as PyTorch, TensorFlow, and JAX. ROCm works closely with these
 frameworks to ensure that framework-specific optimizations take advantage of AMD accelerator and GPU architectures.
 
-The following guides provide information on compatibility and supported features for ROCm-enabled deep learning frameworks.
+The following guides provide information on compatibility and supported
+features for these ROCm-enabled deep learning frameworks.
 
 * :doc:`PyTorch compatibility <../compatibility/pytorch-compatibility>`
+.. * :doc:`TensorFlow compatibility <../compatibility/tensorflow-compatibility>`
+.. * :doc:`JAX compatibility <../compatibility/jax-compatibility>`
 
-The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
+This chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
 
 .. image:: ../data/how-to/framework_install_2024_07_04.png
    :alt: Flowchart for installing ROCm-aware machine learning frameworks
@@ -37,3 +40,4 @@ through the following guides.
 * :doc:`rocm-for-ai/index`
 
 * :doc:`llm-fine-tuning-optimization/index`
+
diff --git a/docs/how-to/performance-validation/mi300x/vllm-benchmark.rst b/docs/how-to/performance-validation/mi300x/vllm-benchmark.rst
@@ -399,9 +399,6 @@ Further reading
 - To learn how to optimize inference on LLMs, see
   :doc:`Fine-tuning LLMs and inference optimization </how-to/llm-fine-tuning-optimization/index>`.
 
-- For a list of other ready-made Docker images for ROCm, see the
-  :doc:`Docker image support matrix <rocm-install-on-linux:reference/docker-image-support-matrix>`.
-
 - To compare with the previous version of the ROCm vLLM Docker image for performance validation, refer to
   `LLM inference performance validation on AMD Instinct MI300X (ROCm 6.2.0) <https://rocm.docs.amd.com/en/docs-6.2.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_.
 
diff --git a/docs/how-to/tuning-guides/mi300x/workload.rst b/docs/how-to/tuning-guides/mi300x/workload.rst
@@ -92,7 +92,7 @@ involves configuring tensor parallelism, leveraging advanced features, and
 ensuring efficient execution. Here’s how to optimize vLLM performance:
 
 * Tensor parallelism: Configure the
-  :ref:`tensor-parallel-size parameter <mi300x-vllm-optimize-tp-gemm>` to distribute
+  :ref:`tensor-parallel-size parameter <mi300x-vllm-multiple-gpus>` to distribute
   tensor computations across multiple GPUs. Adjust parameters such as
   ``batch-size``, ``input-len``, and ``output-len`` based on your workload.