Merge branch 'main' into tiler-helper

Xilinx · Oct 30, 2024 · 108fa81 · 108fa81
2 parents 00e632e + d3da586
commit 108fa81
Show file tree

Hide file tree

Showing 23 changed files with 188 additions and 77 deletions.
diff --git a/.github/workflows/buildAndTestRyzenAI.yml b/.github/workflows/buildAndTestRyzenAI.yml
@@ -139,6 +139,12 @@ jobs:
           source utils/quick_setup.sh
           # quick_setup changes directory to programming_examples, so we need to return to mlir-aie
           cd ..
+
+          # I have no clue why but the system clock on GHA containers is like 12 hours ahead.
+          # That means wheels have file with time stamps in the future which makes ninja loop
+          # forever when configuring. Set the time to some arbitrary stamp in the past just to be safe.
+          find my_install/mlir -exec touch -a -m -t 201108231405.14 {} \;
+
           ./utils/build-mlir-aie-from-wheels.sh ./my_install/mlir build install ./my_install/llvm-aie
 
           # build is created by the build-mlir-aie-from-wheels.sh script

diff --git a/docs/conferenceDescriptions/micro24TutorialDescription.md b/docs/conferenceDescriptions/micro24TutorialDescription.md
@@ -23,20 +23,20 @@ Prerequisite: Please bring your laptop so that you can SSH into our Ryzen™ AI-
 |------|-------|-----------|----------------|
 | 08:00am | Intro to spatial compute and explicit data movement | Kristof | [Programming Guide](../../programming_guide/) |
 | 08:15am | "Hello World" from Ryzen™ AI | Joe | [AI Engine Basic Building Blocks](../../programming_guide/section-1/) |
-| 08:30am | Data movement on Ryzen™ AI with objectFIFOs | Joe | [Data Movement](../../programming_guide/section-2/) |
-| 09:00am | Your First Program | Kristof | [My First Program](../../programming_guide/section-3) |
-| 09:20am | Exercise 1: Build and run your first program | All | [Passthrough](../../programming_examples/basic/passthrough_kernel/) |
-| 09:30am | Break | | |
-| 10:00am | Exercise 2: Vector-Scalar Mul | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
-| 10:10am | Tracing and performance analysis | Kristof | [Timers](../../programming_guide/section-4/section-4a/) and [Tracing](../../programming_guide/section-4/section-4b/) |
-| 10:40am | Exercise 3: Tracing vector-scalar | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
-| 10:50am | Vectorizing on AIE | Kristof | [Kernel Vectorization](../../programming_guide/section-4/section-4c/) |
-| 11:10am | Exercise 4: Vectorized vector-scalar | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
-| 11:20pm | Dataflow and larger designs | Joe | [Example Vector Designs](../../programming_guide/section-5/) and [Large Example Designs](../../programming_guide/section-6/) |
-| 11:30pm | Exercises | All | [Programming Examples](../../programming_examples/) |
+| 08:35am | Exercise 1: Build and run your first program | All | [Passthrough](../../programming_examples/basic/passthrough_kernel/) |
+| 08:50am | Data movement on Ryzen™ AI with objectFIFOs | Joe | [Data Movement](../../programming_guide/section-2/) |
+| 09:10am | Exercise 2: Explore AIE DMA capabilities | All | [DMA Transpose](../../programming_examples/basic/dma_transpose/) |
+| 09:20am | Your First Program | Kristof | [My First Program](../../programming_guide/section-3) |
+| 09:50am | Exercise 3: Vector-scalar mul | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
+| 10:00am | Coffee Break | | |
+| 10:30am | Tracing and performance analysis | Kristof | [Timers](../../programming_guide/section-4/section-4a/) and [Tracing](../../programming_guide/section-4/section-4b/) |
+| 10:50am | Exercise 4: Tracing vector-scalar mul | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
+| 11:00am | Vectorizing on AIE | Kristof | [Kernel Vectorization](../../programming_guide/section-4/section-4c/) |
+| 11:20am | Exercise 5: Tracing vectorized vector-scalar | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
+| 11:30pm | Dataflow and larger designs | Joe | [Example Vector Designs](../../programming_guide/section-5/) and [Large Example Designs](../../programming_guide/section-6/) |
+| 11:40pm | Exercise 6: More examples | All | [Programming Examples](../../programming_examples/) |
 | 11:50pm | Close Tutorial | All | |
 
-
 ## Organizers
 
 *Joseph Melber* is a Senior Member of Technical Staff in AMD’s Research and Advanced Development group. At AMD, he is working on hardware architectures and compiler technologies for current and future AMD devices. He received a BS in electrical engineering from the University Buffalo, as well as MS and PhD degrees from the electrical and computer engineering department at Carnegie Mellon University. His research interests include runtime systems, compiler abstractions for data movement, and hardware prototypes for future adaptive heterogeneous computing architectures.

diff --git a/programming_examples/basic/dma_transpose/aie2.py b/programming_examples/basic/dma_transpose/aie2.py
@@ -21,8 +21,8 @@ def my_passthrough(M, K, N, generate_acccess_map=False):
     data_transform = TensorTile(
         tensor_height=M,
         tensor_width=K,
-        sizes=[1, K, M, 1],
-        strides=[1, 1, K, 1],
+        sizes=[1, 1, K, M],
+        strides=[1, 1, 1, K],
         offset=0,
     )
     if generate_acccess_map:

diff --git a/programming_examples/basic/dma_transpose/test.cpp b/programming_examples/basic/dma_transpose/test.cpp
@@ -186,11 +186,11 @@ int main(int argc, const char *argv[]) {
   std::vector<uint32_t> refVecA(N);
 
   // Doing a transpose on the source vector to produce a ref vector
-  for (uint32_t i = 0; i < M; i++) {
-    for (uint32_t j = 0; j < K; j++) {
-      uint32_t src_index = i * K + j;
-      uint32_t dst_index = j * M + i;
-      refVecA[dst_index] = srcVecA[src_index];
+  uint32_t dst_index = 0;
+  for (uint32_t i = 0; i < K; i++) {
+    for (uint32_t j = 0; j < M; j++) {
+      uint32_t src_index = j * K + i;
+      refVecA[dst_index++] = srcVecA[src_index];
     }
   }
 

diff --git a/programming_examples/basic/vector_scalar_mul/Makefile b/programming_examples/basic/vector_scalar_mul/Makefile
@@ -17,35 +17,50 @@ VPATH := ${srcdir}/../../../aie_kernels/aie2
 targetname = vectorScalar
 data_size = 4096
 trace_size = 8192
+CHESS ?= true
 
 all: build/final_${data_size}.xclbin build/insts_${data_size}.txt
 
 kristof: build/insts_${data_size}.txt
 
 build/%.o: %.cc
 	mkdir -p ${@D}
-	cd ${@D} && ${PEANO_INSTALL_DIR}/bin/clang++ ${PEANOWRAP2_FLAGS} -c $< -o ${@F}
+ifeq ($(CHESS), true)
+	cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $< -o ${@F}; 
+else 
+	cd ${@D} && ${PEANO_INSTALL_DIR}/bin/clang++ ${PEANOWRAP2_FLAGS} -c $< -o ${@F}; 
+endif
 
 build/aie_${data_size}.mlir: ${srcdir}/aie2.py
 	mkdir -p ${@D}
 	python3 $< ${data_size} 0 > $@
 
-build/aie_trace_${data_size}.mlir: aie2.py
+build/aie_trace_${data_size}.mlir: ${srcdir}/aie2.py
 	mkdir -p ${@D}
 	python3 $< ${data_size} ${trace_size} > $@
 
 #build/insts_${data_size}.txt: build/final_${data_size}.xclbin
 build/final_${data_size}.xclbin: build/aie_${data_size}.mlir build/scale.o
 	mkdir -p ${@D}
+ifeq ($(CHESS), true)
 	cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
-				--no-xchesscc --no-xbridge \
 				--aie-generate-npu --npu-insts-name=insts_${data_size}.txt $(<:%=../%)
+else
+	cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
+    	  --no-xchesscc --no-xbridge \
+				--aie-generate-npu --npu-insts-name=insts_${data_size}.txt $(<:%=../%)
+endif
 
 build/final_trace_${data_size}.xclbin: build/aie_trace_${data_size}.mlir build/scale.o
 	mkdir -p ${@D}
+ifeq ($(CHESS), true)
 	cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
-				--no-xchesscc --no-xbridge \
 				--aie-generate-npu --npu-insts-name=insts_${data_size}.txt $(<:%=../%)
+else
+	cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
+    		--no-xchesscc --no-xbridge \
+				--aie-generate-npu --npu-insts-name=insts_${data_size}.txt $(<:%=../%)
+endif
 
 ${targetname}_${data_size}.exe: ${srcdir}/test.cpp
 	rm -rf _build
@@ -66,11 +81,11 @@ run_py: build/final_${data_size}.xclbin build/insts_${data_size}.txt
 
 trace: ${targetname}_${data_size}.exe build/final_trace_${data_size}.xclbin build/insts_${data_size}.txt 
 	${powershell} ./$< -x build/final_trace_${data_size}.xclbin -i build/insts_${data_size}.txt -k MLIR_AIE -t ${trace_size}
-	../../utils/parse_trace.py --filename trace.txt --mlir build/aie_trace_${data_size}.mlir --colshift 1 > trace_vs.json
+	${srcdir}/../../utils/parse_trace.py --filename trace.txt --mlir build/aie_trace_${data_size}.mlir --colshift 1 > trace_vs.json
 
 trace_py: build/final_trace_${data_size}.xclbin build/insts_${data_size}.txt
 	${powershell} python3 ${srcdir}/test.py -x build/final_trace_${data_size}.xclbin -i build/insts_${data_size}.txt -k MLIR_AIE -t ${trace_size} -s ${data_size}
-	../../utils/parse_trace.py --filename trace.txt --mlir build/aie_trace_${data_size}.mlir --colshift 1 > trace_vs.json
+	${srcdir}/../../utils/parse_trace.py --filename trace.txt --mlir build/aie_trace_${data_size}.mlir --colshift 1 > trace_vs.json
 
 
 clean_trace:

diff --git a/programming_examples/basic/vector_scalar_mul/run_makefile.lit b/programming_examples/basic/vector_scalar_mul/run_makefile.lit
@@ -3,8 +3,13 @@
 //
 // REQUIRES: ryzen_ai, peano 
 //
+// RUN: mkdir -p test_peano
+// RUN: cd test_peano
 // RUN: make -f %S/Makefile clean
-// RUN: make -f %S/Makefile 
+// RUN: env CHESS=false make -f %S/Makefile 
 // RUN: %run_on_npu make -f %S/Makefile run | FileCheck %s
 // RUN: %run_on_npu make -f %S/Makefile run_py | FileCheck %s
+// RUN: make -f %S/Makefile clean
+// RUN: env CHESS=false %run_on_npu make -f %S/Makefile trace | FileCheck %s
+// RUN: env CHESS=false %run_on_npu make -f %S/Makefile trace_py | FileCheck %s
 // CHECK: PASS!
diff --git a/programming_examples/basic/vector_scalar_mul/run_makefile_chess.lit b/programming_examples/basic/vector_scalar_mul/run_makefile_chess.lit
@@ -0,0 +1,15 @@
+// (c) Copyright 2024 Advanced Micro Devices, Inc.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+// REQUIRES: ryzen_ai, chess
+//
+// RUN: mkdir -p test_chess
+// RUN: cd test_chess
+// RUN: make -f %S/Makefile clean
+// RUN: env CHESS=true make -f %S/Makefile 
+// RUN: %run_on_npu make -f %S/Makefile run | FileCheck %s
+// RUN: %run_on_npu make -f %S/Makefile run_py | FileCheck %s
+// RUN: make -f %S/Makefile clean
+// RUN: env CHESS=true %run_on_npu make -f %S/Makefile trace | FileCheck %s
+// RUN: env CHESS=true %run_on_npu make -f %S/Makefile trace_py | FileCheck %s
+// CHECK: PASS!
diff --git a/programming_examples/ml/bottleneck/Makefile b/programming_examples/ml/bottleneck/Makefile
@@ -46,5 +46,5 @@ clean:
 		chess* *.o insts.txt \
 		*.log aie_partition.json *.bin BOOT.BIN _x test.exe
 
-run_py: 
+run_py: build/final.xclbin
 	${powershell} python3 ${srcdir}/test.py -x build/final.xclbin -i build/insts.txt -k MLIR_AIE
diff --git a/programming_examples/ml/conv2d/Makefile b/programming_examples/ml/conv2d/Makefile
@@ -29,7 +29,7 @@ build/final.xclbin: build/${mlirFileName}.mlir build/conv2dk1_i8.o
 		--no-xchesscc --no-xbridge \
 		--xclbin-name=${@F} --npu-insts-name=insts.txt $(<:%=../%)
 
-run_py: build/final.xclbin build/insts.txt
+run_py: build/final.xclbin
 	${powershell} python3 ${srcdir}/test.py -x build/final.xclbin -i build/insts.txt -k MLIR_AIE
 
 clean:

diff --git a/programming_examples/ml/conv2d_fused_relu/Makefile b/programming_examples/ml/conv2d_fused_relu/Makefile
@@ -38,5 +38,5 @@ clean:
 		chess* *.o insts.txt \
 		*.log aie_partition.json *.bin BOOT.BIN _x test.exe
 
-run_py: 
+run_py: build/final.xclbin
 	${powershell} python3  ${srcdir}/test.py -x build/final.xclbin -i build/insts.txt -k MLIR_AIE
diff --git a/programming_examples/vision/color_detect/Makefile b/programming_examples/vision/color_detect/Makefile
@@ -57,7 +57,7 @@ else
 	cp _build/${targetname} $@ 
 endif
 
-run: ${targetname}.exe build/final_${COLORDETECT_WIDTH}.xclbin build/insts.txt
+run: ${targetname}.exe build/final_${COLORDETECT_WIDTH}.xclbin
 	${powershell} ./$< -x build/final_${COLORDETECT_WIDTH}.xclbin -i build/insts.txt -k MLIR_AIE
 
 clean:

diff --git a/programming_examples/vision/color_threshold/Makefile b/programming_examples/vision/color_threshold/Makefile
@@ -49,7 +49,7 @@ else
 	cp _build/${targetname} $@ 
 endif
 
-run: ${targetname}.exe build/final_${COLORTHRESHOLD_WIDTH}.xclbin build/insts.txt
+run: ${targetname}.exe build/final_${COLORTHRESHOLD_WIDTH}.xclbin
 	${powershell} ./$< -x build/final_${COLORTHRESHOLD_WIDTH}.xclbin -i build/insts.txt -k MLIR_AIE
 
 clean:

diff --git a/programming_examples/vision/edge_detect/Makefile b/programming_examples/vision/edge_detect/Makefile
@@ -56,7 +56,7 @@ else
 	cp _build/${targetname} $@ 
 endif
 
-run: ${targetname}.exe build/final_${EDGEDETECT_WIDTH}.xclbin build/insts.txt
+run: ${targetname}.exe build/final_${EDGEDETECT_WIDTH}.xclbin
 	${powershell} ./$< -x build/final_${EDGEDETECT_WIDTH}.xclbin -i build/insts.txt -k MLIR_AIE
 
 clean:

diff --git a/programming_examples/vision/vision_passthrough/Makefile b/programming_examples/vision/vision_passthrough/Makefile
@@ -49,7 +49,7 @@ else
 	cp _build/${targetname} $@ 
 endif
 
-run: ${targetname}.exe build/final_${PASSTHROUGH_WIDTH}.xclbin build/insts.txt
+run: ${targetname}.exe build/final_${PASSTHROUGH_WIDTH}.xclbin
 	${powershell} ./$< -x build/final_${PASSTHROUGH_WIDTH}.xclbin -i build/insts.txt -k MLIR_AIE
 
 clean:

diff --git a/programming_guide/section-3/README.md b/programming_guide/section-3/README.md
@@ -46,8 +46,8 @@ We also need to declare that the compute core will run an external function: a k
 
 ```python
         # Type declarations
-        tensor_ty = np.ndarray[(4096,), np.dtype[np.int16]]
-        tile_ty = np.ndarray[(1024,), np.dtype[np.int16]]
+        tensor_ty = np.ndarray[(4096,), np.dtype[np.int32]]
+        tile_ty = np.ndarray[(1024,), np.dtype[np.int32]]
         scalar_ty = np.ndarray[(1,), np.dtype[np.int32]]
 
         # AIE Core Function declarations
@@ -105,7 +105,7 @@ This access and execute pattern runs on the AIE compute core `ComputeTile2` and
 
 ## Kernel Code
 
-We can program the AIE compute core using C++ code and compile it with `xchesscc` into a kernel object file. For our local version of vector scalar multiply, we will use a generic implementation of the `scale.cc` source (called [vector_scalar_mul.cc](./vector_scalar_mul.cc)) that can run on the scalar processor part of the AIE. The `vector_scalar_mul_aie_scalar` function processes one data element at a time, taking advantage of AIE scalar datapath to load, multiply and store data elements.
+We can program the AIE compute core using C++ code and compile it with the selected single-core AIE compiler into a kernel object file. For our local version of vector scalar multiply, we will use a generic implementation of the `scale.cc` source (called [vector_scalar_mul.cc](./vector_scalar_mul.cc)) that can run on the scalar processor part of the AIE. The `vector_scalar_mul_aie_scalar` function processes one data element at a time, taking advantage of AIE scalar datapath to load, multiply and store data elements.
 
 ```c
 void vector_scalar_mul_aie_scalar(int32_t *a_in, int32_t *c_out,

diff --git a/programming_guide/section-4/section-4b/Makefile b/programming_guide/section-4/section-4b/Makefile
@@ -10,11 +10,12 @@ srcdir := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))
 
 include ${srcdir}/../../../programming_examples/makefile-common
 
+
 all: build/final.xclbin
 
 targetname = myFirstProgram
-
 trace_size = 8192
+CHESS ?= true
 
 build/aie.mlir: ${srcdir}/aie2.py
 	mkdir -p ${@D}
@@ -26,18 +27,26 @@ build/aie_trace.mlir: ${srcdir}/aie2.py
 
 build/scale.o: ${srcdir}/vector_scalar_mul.cc
 	mkdir -p ${@D}
-	cd ${@D} && ${PEANO_INSTALL_DIR}/bin/clang++ ${PEANOWRAP2_FLAGS} -c $< -o ${@F}
+ifeq ($(CHESS), true)
+	cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $< -o ${@F}; 
+else 
+	cd ${@D} && ${PEANO_INSTALL_DIR}/bin/clang++ ${PEANOWRAP2_FLAGS} -c $< -o ${@F}; 
+endif
 
 build/final.xclbin: build/aie.mlir build/scale.o
 	mkdir -p ${@D}
 	cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
-				--no-xchesscc --no-xbridge \
+    			$(if $(shell [ $(CHESS) != true ] && echo true), \
+    			    --no-xchesscc --no-xbridge \
+    			) \
 				--aie-generate-npu --npu-insts-name=insts.txt $(<:%=../%)
 
 build/trace.xclbin: build/aie_trace.mlir build/scale.o
 	mkdir -p ${@D}
-	cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
-				--no-xchesscc --no-xbridge \
+	cd ${@D} && aiecc.py -v --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
+    			$(if $(shell [ $(CHESS) != true ] && echo true), \
+    			    --no-xchesscc --no-xbridge \
+    			) \
 				--aie-generate-npu --npu-insts-name=insts.txt $(<:%=../%)
 
 ${targetname}.exe: ${srcdir}/test.cpp

diff --git a/programming_guide/section-4/section-4b/README.md b/programming_guide/section-4/section-4b/README.md
@@ -34,22 +34,34 @@ Enabling trace support can be done with the following steps:
 Enabling tracing means (1a) configuring the trace units for a given tile and then (1b) routing the generated events packets through the stream switches to the shim DMA where we can write them to a buffer in DDR for post-runtime processing.
 
 ### <u>(1a) Configure trace units for an AIE tile</u>
-The first necessary component for trace configuration is setting the right values for the trace control registers for each tile that we want to enable tracing for. In addition, the generated trace packets will need to be routed to shimDMA and then written to one of the 3 inout buffers. We have abstracted these two steps with the python wrapper function `configure_simple_tracing_aie2` which is in [python/utils/test.py](../../../python/utils/test.py) and is described in more detail the [README](../../../python/utils) under `python/utils`. An example of how this function is used is shown below for quick reference
+The first necessary component for trace configuration is setting the right values for the trace control registers for each tile that we want to enable tracing for. In addition, the generated trace packets will need to be routed to shimDMA and then written to one of the 3 inout buffers. We have abstracted these two steps with the python wrapper function `configure_packet_tracing_aie2` which is in [python/utils/test.py](../../../python/utils/test.py) and is described in more detail the [README](../../../python/utils) under `python/utils`. An example of how this function is used is shown below for quick reference
 ```python
-    trace_utils.configure_simple_tracing_aie2(
-        ComputeTile2,
-        ShimTile,
-        ddr_id=2,
-        size=traceSizeInBytes,
-        offset=tensorSize,
-    )
+    trace_utils.configure_packet_tracing_aie2(tiles_to_trace, ShimTile, opts.trace_size, 4096*4)
 ```
+The arguments for this example are
+* *tiles_to_trace* - array of compute tiles we want to trace
+* *ShimTile* - shim tile that the trace is going out to
+* *opts.trace_size* - the trace buffer size in bytes
+* *4096*4* - the output buffer offset in bytes where the trace data begins
+
 This block is defined within the sequence definition for `@runtime_sequence` where we define the shimDMA data movement to the 3 inout buffers.
-> **Note** This simplification works very well for the trace buffer from a single tile to the shimDMA. However, if we want to do something more advaned like allocating the trace buffer from multiple tiles into a single larger buffer, this function will not be able to express that. For that, please consult the [README](../../../python/utils) under `python/utils` for more guidance on how to customize the trace configuration.
+> **Note** This simplified wrapper is an enahnced version of the simpler `configure_simple_tracing_aie2` used previously which routed the trace from a single compute tile using circuit switched routing. This enhanced version relies on packet swtiched routing and supports tracing from multiple tiles by synchronizing the start event for each tile's trace unit to a user generated event. More details can be found in the [README](../../../python/utils) under `python/utils` for more guidance on how to customize the trace configuration.
 
 ### <u>(1b) Define trace event routes from tile to shimDMA</u>
 Once the trace units and shimDMA are configured, we need to define how the trace packets are routed from compute tile to shim tile. This is done via circuit switched flows or packet switched flows as described below. Note that trace units in the MemTile and ShimTile can also be configured and routed.
 
+We can simplify the defining the packet switched flows for the tiles we're tracing with the function `configure_packet_tracing_flow` defined in [python/utils/test.py](../../../python/utils/test.py) and is described in more detail the [README](../../../python/utils) under `python/utils`. An example of how this function is used is shown below for quick reference
+```python
+    trace_utils.configure_packet_tracing_flow(tiles_to_trace, ShimTile)
+```
+The arguments for this example are
+* *tiles_to_trace* - array of compute tiles we want to trace
+* *ShimTile* - shim tile that the trace is going out to
+
+> **Note** The synchronization of this function with the previous is `configure_packet_tracing_aie` is important because we track the route IDs and bd numbers of each configured trace. Do not mix and match these with circuit switched routing as they are intended to work together as a packet tracing pair.
+
+More details about the mechanics for circuit and packet switched flows is described below if interested. Otherwise, you can skip ahead to 2. Configure host code to read trace data and write it to a text file.
+
 #### <u>Circuit switched flows</u>
 An example of a simple circuit switch routing flow to route trace event packets from a compute tile to a shimDMA would be: