Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clock tree appears to be unbalanced for BoomTile running at 833 MHz #5970

Open
jeffng-or opened this issue Oct 16, 2024 · 9 comments
Open

Clock tree appears to be unbalanced for BoomTile running at 833 MHz #5970

jeffng-or opened this issue Oct 16, 2024 · 9 comments
Assignees
Labels
cts Clock Tree Synthesis

Comments

@jeffng-or
Copy link
Contributor

Describe the bug

For megaboom v6, I upped the clock frequency to 833 MHz and the resulting clock tree appears to be unbalanced (see the branch of the tree on the far right in the viewer) and with significant skew:

BoomTile_ClockTree

>>> report_clock_skew
Clock clock
1742.25 source latency frontend/bpd/banked_predictors_0/loop/s1_update_bits_meta[32]$_DFF_P_/CLK ^
-1407.99 target latency frontend/bpd/banked_predictors_1/btb/meta_1_ext/R0_clk ^
 -17.80 CRPR
--------------
 316.47 setup skew

>>> report_clock_latency
Clock clock
rise -> rise
    min     max
   0.00    0.00 source latency
1440.04         network latency dcache/data/array_3_1_ext/R0_clk
        1838.54 network latency dcache/data/io_resp_1_0_REG[75]$_DFF_P_/CLK
---------------
1440.04 1838.54 latency
         398.50 skew

fall -> fall
    min     max
   0.00    0.00 source latency
1527.54         network latency dcache/data/array_3_1_ext/R0_clk
        1954.27 network latency dcache/data/io_resp_1_0_REG[75]$_DFF_P_/CLK
---------------
1527.54 1954.27 latency
         426.73 skew

Tom mentioned that sometimes the clock tree will be unbalanced when connecting to macros, so I'd like to get some feedback on whether that's the case here. Note that prior versions of BoomTile with a slower clock generated a balanced clock tree with not as much skew.

Expected Behavior

Balanced clock tree with minimal skew

Environment

[WARNING] Your current OpenROAD version is outdated.
It is recommened to pull the latest changes.
If problem persists, file a github issue with the re-producible test case.
kernel: Linux 6.5.0-1025-gcp
os: Ubuntu 22.04.4 LTS (Jammy Jellyfish)
cmake version 3.24.2
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- OpenROAD version: v2.0-16316-gf9cfd9383
-- System name: Linux
-- Compiler: GNU 11.4.0
-- Build type: RELEASE
-- Install prefix: /usr/local
-- C++ Standard: 17
-- C++ Standard Required: ON
-- C++ Extensions: OFF
-- The C compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Performing Test C_COMPILER_SUPPORTS__-Wall
-- Performing Test C_COMPILER_SUPPORTS__-Wall - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wall
-- Performing Test CXX_COMPILER_SUPPORTS__-Wall - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-array-bounds
-- Performing Test C_COMPILER_SUPPORTS__-Wno-array-bounds - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-array-bounds
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-array-bounds - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-nonnull
-- Performing Test C_COMPILER_SUPPORTS__-Wno-nonnull - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-nonnull
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-nonnull - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-maybe-uninitialized
-- Performing Test C_COMPILER_SUPPORTS__-Wno-maybe-uninitialized - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-maybe-uninitialized
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-maybe-uninitialized - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format-overflow
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format-overflow - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format-overflow
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format-overflow - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-variable
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-variable - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-variable
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-variable - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-function
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-function - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-function
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-function - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-write-strings
-- Performing Test C_COMPILER_SUPPORTS__-Wno-write-strings - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-write-strings
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-write-strings - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-sign-compare
-- Performing Test C_COMPILER_SUPPORTS__-Wno-sign-compare - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-sign-compare
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-sign-compare - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-deprecated
-- Performing Test C_COMPILER_SUPPORTS__-Wno-deprecated - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-deprecated
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-deprecated - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-c++11-narrowing
-- Performing Test C_COMPILER_SUPPORTS__-Wno-c++11-narrowing - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-c++11-narrowing
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-c++11-narrowing - Failed
-- Performing Test C_COMPILER_SUPPORTS__-Wno-register
-- Performing Test C_COMPILER_SUPPORTS__-Wno-register - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-register
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-register - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal
-- PerformingCMake Warning (dev) at src/sta/CMakeLists.txt:32 (option):
  Policy CMP0077 is not set: option() honors normal variables.  Run "cmake
  --help-policy CMP0077" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  For compatibility with older versions of CMake, option is clearing the
  normal variable 'USE_TCL_READLINE'.
This warning is for project developers.  Use -Wno-dev to suppress it.

 Test C_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal - Failed
-- Performing Test C_COMPILER_SUPPORTS__-fpermissive
-- Performing Test C_COMPILER_SUPPORTS__-fpermissive - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-fpermissive
-- Performing Test CXX_COMPILER_SUPPORTS__-fpermissive - Success
-- Performing Test C_COMPILER_SUPPORTS__-x
-- Performing Test C_COMPILER_SUPPORTS__-x - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-x
-- Performing Test CXX_COMPILER_SUPPORTS__-x - Failed
-- Performing Test C_COMPILER_SUPPORTS__c++
-- Performing Test C_COMPILER_SUPPORTS__c++ - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__c++
-- Performing Test CXX_COMPILER_SUPPORTS__c++ - Failed
-- Performing Test C_COMPILER_SUPPORTS__-std=c++17
-- Performing Test C_COMPILER_SUPPORTS__-std=c++17 - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-std=c++17
-- Performing Test CXX_COMPILER_SUPPORTS__-std=c++17 - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-but-set-variable
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-but-set-variable - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-but-set-variable
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-but-set-variable - Success
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- TCL readline library: /usr/lib/x86_64-linux-gnu/libtclreadline.so
-- TCL readline header: /usr/include/x86_64-linux-gnu
-- Found SWIG: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/bin/swig (found suitable version "4.1.0", minimum required is "4.0")  
-- Using SWIG >= 4.1.0 -flatstaticmethod flag for python
-- Found Boost: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/lib/cmake/Boost-1.80.0/BoostConfig.cmake (found version "1.80.0")  
-- boost: 1.80.0
-- Found GTest: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/lib/cmake/GTest/GTestConfig.cmake (found version "1.13.0")  
-- GTest: 1.13.0
-- Found Python3: /usr/include/python3.10 (found version "3.10.12") found components: Development Development.Module Development.Embed 
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
-- Found Threads: TRUE  
-- spdlog: 1.8.1
-- Found BISON: /usr/bin/bison (found version "3.8.2") 
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- STA version: 2.6.0
-- STA git sha: b5f3a02b33b8ae1739ace8a329fde94434711dd6
-- System name: Linux
-- Compiler: GNU 11.4.0
-- Build type: RELEASE
-- Build CXX_FLAGS: -O3 -DNDEBUG
-- Install prefix: /usr/local
-- Found FLEX: /usr/bin/flex (found version "2.6.4") 
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- TCL readline library: /usr/lib/x86_64-linux-gnu/libtclreadline.so
-- TCL readline header: /usr/include/x86_64-linux-gnu/tclreadline.h
-- CUDD library: /usr/local/lib/libcudd.a
-- CUDD header: /usr/local/include/cudd.h
-- SSTA: 0
-- Found SWIG: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/bin/swig (found suitable version "4.1.0", minimum required is "3.0")  
-- STA executable: /home/jeffng/dev/main/OpenROAD-flow-scripts/tools/OpenROAD/src/sta/app/sta
-- Found re2: /opt/or-tools/lib/cmake/re2/re2Config.cmake (found version "11.0.0") 
-- Found Clp: /opt/or-tools/lib/cmake/Clp/ClpConfig.cmake (found version "1.17.7") 
-- Found Cbc: /opt/or-tools/lib/cmake/Cbc/CbcConfig.cmake (found version "2.10.7") 
-- Found SCIP: /opt/or-tools/lib/cmake/scip/scip-config.cmake (found version "9.0.0") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Found OR-Tools: /opt/or-tools/lib/cmake/ortools (version: 9.10.4067)
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Found OpenMP: TRUE (found version "4.5")  
-- GUI is enabled
-- Charts widget is enabled
-- FounNumber of processor cores: 32
d Boost: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/lib/cmake/Boost-1.80.0/BoostConfig.cmake (found version "1.80.0") found components: serialization 
-- Could NOT find VTune (missing: VTune_LIBRARIES VTune_INCLUDE_DIRS) 
-- Found Boost: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/lib/cmake/Boost-1.80.0/BoostConfig.cmake (found suitable version "1.80.0", minimum required is "1.78")  
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- Found Boost: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/lib/cmake/Boost-1.80.0/BoostConfig.cmake (found version "1.80.0") found components: serialization system thread 
-- Found Boost: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/lib/cmake/Boost-1.80.0/BoostConfig.cmake (found version "1.80.0")  
-- Found Eigen3: /home/jeffng/dev/main/OpenROAD-flow-scripts/dependencies/share/eigen3/cmake/Eigen3Config.cmake (found version "3.4.0") 
-- TCL readline enabled
-- Tcl Extended disabled
-- Python3 enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/tmp.Bt0W42tFUE

To Reproduce

  1. unpack the tarball https://drive.google.com/file/d/1eVCheqXsRstEgBQLqIi-H5i4qwO9Z6Dh/view?usp=sharing
  2. cd cts_BoomTile_asap7_base_2024-10-16_10-40
  3. ./run-me-BoomTile-asap7-base.sh gui <- I added the second argument to just pull up the GUI instead of running CTS
  4. type gui::show if the GUI doesn't come up (might be fixed)
  5. view clock tree in Clock Tree Viewer

I kept 3_place.odb in the tarball in case it's helpful.

Relevant log output

No response

Screenshots

No response

Additional Context

No response

@jeffng-or jeffng-or added the cts Clock Tree Synthesis label Oct 16, 2024
@maliberty
Copy link
Member

@jeffng-or if you zoom in the leafs on the skewed side you can select them to see what instances they are.

@arthurjolo
Copy link
Contributor

@jeffng-or Do you know if there are any clock gaters on this design? Clock gaters have some issues currently.

@jeffng-or
Copy link
Contributor Author

@jeffng-or Do you know if there are any clock gaters on this design? Clock gaters have some issues currently.

AFAIK, no clock gaters. The only cells used in the clock tree look to be bufs or invs:

(BUFx10_ASAP7_75t_R)
(BUFx12_ASAP7_75t_R)
(BUFx12f_ASAP7_75t_R)
(BUFx16f_ASAP7_75t_R)
(BUFx24_ASAP7_75t_R)
(BUFx4f_ASAP7_75t_R)
(BUFx6f_ASAP7_75t_R)
(CKINVDCx10_ASAP7_75t_R)
(CKINVDCx11_ASAP7_75t_R)
(CKINVDCx12_ASAP7_75t_R)
(CKINVDCx14_ASAP7_75t_R)
(CKINVDCx16_ASAP7_75t_R)
(CKINVDCx20_ASAP7_75t_R)
(CKINVDCx5p33_ASAP7_75t_R)
(CKINVDCx6p67_ASAP7_75t_R)
(CKINVDCx8_ASAP7_75t_R)
(CKINVDCx9p33_ASAP7_75t_R)
(INVx13_ASAP7_75t_R)
(INVx3_ASAP7_75t_R)
(INVx5_ASAP7_75t_R)
(INVx6_ASAP7_75t_R)
(INVx8_ASAP7_75t_R)
(INVxp33_ASAP7_75t_R)
(INVxp67_ASAP7_75t_R)

@jeffng-or
Copy link
Contributor Author

@jeffng-or if you zoom in the leafs on the skewed side you can select them to see what instances they are.

The far right branch connects to one macro: dcache/data/array_1_0_ext/R0_clk.

The other long branch nearer to the middle of the clock tree is clknet_leaf_402_clock_regs, which connects to:

  • dcache/data/io_resp_1_0_REG[114]$DFF_P/CLK
  • dcache/data/io_resp_1_0_REG[116]$DFF_P/CLK
  • dcache/data/io_resp_1_0_REG[117]$DFF_P/CLK
  • dcache/data/io_resp_1_0_REG[72]$DFF_P/CLK
  • dcache/data/io_resp_1_0_REG[75]$DFF_P/CLK
  • dcache/data/io_resp_1_0_REG[76]$DFF_P/CLK
  • dcache/data/io_resp_1_0_REG[91]$DFF_P/CLK
  • dcache/data/io_resp_1_0_REG[99]$DFF_P/CLK

@maliberty
Copy link
Member

It branches off the tree very high up. If you look at where it branches do you see a gate?

@jeffng-or
Copy link
Contributor Author

It branches off the tree very high up. If you look at where it branches do you see a gate?

I don't think so. I checked the connections at the colored dots in the clock tree:

image

  • Red: clkbuf_regs_0_clock (BUFx24_ASAP7_75t_R) drives the left most branch
  • Orange: clkbuf_0_clock (BUFx24_ASAP7_75t_R) drives the right most branch
  • Blue: path goes through several layers of delaybuffers until about 796 ps when delaybuf_7_clock/Y connects to clkbuf_1_1_0_clock/A and clkbuf_1_0_0_clock/A
  • Green: The clkbuf_1_0_0_clock/A path eventually connects to the dcache/data/array_1_0_ext/R0_clk, I think. All of the cells in the clock tree between clkbuf_1_0_0_clock and dcache/data/array_1_0_ext/R0_clk are some type of BUFx

In case you want to look at the clock tree in more detail, I have the output from a clock tree connectivity dumper that I wrote (no guarantee that it's totally correct). The output is a text file which can be found in: https://drive.google.com/file/d/1bL4rkNu9erDrZYoNKofA7MbXbJbnxByT/view?usp=sharing

@maliberty
Copy link
Member

@arthurjolo is this due to splitting macros from std cells?

@arthurjolo
Copy link
Contributor

I am going to open the design and take a close look on it, but I don't think this is due to the splitting of macros from std cells. On the macros branch Jeff mentioned that there are delay buffer so the average arrival time on the macro branch was smaller then on the std cells, so the other macros seem to have a better arrival time. I believe that CTS did a poor job connecting to dcache/data/array_1_0_ext/R0_clk, I am going to try to understand why.

@arthurjolo
Copy link
Contributor

One thing that I noticed from Jeff's dump is that on the array_1_0_ext/R0_clk path there are 7 wire buffers between the leaf clk buffer and the CK pin, while on other leafs there are 3 or less buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cts Clock Tree Synthesis
Projects
None yet
Development

No branches or pull requests

3 participants