Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RWRoute] Add --lutRoutethru option #932

Merged
merged 37 commits into from
Jan 12, 2024
Merged

[RWRoute] Add --lutRoutethru option #932

merged 37 commits into from
Jan 12, 2024

Conversation

eddieh-xlnx
Copy link
Collaborator

@eddieh-xlnx eddieh-xlnx commented Jan 4, 2024

Add an option (disabled by default) that enables unused LUT resources to be routed through.


When LUT routethrus are enabled, optimizations added in #888 and #893 must be fully and partially disabled. Recall that these optimizations prevent input site pins (PINFEEDs) from being explored unless they're the target, and also prevent some more nodes that fanout to such input site pins when not in the vicinity of the target, respectively. For LUT routethrus to be considered, these optimizations must be fully/partially disabled, respectively.

Currently, this PR disables these optimizations across the entire device, meaning that all possible LUT routethrus in a connection's bounding box will be considered. A future optimization that may be beneficial would be to dynamically disable the #888 and #893 optimizations only in the vicinity of the source/sink tiles where routability is the most constrained.


Another intricacy is that the device model presents a number of routethru PIPs for each LUT; e.g. for the A LUT, there are 12 PIPs:

  • A[1-6] -> A_O
  • A[1-6] -> AMUX

One PIP for each of the 6 LUT inputs, crossed with one PIP for each SLICE output.
A limitation here appears to be that Vivado only allows one net to be routed through this LUT, meaning that you can't use A_O for one net and AMUX for another net by fracturing the LUT.

Thus, in this PR, only the A[1-6] -> A_O routethrus are considered, and all AMUX routethrus are discarded in order to guarantee that the LUT does not get fractured. A future optimization would be to emulate a PIP between A_O and AMUX so that both output nodes can still be considered for the same net.

Since we expect sink RouteNode's to explicit set its type to be PINFEED_I

Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
Since their A and WA inputs are shared

Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
…nns too"

This reverts commit b049714.

Signed-off-by: Eddie Hung <[email protected]>

Conflicts:
	test/src/com/xilinx/rapidwright/design/tools/TestLUTTools.java
Signed-off-by: Eddie Hung <[email protected]>
Signed-off-by: Eddie Hung <[email protected]>
@eddieh-xlnx
Copy link
Collaborator Author

In the same style of #895 and #888, here's a comparison on the benchmarks from the FPGA24 Routing Contest.

Before:

+--------------------------------+------+------------------+
| Benchmark                      | Pass | Wall Clock (sec) |
+--------------------------------+------+------------------+
| boom_med_pb_rwroute            | True |           262.08 |
| vtr_mcml_rwroute               | True |           279.52 |
| rosetta_fd_rwroute             | True |           196.19 |
| corundum_25g_rwroute           | True |           290.14 |
| vtr_lu64peeng_rwroute          | True |           272.99 |
| corescore_500_rwroute          | True |           185.01 |
| corescore_500_pb_rwroute       | True |           320.64 |
| mlcad_d181_lefttwo3rds_rwroute | True |          2042.37 |
| koios_dla_like_large_rwroute   | True |           466.00 |
| boom_soc_rwroute               | True |          1389.25 |
| ispd16_example2_rwroute        | True |           685.60 |
+--------------------------------+------+------------------+

Total: 6390 seconds.

After:

+--------------------------------+------+------------------+
| Benchmark                      | Pass | Wall Clock (sec) |
+--------------------------------+------+------------------+
| boom_med_pb_rwroute            | True |           294.62 |
| vtr_mcml_rwroute               | True |           364.77 |
| rosetta_fd_rwroute             | True |           222.55 |
| corundum_25g_rwroute           | True |           345.14 |
| vtr_lu64peeng_rwroute          | True |           323.03 |
| corescore_500_rwroute          | True |           215.31 |
| corescore_500_pb_rwroute       | True |           377.79 |
| mlcad_d181_lefttwo3rds_rwroute | True |          1193.92 |
| koios_dla_like_large_rwroute   | True |           549.58 |
| boom_soc_rwroute               | True |          1729.45 |
| ispd16_example2_rwroute        | True |           806.28 |
+--------------------------------+------+------------------+

Total: 6422 seconds (0.5% increase).

However, it looks like most benchmarks saw a small increase in runtime, except for mlcad_d181_lefttwo3rds_rwroute which had a massive halving.


And some statistics from TestRWRoute.testNonTimingDrivenFullRoutingWithGlobalNet():

Before:

Total wirelength:                       434815
Total INT tile nodes:                   837255
Total rnodes created:                  2860250
Average #children per node:                  4
------------------------------------------------------------------------------
Num iterations:                             14
Connections routed:                     374009
Nodes pushed:                        141263736
Nodes popped:                         65278765

After:

Total wirelength:                       399364
Total INT tile nodes:                   810400
Total rnodes created:                  3261557
Average #children per node:                  4
------------------------------------------------------------------------------
Num iterations:                             15
Connections routed:                     357638
Nodes pushed:                        211999922
Nodes popped:                        104715226

In terms of these RWRoute statistics, there is a noticeable wirelength saving, but a big increase in the nodes pushed/popped.

@eddieh-xlnx
Copy link
Collaborator Author

All in all, it looks like this option only improves runtime for one benchmark (but would be expected to reduce wirelength for all benchmarks) so it's not enabled by default.

@eddieh-xlnx eddieh-xlnx marked this pull request as ready for review January 8, 2024 17:35
src/com/xilinx/rapidwright/rwroute/RWRoute.java Outdated Show resolved Hide resolved
// Rename SiteInst (away from "STATIC_SOURCE_<siteName>") and
// attach it to the design so that intra-site routing updates take effect
si.setName(site.getName());
design.addSiteInst(si);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you demonstrate that this doesn't cause an error when importing the designs back into Vivado? Unless a placed cell exists (which a route thru may suffice), my understanding is that this could lead to an issue.

Copy link
Collaborator Author

@eddieh-xlnx eddieh-xlnx Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a breakpoint hit this line in the newly added testtestNonTimingDrivenFullRoutingWithLutRoutethru with bnn.dcp for SLICE_X79Y201, and for optical-flow.dcp for SLICE_X66Y252 and 10 others.

For bnn.dcp, even after RWRoute is done, that SiteInst has no placed cells (recall that thru-site PIPs do not show up as routethru cells), yet Vivado 2022.1 seems to accept it just fine and reports a clean routing result.

image
(highlighted net is GLOBAL_LOGIC0; rest of slice is completely empty aside from these two routethrus)

Vivado 2022.1 also accepts this optical-flow.dcp routed result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to upgrade our terminology as there are "routethrus" that terminate at a cell in the site and
other "routethrus" that don't (being specified by a PIP).

src/com/xilinx/rapidwright/rwroute/RouteNodeGraph.java Outdated Show resolved Hide resolved
Signed-off-by: Eddie Hung <[email protected]>
Conflicts:
	src/com/xilinx/rapidwright/rwroute/RWRoute.java
	src/com/xilinx/rapidwright/rwroute/RouteNodeGraph.java
src/com/xilinx/rapidwright/rwroute/RWRoute.java Outdated Show resolved Hide resolved
// Rename SiteInst (away from "STATIC_SOURCE_<siteName>") and
// attach it to the design so that intra-site routing updates take effect
si.setName(site.getName());
design.addSiteInst(si);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to upgrade our terminology as there are "routethrus" that terminate at a cell in the site and
other "routethrus" that don't (being specified by a PIP).

Co-authored-by: Chris Lavin <[email protected]>
Signed-off-by: eddieh-xlnx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants