-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubleshooting RR graphs with poor router performance #2803
Comments
@rs-dhow @AlexandreSinger @vaughnbetz I believe I have found the smoking gun issue in this setup. It appears that concurrently with fixing the bugs in the RR graph generation I introduced another error in which all switch_id values in the RR graph were being set to zero. It would appear that when the router is fed an RR graph comprised entirely of switch_id = 0 (vpr_delayless_switch) edges that something in the router goes haywire. Chalk this up to the hazards of fixing too many things at once while trying to turn out an eFPGA quickly for a short IC design cycle. I am reporting after running only one round of experiments, but it would appear that much of the background I provided above is not relevant to the underlying issue. Let me know what's helpful from a VPR development standpoint re: next steps; in the meantime my bug fix will be subjected to a much larger battery of tests. |
Thanks @petergrossmann21 . I'm surprised setting the delay of all switches to 0 made the router go haywire, unless the delay of the nodes is also 0 (that could start getting strange, as the router might not see a benefit to branching off shorter paths for critical connections). But fixing that bug and re-running certainly makes sense. There are a couple of router report options that can be helpful in debugging issues like this. See the option at https://docs.verilogtorouting.org/en/latest/vpr/command_line_usage/#cmdoption-vpr-generate_rr_node_overuse_report and the one right above it; turning on the graphics and selecting the congestion display after a failed routing can also help. |
@vaughnbetz I will need to confirm, but there's a very good chance the node delay is also zero. Additional testing is showing that correcting the switch_id settings is producing sensible results across several strawman architectures I use to verify my eFPGA generator. For context, my generator as a few different settings for delay modeling, most of which are still in beta as we have been codesigning them with ongoing IC design work. The baseline is a "zero" delay model in which no delay tags are provided in the , but a 1ns delay is furnished for the default switches in the section of the arch XML. Because all RC values are also set to zero, the 1ns delays in these switch types should be the only delays used in this architecture model. Thanks for pointing me towards the router report options. |
If the delay of everything is 0, the normalizations we do for resource cost could also start going crazy; we normalize the resource costs to the average delay so they have the same units. If the average delay is 0, you might normalize everything to infinity. If we silently let that happen we should probably add an assert for that case during resource cost normalization at the start of the router. |
In a revision of an existing architecture to clean up RR graph bugs, the router wirelength and runtime have both blown up for test circuits with size on the order of thousands of LUTs. The particulars are posted to current behavior:
Both RR graphs are generated from scratch (not derived/revised from VPR's RR graph generator). The QOR of the original graph is benchmarked against the RR graph VPR generates from the arch XML and found to be roughly equal (VPR's graph wirelength: 71017, runtime also ~15 seconds). The VPR arch XML generates a relatively different graph vs. the custom graph, discussed in current behavior.
The salient differences between custom graphs are:
Original graph:
Revised graph:
Expected Behaviour
Because the graphs are so similar, the expected behavior is to see similar wirelength and router results across both RR graph XML versions.
Current Behaviour
The architecture is a customized eFPGA with some unusual grid properties:
*The perimeter X/Y locations (x=0, y=0, y=30, x=36) are all EMPTY
*There are two IO block types that differ only in the number of I/Os located in them:
a. two copies of an IOB with 8 I/Os per IOB are located on each of the four sides of the array. These are the IOBs with the spurious TOP side IPIN in the original graph.
b. 24 copies of an IOB with 50 IOs per IOB are located on three of the four sides of the array.
c. leftover boundary tile locations (x=1, y=1, y=29, x=35) are populated with CLBs.
d. fc_in=0.15 and fc_out=0.15 for all pb_types.
example test circuit: synthetic finite state machine, packs into 631 CLBs with 8 4-LUTs each.
original graph wirelength: 70173
original graph runtime: ~15 seconds
current graph wirelength: routing fails.
current graph runtime: > 30 minutes
I have also shown that if I modify the original RR graph by manually deleting the edges from the CHANX/CHANY nodes to the IOB IPINs with side=TOP (but not deleting the nodes), the wirelength and runtime are effectively the same as the original RR graph.
Regarding a comparison to a VPR-generated RR graph derived from the original architecture XML, the following observations can be made:
*VPR inserts an additional row and column of routing channels at X=0, Y=0 that are not included in the custom RR graph
*The custom RR graph attempts to compensate for the missing row/column by looping channels that would go to X=0, Y=0 back to X=1, Y=1, respectively.
*VPR's routing channel index permutation algorithm for creating edges from CHANX/CHANY to IPINs differs from the custom graph generator's algorithm (original and bug fixed).
Possible Solution
If the mere existence of a second side for the IPIN node helps VPR in some way (for example, by fooling the router lookahead into making better choices), it would be nice to understand why this is the case, to help motivate modifying my graph generator to add these nodes systematically instead of as part of a bug.
Additionally, if there is risk in allowing the architecture XML to specify the architecture in a way that implies a different RR graph than what is loaded with
--read_vpr_graph
, it would be good to know what problems this kind of deviation can cause.Finally, some observations from other experiments I have run attempting to improve the health of the architecture post-bug fix:
Steps to Reproduce
Reproduction of the problem is possible on request. The original graph is available in Zero ASIC's logik release, and current graphs for the same architecture can be furnished for collaboration purposes if deemed appropriate.
Context
The goal of this work is to generate an eFPGA core to insert into an SoC. A custom RR graph generator is needed to properly support genfasm and to ensure CAD correspondence to the eFPGA hardware. For these reasons, we rely on generating a custom RR graph rather than reverse-engineering VPR's auto-generated RR graph. Given that, whatever can be done to keep the router stable vs. tiny perturbations of the routing graph is beneficial.
Your Environment
The text was updated successfully, but these errors were encountered: