Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected High Gain Causes Unrelated Blocks to Be Added to Clusters #2763

Open
WindFrank opened this issue Oct 10, 2024 · 0 comments
Open

Comments

@WindFrank
Copy link

WindFrank commented Oct 10, 2024

Hi VTR team,
I've been investigating the root cause of the inferior solution reported in issue #2726. After analyzing the VTR source code and performing some active debugging, I discovered that unrelated blocks with a high gain value are being unexpectedly added into a cluster, which is calculated in the vpr/src/pack/cluster_util.cpp file, specifically in the function get_molecule_gain. This issue occurs regardless of whether the “allow_unrelated_clustering” option is set to “auto” or “off.”

Expected Behaviour

The molecules condition_program_first2713^MIN~52-1[0] and condition_program_first2713^MIN~52-0[0] should not be added to the condition_program_first5685 cluster, as they have no logical connection to it.

Current Behaviour

I started by trying to localize the problem that leads to the block condition_program_first2713^MIN~52-1[0] being added to a cluster where all other blocks belong to condition_program_first5685. After some debugging, I focused on the following code snippet in the vpr/pack/cluster_util.cpp file:

for (j = pb->pb_stats->num_feasible_blocks - 1; j >= 0; j--) {
    if (get_molecule_gain(pb->pb_stats->feasible_blocks[j], gain, cluster_att_grp, attraction_groups, num_molecule_failures) > 
        get_molecule_gain(molecule, gain, cluster_att_grp, attraction_groups, num_molecule_failures)) {
        pb->pb_stats->feasible_blocks[j + 1] = pb->pb_stats->feasible_blocks[j];
    } else {
        pb->pb_stats->feasible_blocks[j + 1] = molecule;
        break;
    }
}
if (j < 0) {
    pb->pb_stats->feasible_blocks[0] = molecule;
}

The value of gain determines whether the molecule should be given a higher priority. I added several VTR_LOG statements as breakpoints to observe the gain value of condition_program_first2713^MIN~52-1[0] in the function get_molecule_gain:

float get_molecule_gain(t_pack_molecule* molecule, std::map<AtomBlockId, float>& blk_gain, AttractGroupId cluster_attraction_group_id, AttractionInfo& attraction_groups, int num_molecule_failures) {
    ... //declaration

    num_introduced_inputs_of_indirectly_related_block = 0;
    for (i = 0; i < get_array_size_of_molecule(molecule); i++) {
        auto blk_id = molecule->atom_block_ids[i];
        if (blk_id) {
            if (blk_gain.count(blk_id) > 0) {
                gain += blk_gain[blk_id];
                VTR_LOG("gain += blk_gain[blk_id]: %s %s\n", atom_ctx.nlist.block_name(blk_id).c_str(), std::__cxx11::to_string(gain).c_str());
            } else {
                /* This block has no connection with current cluster, penalize molecule for having this block
                 */
                for (auto pin_id : atom_ctx.nlist.block_input_pins(blk_id)) {
                    auto net_id = atom_ctx.nlist.pin_net(pin_id);
                    VTR_ASSERT(net_id);
                    auto driver_pin_id = atom_ctx.nlist.net_driver(net_id);
                    VTR_ASSERT(driver_pin_id);
                    auto driver_blk_id = atom_ctx.nlist.pin_block(driver_pin_id);
                    num_introduced_inputs_of_indirectly_related_block++;
                    for (int iblk = 0; iblk < get_array_size_of_molecule(molecule); iblk++) {
                        if (molecule->atom_block_ids[iblk] && driver_blk_id == molecule->atom_block_ids[iblk]) {
                            //valid block which is driver (and hence not an input)
                            num_introduced_inputs_of_indirectly_related_block--;
                            VTR_LOG("num_introduced_inputs_of_indirectly_related_block: %s\n", std::__cxx11::to_string(num_introduced_inputs_of_indirectly_related_block).c_str());
                            break;
                        }
                    }
                }
            }
            AttractGroupId atom_grp_id = attraction_groups.get_atom_attraction_group(blk_id);
            if (atom_grp_id == cluster_attraction_group_id && cluster_attraction_group_id != AttractGroupId::INVALID()) {
                float att_grp_gain = attraction_groups.get_attraction_group_gain(atom_grp_id);
                gain += att_grp_gain;
                VTR_LOG("gain += att_grp_gain: %s %s\n", atom_ctx.nlist.block_name(blk_id).c_str(), std::__cxx11::to_string(gain).c_str());
            } else if (cluster_attraction_group_id != AttractGroupId::INVALID() && atom_grp_id != cluster_attraction_group_id) {
                gain -= attraction_group_penalty;
                VTR_LOG("gain -= attraction_group_penalty: %s %s\n", atom_ctx.nlist.block_name(blk_id).c_str(), std::__cxx11::to_string(gain).c_str());
            }
        }
    }

    gain += molecule->base_gain * 0.0001; /* Use base gain as tie breaker TODO: need to sweep this value and perhaps normalize */
    VTR_LOG("gain += molecule->base_gain * 0.0001: %s\n", std::__cxx11::to_string(gain).c_str());
    gain -= num_introduced_inputs_of_indirectly_related_block * (0.001);
    VTR_LOG("gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): %s\n", std::__cxx11::to_string(gain).c_str());

    if (num_molecule_failures > 0 && attraction_groups.num_attraction_groups() > 0) {
        gain -= 0.1 * num_molecule_failures;
        VTR_LOG("gain -= 0.1 * num_molecule_failures: %s\n", std::__cxx11::to_string(gain).c_str());
    }
    VTR_LOG("------------------------------------\n");
    return gain;
}

From the logs, I observed the following gain values before condition_program_first2713^MIN~52-1[0] was added to the cluster:

gain += blk_gain[blk_id]: condition_program_first2713^MIN~52-0[0] 0.000000
gain += blk_gain[blk_id]: condition_program_first2713^MIN~52-1[0] 0.000000
gain += molecule->base_gain * 0.0001: 0.003840
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.003840
------------------------------------
gain += blk_gain[blk_id]: condition_program_first5685^lNOT~77 0.000000
gain += molecule->base_gain * 0.0001: 0.000100
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.000100
------------------------------------

gain += blk_gain[blk_id]: condition_program_first2713^MIN~53-0[0] 0.000000
gain += blk_gain[blk_id]: condition_program_first2713^MIN~53-1[0] 0.000000
gain += molecule->base_gain * 0.0001: 0.003840
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.003840
------------------------------------
gain += blk_gain[blk_id]: condition_program_first5685^lNOT~77 0.000000
gain += molecule->base_gain * 0.0001: 0.000100
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.000100
------------------------------------


gain += blk_gain[blk_id]: condition_program_first2713^ADD~77-0[0] 0.000000
gain += blk_gain[blk_id]: condition_program_first2713^ADD~77-1[0] 0.000000
gain += molecule->base_gain * 0.0001: 0.003840
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.003840
------------------------------------
gain += blk_gain[blk_id]: condition_program_first5685^lNOT~77 0.000000
gain += molecule->base_gain * 0.0001: 0.000100
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.000100
------------------------------------

gain += blk_gain[blk_id]: condition_program_first5685^lNOT~76 0.000000
gain += molecule->base_gain * 0.0001: 0.000100
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.000100
------------------------------------
gain += blk_gain[blk_id]: condition_program_first5685^lNOT~77 0.000000
gain += molecule->base_gain * 0.0001: 0.000100
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.000100
------------------------------------

The molecule condition_program_first2713^MIN~52-1[0] was added to the cluster with a gain value of 0.003840, indicating that it had the highest priority among the current candidates. However, there were no blocks related to condition_program_first2713 in the cluster prior to this addition. From worst slack in the final FPGA solution, I confirmed that this inappropriate addition caused the inferior result.

Analysis

The root cause seems to stem from the gain computation process. Why did this unrelated block receive such a high gain value during clustering?

gain += blk_gain[blk_id]: condition_program_first2713^MIN~52-0[0] 0.000000
gain += blk_gain[blk_id]: condition_program_first2713^MIN~52-1[0] 0.000000
gain += molecule->base_gain * 0.0001: 0.003840
gain -= num_introduced_inputs_of_indirectly_related_block * (0.001): 0.003840

As shown in the report, neither condition_program_first2713^MIN~52-0[0] nor condition_program_first2713^MIN~52-1[0] initially received any gain, but both ended up with a gain value of 0.003840 due to the statement gain += molecule->base_gain * 0.0001.

Possible Solution

To solve the problem, I think some of following work should be done:
(1) Code Revision: Investigate whether the statement gain += molecule->base_gain * 0.0001 may introduce a potential bug. If so, the computation steps for gain should be revised to ensure accurate calculation.
(2) Improvement in Gain Calculation Strategy: The current gain calculation strategy may overemphasize certain clusters, leading to undesirable slack, critical path delay, and wire length in the final result. A more flexible and effective method to compute gain may be necessary.
(3) Optimization Process: The unrelated blocks were also ignored in the optimization steps. A stronger optimization process could potentially resolve this issue.

Steps to Reproduce

Run the following command:

vpr EArch_fixed_160_230_no_power.xml condition_program_first2713_and_condition_program_first5685.pre-vpr \
--circuit_file condition_program_first2713_and_condition_program_first5685.pre-vpr.blif \
--device fixed \
--route_chan_width 100 --alpha_clustering 1 \
--place_quench_algorithm slack_timing \
--noc_placement_weighting 0 \
--noc_latency_weighting 0 \
--noc_swap_percentage 0 \
--timing_report_detail detailed \
--timing_report_npaths 100000 \
--fix_clusters condition_program_first2713_and_condition_program_first5685_fixed_pin.place

(--allow_unrelated_clustering off is optional)
Packing_Devices_from_Two_Seperate_Parts_of_a_Netlist_into_a_Single CLB.zip
EArch_fixed_160_230_no_power.zip

The files related to problem above are:
cluster_util.cpp.txt
vpr_stdout_standard.log
vpr_stdout_allow_unrelated_clustering_off.log

Context

I am attempting to obtain a high-performance P&R solution but encountered an unexpected detour situation. I hope the findings above can help improve VTR's performance. Thank you!

Your Environment

  • VTR revision used: VTR master Oct 4th, 2024
  • Operating System and version: Ubuntu 20.04
  • Compiler version: gcc 9.4.0, cmake 3.20.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant