Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡ Collection of optimizations and a bug fix for charge_distribution_surface #325

Closed
wants to merge 33 commits into from

Conversation

wlambooy
Copy link
Contributor

@wlambooy wlambooy commented Nov 5, 2023

Description

This PR introduces a collection of optimisations, mainly with regard to charge_distribution_surface.
The commit messages describe the changes that are made.

Most prominently, we have:

  • Fixed a bug where if the charge index is changed from i to i', and i' has more leading zeroes than i, then there would be cells that stay unchanged that need to be assigned a negative charge state.
  • The system energy is now by default only recomputed for physically valid charge distributions.
  • The validity_check function now returns as soon as a verdict can be made.
  • Functions that modify the private storage object can be made const apparently. This may be applicable in other files as well. Currently I applied the change in charge_distribution_surface.hpp and cell_level_layout.hpp.
  • Removed an unnecessary mutex lock in quicksim.hpp. Also the update_after_charge_change() call before calling the threads can be omitted, since in each thread we first make such a call before checking physical validity. Hence also we have a proper initialisation of the local potentials and system energy before calling adjacent_search.

Checklist:

  • The pull request only contains commits that are related to it.
  • I have added appropriate tests and documentation.
  • I have made sure that all CI jobs on GitHub pass.
  • The pull request introduces no new warnings and follows the project's style guidelines.

@wlambooy
Copy link
Contributor Author

wlambooy commented Nov 5, 2023

Looks like I missed some tests still. Probably just some silliness

@Drewniok
Copy link
Collaborator

Drewniok commented Nov 5, 2023

@wlambooy Thank you for your efforts! As soon as it passes the tests, I will take a closer look at it. Please note that const indicates to the compiler that the function is a read-only operation.

@Drewniok Drewniok added the enhancement New feature or request label Nov 5, 2023
@wlambooy
Copy link
Contributor Author

wlambooy commented Nov 5, 2023

@wlambooy Thank you for your efforts! As soon as it passes the tests, I will take a closer look at it. Please note that const indicates to the compiler that the function is a read-only operation.

I missed some consequences of making the empty layout physically valid, hopefully everything passes now.
My intuition is to use keywords like const as much as possible to help the compiler, since the typing system makes sure you can't overdo it. Yet I would think that a function that modifies a private object cannot be const, yet it is accepted. Do you understand why?

Copy link

codecov bot commented Nov 5, 2023

Codecov Report

Merging #325 (22367d9) into main (14d3d71) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #325      +/-   ##
==========================================
- Coverage   96.04%   96.04%   -0.01%     
==========================================
  Files         102      102              
  Lines       10075    10059      -16     
==========================================
- Hits         9677     9661      -16     
  Misses        398      398              
Files Coverage Δ
...lation/sidb/exhaustive_ground_state_simulation.hpp 95.23% <100.00%> (+0.50%) ⬆️
.../fiction/algorithms/simulation/sidb/quickexact.hpp 98.89% <100.00%> (+0.01%) ⬆️
...de/fiction/algorithms/simulation/sidb/quicksim.hpp 98.68% <100.00%> (-0.02%) ⬇️
...on/algorithms/simulation/sidb/time_to_solution.hpp 97.43% <100.00%> (ø)
include/fiction/layouts/bounding_box.hpp 100.00% <100.00%> (ø)
...fiction/technology/charge_distribution_surface.hpp 99.37% <100.00%> (-0.02%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14d3d71...22367d9. Read the comment docs.

@Drewniok
Copy link
Collaborator

Drewniok commented Nov 6, 2023

I see what you mean, but this is not the best style. Using the keyword const when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!

@wlambooy
Copy link
Contributor Author

wlambooy commented Nov 6, 2023

I see what you mean, but this is not the best style. Using the keyword const when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!

My assumption that the C++ compiler catches this turns out to be incorrect, and our example in which a dereferenced pointer to a object is changed show this. I made sure to verify the logical constness in a new commit.

While trying to improve the code coverage, I stumbled on a layout that generates no charge distributions for ExGS, but two for QuickExact. The test looks as follows:

TEMPLATE_TEST_CASE(
    "ExGS simulation of two SiDBs placed directly next to each other with non-realistic relative permittivity", "[exhaustive-ground-state-simulation]",
    (cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>),
    (charge_distribution_surface<cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>>))
{
    TestType lyt{};
    lyt.assign_cell_type({1, 3, 0}, TestType::cell_type::NORMAL);
    lyt.assign_cell_type({2, 3, 0}, TestType::cell_type::NORMAL);

    const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3}};

    const auto simulation_results = quickexact<TestType>(lyt, params);

    CHECK(simulation_results.charge_distributions.empty());
}

A degenerate charge distribution is returned by QuickExact with a system energy of -3472.6959461051524. Perhaps QuickExact is not defined on such non-realistic relative permittivity values?

@Drewniok
Copy link
Collaborator

Drewniok commented Nov 6, 2023

I see what you mean, but this is not the best style. Using the keyword const when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!

My assumption that the C++ compiler catches this turns out to be incorrect, and our example in which a dereferenced pointer to a object is changed show this. I made sure to verify the logical constness in a new commit.

While trying to improve the code coverage, I stumbled on a layout that generates no charge distributions for ExGS, but two for QuickExact. The test looks as follows:

TEMPLATE_TEST_CASE(
    "ExGS simulation of two SiDBs placed directly next to each other with non-realistic relative permittivity", "[exhaustive-ground-state-simulation]",
    (cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>),
    (charge_distribution_surface<cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>>))
{
    TestType lyt{};
    lyt.assign_cell_type({1, 3, 0}, TestType::cell_type::NORMAL);
    lyt.assign_cell_type({2, 3, 0}, TestType::cell_type::NORMAL);

    const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3}};

    const auto simulation_results = quickexact<TestType>(lyt, params);

    CHECK(simulation_results.charge_distributions.empty());
}

A degenerate charge distribution is returned by QuickExact with a system energy of -3472.6959461051524. Perhaps QuickExact is not defined on such non-realistic relative permittivity values?

Thank you for your efforts! 🙏
That's a good example, even though it's not realistic. You are right. I found the problem and will fix it as soon as possible.
Just for your information: Be aware of this additional parameter, which can also lead to deviations between both simulators, but not here.

    const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3},
                                             quickexact_params<TestType>::automatic_base_number_detection::OFF};

@Drewniok
Copy link
Collaborator

@wlambooy Thank you very much for your effort! Just klick on re-request review when you are ready.

Copy link
Collaborator

@Drewniok Drewniok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you so much for your efforts! 🙏

@marcelwa marcelwa self-requested a review November 27, 2023 10:38
@marcelwa
Copy link
Collaborator

@wlambooy many thanks also from my side for the additions! 🙏

@Drewniok thanks to you as well for taking your time reviewing the changes! I'll have a quick scan as well and then it should be ready for merge 💪

Comment on lines 78 to 87
st.additional_simulation_parameters.emplace_back("iteration_steps", ps.interation_steps);
st.additional_simulation_parameters.emplace_back("alpha", ps.alpha);
st.physical_parameters = ps.phys_params;

if (lyt.num_cells() == 0)
{
return st;
}

st.charge_distributions.reserve(ps.interation_steps);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't name this variable st but now that there have been changes made, it seems to be an inconsistency to the other two ground state algorithms where this variable is called result, a more fitting name in my opinion. May I ask you to adjust the naming? Many thanks!

@@ -93,7 +99,7 @@ sidb_simulation_result<Lyt> quicksim(const Lyt& lyt, const quicksim_params& ps =
charge_lyt.assign_base_number(2);
charge_lyt.assign_all_charge_states(sidb_charge_state::NEGATIVE);
charge_lyt.update_after_charge_change(dependent_cell_mode::VARIABLE);
const auto negative_sidb_indices = charge_lyt.negative_sidb_detection();
const auto& negative_sidb_indices = charge_lyt.negative_sidb_detection();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the reference? Due to C++'s return value optimization, it should be equivalent performance-wise to have const auto instead of const auto&. However, const auto is safer because it doesn't fall prey to the object running out of scope and disappearing. I prefer using const auto unless there are good reasons against it.

@@ -105,14 +105,14 @@ void time_to_solution(Lyt& lyt, const quicksim_params& quicksim_params, const ti
sidb_simulation_result<Lyt> simulation_result{};
if (tts_params.engine == exhaustive_sidb_simulation_engine::QUICKEXACT)
{
const quickexact_params<Lyt> params{quicksim_params.phys_params};
st.algorithm = "QuickExact";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should st also be result here?

const quickexact_params<Lyt> params{quicksim_params.phys_params};
st.algorithm = "QuickExact";
simulation_result = quickexact(lyt, params);
st.algorithm = "Exhaustive Ground State Simulation";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we use ExGS as the algorithm name here. @Drewniok can you confirm this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcelwa You are right!
@wlambooy Please use ExGS. Thanks

Comment on lines +346 to +347
if (old_params.lat_a != params.lat_a || old_params.lat_b != params.lat_b || old_params.lat_c != params.lat_c)
{ // lattice changed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either move the comment directly above the if statement (if it concerns the condition) or within the block (if it concerns the statements inside the condition).

}
else if (old_params.epsilon_r != params.epsilon_r || old_params.lambda_tf != params.lambda_tf)
{ // potential changed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@Drewniok
Copy link
Collaborator

@wlambooy Thanks again for your efforts! @marcelwa just asked me to check again if your changes do not increase the simulation runtimes. However, when running our recently added benchmark script (fiction -> test -> benchmark -> simulation.cpp), a careful analysis of the results shows that the runtimes increase significantly (~10%), which is unacceptable. As far as I can see, this is not caused by the bugfix, but rather by the other changes.

@marcelwa
Copy link
Collaborator

@wlambooy Thanks again for your efforts! @marcelwa just asked me to check again if your changes do not increase the simulation runtimes. However, when running our recently added benchmark script (fiction -> test -> benchmark -> simulation.cpp), a careful analysis of the results shows that the runtimes increase significantly (~10%), which is unacceptable. As far as I can see, this is not caused by the bugfix, but rather by the other changes.

"Unacceptable" is maybe a bit harsh a statement. We aim for the most performant code possible for any given scenario and we indeed take pride in our achievements. @wlambooy if the benchmark code @Drewniok mentioned helps you figure out where exactly the runtime increase was caused, we'd appreciate a fix. Many thanks!

@wlambooy
Copy link
Contributor Author

Thank you for doing the analysis @Drewniok! I will try to uncover what changes makes this performance decrease.

@wlambooy
Copy link
Contributor Author

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@Drewniok
Copy link
Collaborator

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

@marcelwa
Copy link
Collaborator

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

I think @wlambooy's question was where the output comes from. It will be automatically generated by the benchmark script. Simply pass -DFICTION_BENCHMARK=ON to your CMake call and run the resulting binary after compilation in Release mode.

@wlambooy
Copy link
Contributor Author

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

I see, I was running the benchmarks from within CLion so I was getting some XML output that didn't look like the output in the comment. I have to say that on my system at least, I'm always getting quite a spread in the benchmark results, and usually the more often I run it without rebuilding the better the times will get. You can see this in the benchmark results below, which were performed directly in sequence (look at the mean values)

QuickExact
<!-- All values in nano seconds -->
<mean value="2.42783e+09" lowerBound="2.35123e+09" upperBound="2.54392e+09" ci="0.95"/>
<standardDeviation value="4.74379e+08" lowerBound="3.47914e+08" upperBound="6.37069e+08" ci="0.95"/>
<outliers variance="0.936384" lowMild="0" lowSevere="0" highMild="2" highSevere="7"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.33811e+07" lowerBound="1.31979e+07" upperBound="1.35853e+07" ci="0.95"/>
<standardDeviation value="987767" lowerBound="835378" upperBound="1.19855e+06" ci="0.95"/>
<outliers variance="0.676348" lowMild="2" lowSevere="0" highMild="6" highSevere="1"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.11722e+09" lowerBound="2.08929e+09" upperBound="2.15057e+09" ci="0.95"/>
<standardDeviation value="1.55694e+08" lowerBound="1.30843e+08" upperBound="1.96559e+08" ci="0.95"/>
<outliers variance="0.666639" lowMild="0" lowSevere="0" highMild="3" highSevere="0"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.06471e+07" lowerBound="1.04801e+07" upperBound="1.08103e+07" ci="0.95"/>
<standardDeviation value="845190" lowerBound="739383" upperBound="995966" ci="0.95"/>
<outliers variance="0.707146" lowMild="2" lowSevere="0" highMild="1" highSevere="0"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.01781e+09" lowerBound="2.00066e+09" upperBound="2.04291e+09" ci="0.95"/>
<standardDeviation value="1.04298e+08" lowerBound="7.61848e+07" upperBound="1.52602e+08" ci="0.95"/>
<outliers variance="0.494789" lowMild="0" lowSevere="0" highMild="3" highSevere="2"/>

Quicksim
<!-- All values in nano seconds -->
<mean value="9.83447e+06" lowerBound="9.68097e+06" upperBound="9.98761e+06" ci="0.95"/>
<standardDeviation value="783977" lowerBound="703335" upperBound="901620" ci="0.95"/>
<outliers variance="0.707254" lowMild="0" lowSevere="0" highMild="0" highSevere="0"/>

@marcelwa
Copy link
Collaborator

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

I see, I was running the benchmarks from within CLion so I was getting some XML output that didn't look like the output in the comment. I have to say that on my system at least, I'm always getting quite a spread in the benchmark results, and usually the more often I run it without rebuilding the better the times will get. You can see this in the benchmark results below, which were performed directly in sequence (look at the mean values)

QuickExact
<!-- All values in nano seconds -->
<mean value="2.42783e+09" lowerBound="2.35123e+09" upperBound="2.54392e+09" ci="0.95"/>
<standardDeviation value="4.74379e+08" lowerBound="3.47914e+08" upperBound="6.37069e+08" ci="0.95"/>
<outliers variance="0.936384" lowMild="0" lowSevere="0" highMild="2" highSevere="7"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.33811e+07" lowerBound="1.31979e+07" upperBound="1.35853e+07" ci="0.95"/>
<standardDeviation value="987767" lowerBound="835378" upperBound="1.19855e+06" ci="0.95"/>
<outliers variance="0.676348" lowMild="2" lowSevere="0" highMild="6" highSevere="1"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.11722e+09" lowerBound="2.08929e+09" upperBound="2.15057e+09" ci="0.95"/>
<standardDeviation value="1.55694e+08" lowerBound="1.30843e+08" upperBound="1.96559e+08" ci="0.95"/>
<outliers variance="0.666639" lowMild="0" lowSevere="0" highMild="3" highSevere="0"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.06471e+07" lowerBound="1.04801e+07" upperBound="1.08103e+07" ci="0.95"/>
<standardDeviation value="845190" lowerBound="739383" upperBound="995966" ci="0.95"/>
<outliers variance="0.707146" lowMild="2" lowSevere="0" highMild="1" highSevere="0"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.01781e+09" lowerBound="2.00066e+09" upperBound="2.04291e+09" ci="0.95"/>
<standardDeviation value="1.04298e+08" lowerBound="7.61848e+07" upperBound="1.52602e+08" ci="0.95"/>
<outliers variance="0.494789" lowMild="0" lowSevere="0" highMild="3" highSevere="2"/>

Quicksim
<!-- All values in nano seconds -->
<mean value="9.83447e+06" lowerBound="9.68097e+06" upperBound="9.98761e+06" ci="0.95"/>
<standardDeviation value="783977" lowerBound="703335" upperBound="901620" ci="0.95"/>
<outliers variance="0.707254" lowMild="0" lowSevere="0" highMild="0" highSevere="0"/>

It would be best to run the benchmarks from the command line while having all other tasks/programs closed. Particularly your web-browser and CLion are running expensive background tasks that can impact the measurements.

@wlambooy
Copy link
Contributor Author

After doing a couple of benchmark runs with pre-built binaries in a more processor-usage sterile environement, I am able to confirm a performance difference with the following findings:

The QuickExact time is slower (+5.4% runtime on average) while QuickSim is faster (-7.1% runtime on average). All benchmark runs were consistent in the differences: each QuickExact benchmark run of the main branch was faster than each QuickExact benchmark run of this branch, and each QuickSim benchmark run of the main branch was slower than each QuickSim benchmark run of this branch.

I'll try to figure out why this is.

@Drewniok
Copy link
Collaborator

Drewniok commented Dec 4, 2023

@wlambooy Thanks again for your efforts! I know, sometimes it is not so easy to find out why the code is slower. Take your time and don't hesitate to contact us.
Since it would be great to have this bugfix in the main branch as soon as possible, can I ask you to maybe split this PR into two - one for the bugfix and one for the other changes. What do you think about that? What do you think @marcelwa?

@marcelwa
Copy link
Collaborator

marcelwa commented Dec 4, 2023

I agree that it would be very helpful to have the bugfix available as quickly as possible.

@wlambooy
Copy link
Contributor Author

wlambooy commented Dec 5, 2023

I was working on more pressing things the last week so I haven't gotten into a code speed investigation. I put the bugfixes in a new PR: #347

marcelwa added a commit that referenced this pull request Dec 15, 2023
…#325 (#347)

* 🐛 reducing the charge index to a value with leading zeroes now gives correct behaviour

(cherry picked from commit e1fe34a)

* 🐛 `bounding_box` does not update the dimension sizes for cubic coordinates

(cherry picked from commit fc01d74)

* ⚡ optimized increase_charge_index_by_one (with and enum this time)

* 🎨 remove `const` specifier from functions that are not logically const

* ✅ add `const` specifier from functions that are logically const when considering a change of charge index as such

* ✅ add `const` specifier from functions that are logically const when considering a change of charge index as such

* ✅ remove `const` specifier from functions that are not logically const when considering a change of charge index as such

* 📝 implemented Jan's suggestions

* ✅ remove `const` specifier

* 🐛 The TTS algorithm now sets the simulator name correctly for ExGS (to `ExGS`)

---------

Co-authored-by: Marcel Walter <[email protected]>
@marcelwa
Copy link
Collaborator

marcelwa commented Aug 9, 2024

This ran out of sync and doesn't seem to be necessary anymore according to @Drewniok.

@marcelwa marcelwa closed this Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants