Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take measurement inaccuracy intervals into account #4078

Conversation

marlamb
Copy link
Contributor

@marlamb marlamb commented Oct 21, 2024

Stating one version being faster while the intervals have a considerable overlap is probably not accurate.

Stating one version being faster while the intervals have a considerable
overlap is probably not accurate.
@ghost
Copy link

ghost commented Oct 25, 2024

I agree. Saying that the performance of the 'iter' version, in this specific case, is superior (even if slightly) reveals more bias than information.
If, by chance, the benchmark result indicated that the 'for' version was faster (in this specific case), would the author try again until 'iter' was the fastest?

Copy link
Contributor

@chriskrycho chriskrycho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. It’s true that it was “within the noise” but the point—as the rest of the paragraph itself says!—is not to prove something about the two versions, but to get a general sense of how they compare performance-wise. As measured, the iterator version was slightly faster—and that’s all the text was saying!

If you want to dig into the details further:

  • The lower bound for the iter version is 18,577,700 and the lower bound for the for loop version is 18,704,600.
  • The upper bound for the iter version is 19,892,100 and the upper bound for the for loop version is 20,536,000.

You’d see this clearly on a violin plot or similar: we may not be 100% positive that the iterator is always guaranteed to be faster, but it’s totally fair to say that it was faster in this case, even when digging into those details!

@marlamb
Copy link
Contributor Author

marlamb commented Oct 26, 2024

Thanks for the suggestion. It’s true that it was “within the noise” but the point—as the rest of the paragraph itself says!—is not to prove something about the two versions, but to get a general sense of how they compare performance-wise. As measured, the iterator version was slightly faster—and that’s all the text was saying!

If you want to dig into the details further:

* The lower bound for the iter version is `18,577,700` and the lower bound for the for loop version is `18,704,600`.

* The upper bound for the iter version is `19,892,100` and the upper bound for the for loop version is `20,536,000`.

You’d see this clearly on a violin plot or similar: we may not be 100% positive that the iterator is always guaranteed to be faster, but it’s totally fair to say that it was faster in this case, even when digging into those details!

I am sorry, but I strongly disagree. From a scientific point of view stating one of them is faster if the confidence intervals overlap that strongly is simply wrong (and assuming that it is only a one-sigma interval makes it even worse).

The change I made tried to emphasize that they are of equal performance, which does not contradict the complete story the section wants to tell -> use the higher level construct, you don't pay for it performance-wise. I don't want to say my formulation is perfect and I am open for suggestions to further improve it. But keeping it as is, is not supporting the intention: everyone with scientific background should immediately recognize that the author tries to make an argument based on data, which the data do not support.

I also don't know about the usual manners in this repository, as I am contributing for the first time. But I have to admit I am slightly surprised that this gets closed without proper discussion, especially as already someone independent of the author (@odinplusplus) agreed that the general idea of the change seems ok.
@chriskrycho it would be nice if you could reply to these questions and I would appreciate reopening the pull request to perhaps get also some feedback from others.

@chriskrycho
Copy link
Contributor

First, as regards “manners”: if every time two commenters happened to agree on something we were obliged to make a change, we would be all over the place, including changing things back and forth on the same text over time. The same for “proper discussion”—there is a basic imbalance between maintainer time and the time of folks submitting suggestions. We regularly just have to make judgment calls and move on, or else we’d spend all our time just responding to issues, PRs, etc.! I hope that helps make some sense of the relatively quick and brief response.

However, it’s important to recognize that the text isn’t really “making an argument” here or aiming to be precise; it’s just reporting in a fairly casual way what the benchmark indicated. (I'm very well aware of how confidence intervals work, and yes, I agree that it’s entirely possible they’re actually “the same speed” given those error bars!) The point in the text—as in my comment above—is just to say that if we’re eyeballing it/taking the benchmark at face value, one appears to be a bit faster than the other, and it might surprise folks who assume that iterators are always slower than a hand-written loop.

Net, I don’t object to the change you suggested to the text, and I’ll consider reopening it (I want to mull on it a bit); I am just not persuaded it’s absolutely necessary, given what the text is actually trying to do here. Hope that makes sense!

@dyfrgi
Copy link

dyfrgi commented Jan 2, 2025

I came by to open an issue about this sentence. The effect of this line was to make me trust the performance assertions made by the text less. If the text is making this subtly wrong assertion about performance, what other wrong assertions is it making?

I see the point that this isn't really a page which is intended as a thorough exploration of Rust performance - explaining how to write fast programs isn't a goal of the book at all. The point it's trying to make is that you shouldn't worry about the performance of these two constructs when deciding which to use. I think that saying that they're the same is just as effective for that purpose.

I think the small wording change in this PR is a good one. You could rework things more, but this small change is a pure improvement.

@chriskrycho chriskrycho reopened this Jan 6, 2025
@chriskrycho
Copy link
Contributor

After a bit of mulling, I have reopened this and am merging it. Thanks!

@chriskrycho chriskrycho merged commit 3d058ca into rust-lang:main Jan 6, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants