Create a performance dashboard #156

p-shahi · 2023-03-21T16:34:08Z

Overarching tracking issue: #63

Goal

Create a dashboard that can showcase libp2p performance benchmark results, more specifically benchmark-results.json.

Tasks

Give feedback

Create a mockup dashboard #158
Canonical dashboard
Visualize performance across versions of implementations
Display sample size
Add more advanced filtering mechanisms, ideally mirrored in the URL for better sharing
Move to a libp2p ObservableHQ account.
Options

The text was updated successfully, but these errors were encountered:

MarcoPolo · 2023-03-23T17:10:11Z

I have some more thoughts on what the runner looks like and what implmentations should implement here: https://github.com/libp2p/test-plans/blob/marco/perf/perf-dashboard/README.md

BigLep · 2023-06-03T00:13:40Z

@mxinden : Some thoughts immediately coming to mind when taking a look at https://observablehq.com/@mxinden-workspace/libp2p-performance-dashboard

I believe a best practice is that any time we discuss percentiles, we also share the "n". p90 isn't very meaningful of a sample size of 5. It is for a sample size of 100+.
Will we expose more dimensionality or combinational (security, multiplexers). The reasons I think we want that are to show why things like QUIC are better and to also catch when there are regressions.
How are you going to show performance over time? I was assuming you would just have multiple versions on each plot
I think it could be useful to have a way to add filters for the results they want to focus on.
- Maybe a textbox where could enter reges? For example, "iperf*,rust*" would match all results for iperf or rust. I'm thinking we have a bunch of "dimension sets" (I'm not sure what to call this, but basically what you have for "rust-libp2p/v0.52/quic") and then any "dimension set" that matches the regex would be included.

This set of filters that are applied should make it into the URL as well so the view is bookmarkable.

What are the next steps in your mind?

Has or will there be a maintainer review? I would like to make sure the team is aligned on where we're headed and will end up.

Thanks for putting this together.

mxinden · 2023-06-05T04:59:21Z

I believe a best practice is that any time we discuss percentiles, we also share the "n". p90 isn't very meaningful of a sample size of 5. It is for a sample size of 100+.

👍 Good point. Thanks. I plan to increase the sample size of the upload and download tests. Will also make the sample size more explicit on the dashboard.

Will we expose more dimensionality or combinational (security, multiplexers). The reasons I think we want that are to show why things like QUIC are better and to also catch when there are regressions.

In regards to security protocols, yes, though only tracked as a follow-up, i.e. not in the first iteration. Tracked in "Outstanding work for future pull requests:" in #184.

In regards to muxers, we really only have one muxer per transport protocol. E.g. on TCP we only have Yamux at this point. I don't we should consider mplex a valid option at this point. See libp2p/specs#402.

How are you going to show performance over time? I was assuming you would just have multiple versions on each plot

Today there is a single version per implementation only. Once we have more than one version per implementation we can visualize the data. Support for multiple versions per implementation is already in #184. Adding support in the dashboard is not difficult.

A placeholder is already on the dashboard, see bottom:

Performance over time (across releases)

Given that all our implementations implement the perf protocol for one version only, this section is still blank.

I think it could be useful to have a way to add filters for the results they want to focus on.

Sounds good to me. You can already filter implementations in the visualizations, i.e. disable showing some. I suggest once we have more graphs, we can introduce more advanced filtering mechanisms. Though at this point, I don't want to add a filtering mechanism without the need for filtering.

Has or will there be a maintainer review? I would like to make sure the team is aligned on where we're headed and will end up.

Not yet. At this point I want to drive consensus and get the first iteration of #184 into master. Once we have that I will continue work on the dashboard and ask for more input.

Thanks @BigLep. Updated the task list in the issue description.

mxinden · 2023-06-27T07:09:35Z

The libp2p Performance Dashboard is now ready for review.

https://observablehq.com/@libp2p-workspace/performance-dashboard

@libp2p/github-mgmt-stewards, @sukunrt and @thomaseizinger I welcome your feedback and suggestions.. If you have an ObservableHQ account feel free to comment on the notebook directly. If not, comments on this GitHub issue are just fine.

I am tracking outstanding tasks in the description of this GitHub issue and on #63.

Also in case you have an ObservableHQ account and would like to be part of the libp2p ObservableHQ workspace, ping me.

thomaseizinger · 2023-06-27T11:48:45Z

@libp2p/github-mgmt-stewards, @sukunrt and @thomaseizinger I welcome your feedback and suggestions

It is really cool! From looking at it, I take that we've moved away from the idea of making this an interactive blog post and instead just represent the data? For a blog post, the text between the visualisations is a bit sparse.

I know you've been identifying and fixing some minor issues already as a result of this work and #63 tracks additional work around the tooling. Do we also have an issue that collects ideas on how we can improve performance based on these measurements?

ianopolous · 2023-06-27T13:27:26Z

Nice! Are there plans to include other languages here?

mxinden · 2023-06-28T00:27:41Z

It is really cool! From looking at it, I take that we've moved away from the idea of making this an interactive blog post and instead just represent the data? For a blog post, the text between the visualisations is a bit sparse.

@thomaseizinger blog post is shelved for now, waiting for some optimizations to land.

Non-sparse draft can be found here: https://observablehq.com/@mxinden-workspace/libp2p-perf

I know you've been identifying and fixing some minor issues already as a result of this work and #63 tracks additional work around the tooling. Do we also have an issue that collects ideas on how we can improve performance based on these measurements?

Not so far. Some items thus far:

Nice! Are there plans to include other languages here?

@ianopolous ideally all of them, including jvm-libp2p. Do you have time to drive a jvm-libp2p integration? Vague guess is that this would take you 4h total. Happy to help.

See https://github.com/libp2p/test-plans/blob/master/perf/README.md#adding-a-new-implementation

ianopolous · 2023-06-28T08:38:59Z

@mxinden Is the current methodology to spin up an instance, make a measurement, and then kill it? If so we'll need to change this to have some warm up to get meaningful results for the jitted languages like JS and Java.

mxinden · 2023-06-28T11:07:56Z

Is the current methodology to spin up an instance, make a measurement, and then kill it?

Correct. No warm-up phase currently. Each iteration is a new process.

If so we'll need to change this to have some warm up to get meaningful results for the jitted languages like JS and Java.

Good point. Though, based on intuition, I suggest we give it a shot anyways. I would assume that neither of of our tests today (latency, throughput) are CPU bound. Let's not prematurely fix this issue without seeing it happen.

mxinden · 2023-07-03T09:08:47Z

Status Update

Added FAQ section to graphs to explain outliers and potential optimizations https://observablehq.com/@libp2p-workspace/performance-dashboard

@libp2p/github-mgmt-stewards, @sukunrt I welcome your feedback and suggestions.. If you have an ObservableHQ account feel free to comment on the notebook directly. If not, comments on this GitHub issue are just fine.

I am tracking outstanding tasks in the description of this GitHub issue and on #63.

Also in case you have an ObservableHQ account and would like to be part of the libp2p ObservableHQ workspace, ping me.

Friendly ping @libp2p/github-mgmt-stewards, @sukunrt in case you want to review.

BigLep · 2023-07-17T19:48:05Z

Thanks @mxinden . Things coming to mind when looking at https://observablehq.com/@libp2p-workspace/performance-dashboard fresh...

go-libp2p 0.29 has been released, but I don't see it reflected. What is the process for updating this? Ideally this should be automated or part of release process.
- Related it would be great to show the latest in master/main across versions so implementers can catch regressions before doing a release.
Generally for any of the benchmarks, I want to know when do we start the timer and when does it end. I actually don't think we're explicit about this in https://github.com/libp2p/specs/blob/master/perf/perf.md and maybe we should get that updated there and just link there? https://github.com/libp2p/specs/blob/master/perf/perf.md#single-connection-throughput hits on this with "the total time it took from stream open to stream close" but that would imply we're including the "Tell the server how many bytes we want the server to send us as a single big-endian uint64 number" step which I assume is fine in practice, but is "extra work".
How is someone going to discover this dashboard? I expect there will be multiple inroads. I want to make sure we're capitalizing on this work to give an overwhelming sense of "wow, libp2p maintainers take performance seriously..."
I left some small cosmetic feedback in https://observablehq.com/@libp2p-workspace/performance-dashboard . I didn't quickly see how I could make edits without forking.

MarcoPolo · 2023-07-17T23:27:52Z

Generally for any of the benchmarks, I want to know when do we start the timer and when does it end. I actually don't think we're explicit about this in https://github.com/libp2p/specs/blob/master/perf/perf.md and maybe we should get that updated there and just link there? https://github.com/libp2p/specs/blob/master/perf/perf.md#single-connection-throughput hits on this with "the total time it took from stream open to stream close" but that would imply we're including the "Tell the server how many bytes we want the server to send us as a single big-endian uint64 number" step which I assume is fine in practice, but is "extra work".

This has to be the time from when we open the stream to when we read EOF from the server. The issue is we don't really know when the bytes have been uploaded and received. Even if they are given to the kernel they still haven't been sent. And even then they may be dropped and need to be retransmitted. The only true measurement we can make is when the test ends.

BigLep · 2023-07-18T14:58:52Z

Good points - makes sense. This still affirms for me that we should be explicit for the benchmarks we're reporting on "when" the timer starts/stops and "where" the timer is (client or server).

BigLep · 2023-07-18T15:13:21Z

Good points - makes sense. This still affirms for me that we should be explicit for the benchmarks we're reporting on "when" the timer starts/stops and "where" the timer is (client or server).

I guess this is fine as is. The dashboard links to https://github.com/libp2p/test-plans/tree/master/perf, which under the implementation section discusses "the measurement includes the time to (1) establish the connection, (2) upload the bytes and (3) download the bytes". I think it would be ideal to make it more front and center when/where the timing is happening, but the information is discoverable, so this can be skipped.

mxinden · 2023-07-20T15:34:21Z

go-libp2p 0.29 has been released, but I don't see it reflected. What is the process for updating this? Ideally this should be automated or part of release process.

On my TODO list for v0.29. For future versions I will try to get it onto the go-libp2p release process documentation.

How is someone going to discover this dashboard? I expect there will be multiple inroads. I want to make sure we're capitalizing on this work to give an overwhelming sense of "wow, libp2p maintainers take performance seriously..."

Dashboard is linked here:

https://github.com/libp2p/test-plans/tree/master/perf#libp2p-performance-benchmarking

Another "inroad" will be the blog post.

I left some small cosmetic feedback in https://observablehq.com/@libp2p-workspace/performance-dashboard . I didn't quickly see how I could make edits without forking.

Saw them. Thanks @BigLep.

mxinden · 2023-07-26T16:33:21Z

I addressed all dashboard-related outstanding items, thus closing here. I will take the last outstanding item, namely to " ideally mirrored in the URL for better sharing" to the main tracking issue.

p-shahi mentioned this issue Mar 21, 2023

Future-proof Benchmarking ⏱ #63

Open

26 tasks

p-shahi assigned MarcoPolo Mar 21, 2023

mxinden closed this as completed Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a performance dashboard #156

Create a performance dashboard #156

p-shahi commented Mar 21, 2023 •

edited by mxinden

Loading

Tasks

MarcoPolo commented Mar 23, 2023

BigLep commented Jun 3, 2023 •

edited

Loading

mxinden commented Jun 5, 2023

mxinden commented Jun 27, 2023

thomaseizinger commented Jun 27, 2023

ianopolous commented Jun 27, 2023

mxinden commented Jun 28, 2023

ianopolous commented Jun 28, 2023

mxinden commented Jun 28, 2023

mxinden commented Jul 3, 2023

BigLep commented Jul 17, 2023 •

edited

Loading

MarcoPolo commented Jul 17, 2023

BigLep commented Jul 18, 2023

BigLep commented Jul 18, 2023

mxinden commented Jul 20, 2023

mxinden commented Jul 26, 2023

Create a performance dashboard #156

Create a performance dashboard #156

Comments

p-shahi commented Mar 21, 2023 • edited by mxinden Loading

Goal

Tasks

MarcoPolo commented Mar 23, 2023

BigLep commented Jun 3, 2023 • edited Loading

mxinden commented Jun 5, 2023

mxinden commented Jun 27, 2023

thomaseizinger commented Jun 27, 2023

ianopolous commented Jun 27, 2023

mxinden commented Jun 28, 2023

ianopolous commented Jun 28, 2023

mxinden commented Jun 28, 2023

mxinden commented Jul 3, 2023

BigLep commented Jul 17, 2023 • edited Loading

MarcoPolo commented Jul 17, 2023

BigLep commented Jul 18, 2023

BigLep commented Jul 18, 2023

mxinden commented Jul 20, 2023

mxinden commented Jul 26, 2023

p-shahi commented Mar 21, 2023 •

edited by mxinden

Loading

BigLep commented Jun 3, 2023 •

edited

Loading

BigLep commented Jul 17, 2023 •

edited

Loading