-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stat collection to IPTB #65
Conversation
Hey @davinci26 Thanks a bunch for this! I hope you don't mind us pulling this apart a bit. We're working on refactoring iptb to be less ipfs centric (giving it the ability to really test any libp2p based protocol). And to complete that work, it may require some cutting and pasting :) I'll leave it to @travisperson and @frrist to take it from here and see how integrating this work could happen. |
No problem, feel free to make any changes you want! I spent quite some time playing with this repo so if you want any help in the refactoring or in the integration, give me a shout :) |
This is really cool and highlights what iptb can do nicely. 🏆 You can checkout the current work we are doing in PR #61, and there is an issue which has some discussion in #62 if you want to see what changes we are making. My comments below will be with regard to this work and how we can incorporate the changes you have here. I think we can probably incorporate the idea behind these changes and improve iptb. It looks like I could see us adding a new flag to
I really like the idea of baking stats into iptb, as it seems like a perfect use case. I think we could make it more generic as well, enabling it to be used to record a lot of different commands, for example, targeting the dht specifically. We could make the The Note: (If we wanted to run it against a subset of nodes, we could do The only stat that does not carry over outside of ipfs very well is the
|
Thanks for the review! I totally agree with your suggestions. From my experience, the problem I had with the Maybe this could also be added as an optional flag in the run command: I can add the changes as you described them after you merge PR #61 to master, or you can add them directly to PR #61 :). |
Hey y'all, Can I start working on the |
Hey @davinci26 that sounds great! Let me know if you have any questions. You can find me (tperson) on our #ipfs-dev IRC channel (freenode).
It's asynchronous now (as of #61)! 🎉 |
Hey again, I made the majority of the changes so feel free to review the code. I also added unit tests on the Stats just to be sure there is no stupid miscalculation. The stats produced are the following:
Stats Implementation details:
Topology Implementation details:
Looking forward to your comments 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating!
Right now the majority of commands produce the stats. This could easily be disabled if you dont think its useful.
I don't think having stats around init
and start
will be very useful. Mainly because I don't think iptb is the best way to measure that behavior. The measurements would tell you more about iptb and the plugin than the actual thing behind it.
The connect
command also has the possibility for a lot of overhead. The command first has to determine how to connect one node to another (request swarm addresses, and possibly a peer id) before it tries to run a connect call.
I think it's best to only have the stats collecting on the run
command for now.
I assume it would be useful to add stats besides time on the various plugins. However, I am not 100% sure what kind of metrics the plugins you plan to add produce. If you have any reading material on that then I can take a look into it.
I think we would just want to provide a way for the user to specify different metrics to collect from each node the command was run against. This would just leverage the Metrics
interface.
With regard to the Stats struct, should we also be reporting the individual times for each node?
@@ -1,2 +1,5 @@ | |||
*.so | |||
iptb | |||
.vscode | |||
topologies/* | |||
deb.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .vscode
I think is fine, but I don't think we need the other two.
commands/connect.go
Outdated
return nil, errors.New("Line " + strconv.Itoa(lineNumber) + " does not follow the correct format") | ||
} | ||
destinations = strings.Split(lineTokenized[1], ",") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The else
of this if statement is needless, it can just be part of the regular flow when the statement is false.
@@ -14,6 +14,7 @@ import ( | |||
) | |||
|
|||
func loadPlugins(dir string) error { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code formatting tool decided to add an empty line there. I have manually checked the files and they are the same.
commands/connect.go
Outdated
for _, connectionRow := range topologyGraph { | ||
from := connectionRow[0] | ||
to := connectionRow[1:] | ||
err = connectNodes(tb, []int{from}, to, timeout, flagStats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing the flagStats
through, it might be better (and probably should of been done the first time) to return a ([]results, error)
from connectNodes
and then call buildReport at the end.
commands/utils.go
Outdated
c1 = (length-1)/2 + 1 | ||
c2 = c1 - 1 | ||
// Print individual results to JSON format | ||
func printIndivudualResults(nodeTime []float64, nodeID []int, metric string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think that's a typo and should be printIndividualResults
;)
commands/utils.go
Outdated
the strings are not castable to float | ||
the function returns an error | ||
*/ | ||
func substractArrays(lhs []string, rhs []string) ([]float64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more typo, subtractArrays
*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix all of them :) Thnx a lot for the review, I totally agree details matter a lot !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! And one of the comments I made got marked as outdated for some reason, but printIndividualResults
is typo'd as printIndivudualResults
commands/stats.go
Outdated
|
||
// Calculate the min, max mean, std with a single pass of the array | ||
sum := 0. | ||
sumSquarred := 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sumSquared
(just one r)
commands/stats.go
Outdated
} | ||
|
||
//BuildStats generates a stats struct from an input float64 slice | ||
//It returns an error if the slice is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are missing a space after the //
to be consistent with the others (sorry, details stand out to me haha)
@@ -26,7 +26,7 @@ $ iptb shell 4 | |||
$ ipfs cat QmNqugRcYjwh9pEQUK7MLuxvLjxDNZL1DH8PJJgWtQXxuF | |||
hey! | |||
``` | |||
|
|||
Available plugins now: Local IPFS node (plugin_name: localipfs), Docker IPFS node (plugin_name: dockeripfs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be nice to be able to list available plugins via the CLI. Doesn't have to be in this PR though, just mentioning.
commands/connect.go
Outdated
There are three variants of the command. It can accept no arugments, | ||
a single argument, or two arguments. The no argument and single argument | ||
expands out to the two argument usage. | ||
There are four variants of the command. It can accept no arugments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arguments*
commands/connect.go
Outdated
expands out to the two argument usage. | ||
There are four variants of the command. It can accept no arugments, | ||
a single argument, two arguments or a file. The no argument and single argument | ||
expands out to the two argument usage. The file argument, parses the file and exapnds it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expands* (the second one)
commands/run.go
Outdated
nodeMetricValue[i] = value | ||
} | ||
return nodeMetricValue, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I opted to collect the metric before and after the command is executed in run.go
. I think with this approach the code is cleaner and makes more sense.
I could add it in the mapWithOutput
function but since this function is using locks the more operations I do inside the locks the more the overhead.
I would love to hear your thoughts on which design you think is more useful for the metrics you plan to implement.
commands/utils.go
Outdated
the function returns an error | ||
TODO: This could be more generic | ||
*/ | ||
func subtractArrays(lhs []string, rhs []string) ([]float64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is implemented now the metrics are collected as follows:
Let metric be collected, then we collect m1before before the execution of the run command and m1after after. Then we calculate the statistics based on Δm1 = (m1after - m1before).
I was giving this some thought over the weekend and I think we should try and only add what is minimally required to expose the functionality required to do these stat calculations. While iptb could actually do all the calculations, I don't know if there is a strong reason why it needs to do them. It might be better to just report the raw stats and allow the user to handle how to calculate values from them. So with this we would just be adding an elapsed time to the results (as you already have). Collecting metrics before and after isn't a lot different than running Per our discussion on IRC, you also pointed out that for metrics it would be better to actually sample over the course of the commands execution. I think this would probably be the correct thing to do as well for any realtime stats (cpu / memory / etc). This does get a little bit complicated though and I think we really need to start a discussion in another issue and work through exactly the requirements needed to find a good approach if we want to go down this route. For this PR though I think we can keep it pretty simple, and just always report the elapse time for all commands ran that use the standard
To summarize. I think we can easily move forward with that, and then start to discuss further how we can approach the other topics. |
On an additional note I'm actually going to push back on the topology feature as well right now. Sorry for this being a bit messy. I should have requested an issue from the start so that we could have a discussion outside of the code. My concerns at the moment with regard to the topology is that it can be accomplished with a shell script fairly easily, and I don't see the exact benefits of the topology file at the moment due to the restrictiveness of the format, as it works with exactly N nodes. Let take a step back from this and open two issues to see how we can address the requests
I'll open these issues and bring some context in, we can then discuss them from there. |
The last commit reflects the discussion in #77 #76 . The user can run |
commands/run.go
Outdated
Name: "encoding", | ||
Usage: "Specify the output format, current options JSON and text", | ||
Value: "text", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add this as a global flag under the app in main.go so that it works for other commands. The format error checking below can take place in the Before
block as well.
commands/utils.go
Outdated
Node int | ||
Output testbedi.Output | ||
Error error | ||
Elapsed float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this as the time.Duration
over converting it to seconds in a float64 would probably be good for this internal structure. It will also make it easier to provide a more human readable format as Duration#String
handles that. The JSON
output can probably say a float64 and returned in seconds though being the SI unit and all.
commands/utils.go
Outdated
if rs.Output.Error() != nil { | ||
fmt.Printf("%s", rs.Output.Error()) | ||
if encoding == "text" { | ||
fmt.Printf("node[%d] exit %d Elapsed %.4fs \n", rs.Node, rs.Output.ExitCode(), rs.Elapsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Lowercase the E
here, and remove the extra space at the end.
Also with rs.Elapsed
being a duration, we can use %s
to get a human readable format out.
commands/utils.go
Outdated
ExitCode int | ||
Errors error | ||
PluginStdOut string | ||
PluginStdErr string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors
=> Error
as it's a single error.
nit: To keep the casing similar to other usages through the code and go in general I think we should use Stdout
and Stderr
here.
commands/utils.go
Outdated
} | ||
if b, err := ioutil.ReadAll(rs.Output.Stderr()); err == nil { | ||
pluginErr = string(b) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle the other side of this error, when the error is not nil.
commands/utils.go
Outdated
PluginStdOut: pluginOut, | ||
PluginStdErr: pluginErr, | ||
Elapsed: rs.Elapsed, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to also handle this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few comments around the implementation. I think once these are addressed we can move forward with this as we will have a pretty settled interface that I think will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Just need to handle an error, and fix a newline in a few places.
commands/utils.go
Outdated
} | ||
pluginErr := string(pluginErrB) | ||
// JSONify the plugin output | ||
rsJSON, _ := json.Marshal(output{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing this error
commands/utils.go
Outdated
|
||
io.Copy(os.Stdout, rs.Output.Stdout()) | ||
io.Copy(os.Stdout, rs.Output.Stderr()) | ||
|
||
fmt.Println() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this up inside of the text encoding section, and then also make sure we have a newline outside of the loop (it can be printed for all cases that are not text).
This will make it so that all output for a single command will be ld-json, and between each execution will be a newline. This will help break up the output if appending to the same file.
The output should look like if it where being appended to the same file
{"Node": 0, ...}
{"Node": 1, ...}
{"Node": 2, ...}
{"Node": 0, ...}
{"Node": 1, ...}
{"Node": 2, ...}
I handled the error cases mentioned and also the formatting is as specified when I append to a file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @davinci26 thank you for all the work you have done here!
It looks like this PR is trying to tackle more than the title suggests. It appears that an output encoding option has been added for both text and json, in addition to stats for the elapsed time of the start command. Typically changes of this nature require previous discussion wrt design (we apologize for not making this more clear in our README). Until we have more design discussion around these features, we will be unable to accept the changes here as we cannot yet determine if they align with the long-term strategy for IPTB.
Some discussion around this has already been started in #77, and we have opened #88 to start a discussion around a similar feature set. It would be great if we could get your help hammering out the design details of these things.
If our pace here is limiting the work you are trying to do, please feel free to fork this repo -- that could even serve as a means to prototype out some of your ideas so that we can better incorporate them with the design goals of IPTB
Hey @frrist, I was thinking about this as well, that this PR is getting too big and for no reason, as there was no design discussion prior to the beginning of the work. It was also an omission from my side for not opening an issue prior to the PR. Feel free to decline the PR. I have written my ideas in #77 under general thoughts and I am more than happy to help in designing meaningful event logs. |
I'm really excited that iptb is getting so much attention. I love a lot of the direction here, let's get things designed nicely and march onwards :) Thanks @davinci26 for pushing! One thing I always try to keep in mind in the design of tools is the unix philosophy, primarily "do one thing well". What that means here specifically, I'm not sure, but I'm thinking that we may end up adding a companion tool to iptb for metrics collection, and just ensuring that iptb doesnt prevent us from getting the information we need. This would allow us to more easily tune the stats we collect for each usecase. (similar design philosophy went into the plugins architecture) |
Hey y'all,
Thanks for the project it helped me a lot!
As discussed in IPFS issue board IPFS Performance #5226, I made some changes to the IPTB framework to generate measure the performance of IPFS and generate performance graphs. In detail I added the following functions in the framework:
origin:connection 1, connection 2 ...
where origin and connections are specified with their node ID.I also added a Python3 script to plot the results that adds an additional optional dependency in the project, Matplotlib.
Finally I created a readme file, simulation.md, that explains the logic of the simulation. I also added there the response of @whyrusleeping to the issue Simulate bad network #50 so people would know that there is support for bad network.
I would appreciate your feedback and any improvements suggestions :)