Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input file with list of commands to run async on nodes #75

Merged
merged 15 commits into from
Sep 14, 2018

Conversation

dgrisham
Copy link
Member

@dgrisham dgrisham commented Sep 5, 2018

I have an application where I want to easily run commands asynchronously on nodes. While the new async update to iptb run is awesome and covers some of my use cases, there are others where I want to asynchronously run different commands on nodes. This adds that functionality.

Usage: Pass --cmdFile <cmds> to iptb run, where <cmds> is a file where each line is of the form <nodeRange>: <cmd> -- <cmd> is the command to run on all of the nodes in <nodeRange>. Each <nodeRange> should have the same format as that accepted by commands.parseRange().

Example <cmds> file:

0: echo hi
[1-3]: echo bye
[4-5]: echo hello

It is not necessary that all nodes are account for, and nodes can also appear more than once. For example, if there are 3 nodes, we can do:

0: echo hi
[0,2]: echo bye

I'm curious what others' thoughts are on the utility of this feature, and the implementation.

@travisperson
Copy link
Member

I want to asynchronously run different commands on nodes

I can see a use case for this, or at least can see that trying to accomplish this might be difficult from the cli or a shell script.

This is sort of the same idea as the topology idea in #65 / #76. The issue here though is that you are looking for all commands to start executing at or around the same time.

I think we can solve two problems here if we make it a bit more generic in our implementation, that would allow for it to also be used in the similar way to connect nodes.

A generic way to approach this would be to make every line of the file be the arguments to the command, after flags.

run

0 -- echo "Running on node 0"
[1-3] -- echo "Running on node 1,2,3"
[4-5] -- echo "Running on node 4,5"
-- echo "Running on all nodes"

connect

0 1
[2-4] [5-8]

The basic implementation to use across supported commands would be to treat everything like it was coming from a file, but the single use on the cli, would just be a file with a single line. Only the arguments section of the command would be contained in the file. No flags could be passed on a per line basis.

What are your thoughts @dgrisham?

/cc @davinci26

@dgrisham
Copy link
Member Author

dgrisham commented Sep 6, 2018

@travisperson Yeah, that all sounds great to me. I did feel like my implementation could be a bit cleaner in its connection with the CLI/existing implementation, and I think your suggestion addresses that. I can pull together an implementation of this and we can move forward from there?

@dgrisham
Copy link
Member Author

dgrisham commented Sep 6, 2018

Also -- are you thinking of having the same kind of interface as I initially suggested? Specifically having a --cmdFile flag for iptb run (and, I suppose, a --topology flag for iptb connect)?

@dgrisham
Copy link
Member Author

dgrisham commented Sep 6, 2018

That last commit reflects the approach you suggested, @travisperson.

@davinci26
Copy link

I really like the idea in general, I think it is nice superset of #76 . I would really like to see comment lines starting with # or // which is not a big overhead.

In terms of weight, if both parsers are implemented correctly and tested thoroughly then since they are an overlay to IPTB I think they will require minimal maintanance.

This was referenced Sep 6, 2018
@travisperson
Copy link
Member

I would really like to see comment lines starting with # or // which is not a big overhead.

I think we can support that. I would be an advocate for #.

Specifically having a --cmdFile flag for iptb run (and, I suppose, a --topology flag for iptb connect)?

I'm wondering what your thoughts on using stdin might be? I like the approach of using stdin because it ends up being really clean for iptb when used within a shell script (common for testing), but also supports passing a file when needed easily.

iptb run <<CMD
0     -- echo "Running on node 0"
[1-3] -- echo "Running on node 1,2,3"
CMD
iptb run < cmds.txt

Regardless I would like both run and connect to support the same usage because they both will be collecting a list of arguments to run. I think this pattern might be expanded further to other commands as well and it will be nice to keep them all similar.

We do loose the stdin for any future use, but at the moment I don't think there is a need for it. It would be useful for iptb run to pass a stdin to the command being ran on the node. However, that is tricky in itself as we'd need to buffer the entire input to pass along to multiple calls to RunCmd.

Also, with this PR, we'd want to be able to control stdin differently as you'd want to be able to supply the stdin for each line. Something we can look at down the road I think.

@dgrisham
Copy link
Member Author

dgrisham commented Sep 7, 2018

@travisperson

I'm wondering what your thoughts on using stdin might be?

I love the idea -- cleans up the interface even more and seems like a very natural approach (moreso than --cmdFile). I'm taking a stab at the implementation now, it's not immediately obvious to me what the best way to handle this code-wise is but that's what code reviews are for :)

@dgrisham
Copy link
Member Author

dgrisham commented Sep 7, 2018

Attempt made, seems to work well as far as I can tell.

@dgrisham
Copy link
Member Author

dgrisham commented Sep 7, 2018

Note on implementation: iptb run will wait for STDIN input if ran with no args and STDIN is initially empty. I think that's because I used a bufio.Scanner instead of bufio.Reader to read STDIN. Is this acceptable behavior? iptb run --help can still access the usage info, but it's overall inconsistent with how other commands response in similar cases. I think we can use bufio.Reader to check whether there's buffered input, and if not we can print the usage, which seems better to me.

Edit: Hm, after playing around with bufio.Reader it seems like that might not actually work here.

@travisperson
Copy link
Member

I think we can use bufio.Reader to check whether there's buffered input, and if not we can print the usage, which seems better to me.

Ya, there are a few ways to check if stdin has data. It's a bit tricky though to do it for all platforms nicely. There are some packages that handle it all (https://github.com/mattn/go-isatty). I want to go find how the go-ipfs project does it though it would be good to do it the same way.

wait for STDIN input if ran with no args

I think this is okay for now, works well enough for this branch at least at the moment till we can figure out how go-ipfs does it.

Have you been able to run an example? I tried and was not getting all the output I expected. I think we will need to do some scope management to make sure everything executes correctly.

@dgrisham
Copy link
Member Author

dgrisham commented Sep 7, 2018

Have you been able to run an example?

Yeah, it works in the form iptb run <cmds to me. And the example you posted before also works:

$ iptb run <<CMD
0     -- echo "Running on node 0"
[1-3] -- echo "Running on node 1,2,3"
CMD
node[0] exit 0

"Running on node 0"

node[1] exit 0

"Running on node 1,2,3"

node[2] exit 0

"Running on node 1,2,3"

node[3] exit 0

"Running on node 1,2,3"

@dgrisham
Copy link
Member Author

dgrisham commented Sep 7, 2018

Also, because I don't want to forget -- it's worth considering how commands.mapListWithOutput() is implemented. It wraps commands.mapWithOutput(), which means that it's hierarchical. So, for example, if you run:

iptb run <<CMD
[0,1] -- echo "Running on nodes 0, 1"
[2, 3] -- echo "Running on nodes 2, 3"
CMD

then mapListWithOutput() will start a goroutine that runs mapWithOutput() on [0,1], then another that runs on [2,3]. Each of those goroutines will then spawn two more goroutines, one for each of the nodes they're responsible for, that actually run the commands on each node. I think this is probably fine, but wanted to bring it up in case another approach would be preferred -- e.g. having mapListWithOutput() spawn the goroutines for all of the nodes directly that run the commands.

@dgrisham dgrisham force-pushed the feat/async-run-list branch 2 times, most recently from c61747a to 019ae29 Compare September 7, 2018 06:04
@travisperson
Copy link
Member

Ah thanks, I figured out what I was doing wrong.

commands/run.go Outdated
cArgsStr[i] = arg
}
args = append(args, cArgsStr)
terminatorPresent = append(terminatorPresent, c.IsSet("terminator"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I think we can do here to make this simpler is to normalize everything to an io.Reader that we can pass to bufio.NewScanner regardless of using stdin, or passing through arguments.

So for the non-stdin case, we construct an io.Reader using strings.NewReader, we can then just hand it to bufio.NewScanner and let it be handled the exact same way, allowing us to move that scanner logic out, and into the main execution path. This also helps clean up the some of the terminator logic, as we can just assume everything will have it present (we can inject it back in on the args case).

We only need the terminator flag for the cli args case, as the cli library will swallow it, in the stdin case, the flag terminator is always present (if the user supplied it, if they didn't it's an error anyways).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I like the abstraction but to make it happen I think we would have to reconstruct the input argument into a single string in order to pass it to strings.NewReader in the non-stdin case. This means that cli would break up the string for us, we'd put it back together, then take it apart again once we processes the command. Seems a bit roundabout, but I'm not sure how else to make that happen since we don't seem to have access to the original string once we're in the Action function.

Copy link
Member Author

@dgrisham dgrisham Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it necessarily the case that not passing -- should result in an error? Seems like the following could be supported:

iptb run <<CMD
0 "Running on node 0"
echo "Running on all nodes"
CMD

Depends on what we want here, though.

Edit Scratch that, I just realized that iptb run echo "Running on all nodes" was never supported in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that cli would break up the string for us, we'd put it back together, then take it apart again once we processes the command. Seems a bit roundabout

Ya, it does feel a bit weird, but it does provide a really simple abstraction. We don't need to do it though, as long as we think the approach we take seems simple and will be fairly straight forward to reuse.

Also, is it necessarily the case that not passing -- should result in an error?

It would be nice to make any line of the cmd file be valid arguments to the command itself. It makes it easy to copy a single line, and does not require anything special if a user decided they want to quickly tests running each line of the cmd as a separate iptb run by reading each line on their own and applying it to iptb run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the abstraction is worth it in this case, working on it now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And re:

It would be nice to make any line of the cmd file be valid arguments to the command itself.

Agreed with that, I was going for the same but was forgetting how iptb run worked.

@dgrisham
Copy link
Member Author

dgrisham commented Sep 7, 2018

@travisperson I still want to discuss the reader abstraction point you mentioned, just cleaned up the terminator logic for now.

commands/run.go Outdated
var args [][]string
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
tokens := strings.Fields(scanner.Text())
Copy link
Member Author

@dgrisham dgrisham Sep 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually bad idea -- for example, when you run iptb run 0 -- sh -c 'echo hello >file', I think the 'echo hello >file' portion needs to be passed as a single argument to the this exec function. However, the strings.Fields() function splits on spaces, which means 'echo, hello, and >file' get passed separately. Because of this, commands like that end up failing. I'm not quite sure how to handle this at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc'ing @travisperson for thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to use something like this: https://github.com/kballard/go-shellquote

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ended up going with https://github.com/mattn/go-shellwords instead (based on code/tests), same idea though.

@dgrisham
Copy link
Member Author

dgrisham commented Sep 9, 2018

Something I want to note here because it's kind of obscured in the outdated review comments now: the current version of this introduces a new dependency, https://github.com/mattn/go-shellwords. I'm not sure what the protocol is for doing something like that. There aren't a lot of packages dedicated to parsing shell strings -- the other one we found was https://github.com/kballard/go-shellquote. I chose the former package because of the number of tests it had, and for the coding style.

@travisperson
Copy link
Member

I chose the former package because of the number of tests it had, and for the coding style.

👍

the current version of this introduces a new dependency, https://github.com/mattn/go-shellwords. I'm not sure what the protocol is for doing something like that.

It would be nice to have it gxed. This basically just takes forking and adding a package.json.

Also with regard to an earlier discussion, checking if stdin has data, there is a gxed package: https://github.com/gxed/go-isatty

@dgrisham
Copy link
Member Author

dgrisham commented Sep 11, 2018

@travisperson I gx'ed the package here (moved to a different repo, see edit below) and added it to our package.json.

I also found a way to check for data on stdin in a pretty straightforward way, just using Go's os package. The current implementation allows for each of these usages:

$ iptb run 0 -- echo hi
$ iptb run <<CMD
0 -- echo hi
CMD
$ echo '0 -- echo hi' | iptb run

but gives an error if the input is just:

$ iptb run
error: no command input and stdin is empty

It basically just checks if there's something to read on stdin, or if there's an open pipe to stdin. If neither are the case, it prints an error.

Edit: Gxed packaged is now at https://github.com/gxed/go-shellwords.

Edit 2: The error output is now:

$ iptb run
NAME:
   iptb run - run command on specified nodes (or all)

USAGE:
   iptb run [nodes] -- <command...>

CATEGORY:
   CORE

error: no command input and stdin is empty

commands/run.go Outdated Show resolved Hide resolved
Usage: Pass `--cmdFile <cmds>` to `iptb run`, where `<cmds>` is a file where
each line is of the form `<nodeRange>: <cmd>` -- `<cmd>` is the command to
run on all of the nodes in `<nodeRange>`. The format of `<nodeRange>` should be
anything accepted by the `commands.parseRange()` function.

License: MIT
Signed-off-by: David Grisham <[email protected]>

rebase me
License: MIT
Signed-off-by: David Grisham <[email protected]>
License: MIT
Signed-off-by: David Grisham <[email protected]>
License: MIT
Signed-off-by: David Grisham <[email protected]>
Signed-off-by: David Grisham <[email protected]>
License: MIT
License: MIT
Signed-off-by: David Grisham <[email protected]>
License: MIT
Signed-off-by: David Grisham <[email protected]>
Also mattn/go-shellwords -> gxed/go-shellwords.

License: MIT
Signed-off-by: David Grisham <[email protected]>
License: MIT
Signed-off-by: David Grisham <[email protected]>
License: MIT
Signed-off-by: David Grisham <[email protected]>
License: MIT
Signed-off-by: David Grisham <[email protected]>
@dgrisham
Copy link
Member Author

@travisperson I think I'm happy with this in its current state. If you have a chance/haven't already, maybe make sure you're happy with the mapListWithOutput function I introduced and let me know if there's anything else you see that you think should be changed. Once we verify that we're happy with how this works, I'll update the documentation to elaborate a bit on how this works.

Copy link
Member

@travisperson travisperson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some race concerns. Otherwise I think this is good to go. When we extend it to some other commands we can possibility look at breaking it up some more. No need to go down that route yet though I think.

wg.Add(1)
go func() {
defer wg.Done()
results_i, err := mapWithOutput(list, nodes, fns[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we want to pass list and i in as parameters. I'm pretty sure this is a race issue. On small workloads it probably won't show up, but when things actually start to get schedule the value of i or list may have changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also something we need to make clear in our plugin documentation (none existed at the moment) is that they need to be thread safe, as now with this format the same node could be access at the same time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, this is my first time using goroutines and I'm still getting used to stuff like this 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And gotcha on the documentation, I can throw that in when I do this rest of the documentation as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just realized you said that there isn't any plugin documentation at all. I'll at least make an issue to remind us.

@dgrisham
Copy link
Member Author

@travisperson Race condition fixed, and documentation added.

Copy link
Member

@travisperson travisperson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dgrisham
Copy link
Member Author

dgrisham commented Sep 14, 2018

@travisperson Hey, just caught an inconsistency -- we decided to move the check for stdin/pipe input from Action to Before (here). This means that running only iptb run prints the usage/description/etc. and, after all of that, the error output.

By contrast, iptb shell just prints the error itself, and not usage/etc.:

$ iptb shell
Usage Error: shell takes exactly 1 argument

These seem like they should behave the same, do you have a preference as to which one should change?

Edit: More info, iptb testbed and iptb attr print out usage/etc., while iptb metric just prints out an error like iptb shell. I'll have iptb run mirror testbed and attr, and we can discuss standardizing this in a different issue.

@travisperson
Copy link
Member

More info, iptb testbed and iptb attr print out usage/etc., while iptb metric just prints out an error like iptb shell

Anything that is printing usage is due to having sub commands.

There is some commented out code in main.go that handles the inconsistency by supporting a UsageError error but its dependent on some code which we haven't updated too yet, and it also has a bug at the moment.

More Info

@dgrisham
Copy link
Member Author

dgrisham commented Sep 14, 2018

Ah, okay, I see. I guess that means I shouldn't have wrapped my error in NewUsageError, because when this change is applied it would end up printing the iptb run command help twice. Thanks for the explanation!

@dgrisham dgrisham merged commit 7928974 into ipfs:master Sep 14, 2018
@dgrisham dgrisham deleted the feat/async-run-list branch September 14, 2018 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants