Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::os::argparse module #1897

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

alexveden
Copy link
Contributor

std::os::argsparse module

@hwchen
Copy link
Contributor

hwchen commented Jan 27, 2025

I don't know if the API for std argparse has already been discussed (It's not obvious from a quick search of issues or looking at the test runner pr). If not, I've got opinions 😄 and code I'd be willing to donate. But if this has already been decided I don't want to derail.

@alexveden
Copy link
Contributor Author

There are a bunch of tests of argparse there in the test runner PR. So you may try to get a sense of it API. Anyway, I'm open for ideas.

@lerno
Copy link
Collaborator

lerno commented Jan 29, 2025

@hwchen did you have some feedback?

@hwchen
Copy link
Contributor

hwchen commented Jan 31, 2025

Just want to be clear that I'm not really commenting on the the current implementation. I'm more interested in whether there's a certain type of API we're looking for in an argparse module.

I come from Rust, and not C, so I'll explain in terms of those libraries.

  • Clap is very full featured. Help text generation, deriving parser using struct attributes (tags), explicit subcommands, validation, built-in API for parsing common types.
  • lexopt is very minimal, it only provides a stream of values/options.

The ripgrep crate moved away from clap to lexopt, in part to reduce dependencies, and also because lexopt would end up providing more control over arg parsing (at the cost of having to implement more boilerplate).

I feel that the current PR API sits between the two (more towards simplicity). I think for stdlib, I'd prefer either extreme; if it's simpler, more complex parsers can be built on it, and if it has more features it can be used easily as-is for more scenarios. Odin ended up with something more comprehensive (can defined opts using a struct with tags).

Also, I believe that wherever we want to sit on the spectrum, it's good to be explicit about it.


As for my own biases, I've written an arg parsing library for c3 which follows the general structure of lexopt's API. I might prefer something like it in the std library, but I can also see the appeal of other approaches. And seeing as everybody ends up writing their own argparse, there's probably a lot of other opinions out there too :)

@tomaskallup
Copy link
Contributor

tomaskallup commented Feb 1, 2025

I have a bit of feedback on this.

I feel like the API is fine, it's exactly what it says that it is, argument parser. If something more like a full blown CLI app API would be needed (to have 0 hassle subcommands and what not) I could be in another module, which would utilize argparse under the hood.

What I currently don't see is a way to provide an array option, since from the implementation it would seem that providing a single option multiple times would result in an error of "duplicated option". The value of the option could be handled by the callback function from the looks of it.

The only other thing that came to mind was a bit more "hackability", for example if I wanted to somehow implement validation of a parameter, I would have to do it myself after the parsing and I would also have to write the extra help info (if it was for example an enum). But again, this could be solved by the wrapping module, which would hold the users hand a bit more. Altough my view is similar to hwchens' above, I feel like the current implementation here is good enough and if one wants to opt-out of some of the features, they still can (for example the help option is opt-in).

Edit: I see now that the callback function can return optional, which makes the hackability possible for validation or exclusivity of options.

@alexveden
Copy link
Contributor Author

What I currently don't see is a way to provide an array option, since from the implementation it would seem that providing a single option multiple times would result in an error of "duplicated option". The value of the option could be handled by the callback function from the looks of it.

This is a kind a thing I was thinking about. I think it's common for CLI to have accumulated values, e.g. -vvvv for verbosity levels. I didn't implement arrays, because I wanted to have argparse non-allocating. But I think it may be a good idea to add multiple values, at least make it possible to do it with callbacks.

So by design, the callback mechanism is the way to extend the argparse to whatever is needed. I can refine callbacks and arrays of arguments after PR approval.

The only other thing that came to mind was a bit more "hackability", for example if I wanted to somehow implement validation of a parameter, I would have to do it myself after the parsing and I would also have to write the extra help info (if it was for example an enum).

All hackability is implemented via callbacks, or explicit param validation after parsing in the main (or other function). argparse module still does simple validation, so if you expect int type in the option value and given a string, it will raise validation error. More complex cases, should be handled by the program via callback of argparse, or after parse completes in regular code.

@tomaskallup
Copy link
Contributor

So for the arrays, just a simple flag multiple would be needed for the arg? Also requiring you to use the callback.

Since now it would call the callback once and then error. I'm fine with arrays not being available by default and requiring custom implementation.

@alexveden
Copy link
Contributor Author

FYI, I found array args impractical in most cases, I barely can remember anything I used with array args except maybe gcc :). For simple use, it's possible to use --flag + array of arguments

@tomaskallup
Copy link
Contributor

That's what most tools do, single flag with values separated by some character. But sometimes you might want those values to be arbitrary strings and there might not be a feasible separator, like when specifying ENV variables for docker etc.

@lerno
Copy link
Collaborator

lerno commented Feb 4, 2025

I am sorry this one isn't looked at yet. It's half past midnight and I don't have the time this lib deserves to check it. I'll need to push it to the weekend.

@lerno
Copy link
Collaborator

lerno commented Feb 5, 2025

Maybe I'm not the kind of audience who is using something like this, but for me it's more natural with a simpler design, as you might have guessed from the way build_options.c work.

It is quite simple: have a switch which looks at each arg, then if the arg starts with - it instead runs through the switch with - opts, and if it finds another - then that's a long opt and will be checked with the longopts.

This way checking is trivially stateful, which can be useful.

So the useful functionality is not parsing the arguments but rather:

  1. Skip an argument
  2. Check if a string (argument) is a vaild file or directory
  3. Check if a string (argument) is an int
  4. Check if a string (argument) is one in a list of values, and return that index.

What are your thoughts?

@hwchen
Copy link
Contributor

hwchen commented Feb 6, 2025

My library is also basically a big switch, but there's some additional lexing it does which I think is an improvement over just checking for - or --.

  • it's easy to group the long and short options as the same case
  • separator boilerplate: allows either ' ' or = (or unseparated short) for opt-value separator
  • handles short -abc for short flags with no values
  • lexing is fairly well tested

I had also added some convenience methods for parsing ints and paths, although I don't think they're really necessary.

I might change up the error handling a bit; I want some decent canned error messages, but I also want it to be easy to silence when the user wants custom error messages.

(Not sure how this will work on windows; lexopt had a decent amount of logic for handling windows)

Example switch in loop:

struct Opts {
	Maybe(<String>) thing;
	uint number;
	bool shout;
}

fn void! parse_cli(String[] args, Opts* opts) {
	Opter opter;
	opter.init(args);

	while (true) {
		Arg arg = opter.next()!;
		if (arg.type == EOF) break;
		switch {
			case arg.is(SHORT, 'n'):
			case arg.is(LONG, "number"):
				opts.number = opter.value()!.as_int(uint)!;

			case arg.is(LONG, "shout"):
				opts.shout = true;

			case arg.is(VALUE):
				if (!opts.thing.has_value) {
					opts.thing = { .value = arg.value.as_str(), .has_value = true };
				}

			case arg.is(LONG, "help"):
				io::printn("Usage: hello [-n|--number=NUM] [--shout] THING");

			default:
				return opter.err_unexpected_arg();
		}
	}
}

@lerno
Copy link
Collaborator

lerno commented Feb 6, 2025

Just a comment there, this:

while (true) {
		Arg arg = opter.next()!;
		if (arg.type == EOF) break;
		switch {

Should be possible to rewrite as:

while (try arg = opter.next())
{
  switch { ... }
}

@lerno
Copy link
Collaborator

lerno commented Feb 6, 2025

But I think the one benefit of a module is to create some uniform way of presenting the help. I think the actual parsing might be something of a red herring.

enum Options : (String short_opt, String long_opt, String description)
{
   UNKNOWN = { "", "", "" }, // Mandatory
   VALUE = { "", "", "The value" },
   N = { "n=NUM", "number=NUM", "The number to use" },
   SHOUT = { "", "shout", "Shout out" }   
}

In the above then, we can imagine:

@parse_opt("foo.exe [options]", strings, Options; @body(OptParser* parser, OptArg arg)
{
    switch (arg.type)
    {
        case UNKNOWN:
            // Handle anything else here
            // Example using another argument
            OptArg! next = parser.next();
            if (catch next_thing = parser.next())
            {
                return report_error("Expected an argument after %s.", arg.string);
            }
            io::printfn("The argument after was %s.", next.string);
        case VALUE: 
            // Value arguments are handled in UNKNOWN
            unreachable();
        case N:
            ...
        case SHOUT:
            ...            
    }
}    

Becoming:

Usage: foo.exe <VALUE> [options]
  <VALUE>                 The value
  -n, --number=NUM        The number to use
  --shout                 Shout out
  -h, --help              Show this help

But then we can imagine essentially a mini DSL defining things instead:

enum Options : (String short_opt, String long_opt, String description)
{
   UNKNOWN = { "EMPTY" }, 
   VALUE = { "The value" },
   N = { "n=;number=;NUM;The number to use" },
   DEBUG = { "g;debug;Use debug" }, // -g, --debug
   SHOUT = { "shout;Shout out" }   
}

It all depends on how much is in the DSL and how much is in the language. It's nice if the description and the commands are in sync, and that's the basic service provided. Then providing those utility methods for matching multiple options and so on.

@alexveden
Copy link
Contributor Author

alexveden commented Feb 6, 2025

Let me explain and reason about my design decisions for argparse. First, it's designed with Python's argparse library flavor. Which I used in many projects and find handy.

C3 argparse is designed with the following requirements in mind:

  • It abstracts grunt work of matching argument type, parsing it, and making sanity checks (e.g. int is "123" but not "asd")
  • it doesn't allocate memory, and ArgParse structure is self containing, data is stored in .values and the program resposible for its lifetime
  • It automatically checks if option is required
  • It doesn't require creating extra type dedicated only for parsing
  • It generates help printout under the hood
  • It supports default values (so the value of C3 variable which passed to argparse settings structs is the default)
  • It detects option type by the type of .value= item, e.g. .value = &foo, if foo is bool it's a flag, if int - it's a numerical, String and other (which customizes validation method without any extra configuration from user side)
  • it refines non-optionable arguments and store them as separate array of strings
  • it supports -- which in POSIX makes all following arguments (even with --... prefix) arguments
  • it supports both -s and --long option syntax
  • it supports short stacking -sev is equal to -s -e -v
  • supports "=" in --long-opt= or --long-opt
  • supports subcommands via early stopping, for example main.exe --foo command --bar -b -f, with a special flag can stop at command --bar -b -f, so you can pass this array of arguments into another instance of argparse which support command. Whereas --foo option was passed to main argparse.
  • for extensibility you would be able to pass callback function to argparse if you need to do some validation or fancy argument parsing

@hwchen alternative library looks more like syntactic sugar to me, which requires working through each argument separately, and making printout manually.

In order to have fair comparison, we need to implement the same functionality using alternative argparse libraries. And judge which one provides cleaner or shorter or more flexible code. It has to be different use case, different options and argument types, maybe sub commands.

@hwchen
Copy link
Contributor

hwchen commented Feb 6, 2025

Not sure how I feel about it, but I implemented @lerno 's suggested API on top of my library (without help generation yet; I don't think that's the difficult part).

Implementation: https://github.com/hwchen/opter-c3/blob/691e5b19ba5cc8e45418f7e717c1351072f083f3/fancy.c3
Example: https://github.com/hwchen/opter-c3/blob/691e5b19ba5cc8e45418f7e717c1351072f083f3/examples/fancy.c3

I think this demonstrates one perspective on my library: it's the minimum amount of lexing/parsing code that's still actually helpful. With the basic lexing/parsing out of the way, it's easy to focus on adding features if building a more advanced cli library, or handling some really twisted config logic in an application. To me, this gives a feeling of "hackability", which I associate with C3.

So if we end up taking this general approach of just a switch statement (and using some of my code), I think it would be nice if the "lower-level" was still exposed, even if there was a higher-level API.

@alexveden
Copy link
Contributor Author

Example of using callbacks, 2 flavors of callbacks:

fn void test_custom_type_callback_unknown_type()
{

    char val = '\0';
    ArgParseCallbackFn cbf = fn void! (ArgOpt* opt, String value) {
        io::printfn("flt--callback");
        test::eq(value, "bar");
        *anycast(opt.value, char)! = value[0];
    };
    // NOTE: pretends app struct
    ArgParse my_app_state = {};

    ArgParse agp = {
        .options = {
            {
                .short_name = 'f',
                .long_name = "flt",
                .value = &val,
                .callback = cbf
            },
            {
                .short_name = 'o',
                .long_name = "other",
                .value = &my_app_state,
                .callback = fn void! (ArgOpt* opt, String value) {
                    ArgParse* ctx = anycast(opt.value, ArgParse)!;
                    io::printfn("other--callback");
                    // NOTE: pretends to update important app struct
                    ctx.usage = value;
                }
            }
        }
    };

    String[] args = { "testprog", "--flt=bar", "--other=my_callback" };
    test::eq(val, '\0');
    test::eq(agp.options[0]._is_present, false);

    agp.parse(args)!!;

    test::eq(val, 'b');
    test::eq(my_app_state.usage, "my_callback");
    test::eq(agp.options[0]._is_present, true);
}

@alexveden
Copy link
Contributor Author

@tomaskallup With small amendments I managed to add multiple options parsing via callbacks

fn void test_custom_type_callback_int_accumulator()
{

    List(<int>) numbers;
    numbers.new_init(); 
    defer numbers.free();

    argparse::ArgParse agp = {
        .options = {
            {
                .short_name = 'n',
                .long_name = "num",
                .value = &numbers,
                .callback = fn void! (ArgOpt* opt, String value) {
                    io::printfn("value: %s", value);
                    List(<int>) * ctx = anycast(opt.value, List(<int>))!;
                    int val = value.to_integer(int)!;
                    ctx.push(val);
                }
            },
        }
    };

    String[] args = { "testprog", "--num=1", "-n", "5", "--num", "2" };
    test::eq(numbers.len(), 0);
    test::eq(agp.options[0]._is_present, false);

    agp.parse(args)!!;

    io::printfn("%s", numbers);
    test::eq(numbers.len(), 2);
    test::eq(numbers[0], 1);
    test::eq(numbers[1], 5);
    test::eq(numbers[2], 2);
    test::eq(agp.options[0]._is_present, true);
}

value: 1
value: 5
value: 2
[1, 5, 2]

@alexveden
Copy link
Contributor Author

Ok guys, if you wish moar control, why not, it's easy to add to ArgParse too :) I added ArgParse.next() which bare-metal argument processor with minimal interventions. The only thing it does it split --foo=3 into 2 consecutive calls, returning --foo and 3 separately.

This is an alternative way of parsing args (while preserving old path, so you can pick whichever you like more):

fn void test_argparse_next_all_together()
{
    List(<String>) args;
    args.new_init(); 
    defer args.free();

    String[] argv = { "testprog", "-n", "5", "--num", "2", "--foo=3", "--", "-fex", "-I./foo"};

    ArgParse agp;
	while (String arg = agp.next(argv)!!) {
	    args.push(arg);
	}
    io::printfn("%s", args);
    // prints
    // [-n, 5, --num, 2, --foo, 3, --, -fex, -I./foo]
 
}


fn void test_argparse_next_switch()
{

    List(<String>) args;
    args.new_init(); 
    defer args.free();

    ArgParse agp = {
        .description = "Test number sum program",
        .options = {
            {
                .short_name = 'a',
                .help = "a short name flag"
            },
            {
                .long_name = "flag",
                .help = "a long name flag"
            },
        }
    };
    String[] argv = { "testprog", "-n", "5", "--num", "2", "--num=3"};
    int n_sum = 0;  // expected to be 5+2+3
    while (String arg = agp.next(argv)!!) {
        switch (arg) {
            case "-n":
            case "--num":
                String value = agp.next(argv)!!;
                n_sum += value.to_integer(int)!!;
            default:
                // you may use ArgParse options only for usage display as well
                // OR make your own usage printout
                agp.print_usage()!!;
                test::eq(1, 0); // unexpected here
        }
        args.push(arg);
    }

    io::printfn("%s", args);
    test::eq(args.len(), 3);
    // NOTE: we skipped values in switch "--num" handler
    test::eq(args[0], "-n");
    test::eq(args[1], "--num");
    test::eq(args[2], "--num");
    test::eq(n_sum, 5+2+3);
}

@alexveden
Copy link
Contributor Author

So currently I evolve argparse module as side project, if you greenlight this, I'll add updated version + unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants