std::os::argparse module #1897

alexveden · 2025-01-27T13:46:16Z

std::os::argsparse module

hwchen · 2025-01-27T15:49:18Z

I don't know if the API for std argparse has already been discussed (It's not obvious from a quick search of issues or looking at the test runner pr). If not, I've got opinions 😄 and code I'd be willing to donate. But if this has already been decided I don't want to derail.

alexveden · 2025-01-27T16:09:59Z

There are a bunch of tests of argparse there in the test runner PR. So you may try to get a sense of it API. Anyway, I'm open for ideas.

lerno · 2025-01-29T20:46:09Z

@hwchen did you have some feedback?

hwchen · 2025-01-31T04:39:01Z

Just want to be clear that I'm not really commenting on the the current implementation. I'm more interested in whether there's a certain type of API we're looking for in an argparse module.

I come from Rust, and not C, so I'll explain in terms of those libraries.

Clap is very full featured. Help text generation, deriving parser using struct attributes (tags), explicit subcommands, validation, built-in API for parsing common types.
lexopt is very minimal, it only provides a stream of values/options.

The ripgrep crate moved away from clap to lexopt, in part to reduce dependencies, and also because lexopt would end up providing more control over arg parsing (at the cost of having to implement more boilerplate).

I feel that the current PR API sits between the two (more towards simplicity). I think for stdlib, I'd prefer either extreme; if it's simpler, more complex parsers can be built on it, and if it has more features it can be used easily as-is for more scenarios. Odin ended up with something more comprehensive (can defined opts using a struct with tags).

Also, I believe that wherever we want to sit on the spectrum, it's good to be explicit about it.

As for my own biases, I've written an arg parsing library for c3 which follows the general structure of lexopt's API. I might prefer something like it in the std library, but I can also see the appeal of other approaches. And seeing as everybody ends up writing their own argparse, there's probably a lot of other opinions out there too :)

tomaskallup · 2025-02-01T21:43:36Z

I have a bit of feedback on this.

I feel like the API is fine, it's exactly what it says that it is, argument parser. If something more like a full blown CLI app API would be needed (to have 0 hassle subcommands and what not) I could be in another module, which would utilize argparse under the hood.

What I currently don't see is a way to provide an array option, since from the implementation it would seem that providing a single option multiple times would result in an error of "duplicated option". The value of the option could be handled by the callback function from the looks of it.

The only other thing that came to mind was a bit more "hackability", for example if I wanted to somehow implement validation of a parameter, I would have to do it myself after the parsing and I would also have to write the extra help info (if it was for example an enum). But again, this could be solved by the wrapping module, which would hold the users hand a bit more. Altough my view is similar to hwchens' above, I feel like the current implementation here is good enough and if one wants to opt-out of some of the features, they still can (for example the help option is opt-in).

Edit: I see now that the callback function can return optional, which makes the hackability possible for validation or exclusivity of options.

alexveden · 2025-02-02T13:39:57Z

What I currently don't see is a way to provide an array option, since from the implementation it would seem that providing a single option multiple times would result in an error of "duplicated option". The value of the option could be handled by the callback function from the looks of it.

This is a kind a thing I was thinking about. I think it's common for CLI to have accumulated values, e.g. -vvvv for verbosity levels. I didn't implement arrays, because I wanted to have argparse non-allocating. But I think it may be a good idea to add multiple values, at least make it possible to do it with callbacks.

So by design, the callback mechanism is the way to extend the argparse to whatever is needed. I can refine callbacks and arrays of arguments after PR approval.

The only other thing that came to mind was a bit more "hackability", for example if I wanted to somehow implement validation of a parameter, I would have to do it myself after the parsing and I would also have to write the extra help info (if it was for example an enum).

All hackability is implemented via callbacks, or explicit param validation after parsing in the main (or other function). argparse module still does simple validation, so if you expect int type in the option value and given a string, it will raise validation error. More complex cases, should be handled by the program via callback of argparse, or after parse completes in regular code.

tomaskallup · 2025-02-02T14:08:56Z

So for the arrays, just a simple flag multiple would be needed for the arg? Also requiring you to use the callback.

Since now it would call the callback once and then error. I'm fine with arrays not being available by default and requiring custom implementation.

alexveden · 2025-02-02T15:48:22Z

FYI, I found array args impractical in most cases, I barely can remember anything I used with array args except maybe gcc :). For simple use, it's possible to use --flag + array of arguments

tomaskallup · 2025-02-02T19:26:35Z

That's what most tools do, single flag with values separated by some character. But sometimes you might want those values to be arbitrary strings and there might not be a feasible separator, like when specifying ENV variables for docker etc.

lerno · 2025-02-04T23:33:19Z

I am sorry this one isn't looked at yet. It's half past midnight and I don't have the time this lib deserves to check it. I'll need to push it to the weekend.

lerno · 2025-02-05T23:39:11Z

Maybe I'm not the kind of audience who is using something like this, but for me it's more natural with a simpler design, as you might have guessed from the way build_options.c work.

It is quite simple: have a switch which looks at each arg, then if the arg starts with - it instead runs through the switch with - opts, and if it finds another - then that's a long opt and will be checked with the longopts.

This way checking is trivially stateful, which can be useful.

So the useful functionality is not parsing the arguments but rather:

Skip an argument
Check if a string (argument) is a vaild file or directory
Check if a string (argument) is an int
Check if a string (argument) is one in a list of values, and return that index.

What are your thoughts?

hwchen · 2025-02-06T02:46:22Z

My library is also basically a big switch, but there's some additional lexing it does which I think is an improvement over just checking for - or --.

it's easy to group the long and short options as the same case
separator boilerplate: allows either ' ' or = (or unseparated short) for opt-value separator
handles short -abc for short flags with no values
lexing is fairly well tested

I had also added some convenience methods for parsing ints and paths, although I don't think they're really necessary.

I might change up the error handling a bit; I want some decent canned error messages, but I also want it to be easy to silence when the user wants custom error messages.

(Not sure how this will work on windows; lexopt had a decent amount of logic for handling windows)

Example switch in loop:

struct Opts {
	Maybe(<String>) thing;
	uint number;
	bool shout;
}

fn void! parse_cli(String[] args, Opts* opts) {
	Opter opter;
	opter.init(args);

	while (true) {
		Arg arg = opter.next()!;
		if (arg.type == EOF) break;
		switch {
			case arg.is(SHORT, 'n'):
			case arg.is(LONG, "number"):
				opts.number = opter.value()!.as_int(uint)!;

			case arg.is(LONG, "shout"):
				opts.shout = true;

			case arg.is(VALUE):
				if (!opts.thing.has_value) {
					opts.thing = { .value = arg.value.as_str(), .has_value = true };
				}

			case arg.is(LONG, "help"):
				io::printn("Usage: hello [-n|--number=NUM] [--shout] THING");

			default:
				return opter.err_unexpected_arg();
		}
	}
}

lerno · 2025-02-06T11:00:16Z

Just a comment there, this:

while (true) {
		Arg arg = opter.next()!;
		if (arg.type == EOF) break;
		switch {

Should be possible to rewrite as:

while (try arg = opter.next())
{
  switch { ... }
}

lerno · 2025-02-06T11:26:50Z

But I think the one benefit of a module is to create some uniform way of presenting the help. I think the actual parsing might be something of a red herring.

enum Options : (String short_opt, String long_opt, String description)
{
   UNKNOWN = { "", "", "" }, // Mandatory
   VALUE = { "", "", "The value" },
   N = { "n=NUM", "number=NUM", "The number to use" },
   SHOUT = { "", "shout", "Shout out" }   
}

In the above then, we can imagine:

@parse_opt("foo.exe [options]", strings, Options; @body(OptParser* parser, OptArg arg)
{
    switch (arg.type)
    {
        case UNKNOWN:
            // Handle anything else here
            // Example using another argument
            OptArg! next = parser.next();
            if (catch next_thing = parser.next())
            {
                return report_error("Expected an argument after %s.", arg.string);
            }
            io::printfn("The argument after was %s.", next.string);
        case VALUE: 
            // Value arguments are handled in UNKNOWN
            unreachable();
        case N:
            ...
        case SHOUT:
            ...            
    }
}

Becoming:

Usage: foo.exe <VALUE> [options]
  <VALUE>                 The value
  -n, --number=NUM        The number to use
  --shout                 Shout out
  -h, --help              Show this help

But then we can imagine essentially a mini DSL defining things instead:

enum Options : (String short_opt, String long_opt, String description)
{
   UNKNOWN = { "EMPTY" }, 
   VALUE = { "The value" },
   N = { "n=;number=;NUM;The number to use" },
   DEBUG = { "g;debug;Use debug" }, // -g, --debug
   SHOUT = { "shout;Shout out" }   
}

It all depends on how much is in the DSL and how much is in the language. It's nice if the description and the commands are in sync, and that's the basic service provided. Then providing those utility methods for matching multiple options and so on.

alexveden · 2025-02-06T12:09:46Z

Let me explain and reason about my design decisions for argparse. First, it's designed with Python's argparse library flavor. Which I used in many projects and find handy.

C3 argparse is designed with the following requirements in mind:

It abstracts grunt work of matching argument type, parsing it, and making sanity checks (e.g. int is "123" but not "asd")
it doesn't allocate memory, and ArgParse structure is self containing, data is stored in .values and the program resposible for its lifetime
It automatically checks if option is required
It doesn't require creating extra type dedicated only for parsing
It generates help printout under the hood
It supports default values (so the value of C3 variable which passed to argparse settings structs is the default)
It detects option type by the type of .value= item, e.g. .value = &foo, if foo is bool it's a flag, if int - it's a numerical, String and other (which customizes validation method without any extra configuration from user side)
it refines non-optionable arguments and store them as separate array of strings
it supports -- which in POSIX makes all following arguments (even with --... prefix) arguments
it supports both -s and --long option syntax
it supports short stacking -sev is equal to -s -e -v
supports "=" in --long-opt= or --long-opt
supports subcommands via early stopping, for example main.exe --foo command --bar -b -f, with a special flag can stop at command --bar -b -f, so you can pass this array of arguments into another instance of argparse which support command. Whereas --foo option was passed to main argparse.
for extensibility you would be able to pass callback function to argparse if you need to do some validation or fancy argument parsing

@hwchen alternative library looks more like syntactic sugar to me, which requires working through each argument separately, and making printout manually.

In order to have fair comparison, we need to implement the same functionality using alternative argparse libraries. And judge which one provides cleaner or shorter or more flexible code. It has to be different use case, different options and argument types, maybe sub commands.

hwchen · 2025-02-06T16:25:39Z

Not sure how I feel about it, but I implemented @lerno 's suggested API on top of my library (without help generation yet; I don't think that's the difficult part).

Implementation: https://github.com/hwchen/opter-c3/blob/691e5b19ba5cc8e45418f7e717c1351072f083f3/fancy.c3
Example: https://github.com/hwchen/opter-c3/blob/691e5b19ba5cc8e45418f7e717c1351072f083f3/examples/fancy.c3

I think this demonstrates one perspective on my library: it's the minimum amount of lexing/parsing code that's still actually helpful. With the basic lexing/parsing out of the way, it's easy to focus on adding features if building a more advanced cli library, or handling some really twisted config logic in an application. To me, this gives a feeling of "hackability", which I associate with C3.

So if we end up taking this general approach of just a switch statement (and using some of my code), I think it would be nice if the "lower-level" was still exposed, even if there was a higher-level API.

alexveden · 2025-02-06T17:35:47Z

Example of using callbacks, 2 flavors of callbacks:

fn void test_custom_type_callback_unknown_type()
{

    char val = '\0';
    ArgParseCallbackFn cbf = fn void! (ArgOpt* opt, String value) {
        io::printfn("flt--callback");
        test::eq(value, "bar");
        *anycast(opt.value, char)! = value[0];
    };
    // NOTE: pretends app struct
    ArgParse my_app_state = {};

    ArgParse agp = {
        .options = {
            {
                .short_name = 'f',
                .long_name = "flt",
                .value = &val,
                .callback = cbf
            },
            {
                .short_name = 'o',
                .long_name = "other",
                .value = &my_app_state,
                .callback = fn void! (ArgOpt* opt, String value) {
                    ArgParse* ctx = anycast(opt.value, ArgParse)!;
                    io::printfn("other--callback");
                    // NOTE: pretends to update important app struct
                    ctx.usage = value;
                }
            }
        }
    };

    String[] args = { "testprog", "--flt=bar", "--other=my_callback" };
    test::eq(val, '\0');
    test::eq(agp.options[0]._is_present, false);

    agp.parse(args)!!;

    test::eq(val, 'b');
    test::eq(my_app_state.usage, "my_callback");
    test::eq(agp.options[0]._is_present, true);
}

alexveden · 2025-02-06T19:56:28Z

@tomaskallup With small amendments I managed to add multiple options parsing via callbacks

fn void test_custom_type_callback_int_accumulator()
{

    List(<int>) numbers;
    numbers.new_init(); 
    defer numbers.free();

    argparse::ArgParse agp = {
        .options = {
            {
                .short_name = 'n',
                .long_name = "num",
                .value = &numbers,
                .callback = fn void! (ArgOpt* opt, String value) {
                    io::printfn("value: %s", value);
                    List(<int>) * ctx = anycast(opt.value, List(<int>))!;
                    int val = value.to_integer(int)!;
                    ctx.push(val);
                }
            },
        }
    };

    String[] args = { "testprog", "--num=1", "-n", "5", "--num", "2" };
    test::eq(numbers.len(), 0);
    test::eq(agp.options[0]._is_present, false);

    agp.parse(args)!!;

    io::printfn("%s", numbers);
    test::eq(numbers.len(), 2);
    test::eq(numbers[0], 1);
    test::eq(numbers[1], 5);
    test::eq(numbers[2], 2);
    test::eq(agp.options[0]._is_present, true);
}


value: 1
value: 5
value: 2
[1, 5, 2]

alexveden · 2025-02-07T07:45:53Z

Ok guys, if you wish moar control, why not, it's easy to add to ArgParse too :) I added ArgParse.next() which bare-metal argument processor with minimal interventions. The only thing it does it split --foo=3 into 2 consecutive calls, returning --foo and 3 separately.

This is an alternative way of parsing args (while preserving old path, so you can pick whichever you like more):

fn void test_argparse_next_all_together()
{
    List(<String>) args;
    args.new_init(); 
    defer args.free();

    String[] argv = { "testprog", "-n", "5", "--num", "2", "--foo=3", "--", "-fex", "-I./foo"};

    ArgParse agp;
	while (String arg = agp.next(argv)!!) {
	    args.push(arg);
	}
    io::printfn("%s", args);
    // prints
    // [-n, 5, --num, 2, --foo, 3, --, -fex, -I./foo]
 
}


fn void test_argparse_next_switch()
{

    List(<String>) args;
    args.new_init(); 
    defer args.free();

    ArgParse agp = {
        .description = "Test number sum program",
        .options = {
            {
                .short_name = 'a',
                .help = "a short name flag"
            },
            {
                .long_name = "flag",
                .help = "a long name flag"
            },
        }
    };
    String[] argv = { "testprog", "-n", "5", "--num", "2", "--num=3"};
    int n_sum = 0;  // expected to be 5+2+3
    while (String arg = agp.next(argv)!!) {
        switch (arg) {
            case "-n":
            case "--num":
                String value = agp.next(argv)!!;
                n_sum += value.to_integer(int)!!;
            default:
                // you may use ArgParse options only for usage display as well
                // OR make your own usage printout
                agp.print_usage()!!;
                test::eq(1, 0); // unexpected here
        }
        args.push(arg);
    }

    io::printfn("%s", args);
    test::eq(args.len(), 3);
    // NOTE: we skipped values in switch "--num" handler
    test::eq(args[0], "-n");
    test::eq(args[1], "--num");
    test::eq(args[2], "--num");
    test::eq(n_sum, 5+2+3);
}

alexveden · 2025-02-07T07:56:01Z

So currently I evolve argparse module as side project, if you greenlight this, I'll add updated version + unit tests.

…+ tests

…evel parser + unit test module ported

…alexveden/args_parse

alexveden added 7 commits January 10, 2025 11:06

allowing c3c test -- <args1>

4034049

Merge branch 'master' of github.com:c3lang/c3c

111d3a5

Merge branch 'master' of github.com:c3lang/c3c

7b8b3d2

Merge branches 'master' and 'master' of github.com:c3lang/c3c

2162c30

Merge branch 'master' of github.com:c3lang/c3c

d51628c

console argument parsing module

0e17819

std::os::argparse module

473e519

Merge branch 'c3lang:master' into alexveden/args_parse

e207013

alexveden added 2 commits February 7, 2025 13:00

Added support of multi arguments + ArgParse.next() low-level parsing …

1b45bf4

…+ tests

Merge branch 'master' into argparse

afafecf

alexveden added 4 commits February 7, 2025 13:03

Merge branch 'argparse' into alexveden/args_parse

4b782b4

ArgParse added .allow_many for multiple args, ArgParse.next() low l…

fd8fb72

…evel parser + unit test module ported

Merge branch 'alexveden/args_parse' of github.com:alexveden/c3c into …

83c2d82

…alexveden/args_parse

fixed typos and some cleanup

4a4f0dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::os::argparse module #1897

std::os::argparse module #1897

alexveden commented Jan 27, 2025

hwchen commented Jan 27, 2025 •

edited

Loading

alexveden commented Jan 27, 2025

lerno commented Jan 29, 2025

hwchen commented Jan 31, 2025

tomaskallup commented Feb 1, 2025 •

edited

Loading

alexveden commented Feb 2, 2025

tomaskallup commented Feb 2, 2025

alexveden commented Feb 2, 2025

tomaskallup commented Feb 2, 2025

lerno commented Feb 4, 2025

lerno commented Feb 5, 2025

hwchen commented Feb 6, 2025 •

edited

Loading

lerno commented Feb 6, 2025

lerno commented Feb 6, 2025

alexveden commented Feb 6, 2025 •

edited

Loading

hwchen commented Feb 6, 2025

alexveden commented Feb 6, 2025

alexveden commented Feb 6, 2025

alexveden commented Feb 7, 2025

alexveden commented Feb 7, 2025

std::os::argparse module #1897

Are you sure you want to change the base?

std::os::argparse module #1897

Conversation

alexveden commented Jan 27, 2025

hwchen commented Jan 27, 2025 • edited Loading

alexveden commented Jan 27, 2025

lerno commented Jan 29, 2025

hwchen commented Jan 31, 2025

tomaskallup commented Feb 1, 2025 • edited Loading

alexveden commented Feb 2, 2025

tomaskallup commented Feb 2, 2025

alexveden commented Feb 2, 2025

tomaskallup commented Feb 2, 2025

lerno commented Feb 4, 2025

lerno commented Feb 5, 2025

hwchen commented Feb 6, 2025 • edited Loading

lerno commented Feb 6, 2025

lerno commented Feb 6, 2025

alexveden commented Feb 6, 2025 • edited Loading

hwchen commented Feb 6, 2025

alexveden commented Feb 6, 2025

alexveden commented Feb 6, 2025

alexveden commented Feb 7, 2025

alexveden commented Feb 7, 2025

hwchen commented Jan 27, 2025 •

edited

Loading

tomaskallup commented Feb 1, 2025 •

edited

Loading

hwchen commented Feb 6, 2025 •

edited

Loading

alexveden commented Feb 6, 2025 •

edited

Loading