Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can multi-line comments be nested? #52

Closed
nickbattle opened this issue Feb 6, 2021 · 21 comments
Closed

Can multi-line comments be nested? #52

nickbattle opened this issue Feb 6, 2021 · 21 comments

Comments

@nickbattle
Copy link
Contributor

A request was recently raised on the VDM VSCode tool to enable nested block comments, so that you can efficiently comment out large blocks of code without concern for whether there are block comments within the range.

The parser change is simple and has been implemented, but after making the same change on Overture, objections were raised. It was proposed that the issue be discussed in the LB, since the LRM does not say whether multi-line comments can be nested or not. This should be clarified.

VDM VSCode overturetool/vdm-vscode#32
Overture overturetool/overture#774

@idhugoid
Copy link

I used nested comments in the languages where they are allowed. They are useful while troubleshooting huge files. I see this change as something that I could use myself in the midst of hacking session. I am not sure about its implications. One thing that it is really annoying is to make a block comment to avoid thinking about all of what is inside it, and then have to do changes in the inner text either because there was a lose comment end or other quirks of the language at hand.

One thing is sure: Block comments, nested or not, belong to checked in code as the Chef's knife belongs to a table in the dinning room.

@kgpierce
Copy link
Contributor

Was the objection simply that the LRM does not specify this? Or were there specific arguments against it?

@nickbattle
Copy link
Contributor Author

@kgpierce You can see Marcel's comments in the Overture issue - link in the top post of this issue. He was initially concerned that the ISO standard would prohibit them (but ISO doesn't allow block comments; they are an Overture feature). Then he was concerned that it was bad practice, but that he would see whether any of ESAs coding standards mentioned them. I said that I would raise it with the LB, and found half a dozen current languages that provide the feature, by way of support. I doubt the coding standards will mention this - it's quite a small issue really.

In general I agree with Hugo's thinking (which was Paul's original reason for the request): comment nesting is extremely useful for hacking, but probably ought to be discouraged for released systems. The LRM doesn't specify what's allowed; the original parser treated them as an error (stopping the outer comment at the enclosed */); the tweaked parser allows nesting.

@nickbattle
Copy link
Contributor Author

Incidentally, the feature just leaked out into VDM VSCode 1.1.0 - oops :) If we decide this is to be prohibited, I will tweak the parser to allow them only if a flag is set. But at the moment, it works by default (and the UI comment-colouring works too).

@leouk
Copy link

leouk commented Feb 15, 2021

I somewhat use it as well in certain circumstances, but effectively it's just like in LaTeX commenting: if you add something like

-- this is a comment

-- -- and this is an inner comment to it
-- f(x) == ...

Or something like that, works well. Usually inner comments tend to be comments to commented out definitions as above in my use case

@nlmave
Copy link
Contributor

nlmave commented Feb 15, 2021 via email

@leouk
Copy link

leouk commented Feb 16, 2021 via email

@nickbattle
Copy link
Contributor Author

@nlmave Thanks for the references.

I think it would help to distinguish two things: whether we think that nested comments are problematic from a safety/style/fragility point of view, and separately what the parser should do if it encounters them.

I haven't heard any strong arguments in favour of nested comments as a "good thing". They're ugly and confusing, and arguably can lead to mistakes. Happy to be persuaded if anyone disagrees.

But I don't think it is the job of the parser (or the LRM) to enforce a particular specification style. We can choose to handle this case in different ways though. Currently, the parser does no special processing, so /* this is fine /* and so is this */ but you get an error here */. The first example in your reference silently ignores a line with this kind of parse:

/* Comment with end comment marker unintentionally omitted
security_critical_function();
/* Some other comment */

The nested parser gives an error in this case. It says that the first comment is not terminated. That is helpful, though unclear.

So perhaps we are debating how to report nested comments, rather than have the parser ignore the possibility? I think the choices are (a) to allow nested comments, (b) to warn about nested comments but allow them, (c) to give an error for nested comments, (d) to not consider nesting and parse them naively.

Currently we have (d). Paul was proposing (a). I could live with (b), but I think (c) is being strict at the expense of the convenience of large-scale temporary "spec disabling".

@tomooda
Copy link
Member

tomooda commented Feb 16, 2021

I think convenient ways to temporarily spec disabling belong to tool design rather than language design.
For example, some editors provide a command to comment out multiple lines by inserting one-line comment marks.
In VSCode, ctrl-/ will do. (may depend on your shortcut settings)
This also means that it's also up to a tool design that an editor may employ a special parser/interpreter that temporarily allows nested comments.
But as default behavior, I think the naive rule of parsing comments would be preferable.

@nickbattle
Copy link
Contributor Author

Interesting point about the tools, @tomooda. Overture supports this key combination too, though ironically it can get confused if you try to comment out large blocks that include block comments! (But that's a bug: it should work in principle).

@nlmave
Copy link
Contributor

nlmave commented Mar 3, 2021 via email

@nickbattle
Copy link
Contributor Author

We're drifting a bit, but I would distinguish between syntax and style. I think the (technical) job of the parser (lex and syntax) is to enforce the rules from the EBNF grammar. If we were to add a style checker - which might be a valuable thing - that would be a separate analysis, akin to (and applied after) the type checker, since type information could be important when applying various style rules.

The problem we have with comments is that they need to be handled by the lexical reader, but they don't occur in the grammar as such (they can occur anywhere between symbols). So unless we're ignoring the possibility of nested comments (which exposes us to the kinds of problem given in the coding standards), we have to "know about" nested block comments to read them accurately. That just leaves us with the choice of how/whether to report nesting.

There is currently a -strict flag which picks up a variety of errors that VDMJ/Overture would otherwise tolerate, like getting your inv, eq and ord clauses in the wrong order. We can use that to control how nested comments are reported, or we can add a new property.

But what should the behaviour be, both with and without a -strict flag?

@nlmave
Copy link
Contributor

nlmave commented Mar 7, 2021 via email

@nickbattle
Copy link
Contributor Author

Reviewing the policies that the CheckStyle tool supports (which is what I think of as "style"), the majority of them are syntax related rather than being type-sensitive, which was a surprise. That is, most of them could be implemented by checks in the parser rather than after type checking. But I would not want a parser that concerns itself with (say) whether a class is overriding overloaded methods (surely a sin :-).

I can see that some languages use layout as part of their "grammar", and it is perfectly possible (though I wonder what the formal grammar looks like). But VDM isn't currently defined that way and it feels awkward to introduce one special case here.

I have a vague memory of adding block comments at some point. It may have been after a query from Peter, though I can't find any emails (it would be 10-12 years ago, and before the LB was created). They are supported by VDMTools, but with the same approach as Overture (nesting is not checked for). I can only find reference to "--" comments in the standard related documents that I have.

My proposal would be that we parse nested block comments, and don't give a warning or error. This catches the kind of issue identified in the coding standards, but allows people to use nesting in a natural way. If we want to add style checks that restrict the use of the grammar - nested comments are surely ugly - we should develop that as a separate (policy driven) style analysis.

@nlmave
Copy link
Contributor

nlmave commented Mar 14, 2021 via email

@nickbattle
Copy link
Contributor Author

I fail to see how this solution "catches it", in fact quite the contrary, and even no warning either. Basically the suggestion is to entirely ignore my criticism to the nested comments issue.

If we look at the coding standard example...

1    /* Comment with end comment marker unintentionally omitted
2    security_critical_function();
3    /* Some other comment */

The real problem here is that the user has accidentally forgotten or edited away the comment termination on line 1.

If we parse and allow nested comments, the example from the coding standards is reported as an unterminated block comment on line 1. This is why I said it is "caught", and the error location on line 1 is helpful.

If we parse and notify nested comments as a warning or error, you will get messages on lines 1 and 3. In general the nesting message is an arbitrary distance from the real mistake on line 1. It is just the next block comment down the file.

If we naively parse nested comments (ie. don't), then the bug is not caught and the security_critical_function() is missed.

I'm not ignoring your suggestion. I'm just offering an alternative which still catches the example bug and gives sensible error messages, but seems more natural to me.

[Can we get some input from others here, please?]

@leouk
Copy link

leouk commented Mar 15, 2021

Hi folks,

I'd say the -strict option is the best in my view. It allows for it to be there for those who "want" to commit the sin, and enables those who don't want it at all to chase it. From the example above, clearly the comments have gone wrong, even if without (immediate) consequence. You could even have (with the -strict flag on) the warning ignored with --@warning(XXX) for whatever wicked reason you want.

L

@tomooda
Copy link
Member

tomooda commented Mar 15, 2021

I'm in favor of the naive way of parsing multi-line comments and tools have the freedom to implement their own ways to snip-and-recover a chunk of the source, e.g. temporarily allowing nested comments, introducing yet another multi-line comment markers, a lightweight version control/history manager, "ignore me" markers at AST nodes, and whatever.
/* ... /* ... */ is smoke to be warned in either way, and there are tons of corner cases that either way of parsing comments can go different from the specifier's intention.
So, I see no golden solution and I'd prefer a simpler rule on the language.
And again, I think tool builders can extend their own notation for "ignore me" markers (including nested ones) that do not conflict with the language definition.

@kgpierce
Copy link
Contributor

Hi, just catching up here. It looks like a concrete proposal would be:

  • Previous behaviour is moved to -strict mode ("no special processing")
  • Naively parse nested comments in loose mode ("i.e. don't"), and allow tools to provide additional functionality.

I admit I lost track of some of the arguments, so there may be other options here!

@nickbattle
Copy link
Contributor Author

After discussing this issue at the LB meeting on 2nd May, the decision was to resolve this as follows:

  • The LRM should clarify that nested block comments are not handled by the language.
  • The tools allow alternative parsing to be enabled (warn, error, allow), but default to the LRM behaviour.
  • If the -strict flag is passed, this forces the LRM behaviour.

This means that the situation is clarified and the behaviour is defined, but it allows power users to enable nesting options if they have a temporary need. The alternative options are already available in VDMJ via a property; this will be back-ported to Overture, then this issue can be closed.

@tomooda
Copy link
Member

tomooda commented Jul 8, 2021

@tomooda tomooda closed this as completed Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants