Can multi-line comments be nested? #52

nickbattle · 2021-02-06T08:51:31Z

A request was recently raised on the VDM VSCode tool to enable nested block comments, so that you can efficiently comment out large blocks of code without concern for whether there are block comments within the range.

The parser change is simple and has been implemented, but after making the same change on Overture, objections were raised. It was proposed that the issue be discussed in the LB, since the LRM does not say whether multi-line comments can be nested or not. This should be clarified.

VDM VSCode overturetool/vdm-vscode#32
Overture overturetool/overture#774

idhugoid · 2021-02-15T13:53:49Z

I used nested comments in the languages where they are allowed. They are useful while troubleshooting huge files. I see this change as something that I could use myself in the midst of hacking session. I am not sure about its implications. One thing that it is really annoying is to make a block comment to avoid thinking about all of what is inside it, and then have to do changes in the inner text either because there was a lose comment end or other quirks of the language at hand.

One thing is sure: Block comments, nested or not, belong to checked in code as the Chef's knife belongs to a table in the dinning room.

kgpierce · 2021-02-15T14:17:24Z

Was the objection simply that the LRM does not specify this? Or were there specific arguments against it?

nickbattle · 2021-02-15T14:55:35Z

@kgpierce You can see Marcel's comments in the Overture issue - link in the top post of this issue. He was initially concerned that the ISO standard would prohibit them (but ISO doesn't allow block comments; they are an Overture feature). Then he was concerned that it was bad practice, but that he would see whether any of ESAs coding standards mentioned them. I said that I would raise it with the LB, and found half a dozen current languages that provide the feature, by way of support. I doubt the coding standards will mention this - it's quite a small issue really.

In general I agree with Hugo's thinking (which was Paul's original reason for the request): comment nesting is extremely useful for hacking, but probably ought to be discouraged for released systems. The LRM doesn't specify what's allowed; the original parser treated them as an error (stopping the outer comment at the enclosed */); the tweaked parser allows nesting.

nickbattle · 2021-02-15T14:57:21Z

Incidentally, the feature just leaked out into VDM VSCode 1.1.0 - oops :) If we decide this is to be prohibited, I will tweak the parser to allow them only if a flag is set. But at the moment, it works by default (and the UI comment-colouring works too).

leouk · 2021-02-15T16:55:58Z

I somewhat use it as well in certain circumstances, but effectively it's just like in LaTeX commenting: if you add something like

-- this is a comment

-- -- and this is an inner comment to it
-- f(x) == ...

Or something like that, works well. Usually inner comments tend to be comments to commented out definitions as above in my use case

nlmave · 2021-02-15T23:35:27Z

On 2/15/21 3:55 PM, Nick Battle wrote: .... comment nesting is extremely useful for hacking ....

... which is the root cause of my concern, which in fact is discouraged by most secure coding standards (MISRA and CERT): https://wiki.sei.cmu.edu/confluence/display/c/MSC04-C.+Use+comments+consistently+and+in+a+readable+fashion https://rules.sonarsource.com/c/RSPEC-1103, which refers to: MISRA C:2004, 2.3 - The character sequence /* shall not be used within a comment. MISRA C++:2008, 2-7-1 - The character sequence /* shall not be used within a C-style comment. MISRA C:2012, 3.1 - The character sequences /* and // shall not be used within a comment So, yes - albeit formally allowed, the secure coding community strongly discourages it and for a community that promotes a formal specification language I think we should avoid this too. I really think this is a bad idea.

leouk · 2021-02-16T08:54:47Z

Hi all Marcel is right that in the C world (and other languages) this is discouraged. Yet in Isabelle, there multiple ways of having such comments - even though they can have grounded (even proof/semantic!) meaning. For my use in VDM, I find then useful in experimenting and switching between options, but yes, like Marcel say, never at the end. So perhaps a flag? Like with -strict or cyclic check? Best Leo On 15 Feb 2021, at 23:35, Marcel Verhoef <[email protected]> wrote: ⚠ External sender. Take care when opening links or attachments. Do not provide your login details.

On 2/15/21 3:55 PM, Nick Battle wrote: .... comment nesting is extremely useful for hacking ....

... which is the root cause of my concern, which in fact is discouraged by most secure coding standards (MISRA and CERT): https://wiki.sei.cmu.edu/confluence/display/c/MSC04-C.+Use+comments+consistently+and+in+a+readable+fashion https://rules.sonarsource.com/c/RSPEC-1103, which refers to: MISRA C:2004, 2.3 - The character sequence /* shall not be used within a comment. MISRA C++:2008, 2-7-1 - The character sequence /* shall not be used within a C-style comment. MISRA C:2012, 3.1 - The character sequences /* and // shall not be used within a comment So, yes - albeit formally allowed, the secure coding community strongly discourages it and for a community that promotes a formal specification language I think we should avoid this too. I really think this is a bad idea. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#52 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAN63QN7JZ6NL5VHTISNV7TS7GVVDANCNFSM4XGCAHNQ>.

nickbattle · 2021-02-16T09:02:08Z

@nlmave Thanks for the references.

I think it would help to distinguish two things: whether we think that nested comments are problematic from a safety/style/fragility point of view, and separately what the parser should do if it encounters them.

I haven't heard any strong arguments in favour of nested comments as a "good thing". They're ugly and confusing, and arguably can lead to mistakes. Happy to be persuaded if anyone disagrees.

But I don't think it is the job of the parser (or the LRM) to enforce a particular specification style. We can choose to handle this case in different ways though. Currently, the parser does no special processing, so /* this is fine /* and so is this */ but you get an error here */. The first example in your reference silently ignores a line with this kind of parse:

/* Comment with end comment marker unintentionally omitted
security_critical_function();
/* Some other comment */

The nested parser gives an error in this case. It says that the first comment is not terminated. That is helpful, though unclear.

So perhaps we are debating how to report nested comments, rather than have the parser ignore the possibility? I think the choices are (a) to allow nested comments, (b) to warn about nested comments but allow them, (c) to give an error for nested comments, (d) to not consider nesting and parse them naively.

Currently we have (d). Paul was proposing (a). I could live with (b), but I think (c) is being strict at the expense of the convenience of large-scale temporary "spec disabling".

tomooda · 2021-02-16T09:32:10Z

I think convenient ways to temporarily spec disabling belong to tool design rather than language design.
For example, some editors provide a command to comment out multiple lines by inserting one-line comment marks.
In VSCode, ctrl-/ will do. (may depend on your shortcut settings)
This also means that it's also up to a tool design that an editor may employ a special parser/interpreter that temporarily allows nested comments.
But as default behavior, I think the naive rule of parsing comments would be preferable.

nickbattle · 2021-02-16T10:17:17Z

Interesting point about the tools, @tomooda. Overture supports this key combination too, though ironically it can get confused if you try to comment out large blocks that include block comments! (But that's a bug: it should work in principle).

nlmave · 2021-03-03T21:54:41Z

But I don't think it is the job of the parser (or the LRM) to enforce a particular specification style. I don't think this is true, many languages enforce specification/coding style in their syntax (i.e. mandatory indention in Python to enforce readability). Furthermore, comments are now often used as well as carrier of additional information (i.e. doxygen). And moreover, the static analysis tools used nowadays enforce code style rules to ensure maintainability. So from my point of view it *could* become part of the language reference - if we so wish. I.e. only allow multi-line comments outside the scope of top-level definitions and only allow single line comments in-side their scope. Note that I am not necessarily arguing that this should be done right now, but I think there is real value in making style guides enforced by tools. And this also common place in IDEs nowadays, enforcing coding style as a built-in refactoring feature. Coming back to the original point: option (b) or (c) should IMHO be followed, perhaps coupled to the "strict type-check" flag that we (used to) have. In that way we allow specifiers to use this feature inside the IDE as a temporary convenience measure, but nevertheless warn them about it.

nickbattle · 2021-03-04T10:59:55Z

We're drifting a bit, but I would distinguish between syntax and style. I think the (technical) job of the parser (lex and syntax) is to enforce the rules from the EBNF grammar. If we were to add a style checker - which might be a valuable thing - that would be a separate analysis, akin to (and applied after) the type checker, since type information could be important when applying various style rules.

The problem we have with comments is that they need to be handled by the lexical reader, but they don't occur in the grammar as such (they can occur anywhere between symbols). So unless we're ignoring the possibility of nested comments (which exposes us to the kinds of problem given in the coding standards), we have to "know about" nested block comments to read them accurately. That just leaves us with the choice of how/whether to report nesting.

There is currently a -strict flag which picks up a variety of errors that VDMJ/Overture would otherwise tolerate, like getting your inv, eq and ord clauses in the wrong order. We can use that to control how nested comments are reported, or we can add a new property.

But what should the behaviour be, both with and without a -strict flag?

nlmave · 2021-03-07T21:47:55Z

We're drifting a bit, but I would distinguish between syntax and style. I think the (technical) job of the parser (lex and syntax) is to enforce the rules from the EBNF grammar. If we were to add a style checker - which might be a valuable thing - that would be a separate analysis, akin to (and applied after) the type checker, since type information could be important when applying various style rules.

I already made the case that style = syntax, i.e. in Python using idents is mandatory and leads to syntax errors if you don't; so it IS imho a matter of language design - one that has been overlooked to date (at least that is my argument now). All arguments I've seen so far are "productivity" related but not to clean specifications and reducing likelihood of making specification errors. And the separation between lexer and parser is not as strong as you claim; in flex it is easy to handle comments (and their nesting) using local states; idem in ANTLR using channels. I tried to look up the origin of comments in the VDM standard; but I don't have a copy of the ISO standard. The VDM-SL syntax is presented in the "Modeling Systems" book, but it does not address comments at all. However, the Dawes 91 book presents comments on page 197, but only using the '--' notation and it says "annotations cannot be nested". It does not mention the C-style "/* .. */' multi-line comments at all (let alone their nesting), which may well be added by IFAD when they build VDMTools...... But let us return to the details of the topic at hand.... do we want compounded comments or not?

The problem we have with comments is that they need to be handled by the lexical reader, but they don't occur in the grammar as such (they can occur anywhere between symbols). So unless we're ignoring the possibility of nested comments (which exposes us to the kinds of problem given in the coding standards), we have to "know about" nested block comments to read them accurately. That just leaves us with the choice of how/whether to report nesting. There is currently a |-strict| flag which picks up a variety of errors that VDMJ/Overture would otherwise tolerate, like getting your |inv|, |eq| and |ord| clauses in the wrong order. We can use that to control how nested comments are reported, or we can add a new property. But what should the behaviour be, both with and without a |-strict| flag?

My proposal would be that: - default behavior would be to allow nested comments, but a warning is issued if the tools detects nested comments (as I feel strongly its use should be discouraged) - if "-strict" is used, these warnings would be raised to error level (parse errors)

nickbattle · 2021-03-08T09:48:09Z

Reviewing the policies that the CheckStyle tool supports (which is what I think of as "style"), the majority of them are syntax related rather than being type-sensitive, which was a surprise. That is, most of them could be implemented by checks in the parser rather than after type checking. But I would not want a parser that concerns itself with (say) whether a class is overriding overloaded methods (surely a sin :-).

I can see that some languages use layout as part of their "grammar", and it is perfectly possible (though I wonder what the formal grammar looks like). But VDM isn't currently defined that way and it feels awkward to introduce one special case here.

I have a vague memory of adding block comments at some point. It may have been after a query from Peter, though I can't find any emails (it would be 10-12 years ago, and before the LB was created). They are supported by VDMTools, but with the same approach as Overture (nesting is not checked for). I can only find reference to "--" comments in the standard related documents that I have.

My proposal would be that we parse nested block comments, and don't give a warning or error. This catches the kind of issue identified in the coding standards, but allows people to use nesting in a natural way. If we want to add style checks that restrict the use of the grammar - nested comments are surely ugly - we should develop that as a separate (policy driven) style analysis.

nlmave · 2021-03-14T22:00:01Z

My proposal would be that we parse nested block comments, and don't give a warning or error. This catches the kind of issue identified in the coding standards, but allows people to use nesting in a natural way. I fail to see how this solution "catches it", in fact quite the contrary, and even no warning either. Basically the suggestion is to entirely ignore my criticism to the nested comments issue.

nickbattle · 2021-03-15T09:22:43Z

I fail to see how this solution "catches it", in fact quite the contrary, and even no warning either. Basically the suggestion is to entirely ignore my criticism to the nested comments issue.

If we look at the coding standard example...

1    /* Comment with end comment marker unintentionally omitted
2    security_critical_function();
3    /* Some other comment */

The real problem here is that the user has accidentally forgotten or edited away the comment termination on line 1.

If we parse and allow nested comments, the example from the coding standards is reported as an unterminated block comment on line 1. This is why I said it is "caught", and the error location on line 1 is helpful.

If we parse and notify nested comments as a warning or error, you will get messages on lines 1 and 3. In general the nesting message is an arbitrary distance from the real mistake on line 1. It is just the next block comment down the file.

If we naively parse nested comments (ie. don't), then the bug is not caught and the security_critical_function() is missed.

I'm not ignoring your suggestion. I'm just offering an alternative which still catches the example bug and gives sensible error messages, but seems more natural to me.

[Can we get some input from others here, please?]

leouk · 2021-03-15T10:36:08Z

Hi folks,

I'd say the -strict option is the best in my view. It allows for it to be there for those who "want" to commit the sin, and enables those who don't want it at all to chase it. From the example above, clearly the comments have gone wrong, even if without (immediate) consequence. You could even have (with the -strict flag on) the warning ignored with --@warning(XXX) for whatever wicked reason you want.

L

tomooda · 2021-03-15T23:31:31Z

I'm in favor of the naive way of parsing multi-line comments and tools have the freedom to implement their own ways to snip-and-recover a chunk of the source, e.g. temporarily allowing nested comments, introducing yet another multi-line comment markers, a lightweight version control/history manager, "ignore me" markers at AST nodes, and whatever.
/* ... /* ... */ is smoke to be warned in either way, and there are tons of corner cases that either way of parsing comments can go different from the specifier's intention.
So, I see no golden solution and I'd prefer a simpler rule on the language.
And again, I think tool builders can extend their own notation for "ignore me" markers (including nested ones) that do not conflict with the language definition.

kgpierce · 2021-04-30T17:16:33Z

Hi, just catching up here. It looks like a concrete proposal would be:

Previous behaviour is moved to -strict mode ("no special processing")
Naively parse nested comments in loose mode ("i.e. don't"), and allow tools to provide additional functionality.

I admit I lost track of some of the arguments, so there may be other options here!

nickbattle · 2021-05-02T11:49:53Z

After discussing this issue at the LB meeting on 2nd May, the decision was to resolve this as follows:

The LRM should clarify that nested block comments are not handled by the language.
The tools allow alternative parsing to be enabled (warn, error, allow), but default to the LRM behaviour.
If the -strict flag is passed, this forces the LRM behaviour.

This means that the situation is clarified and the behaviour is defined, but it allows power users to enable nesting options if they have a temporary need. The alternative options are already available in VDMJ via a property; this will be back-ported to Overture, then this issue can be closed.

tomooda · 2021-07-08T10:58:08Z

overturetool/documentation#22

nickbattle added the Request for Clarification label Feb 6, 2021

nickbattle mentioned this issue Feb 6, 2021

Allow nested comments overturetool/vdm-vscode#32

Closed

nickbattle mentioned this issue May 3, 2021

Correctly colour nested comments in VDM editor overturetool/overture#775

Open

tomooda mentioned this issue Jul 8, 2021

add clarification of parsing strategy for multi-line comments overturetool/documentation#22

Merged

tomooda closed this as completed Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can multi-line comments be nested? #52

Can multi-line comments be nested? #52

nickbattle commented Feb 6, 2021

idhugoid commented Feb 15, 2021

kgpierce commented Feb 15, 2021

nickbattle commented Feb 15, 2021

nickbattle commented Feb 15, 2021

leouk commented Feb 15, 2021

nlmave commented Feb 15, 2021 via email

leouk commented Feb 16, 2021 via email

nickbattle commented Feb 16, 2021

tomooda commented Feb 16, 2021

nickbattle commented Feb 16, 2021

nlmave commented Mar 3, 2021 via email

nickbattle commented Mar 4, 2021

nlmave commented Mar 7, 2021 via email

nickbattle commented Mar 8, 2021

nlmave commented Mar 14, 2021 via email

nickbattle commented Mar 15, 2021

leouk commented Mar 15, 2021

tomooda commented Mar 15, 2021

kgpierce commented Apr 30, 2021

nickbattle commented May 2, 2021

tomooda commented Jul 8, 2021

Can multi-line comments be nested? #52

Can multi-line comments be nested? #52

Comments

nickbattle commented Feb 6, 2021

idhugoid commented Feb 15, 2021

kgpierce commented Feb 15, 2021

nickbattle commented Feb 15, 2021

nickbattle commented Feb 15, 2021

leouk commented Feb 15, 2021

nlmave commented Feb 15, 2021 via email

leouk commented Feb 16, 2021 via email

nickbattle commented Feb 16, 2021

tomooda commented Feb 16, 2021

nickbattle commented Feb 16, 2021

nlmave commented Mar 3, 2021 via email

nickbattle commented Mar 4, 2021

nlmave commented Mar 7, 2021 via email

nickbattle commented Mar 8, 2021

nlmave commented Mar 14, 2021 via email

nickbattle commented Mar 15, 2021

leouk commented Mar 15, 2021

tomooda commented Mar 15, 2021

kgpierce commented Apr 30, 2021

nickbattle commented May 2, 2021

tomooda commented Jul 8, 2021