Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize comment parsing #3030

Conversation

InsertCreativityHere
Copy link
Member

@InsertCreativityHere InsertCreativityHere commented Oct 31, 2024

This PR removes a bunch of duplicate code we have for parsing comments. Most from slice2matlab and slice2swift.
Originally I was just going to fix this in Swift (see #2631), but I just went ahead and ripped it all out at once.
While I was at it, I also accidentally found/fixed some small bugs in our doc comment generation along the way.


Currently we have code for parsing Slice doc comments scattered around the various compilers, much of it nearly identical.
Sometimes we even invent a new type for holding the comment data DocElement; a near duplicate of Comment.

The main reason for this duplication was that every language had it's own formatting for doc-links.
My 'fix' for this is that now you can pass in an std::function to parseComment which is used to handle language-specific link formatting.

/// Returns a doxygen formatted link to the provided Slice identifier.
string cppLinkFormatter(string identifier)
{
return "{@link " + fixKwd(identifier) + "}";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we weren't escaping names in C++ links...
Linking to an identifier that is a C++ keyword is a very niche case, but still good to fix.

/// Returns a javadoc formatted link to the provided Slice identifier.
string javaLinkFormatter(string identifier)
{
return "{@link " + Slice::JavaGenerator::fixKwd(identifier) + "}";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly in Java, we forgot to escape our linked-to identifiers...

@@ -1753,28 +1761,6 @@ Slice::JavaVisitor::writeDataMemberInitializers(Output& out, const DataMemberLis
}
}

StringList
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely dead code.

@@ -780,7 +780,7 @@ Slice::JavaGenerator::output() const
// otherwise, return the name unchanged.
//
string
Slice::JavaGenerator::fixKwd(const string& name) const
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was switched from a const instance function to a static function.
There was no reason for it to not be static.

@@ -510,47 +489,43 @@ Slice::JsVisitor::writeConstantValue(
return os.str();
}

StringList
Slice::JsVisitor::splitComment(const ContainedPtr& p)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After refactoring, this function was only used by writeDocCommentFor (the function underneath it). So I just inlined it into that function.

Comment on lines 359 to 363
CommentPtr doc = p->parseComment(swiftLinkFormatter, true, true);
if (!doc)
{
return;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is that the parseComment in the parser can return nullptr if there is no comment, whereas the old slice2swift specific parseComment would always return a DocElements.

So, now we do need to check if nullptr was returned before generating anything.

// Skip if there are no doc comments.
StringList docOverview = doc->overview();
StringList docMisc = doc->misc();
StringList docSeeAlso = doc->seeAlso(); // TODO we check seeAlso, but don't write any of in the generated comment?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are definitely still bugs in the slice2swift doc generation, I TODOed the ones I found, and plan to fix them when I fix #2274.

@@ -375,8 +375,10 @@ namespace Slice
int line() const;

std::string comment() const;
CommentPtr parseComment(bool stripMarkup) const;
CommentPtr parseComment(const std::string& text, bool stripMarkup) const;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need one function, the 2nd one was a hack-around to let ice2slice massage it's links before parsing. The new linkFormatter approach solves this problem though.

@@ -853,6 +848,7 @@ Slice::Contained::parseComment(const string& text, bool stripMarkup) const
{
if (!line.empty())
{
state = StateMisc; // Reset the state; `@see` cannot span multiple lines.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes a bug where we'd sometimes generate multi-line see statements.
These should always be a single line like: @see MyType.op79

Comment on lines 666 to 667
// Fix any javadoc using the provided link formatter.
const string link = "{@link ";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should always be a space after the @link.
This code would let through things like {@linkStringSeq} which is bogus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this update break the link if you use \t or \n after {@link. A corner case...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@link cannot span multiple lines, so \n shouldn't be a problem.
But, you're right that \t used to work and won't anymore. But I'm pretty sure it only worked by accident.
Using \t here would be putting tabs between the words in a sentence. It would work, but it's a very strange thing to do. I'm not convinced any sane person is doing this.

Comment on lines 276 to 278
/// Returns a doxygen formatted link to the provided Slice identifier.
string cppLinkFormatter(string identifier) { return "{@link " + fixKwd(identifier) + "}"; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we weren't escaping names in C++ links...
Linking to an identifier that is a C++ keyword is a very niche case, but still good to fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does fixKwd deal with a Type#member link?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was one of the bugs I was aware of when writing this PR.
But I went ahead and added a fix to this, so now it should be properly handled.

@InsertCreativityHere InsertCreativityHere marked this pull request as ready for review October 31, 2024 17:59
@InsertCreativityHere InsertCreativityHere requested review from bernardnormier, pepone and externl and removed request for bernardnormier and pepone October 31, 2024 17:59
Copy link
Member

@bernardnormier bernardnormier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good except for the handling of deprecated, which should as per #2970:

a) in the Slice parser, the deprecated metadata no longer inserts "deprecated" in the associated doc-comment, if any. This insertion was incorrect.

b) in the Slice compilers, we no longer generate a language-specific @deprecated doc-comment for servants and servant operations.

#2970 was not comprehensive, and this was by accident.

Slice::Contained::parseComment(const string& text, bool stripMarkup) const
{
if (text.empty())
// Check for deprecated metadata and add it to the comment if necessary.
Copy link
Member

@bernardnormier bernardnormier Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do that. I removed this behavior in #2970, and looking at this PR, it appears I forgot do removed it from Swift and MATLAB. Please fixes these too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fairness, the comment code for Swift and MATLAB was completely separate from all the other comment code, so it's not surprising.

I removed all the includeDeprecated stuff, and am happy to see it go!

CommentPtr parseComment(const std::string& text, bool stripMarkup) const;
CommentPtr parseComment(
std::function<std::string(std::string)> linkFormatter,
bool includeDeprecatedMetadata = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in .cpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Strip HTML markup and javadoc links.
string::size_type pos = 0;
// Strip HTML markup.
pos = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to line 644

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

StringList
Slice::JsVisitor::splitComment(const ContainedPtr& p)
void
Slice::JsVisitor::writeDocCommentFor(const ContainedPtr& p, bool includeDeprecated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includeDeprecated parameter doesn't sound correct to me.

The generation of deprecated doc-comment tags should be totally independent of whether the Contained is deprecated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too closely at this function, but I think this field might still be useful.

To my knowledge, javascript doesn't have a standalone attribute like [Obsolete] or [[deprecated]].
The only way to mark something deprecated is at the doc-comment level, so maybe linters and other tools will check for this doc comment tag?

And there are some things that we don't want to mark as deprecated, like the generated service skeletons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want to add a special JS-only logic like: Container is deprecated but there is no @deprecated doc-comment => generate @deprecated JS doc-comment?

Sounds ok, but definitely deserves a comment.

Then:

And there are some things that we don't want to mark as deprecated, like the generated service skeletons.

Yes, I actually did something similar for C++:

writeDocSummary(Output& out, const ContainedPtr& p, GenerateDeprecated generateDeprecated = GenerateDeprecated::Yes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want to add a special JS-only logic like

No, the inverse of that. I want to add exactly what you described for C++!
I was just justifying why we might want that behavior, because I was unaware it already existed :)

I'll co-opt your code into the parser, and then use it for all the different languages.
But since this PR is already pretty big, I'd prefer to leave that for when I refactor all the deprecated stuff,
since there's like 3 separate issues for cleaning up how deprecated works...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say you define in Slice:

/// The base interface for all Widgets 
["deprecated"] interface Widget { ... }

with a doc-comment but for @deprecated doc-comment tag (which is fine at the Slice level). What do you want to generate for the corresponding proxy in JS and TS?

Since JS/TS doesn't provide a deprecated attribute (as far as I can tell), generating an @deprecated JS/TS doc-comment seems better than not generating anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your statement, about adding that logic for JS, but it's not what related to what I was originally talking about. And is something I'd rather leave to a future PR, since this one is already fairly large.

The includeDeprecated flag is only about omitting the deprecated comment tag.
If you set it false, we generate a full doc-comment, if it's set to true (ie. for servant skeletons) then we don't generate any @deprecated section in the doc comment. That's all it's controlling for here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please create an issue to fix it in a follow-up PR.

{
string comment = c;
string comment = string{c};
string::size_type pos = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a string preferred in this case, so that we can use std::move and avoid the copy when the parameter is an rvalue reference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I switched it to take a string, since it's a simpler type.
But, this helper function is only called from one place, which will never use move semantics.

const string link = "{@link";
pos = 0;
do
// Fix any javadoc using the provided link formatter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this comment be about Slice links, not javadoc links. I guess we have this comment because Slice doc comments are based on javadoc, still this is about parsing the Slice doc comment link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited the comment to talk about Slice instead of javadoc.

Comment on lines 666 to 667
// Fix any javadoc using the provided link formatter.
const string link = "{@link ";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this update break the link if you use \t or \n after {@link. A corner case...

// Fix any javadoc using the provided link formatter.
const string link = "{@link ";
pos = 0;
do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This do/while loop with the inner if checking the same condition, can be refactor into a single while loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, totally right.
Fixed.


StringList result;

string::size_type pos = 0;
pos = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a comment about the lines below. I think we can also slightly simplify it, we don't need special logic for the last line if we are calling trimLines.

We can also update the calls to IceInternal::trim to pass a string_view, trim parameter is now a string_view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment and simplified the way we handled the last line of text.

{
if (text.empty())
// Split the comment's raw text up into lines.
StringList lines = splitComment(_comment, linkFormatter, stripMarkup);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linkFormater is not used here, we can move it to splitComment.

Suggested change
StringList lines = splitComment(_comment, linkFormatter, stripMarkup);
StringList lines = splitComment(_comment, std::move(linkFormatter), stripMarkup);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I guess there's no reason not to!
Fixed.

Comment on lines 276 to 278
/// Returns a doxygen formatted link to the provided Slice identifier.
string cppLinkFormatter(string identifier) { return "{@link " + fixKwd(identifier) + "}"; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does fixKwd deal with a Type#member link?

@@ -65,6 +65,12 @@ namespace Slice

void close();

//
// Check a symbol against any of the Java keywords. If a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the regular 120 line length for new comments, and avoid surrounding them with extra // empty comment lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it.
But this function was just changed from protected to private, and I think it'd be better to open a dedicated PR for fixing all the comments instead of just piecemeal fixing the ones that we come across. Otherwise we're going to have a mix of comments that never gets fixed.

{
StringList lines = splitComment(p);
CommentPtr dc = p->parseComment(false);
// TODO this whole function seems bogus. Why do we manually split the lines and also call `parseComment`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should switch to use the lines provided by the parsed comment and remove the custom split lines logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but there's even more to cleanup/remove around this code.
So, I'm planning on doing it in a separate PR, right after this one gets merged.

There's an endless rabbit-hole of cleanup in the parser, and I need to draw the line somewhere or I'd never get around to opening my PRs : vP


if (l.find(tag) == 0)
string::size_type hashPos = identifier.find("#");
if (hashPos != string::npos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link formatter seems much complicated than others, and the fixIdent here and in others doesn't look totally correct.

I think would be simpler if we have a method to convert a Slice link identifier into a list of Slice identifiers, that can be used by all formatters.

Then the formatter can use the language mapping method for fixing keyworkds, and put back a link from the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the parser to pre-split the link if a # is present.

Copy link
Member

@bernardnormier bernardnormier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll re-review once my initial comments and @pepone comments have been addressed.

@InsertCreativityHere
Copy link
Member Author

I reworked the link formatter.
Now the parser will pre-split the links if a # character is present.
So the linkFormatter now takes two strings (split along the #) instead of one string, which they were responsible for splitting.

It's invalid for a link to have more than one # character. So, this single split is sufficient.

Comment on lines 687 to 690
else
{
memberComponent = "";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this else is not required, memberComponent is already "" in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

Comment on lines 916 to 917
CommentPtr memberDoc = member->parseComment(matlabLinkFormatter, true);
if (memberDoc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is preferable to declare this inside the if with modern C++.

Suggested change
CommentPtr memberDoc = member->parseComment(matlabLinkFormatter, true);
if (memberDoc)
if (CommentPtr memberDoc = member->parseComment(matlabLinkFormatter, true))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!
I also fixed all the other places where I was also splitting the check from the function call.

StringList
Slice::JsVisitor::splitComment(const ContainedPtr& p)
void
Slice::JsVisitor::writeDocCommentFor(const ContainedPtr& p, bool includeDeprecated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please create an issue to fix it in a follow-up PR.

@InsertCreativityHere InsertCreativityHere merged commit f683fe0 into zeroc-ice:main Nov 5, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants