-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rudimentary name mangling support #4267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm glad to see the start on mangling! There are some bits that make sense to me, some questions about how mangling should really work.
For documentation, my sense would be that it's a design choice. If you'd like to write a proposal for that (with alternatives, etc), let me know and I'll help walk you through that. Otherwise, maybe just create a Google Doc explaining it for now? Ideally I'd say to update toolchain/docs/lower.md
, but unfortunately #4242 is still in review.
} | ||
std::string result; | ||
llvm::raw_string_ostream os(result); | ||
os << "_C"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was kind of hoping @chandlerc would review and handle this in particular. I know we'd discussed _C
as an option, but actually, considering swift mangling uses $s
, should we consider a similar $c
? I think one of the concerns with _C
might've been too high a risk of conflicting with a C-based name, but maybe using a special character in mangling is a good way to avoid that problem entirely? Or maybe c.
if we think we can use .
in names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to other ideas - The "_" form seems to be dominant, though (LLVM's demangler supports _Z for C++, _R for Rust, and _D for the D programming language).
Just looking at LLVM's demangler makes the fair point that a .
prefix would be problematic (not that that's what you were suggesting) - because of the leading .
in ELF symbol names, so the demangler strips those off by default.
Work backwards from _Z and use _Y (or perhaps _X, because X is cool)?
Though I'm really OK with whatever everyone else wants - and wouldn't totally object to using something following swift's pattern. So if $c is best for you, happy to switch to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can separate this out if you want to start with _C as probably good.
I was actually thinking further, and if C.
or C$
were an option (upper or lower case), particularly using a non-identifier character as separators instead of prefix, it might both (a) make the name easy to read and (b) offer a name that couldn't conflict with a C name.
But I'm realizing the difference between _C
and $c
isn't actually that big for us: we should be suffixing all mangled names with the package scope (e.g., _CFoo.Main
), which should already avoid name conflicts with C names if we're using .
separators. So if you don't think C<char>
is a good choice, let's stick with _C
for cross-language consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - sorry for the drip feed of responses earlier. Working on my github review workflow some more.
I think I've addressed all the comments and things are a bit tidier now.
} | ||
std::string result; | ||
llvm::raw_string_ostream os(result); | ||
os << "_C"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to other ideas - The "_" form seems to be dominant, though (LLVM's demangler supports _Z for C++, _R for Rust, and _D for the D programming language).
Just looking at LLVM's demangler makes the fair point that a .
prefix would be problematic (not that that's what you were suggesting) - because of the leading .
in ELF symbol names, so the demangler strips those off by default.
Work backwards from _Z and use _Y (or perhaps _X, because X is cool)?
Though I'm really OK with whatever everyone else wants - and wouldn't totally object to using something following swift's pattern. So if $c is best for you, happy to switch to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the approach is looking good. I did realize we need to handle the package differently, but I'm fine splitting that out as a TODO if you'd rather just finish here. I think the necessary changes won't significantly affect the current logic.
BTW, might be worth adding a brief TODO about the generic/impl mangling you note in the PR description, but I'll leave that to you.
And toolchain/docs/lower.md is now merged for edits, but that can also be done separately.
} | ||
std::string result; | ||
llvm::raw_string_ostream os(result); | ||
os << "_C"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can separate this out if you want to start with _C as probably good.
I was actually thinking further, and if C.
or C$
were an option (upper or lower case), particularly using a non-identifier character as separators instead of prefix, it might both (a) make the name easy to read and (b) offer a name that couldn't conflict with a C name.
But I'm realizing the difference between _C
and $c
isn't actually that big for us: we should be suffixing all mangled names with the package scope (e.g., _CFoo.Main
), which should already avoid name conflicts with C names if we're using .
separators. So if you don't think C<char>
is a good choice, let's stick with _C
for cross-language consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the approach is looking good. I did realize we need to handle the package differently, but I'm fine splitting that out as a TODO if you'd rather just finish here. I think the necessary changes won't significantly affect the current logic.
Ended up adding package support along the way.
BTW, might be worth adding a brief TODO about the generic/impl mangling you note in the PR description, but I'll leave that to you.
Ended up implementing the impl qualifier mangling in some of the review iterations, so that's done. I'll leave a FIXME for generic support.
And toolchain/docs/lower.md is now merged for edits, but that can also be done separately.
Mentioned that in the description, to be done in a follow-up.
(with the fix for the mangling ending too early with package support and impls - now |
@@ -31,20 +31,20 @@ fn InstanceAccess(a: A) -> i32 { | |||
// CHECK:STDOUT: ; ModuleID = 'extend_impl.carbon' | |||
// CHECK:STDOUT: source_filename = "extend_impl.carbon" | |||
// CHECK:STDOUT: | |||
// CHECK:STDOUT: define i32 @F() !dbg !4 { | |||
// CHECK:STDOUT: define i32 @"_CF.A.Main:I.Main.A.Main"() !dbg !4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mangling looks a little strange -- specifically repeating the .A.Main
part twice. It also looks like the trailing portion might be ambiguous. How should we read this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yep - that's a bug.
Fixed.
The issue was that it was mangling in the parent name scopes, including the parent name scope of wherever the impl was defined - which, if I understand correctly, isn't relevant. (an impl is uniquely defined by its interface and constraint, regardless of the scope in which that impl is defined - ie: two impls of the same interface+constraint, written in different namespaces, are invalid/do collide. (well, I guess there's no syntax to define an impl in a namespace, so it only happens to collide between, I guess, two files both containing an impl or a single file containing an inline impl inside a class definition, and one outside, which fails with:
basic.carbon:11:1: ERROR: Redefinition of `impl B as A`.
impl B as A {
^~~~~~~~~~~~~
basic.carbon:6:3: Previous definition was here.
extend impl as A {
^~~~~~~~~~~~~~~~~~
Fixed this by adding the lexical parent scope after the switch, and having the impl handling continue
resulting in the parent scope addition being skipped.
Now there shouldn't be any trailing portion after an impl, it should be <function name>.<self name>:<constraint name>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good now. For zygoloid's comment, if you can address it easily here feel free and I can take another look, but I think it'd also make sense to merge this and work in a separate PR since it'll have more narrow impact. (note, when impl.self_id
is a class, 3 scopes get added to names_to_render, maybe 1 should be cut)
Very much messed this PR state up... trying to figure out how to fix it :/ |
In preparation for adding broader name mangling support.
_C was chosen without special care or consideration - that C++ didn't choose _C for the itanium mangling probably says it's not the best letter to pick, but we don't have any particular detail about how different languages have chosen their mangling prefixes or other conflict avoidance.
* mangle the fully qualified name of the class doing the implementing * do not mangle the qualifiers of the interfaces yet
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Laying the foundation for removing the recursion from the mangler. (also fixes the nested switch type, which didn't compile)
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
This had been causing mangling to stop for impls, after reaching the package level of the first part of the name - missing the second part of the name.
Co-authored-by: Jon Ross-Perkins <[email protected]>
only the constraint and interface uniquely identify an impl
291c0b5
to
24ba431
Compare
Co-authored-by: Carbon Infra Bot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM, just pointing out a couple small things.
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Jon Ross-Perkins <[email protected]>
Co-authored-by: Carbon Infra Bot <[email protected]>
This seems to be enough to avoid naming collisions for functions in any of the current test cases (verified by asserting that the name of the `llvm::Function` matches the name passed to create it - not triggering LLVM's numbering that happens when names collide) It currently implements mangling for namespace scopes, class scopes, and impls. Nothing generic is mangled yet - haven't looked at how that works, though evidently it's not covered by existing testing, I guess. Follow-up change will document the current mangling algorithm in `toolchain/docs/lower.md` --------- Co-authored-by: Jon Ross-Perkins <[email protected]> Co-authored-by: Carbon Infra Bot <[email protected]>
This seems to be enough to avoid naming collisions for functions in any of the current test cases (verified by asserting that the name of the
llvm::Function
matches the name passed to create it - not triggering LLVM's numbering that happens when names collide)It currently implements mangling for namespace scopes, class scopes, and impls.
Nothing generic is mangled yet - haven't looked at how that works, though evidently it's not covered by existing testing, I guess.
Follow-up change will document the current mangling algorithm in
toolchain/docs/lower.md