-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce Language DSL v2 #610
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a few drive-by decisions taken that aren't related to the purpose of the PR
Struct( | ||
name = PragmaDirective, | ||
fields = ( | ||
pragma_keyword = Required(Terminal([PragmaKeyword])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems overly verbose - can `Required~ be the default, and can single element arrays be replaced by list the element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to have these fine controls once we move to serde, using things like #[serde(flatten)]
and #[serde(default)]
, but for now, I suggest keeping the hand-written parser simple/straight-forward. It is expensive to add more capabilities for it only to throw it away later.
scanner = Choice([ | ||
Atom("0"), | ||
Sequence([ | ||
Range(inclusive_start = '1', exclusive_end = '9'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These exclusive_end
s should all be inclusive_end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed!
Optional(reference = NonTerminal(EventParametersDeclaration)), | ||
anonymous_keyword = | ||
Optional(reference = Terminal([AnonymousKeyword])), | ||
semicolon = Required(Terminal([Semicolon])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this to be a TerminatedBy
construction so that the error recovery can pick it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added error_recovery
field.
close_paren = Required(Terminal([CloseParen])) | ||
) | ||
), | ||
Separated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SeparatedBy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using Separated
since it describes the item being separated (higher importance to describe/annotate). The term SeparatedBy
refers to the separator itself (lower importance). This is also reflected in the backing rust type SeparatedItem
, instead of SeparatedByItem
.
For example in Parameters
, the focus here is on the Parameter
being separated, not the Comma
. This is the thing we would typically extract/use in further passes.
body = Required(NonTerminal(FunctionBody)) | ||
) | ||
), | ||
Struct( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this to be an explicit DelimitedBy
construct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added error_recovery
field.
0a4c5c6
to
f11542a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left some nitpicks.
I'm slightly nervous that we're de facto writing compiler/analysis for the compiler introducing even more machinery for the language definition itself (despite knowing it's a good decision). I think that our immediate priority right now should be to streamline this ASAP as this hurts code navigation (which language definition framework is this file part of, again?) and increases the required mental capacity to move around and work inside the code.
} | ||
|
||
fn collect_top_level_items(&mut self) { | ||
for item in self.language.clone().items() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clone
is redundant - we should probably enable Clippy soon to catch stuff like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it is not, as analysis
is borrowed as a mutable otherwise. However, that should be removed in the next iteration with better validation that doesn't need passing these around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it was redundant in f11542a and now we can remove it with
diff --git a/crates/codegen/language/definition/src/compiler/analysis/definitions.rs b/crates/codegen/language/definition/src/compiler/analysis/definitions.rs
index 430d8587..6c30e6fc 100644
--- a/crates/codegen/language/definition/src/compiler/analysis/definitions.rs
+++ b/crates/codegen/language/definition/src/compiler/analysis/definitions.rs
@@ -18,7 +18,7 @@ pub fn analyze_definitions(analysis: &mut Analysis) {
}
fn collect_top_level_items(analysis: &mut Analysis) {
- for item in analysis.language.clone().items() {
+ for item in analysis.language.items() {
let name = get_item_name(item);
let defined_in = calculate_defined_in(analysis, item);
@@ -111,7 +111,7 @@ fn get_item_name(item: &Item) -> &Spanned<Identifier> {
}
}
-fn calculate_defined_in(analysis: &mut Analysis, item: &Item) -> VersionSet {
+fn calculate_defined_in(analysis: &Analysis, item: &Item) -> VersionSet {
return match item {
Item::Struct { item } => VersionSet::from_range(calculate_enablement(
analysis,
@@ -157,7 +157,7 @@ fn calculate_defined_in(analysis: &mut Analysis, item: &Item) -> VersionSet {
}
fn calculate_enablement(
- analysis: &mut Analysis,
+ analysis: &Analysis,
enabled_in: &Option<Spanned<Version>>,
disabled_in: &Option<Spanned<Version>>,
) -> VersionRange {
but it's a miniscule thing
crates/codegen/language/definition/src/compiler/analysis/definitions.rs
Outdated
Show resolved
Hide resolved
} | ||
|
||
self.metadata.insert( | ||
(**name).to_owned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
Deref can be convenient but sometimes hurts legibility - this calls potentially costly to_owned
and it took me a second to decipher that it derefs to Identifier
(given that Identifier also has another Deref
impl) and clones it. Maybe it'd be better to have an Spanned::<T>::{value, span}
and call name.(as_?)value().to_owned()
here?
Using name.value()
above would also make it more obvious that the self.metadata
doesn't care about spans but inner values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I was under the impression that Deref
here is more idiomatic, but happy to change it. I'm not a big fan of &**
everywhere either.
Can do that in the next iteration if you don't mind, since this is only specific to this crate, and shouldn't affect other work in other areas/crates.
} | ||
} | ||
|
||
fn check_reference( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, peppered here along with ReferenceFilter::apply
took some time for me to decipher - could we add more comments on why it's like this way? Maybe it'd improve legibility if we combine the "get reference by ident for a given namespace/kind or error out"?
@@ -0,0 +1,503 @@ | |||
use crate::{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a module documentation or at least for analyze_references, since the check_*
names are vague and we also account for versions it seems in addition to just checking if the references are valid. Maybe implementing a folder/visitor would be simpler since it's mostly calling update_enablement
and check_reference
while visiting the items?
crates/codegen/language/definition/src/internals/spanned/mod.rs
Outdated
Show resolved
Hide resolved
return self.ranges.is_empty(); | ||
} | ||
|
||
pub fn add(&mut self, range: &VersionRange) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this is mutable but the difference is immutable but creates a new value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not always. It sometimes serves as an accumulator, instead of cloning self
when it is not changed.
|
||
/// A wrapper type to make sure the DSL token is written as an identifier instead of a string literal. | ||
#[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)] | ||
pub struct Identifier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to risk a dumb question - could we use proc_macro2/syn::Ident here directly, since we only seem to use it with the macro and we almost always want it with a span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid question!
We cannot use it instead of Identifier
, as we cannot implement other traits for this external type.
We cannot wrap it inside Identifier
either, as we would have to special case it from internal_macros
in order for it to not be wrapped unnecissarily in Spanned<>
again.
Also, it is not serde
serializable (Span
is not), at least not without workarounds, which we need for the entire model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks! Maybe it'd be worthwhile to add a comment why we're not using proc_macro2::Ident
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like for us to address #610 (comment), otherwise it's good to go and we can tackle other comments in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks a lot for the work on the next version of the DSL! Let's merge this once the CI is green 🎉
Makes the grammar a lot simpler, since it is no longer needed after enums were simplified in NomicFoundation#610
Makes the grammar a lot simpler, since it is no longer needed after enums were simplified in NomicFoundation#610
Makes the grammar a lot simpler, since it is no longer needed after enums were simplified in NomicFoundation#610
Since it is no longer needed after enums were simplified in NomicFoundation#610
Since it is no longer needed after enums were simplified in NomicFoundation#610. We can just use the `reference` item name directly.
Since it is no longer needed after enums were simplified in #610. We can just use the referenced item name directly.
About DSL v2
This introduces DSL v2 that describes the grammar in terms of AST types, not EBNF rules:
cargo check
, are formatted byrustfmt
, and have definition/references/rename IDE support because of the backing types emitted in emitter.rs.Behind the Scenes
The magic happens because of the new
codegen_language_internal_macros
crate, which reads the model types, and generates implementations for three things:Spanned
which rewrites all fields inSpanned<T>
types that preserve the input token spans for validatation.ParseInputTokens
which usessyn
crate to parse the tokens into the backing Rust types.WriteOutputTokens
which serializes the grammarSpanned
types into a Rust expression that generates the definition using the original types (withoutSpanned<T>
), so that they can be used by client crates.Next Steps
We unfortunately now have three sources of truth for the language, that we are manually keeping in sync for now:
I will start to delete the YAML grammar, and move
codegen_spec
to use DSL v2, while in parallel starting a discussion about removing DSL v1, as it is a bigger chunk of work that requires coordinating with other ongoing parser/AST work.Areas of Improvement
Using
serde
to serialize/deserialize was not originally possible, because its data model does not support token spans, which cannot be serialized, recreated, or persisted outside the context of macro invocations. So I moved to usesyn
for parsing for now. It is working well, although it has a few caveats:Rc
/IndexMap
/Box
and other data structures, thatserde
handles by default.struct X { a: u8, b: u8 }
has to be declared asX(a = 1, b = 2)
, notX(b = 2, a = 1)
.I think I found a solution/workaround to the
serde
data model limitation, that will let me remove all these extra implementations and replace it with aserde
deserializer, but I will look into this in a later iteration, since it is not blocking us right now.Additionally, validation can still be tightened in a few places, but mostly for keeping the DSL lean/readable, not about correctness, so I will also delay this work for later.