introduce Language DSL v2 #610

OmarTawfik · 2023-10-16T16:16:09Z

About DSL v2

This introduces DSL v2 that describes the grammar in terms of AST types, not EBNF rules:

The new model types in types.rs are much more accurate/restrictive in describing the semantics of different productions, which removes the need to do a lot of validation that we had for earlier grammars.
The DSL inside a Rust macro invocation in definition.rs are automatically validated on every cargo check, are formatted by rustfmt, and have definition/references/rename IDE support because of the backing types emitted in emitter.rs.

Behind the Scenes

The magic happens because of the new codegen_language_internal_macros crate, which reads the model types, and generates implementations for three things:

Spanned which rewrites all fields in Spanned<T> types that preserve the input token spans for validatation.
ParseInputTokens which uses syn crate to parse the tokens into the backing Rust types.
WriteOutputTokens which serializes the grammar Spanned types into a Rust expression that generates the definition using the original types (without Spanned<T>), so that they can be used by client crates.

Next Steps

We unfortunately now have three sources of truth for the language, that we are manually keeping in sync for now:

The YAML grammar (here) is used for produce the HTML spec.
The DSL v1 grammar (here) is used for produce the parser.
The DSL v2 grammar (here) introduced in this PR, and not used in anything yet.

I will start to delete the YAML grammar, and move codegen_spec to use DSL v2, while in parallel starting a discussion about removing DSL v1, as it is a bigger chunk of work that requires coordinating with other ongoing parser/AST work.

Areas of Improvement

Using serde to serialize/deserialize was not originally possible, because its data model does not support token spans, which cannot be serialized, recreated, or persisted outside the context of macro invocations. So I moved to use syn for parsing for now. It is working well, although it has a few caveats:

I needed to implement a parser, along with custom implementations for Rc/IndexMap/Box and other data structures, that serde handles by default.
The parser is type driven, which means it is strict, and expects fields to be defined exactly in the same order as the backing Rust types. struct X { a: u8, b: u8 } has to be declared as X(a = 1, b = 2), not X(b = 2, a = 1).

I think I found a solution/workaround to the serde data model limitation, that will let me remove all these extra implementations and replace it with a serde deserializer, but I will look into this in a later iteration, since it is not blocking us right now.

Additionally, validation can still be tightened in a few places, but mostly for keeping the DSL lean/readable, not about correctness, so I will also delay this work for later.

changeset-bot · 2023-10-16T16:16:13Z

⚠️ No Changeset found

Latest commit: 17c2a55

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

AntonyBlakey

Seems to be a few drive-by decisions taken that aren't related to the purpose of the PR

AntonyBlakey · 2023-10-17T14:50:09Z