Skip to content

Commit

Permalink
introduce Language DSL v2 (#610)
Browse files Browse the repository at this point in the history
## About DSL v2

This introduces DSL v2 that describes the grammar in terms of AST types,
not EBNF rules:

- The new model types in
[types.rs](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/codegen/language/definition/src/model/types.rs)
are much more accurate/restrictive in describing the semantics of
different productions, which removes the need to do a lot of validation
that we had for earlier grammars.
- The DSL inside a Rust macro invocation in
[definition.rs](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/src/definition.rs)
are automatically validated on every `cargo check`, are formatted by
`rustfmt`, and have definition/references/rename IDE support because of
the backing types emitted in
[emitter.rs](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/codegen/language/definition/src/compiler/emitter.rs).

## Behind the Scenes

The magic happens because of the new `codegen_language_internal_macros`
crate, which reads the model types, and generates implementations for
three things:

1. `Spanned` which rewrites all fields in `Spanned<T>` types that
preserve the input token spans for validatation.
2. `ParseInputTokens` which uses `syn` crate to parse the tokens into
the backing Rust types.
3. `WriteOutputTokens` which serializes the grammar `Spanned` types into
a Rust expression that generates the definition using the original types
(without `Spanned<T>`), so that they can be used by client crates.

## Next Steps

We unfortunately now have three sources of truth for the language, that
we are manually keeping in sync for now:

1. The YAML grammar
([here](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/definition/manifest.yml))
is used for produce the HTML spec.
4. The DSL v1 grammar
([here](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/src/dsl.rs))
is used for produce the parser.
5. The DSL v2 grammar
([here](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/src/definition.rs))
introduced in this PR, and not used in anything yet.

I will start to delete the YAML grammar, and move `codegen_spec` to use
DSL v2, while in parallel starting a discussion about removing DSL v1,
as it is a bigger chunk of work that requires coordinating with other
ongoing parser/AST work.

## Areas of Improvement

Using `serde` to serialize/deserialize was not originally possible,
because its data model does not support token spans, which cannot be
serialized, recreated, or persisted outside the context of macro
invocations. So I moved to use `syn` for parsing for now. It is working
well, although it has a few caveats:

1. I needed to implement a parser, along with custom implementations for
`Rc`/`IndexMap`/`Box` and other data structures, that `serde` handles by
default.
2. The parser is type driven, which means it is strict, and expects
fields to be defined exactly in the same order as the backing Rust
types. `struct X { a: u8, b: u8 }` has to be declared as `X(a = 1, b =
2)`, not `X(b = 2, a = 1)`.

I think I found a solution/workaround to the `serde` data model
limitation, that will let me remove all these extra implementations and
replace it with a `serde` deserializer, but I will look into this in a
later iteration, since it is not blocking us right now.

Additionally, validation can still be tightened in a few places, but
mostly for keeping the DSL lean/readable, not about correctness, so I
will also delay this work for later.
  • Loading branch information
OmarTawfik authored Oct 23, 2023
1 parent 7004bb5 commit f17c0f8
Show file tree
Hide file tree
Showing 84 changed files with 8,269 additions and 93 deletions.
108 changes: 99 additions & 9 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ resolver = "2"
members = [
"crates/codegen/ebnf",
"crates/codegen/grammar",
"crates/codegen/language/definition",
"crates/codegen/language/internal_macros",
"crates/codegen/language/macros",
"crates/codegen/language/tests",
"crates/codegen/parser/generator",
"crates/codegen/parser/runtime",
"crates/codegen/schema",
Expand All @@ -35,6 +39,10 @@ members = [
#
codegen_ebnf = { path = "crates/codegen/ebnf" }
codegen_grammar = { path = "crates/codegen/grammar" }
codegen_language_definition = { path = "crates/codegen/language/definition" }
codegen_language_internal_macros = { path = "crates/codegen/language/internal_macros" }
codegen_language_macros = { path = "crates/codegen/language/macros" }
codegen_language_tests = { path = "crates/codegen/language/tests" }
codegen_parser_generator = { path = "crates/codegen/parser/generator" }
codegen_parser_runtime = { path = "crates/codegen/parser/runtime" }
codegen_schema = { path = "crates/codegen/schema" }
Expand Down Expand Up @@ -86,9 +94,17 @@ serde_yaml = { version = "0.9.19" }
similar-asserts = { version = "1.4.2" }
strum = { version = "0.24.0" }
strum_macros = { version = "0.24.0" }
syn = { version = "2.0.29", features = [
"fold",
"full",
"extra-traits",
"parsing",
"printing",
] }
tera = { version = "1.19.0" }
terminal_size = { version = "0.2.6" }
thiserror = { version = "1.0.40" }
trybuild = { version = "1.0.85" }
toml = { version = "0.7.6" }
typed-arena = { version = "2.0.2" }
url = { version = "2.3.1" }
Expand Down
21 changes: 21 additions & 0 deletions crates/codegen/language/definition/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[package]
name = "codegen_language_definition"
version.workspace = true
rust-version.workspace = true
edition.workspace = true
publish = false

[dependencies]
codegen_language_internal_macros = { workspace = true }
indexmap = { workspace = true }
Inflector = { workspace = true }
infra_utils = { workspace = true }
itertools = { workspace = true }
proc-macro2 = { workspace = true }
quote = { workspace = true }
semver = { workspace = true }
serde = { workspace = true }
strum = { workspace = true }
strum_macros = { workspace = true }
syn = { workspace = true }
thiserror = { workspace = true }
Loading

0 comments on commit f17c0f8

Please sign in to comment.