Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## About DSL v2 This introduces DSL v2 that describes the grammar in terms of AST types, not EBNF rules: - The new model types in [types.rs](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/codegen/language/definition/src/model/types.rs) are much more accurate/restrictive in describing the semantics of different productions, which removes the need to do a lot of validation that we had for earlier grammars. - The DSL inside a Rust macro invocation in [definition.rs](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/src/definition.rs) are automatically validated on every `cargo check`, are formatted by `rustfmt`, and have definition/references/rename IDE support because of the backing types emitted in [emitter.rs](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/codegen/language/definition/src/compiler/emitter.rs). ## Behind the Scenes The magic happens because of the new `codegen_language_internal_macros` crate, which reads the model types, and generates implementations for three things: 1. `Spanned` which rewrites all fields in `Spanned<T>` types that preserve the input token spans for validatation. 2. `ParseInputTokens` which uses `syn` crate to parse the tokens into the backing Rust types. 3. `WriteOutputTokens` which serializes the grammar `Spanned` types into a Rust expression that generates the definition using the original types (without `Spanned<T>`), so that they can be used by client crates. ## Next Steps We unfortunately now have three sources of truth for the language, that we are manually keeping in sync for now: 1. The YAML grammar ([here](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/definition/manifest.yml)) is used for produce the HTML spec. 4. The DSL v1 grammar ([here](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/src/dsl.rs)) is used for produce the parser. 5. The DSL v2 grammar ([here](https://github.com/OmarTawfik-forks/slang/blob/dsl-v2/crates/solidity/inputs/language/src/definition.rs)) introduced in this PR, and not used in anything yet. I will start to delete the YAML grammar, and move `codegen_spec` to use DSL v2, while in parallel starting a discussion about removing DSL v1, as it is a bigger chunk of work that requires coordinating with other ongoing parser/AST work. ## Areas of Improvement Using `serde` to serialize/deserialize was not originally possible, because its data model does not support token spans, which cannot be serialized, recreated, or persisted outside the context of macro invocations. So I moved to use `syn` for parsing for now. It is working well, although it has a few caveats: 1. I needed to implement a parser, along with custom implementations for `Rc`/`IndexMap`/`Box` and other data structures, that `serde` handles by default. 2. The parser is type driven, which means it is strict, and expects fields to be defined exactly in the same order as the backing Rust types. `struct X { a: u8, b: u8 }` has to be declared as `X(a = 1, b = 2)`, not `X(b = 2, a = 1)`. I think I found a solution/workaround to the `serde` data model limitation, that will let me remove all these extra implementations and replace it with a `serde` deserializer, but I will look into this in a later iteration, since it is not blocking us right now. Additionally, validation can still be tightened in a few places, but mostly for keeping the DSL lean/readable, not about correctness, so I will also delay this work for later.
- Loading branch information