-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a Language
type to provide consistent language information of strings.
#1728
Comments
I just want to note that PyCharm allows you to tell it that any string literal is of any language it supports and basically supports all the IDE features for that language. You do have to manually tell it this though. edit: You can also add comments like This isn't a vote for or against a |
Instead of an actual type, any reason not to do |
This topic has been explored in some depth within this pylance discussion. To me, this doesn't seem like something that necessitates an extension to the type system. There are many ways that can be specified using existing type system or language constructs. |
@dmwyatt Using comments to indicate language is fine. However, using Automatically appliedWhile comments are limited to a single string and must be manually marked, types have the advantage of being automatically applied to any code that uses functions or other typed elements. For example, any code that uses the following def execute(code: Language["js"]):
... No collision when multiple strings overlapConsider a function like this def build_html_from_markdown(article: str, style: str, script: str) -> str:
"""`article` takes Markdown, `style` takes CSS, `script` takes JavaScript."""
... This function uses a bunch of different languages at once, thus there's a bit of ambiguity when using it. # language=???
build_html_from_markdown("# hello, world!", "h1 { background: pink; }", "alert('welcome!')") This could be fixed by modifying the function to span multiple lines, but this effectively demonstrates that a more fundamental solution is needed. Using the def build(
article: Language["markdown"],
style: Language["css"],
script: Language["js"],
) -> Language["html"]:
...
build("# hello, world!", "h1 { background: pink; }", "alert('welcome!')") Matches the context in which the type hint is usedSpecifying the language as a type reduces the need to express information about it in other ways, and allows users to infer what the parameters of a function require from the type hint before reading the documentation. And it also allows developers to write clearer code, since the type hint expresses the format, leaving the variable or parameter names to express other, more important information. I think this is in line with why type hints were introduced. Provides a single, consistent, and clear way to represent the language of a string@erictraut While there are many different ways to tell what language a string contains, there hasn't been a single, universally followed method. But it's important enough that we need a formal, documented way of doing it, and I think the Alternatives
|
Yes, I agree that the comments implemented by PyCharm are not fool-proof. I was merely pointing towards prior art. I think the underlying thing you're reaching for might be the lack of a standardized way of annotating languages in strings that is accepted across all IDEs and editors? Maybe types are the best way to get to that point...I don't know. I certainly like the idea. The more generalized issue is the lack of a way to specify the structure or type of the data in a string. One can imagine there are many types of data that can be contained in a string, and programming languages are just one of them. I'm very sympathetic to all of these ideas:
|
Currently, Python has no consistent way to indicate when a programming language is represented as a string that the string follows the syntax of a particular programming language.
This means that languages represented as strings cannot be syntax highlighted, resulting in a significant loss of productivity, readability, and an increase in bugs and errors when dealing with other languages as strings.
This article gives an example of the current problem.
Traditional approaches and issues
Typical case
Typically, syntax highlighting is not provided at all because there is no way for the editor to know the language of the string, which leads to several drawbacks.
Batch syntax highlighting of raw strings for regexes in VSCode
VSCode provides simple syntax highlighting for regexes when using raw strings, as shown below.
However, this approach has several drawbacks. First of all, it doesn't generalize to languages other than regexes. Also, since raw strings aren't just for regexes, it creates a visual distraction for people who want to use raw strings for non-regex reasons, such as Windows paths.
Below is an example of syntax highlighting for regex applied to Windows path, which actually reduces readability.
Language
andLiteralLanguage
Language
is a subtype ofstr
that indicates that the string represents a specific language.LiteralLanguage
is a subtype ofLiteralString
, and is used in the same way asLanguage
.Language
takes a single type argument, and in its place you put the name of the language, for example,Language["html"]
.Editors should provide basic syntax highlighting for string literals set to types
Language
orLiteralLanguage
. Consider code blocks in Markdown.The
Language
type may also be implied by the type of the parameter.Errors
It is difficult to set the
Language
type to remain aLanguage
type after an operation, as this would complicate the implementation and make it difficult to provide a clear criterion for the type.For example, does
Language["A"] + Language["A"]
always result inLanguage["A"]
? Of course it often does, but it's very hard to generalize.The case of
Language["A"] + Language["B"]
is also tricky. Should we catch the type asLanguage["A"]
, or should it beLanguage["B"]
? And what aboutLanguage["A"].strip()
? It's hard to maintain consistency or a single standard for these operations. Therefore,Language
should be considered more as a feature for annotation than for complex static type checking.Therefore, a type checker should accept the target of a given
Language
type as legitimate if it is a string, regardless of its contents, and an editor should not raise an error if it fails to parse.Developers should also not expect that when they accept a value annotated with `Language' that the string is fully valid code that will pass the language's compiler.
Conversely,
Language
can be used for code that is "reasonably close" to the appearance of the language. Developers should consider whether syntax highlighting helps or hinders users when deciding whether to useLanguage
or just usestr
for languages that are not exactly the same as the target language.Post-operation type
The type
Language
should be treated asstr
when computed, andLiteralLanguage
should be treated asLiteralString
when computed.BytesLanguage
?ByteLanguage
is the bytes version ofLanguage
. We should think about whether we need this type.However, there is no type called
LiteralBytes
, so at leastLiteralBytesLanguage
can't exist.Language names
The language identifier in
Language
must be lowercase, e.g.Language["python"]
instead ofLanguage["Python"]
.For language names, it seems like a good idea to use what is used for code blocks in Markdown that developers are familiar with, but the exact definition of this is up to the editor.
Supported languages
A list of supported languages is beyond the scope of this documentation and should be up to each editor's implementation. However, editors should be able to provide basic syntax highlighting for common languages like Python, HTML, SQL, etc.
The text was updated successfully, but these errors were encountered: