-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smarter function documentation indentation. #3
Comments
I'll start by summarizing what I found out about Emacs' indentation functionality within docstrings for other lisp-likes. Most major modes I found for lisp-likes in Emacs don't really seem to provide any support for indenting the content within a docstring. That is, pressing the tab key within a docstring doesn't appear to do anything visible. This doesn't seem too surprising given that the content of a docstring might be considered opaque, free-form, or arbitrary. The following Lisp-likes had major modes that behaved this way:
So far, the only discovered exception to this is clojure-mode. There is some detection of whether point (the cursor) is currently within a docstring and this can trigger behavior that differs from that of when point is outside of a docstring. As far as I can tell though, clojure-mode does not handle the second type of case reported in this issue, i.e. that of trying to indent code within docstrings in a manner that respects the language of the code being indented. |
Next, I'll share some general thoughts about indentation of text within docstrings. The issue of indentation within docstrings may be non-trivial because of a few aspects:
The questions in the first point may have straight-forward answers, though I don't what those might be yet. Detecting whether point is within a string is pretty reliable when using tree-sitter AFAICT. It's not yet clear to me whether one can reliably determine whether a particular string is a docstring though [1]. This remains to be seen. The second point is far from clear. The docstring content is somewhat markdown-like, but a couple of related problems are:
Note, I've investigated this type of issue before in the context of clojure-ts-mode -- see clojure-emacs/clojure-ts-mode#18 for details. Emacs 30 will apparently have better support for nested parsers [2]. That might be helpful in this kind of situation because we might be able to teach Emacs to have behavior that would work better for markdown-types of docstrings. However, it's unclear how much that will help due to the vagueness of the flavor of "markdown" used in Janet docstrings. This seems like something where it's unclear how well it will work until it is tried. [1] Note that a variety of things in Janet may have docstrings. [2] Not sure if it's this stuff or something else... |
Ask bakpakin to document janet markdown. |
It's up to someone else to do that, and I'm not sure how much help it would be to know (at least not in the short-to-medium term). Also, it's unclear to me how coherent a description might be... There is a chance that having documentation may help, but there is no guarantee that an existing tree-sitter grammar will be able to handle the specific format that Janet uses. There are two grammars for "Markdown" that I'm aware of. AFAICT, one of them has been shied away from (and doesn't look that maintained recently). The other one has this sort of text in its README:
It remains to be seen how well this could work for indentation. Would be nice if it did a decent enough job. |
Some notes about determining whether a given string is a docstring... Will try to enumerate which built-in things (e.g.
Note that contrary to what the examples above might indicate, it's a bit more work than one might expect to determine a docstring because other metadata can exist too. See below for details. Possibly I've missed some things in the above attempt at listing... One way to categorize these things is to split them into three groups:
For the sole item of group 1 ( So for:
the docstring is "mo". For the items in group 2 (e.g. So for:
the docstring is "hello". While for:
the docstring is "gday". For the items in group 3 (e.g. So for:
the docstring is "breathe". Likewise for:
the docstring is "hippo". Note that if parsing with tree-sitte-janet-simple, something like:
will produce a tree like:
This means that one cannot rely solely on node count. That is, the type of node must be accounted for. Specifically, [1] Users can define their own macros and these can have docstrings. These constructs will not be considered here. It might be possible to handle them via some extension mechanism, e.g. somewhat like what exists for indentation. |
Another idea is to adopt special indentation rules if point is within a multi-line string (or long-string...or buffer...or long-buffer...?), regardless of whether it is a docstring. Here is an example of a situation that doesn't involve a docstring, yet one might want special indentation behavior -- though note that the content isn't really all that markdown-like, and in general, there is no reason to think it will be, so this might not be a good thing to have enabled by default. Determining whether a particular string is a multi-line string should be relatively easy with tree-sitter because boundary information (line and column numbers for the starts and ends of nodes) is available per node. For example, for a file named "hello
there" a corresponding parse tree might be like:
Note the values Similarly for a file named ``
hello
there
`` a corresponding parse tree might be like:
Here the start / end row / column info is Within a string, if point is on a line without the opening or closing delimiter, alternate indentation behavior could be adopted. What's not so clear is what to do for the boundary cases of point being on a line within a string literal, but that line happens to have an opening or closing delimiter (for the string in question) on it too. May be the opening delimiter case is a matter of indenting as if there were only an opening delimiter. Not so sure about the closing delimiter case. Some example code below for illustrative purposes: (defn my-fn
``Start of text
Some intermediate text
Moar text``
[x]
(+ x 8)) Here the lines in question are:
and:
Perhaps for the closing delimiter case, indenting the first non-whitespace character (in this example (defn my-fn
``Start of text
Some intermediate text
Moar text``
[x]
(+ x 8)) Alternatively, one could just punt for lines that have either the opening or closing delimiter on them and only behave differently for "interior" lines. On an aesthetic side note, though line count would be greater, I'd prefer to have written the code like: (defn my-fn
``
Start of text
Some intermediate text
Moar text
``
[x]
(+ x 8)) |
Here is some initial code (not perfect, but possibly good enough for most cases [1]) to detect whether a multiline string (or long-string) is a docstring for "standard" Janet defining-ish things: (defun janet-ts-node-is-multiline (node)
"Check whether NODE is multiline."
(let* ((n-start (treesit-node-start node))
(start-line (line-number-at-pos n-start))
(n-end (treesit-node-end node))
(end-line (line-number-at-pos n-end)))
(not (= start-line end-line))))
(defun janet-ts-in-multiline-docstring-p ()
"Check whether point is in a multiline docstring."
(let* ((curr-node (treesit-node-at (point))))
(when (and (or (string= "long_str_lit" (treesit-node-type curr-node))
(string= "str_lit" (treesit-node-type curr-node)))
(janet-ts-node-is-multiline curr-node))
(when-let* ((parent-node (treesit-node-parent curr-node))
(parent-type (treesit-node-type parent-node))
(head-node (treesit-node-child parent-node 0 :named))
(head-type (treesit-node-type head-node))
(head-name (treesit-node-text head-node)))
;; XXX: this is not strictly correct, but it may be good enough
(and (string= "par_tup_lit" parent-type)
(string= "sym_lit" head-type)
;; may be order shouldn't matter too much
(or (member head-name
'("def" "defn" "defmacro" "var"
"def-" "defn-" "defmacro-" "var-"
"defdyn")))))))) [1] Famous last words...though possibly this trade-off can turn out to be ok. |
treesitter doesn't help you do this? |
The code above uses tree-sitter. |
Those are overwhelming details, but I appreciate your time. |
If I press enter at the end of line1
I want to get
instead of
If I press enter between
ok
and)
belowI want to get
instead of
The text was updated successfully, but these errors were encountered: