Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter function documentation indentation. #3

Open
amano-kenji opened this issue Jan 30, 2024 · 10 comments
Open

Smarter function documentation indentation. #3

amano-kenji opened this issue Jan 30, 2024 · 10 comments

Comments

@amano-kenji
Copy link

amano-kenji commented Jan 30, 2024

If I press enter at the end of line1

(defn func
  ```
  line1
  ```
  []
  (print "ok"))

I want to get

(defn func
  ```
  line1
  |
  ```
  []
  (print "ok"))

instead of

(defn func
  ```
  line1
|
  ```
  []
  (print "ok"))

If I press enter between ok and ) below

(defn func
  ````
  line1

  ```
  (defn ok)
  ```
  ````
  []
  (print "ok"))

I want to get

(defn func
  ````
  line1

  ```
  (defn ok
    )
  ```
  ````
  []
  (print "ok"))

instead of

(defn func
  ````
  line1

  ```
  (defn ok
)
  ```
  ````
  []
  (print "ok"))
@sogaiu
Copy link
Owner

sogaiu commented Jan 30, 2024

I'll start by summarizing what I found out about Emacs' indentation functionality within docstrings for other lisp-likes.

Most major modes I found for lisp-likes in Emacs don't really seem to provide any support for indenting the content within a docstring. That is, pressing the tab key within a docstring doesn't appear to do anything visible. This doesn't seem too surprising given that the content of a docstring might be considered opaque, free-form, or arbitrary.

The following Lisp-likes had major modes that behaved this way:

  • Common Lisp (common-lisp-mode)
  • Emacs Lisp (elisp-mode)
  • Fennel (fennel-mode - 3rd party mode)
  • Scheme (the language lacks docstrings -- didn't check scheme.el)

So far, the only discovered exception to this is clojure-mode. There is some detection of whether point (the cursor) is currently within a docstring and this can trigger behavior that differs from that of when point is outside of a docstring.

As far as I can tell though, clojure-mode does not handle the second type of case reported in this issue, i.e. that of trying to indent code within docstrings in a manner that respects the language of the code being indented.

@sogaiu
Copy link
Owner

sogaiu commented Jan 30, 2024

Next, I'll share some general thoughts about indentation of text within docstrings.

The issue of indentation within docstrings may be non-trivial because of a few aspects:

  1. Should the behavior be only for docstrings or should it be for all strings? (...and can it be reliably determined whether the current line is within a docstring or just some non-docstring?)
  2. What should the behavior be?

The questions in the first point may have straight-forward answers, though I don't what those might be yet. Detecting whether point is within a string is pretty reliable when using tree-sitter AFAICT. It's not yet clear to me whether one can reliably determine whether a particular string is a docstring though [1]. This remains to be seen.

The second point is far from clear. The docstring content is somewhat markdown-like, but a couple of related problems are:

  1. Markdown itself is not-so-well-defined
  2. Janet's version of what is used in a docstring is not-so-well-defined either

Note, I've investigated this type of issue before in the context of clojure-ts-mode -- see clojure-emacs/clojure-ts-mode#18 for details.

Emacs 30 will apparently have better support for nested parsers [2]. That might be helpful in this kind of situation because we might be able to teach Emacs to have behavior that would work better for markdown-types of docstrings.

However, it's unclear how much that will help due to the vagueness of the flavor of "markdown" used in Janet docstrings. This seems like something where it's unclear how well it will work until it is tried.


[1] Note that a variety of things in Janet may have docstrings.

[2] Not sure if it's this stuff or something else...

@amano-kenji
Copy link
Author

Ask bakpakin to document janet markdown.

@sogaiu
Copy link
Owner

sogaiu commented Jan 30, 2024

It's up to someone else to do that, and I'm not sure how much help it would be to know (at least not in the short-to-medium term). Also, it's unclear to me how coherent a description might be...

There is a chance that having documentation may help, but there is no guarantee that an existing tree-sitter grammar will be able to handle the specific format that Janet uses.

There are two grammars for "Markdown" that I'm aware of. AFAICT, one of them has been shied away from (and doesn't look that maintained recently). The other one has this sort of text in its README:

Even though this parser has existed for some while and obvious issues are mostly solved, there are still lots of inaccuarcies in the output. These stem from restricting a complex format such as markdown to the quite restricting tree-sitter parsing rules.

As such it is not recommended to use this parser where correctness is important. The main goal for this parser is to provide syntactical information for syntax highlighting in parsers such as neovim and helix.

It remains to be seen how well this could work for indentation. Would be nice if it did a decent enough job.

@sogaiu
Copy link
Owner

sogaiu commented Jan 30, 2024

Some notes about determining whether a given string is a docstring...

Will try to enumerate which built-in things (e.g. def) can have docstrings [1]:

  • def, def- - (def a "hi" 1)
  • defdyn - (defdyn *err* "Where error printing prints output to.")
  • defmacro, defmacro- - (defmacro as-macro "text" [f & args] (f ;args))
  • defn, defn- - (defn my-func "smile" [] 8)
  • var, var- - (var b "hello" 2)

Note that contrary to what the examples above might indicate, it's a bit more work than one might expect to determine a docstring because other metadata can exist too. See below for details.

Possibly I've missed some things in the above attempt at listing...


One way to categorize these things is to split them into three groups:

  1. defdyn
  2. def, def-, var, var-
  3. defmacro, defmacro-, defn, defn-

For the sole item of group 1 (defdyn), if there are two (or more) arguments, the last string is supposed to be a docstring.

So for:

(defdyn *b* :alpha "hellooooo" :beta "mo" :x)

the docstring is "mo".

For the items in group 2 (e.g. def), if there are three (or more) arguments, the last argument is the value that is being "bound" so-to-speak, and looking back from it, the first string is the docstring.

So for:

(def a :alpha "hello" :beta "smile")

the docstring is "hello".

While for:

(def g :gamma "gday" :epsilon)

the docstring is "gday".

For the items in group 3 (e.g. defn), if there are four (or more) arguments, looking back from the square bracket or paren tuple that defines arguments, the first string is the docstring.

So for:

(defn my-func "smile" "breathe" [] 8)

the docstring is "breathe".

Likewise for:

(defn my-paren-func "duck" "duck" "hippo" (x) (- 1 x))

the docstring is "hippo".


Note that if parsing with tree-sitte-janet-simple, something like:

(def a :alpha "hello" # ho ho ho
  :beta "smile")

will produce a tree like:

(source [0, 0] - [2, 0]
  (par_tup_lit [0, 0] - [1, 16]
    (sym_lit [0, 1] - [0, 4])
    (sym_lit [0, 5] - [0, 6])
    (kwd_lit [0, 7] - [0, 13])
    (str_lit [0, 14] - [0, 21])
    (comment [0, 22] - [0, 32])
    (kwd_lit [1, 2] - [1, 7])
    (str_lit [1, 8] - [1, 15])))

This means that one cannot rely solely on node count. That is, the type of node must be accounted for. Specifically, comment nodes should be ignored. That might be the only type of node that needs to be ignored. Not sure...


[1] Users can define their own macros and these can have docstrings. These constructs will not be considered here. It might be possible to handle them via some extension mechanism, e.g. somewhat like what exists for indentation.

@sogaiu
Copy link
Owner

sogaiu commented Jan 30, 2024

Another idea is to adopt special indentation rules if point is within a multi-line string (or long-string...or buffer...or long-buffer...?), regardless of whether it is a docstring.

Here is an example of a situation that doesn't involve a docstring, yet one might want special indentation behavior -- though note that the content isn't really all that markdown-like, and in general, there is no reason to think it will be, so this might not be a good thing to have enabled by default.

Determining whether a particular string is a multi-line string should be relatively easy with tree-sitter because boundary information (line and column numbers for the starts and ends of nodes) is available per node.

For example, for a file named multi-line-string.janet with content:

"hello
there"

a corresponding parse tree might be like:

(source [0, 0] - [2, 0]
  (str_lit [0, 0] - [1, 6]))

Note the values [0, 0] and [1, 6]. The first 0 and the 1 represent row (or line) values. If these differ, then it's a multi-line string.

Similarly for a file named multi-line-long-string.janet with content:

``
hello
there
``

a corresponding parse tree might be like:

(source [0, 0] - [4, 0]
  (long_str_lit [0, 0] - [3, 2]))

Here the start / end row / column info is [0, 0] and [3, 2], and 0 is not the same as 3, so it's a multi-line construct.


Within a string, if point is on a line without the opening or closing delimiter, alternate indentation behavior could be adopted. What's not so clear is what to do for the boundary cases of point being on a line within a string literal, but that line happens to have an opening or closing delimiter (for the string in question) on it too.

May be the opening delimiter case is a matter of indenting as if there were only an opening delimiter. Not so sure about the closing delimiter case.

Some example code below for illustrative purposes:

(defn my-fn
  ``Start of text
  Some intermediate text
 Moar text``
  [x]
  (+ x 8))

Here the lines in question are:

  ``Start of text

and:

 Moar text``

Perhaps for the closing delimiter case, indenting the first non-whitespace character (in this example M) to line up with the left-most character of the opening delimiter (in this example the backtick that is beneath the e of defn) makes sense. So in this case one would end up with:

(defn my-fn
  ``Start of text
  Some intermediate text
  Moar text``
  [x]
  (+ x 8))

Alternatively, one could just punt for lines that have either the opening or closing delimiter on them and only behave differently for "interior" lines.


On an aesthetic side note, though line count would be greater, I'd prefer to have written the code like:

(defn my-fn
  ``
  Start of text
  Some intermediate text
  Moar text
  ``
  [x]
  (+ x 8))

@sogaiu
Copy link
Owner

sogaiu commented Jan 30, 2024

Here is some initial code (not perfect, but possibly good enough for most cases [1]) to detect whether a multiline string (or long-string) is a docstring for "standard" Janet defining-ish things:

(defun janet-ts-node-is-multiline (node)
  "Check whether NODE is multiline."
  (let* ((n-start (treesit-node-start node))
         (start-line (line-number-at-pos n-start))
         (n-end (treesit-node-end node))
         (end-line (line-number-at-pos n-end)))
    (not (= start-line end-line))))

(defun janet-ts-in-multiline-docstring-p ()
  "Check whether point is in a multiline docstring."
  (let* ((curr-node (treesit-node-at (point))))
    (when (and (or (string= "long_str_lit" (treesit-node-type curr-node))
                   (string= "str_lit" (treesit-node-type curr-node)))
               (janet-ts-node-is-multiline curr-node))
      (when-let* ((parent-node (treesit-node-parent curr-node))
                  (parent-type (treesit-node-type parent-node))
                  (head-node (treesit-node-child parent-node 0 :named))
                  (head-type (treesit-node-type head-node))
                  (head-name (treesit-node-text head-node)))
        ;; XXX: this is not strictly correct, but it may be good enough
        (and (string= "par_tup_lit" parent-type)
             (string= "sym_lit" head-type)
             ;; may be order shouldn't matter too much
             (or (member head-name
                         '("def" "defn" "defmacro" "var"
                           "def-" "defn-" "defmacro-" "var-"
                           "defdyn"))))))))

[1] Famous last words...though possibly this trade-off can turn out to be ok.

@amano-kenji
Copy link
Author

treesitter doesn't help you do this?

@sogaiu
Copy link
Owner

sogaiu commented Jan 31, 2024

The code above uses tree-sitter.

@amano-kenji
Copy link
Author

Those are overwhelming details, but I appreciate your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants