Slightly structured plain text to enable lightweight, language-agnostic time-travel debugging #14

akkartik · 2022-06-22T05:21:19Z

akkartik
Jun 22, 2022

2 weeks before jam
Status: still welcoming collaborators. Leave a comment if you're interested.
Platform and tools: I'm thinking Lua and LÖVE. It's fast enough, it's been easy to get up and running and the docs are excellent.

Debugging tools today tend to be language-specific, require lots of knowledge of the internals of a language runtime, and therefore the tools tend to be quite complex even without including the ability to go backwards in time. The focus is on debugging heavyweight programs with low overhead, and the predominant paradigm is to slow time down and permit stopping at specific lines (breakpoints). Time-travel debugging remains esoteric, as do user-configurable views of datatypes.

A complementary approach is debug by print. It requires no tooling support and naturally lays out a program run over time. However, you have to choose to either clutter up your codebase with debug prints (maybe disabled) or delete them and re-add them every time, which makes it harder to disseminate debugging tricks.

In this project I'd like to explore a way to address the drawbacks of debug by print for a simple, extensible, language-agnostic debug framework that renders logs over time using user-definable views of datatypes while putting up with potentially significant overheads in run time and disk space.

Experience

In brief, the experience will look like this:

Programs in source files will contain metadata for each line after some delimiter.
A text editor will show source files while hiding metadata by default, but allow toggling them on to edit. Lines will implicitly carry their metadata during editor operations like copying and pasting.
A runtime for a single language (Lua?) will use the metadata to decide what to log -- and how -- after each line is executed.
Simple tools will render logs generated by program runs. Here's an example I built for a past project. Most future work will be here (animation?!), but for the jam we might go with something simple like this, albeit with graphics. Or assign statement metadata to multiple tags that can be filtered for.

Code to render datatypes will be completely open to and extensible by the source codebase.

Drawbacks

The overheads will likely be prohibitive for large codebases and long execution runs.
A place to hide stuff in text files can be confusing.
A place to hide stuff in text files can be a source of security issues.
The experience will be klunky since most of the debugger is in the source codebase.

Handmade projects

https://whitebox.handmade.network seems to have a timeline view? C-specific.

Notable projects

gdb allegedly supports time-travel debugging. I kicked the wheels on it 12 years ago and couldn't get it working. It's not clear if there's been any progress since. ddd is a frontend to gdb that supports graphical views of datatypes that are smart enough to traverse pointers. No time-travel, though.
rr is now a commercial tool for time-travel debugging in C. Again, this is a heavyweight tool. Lots of engineering effort devoted to minimizing debug overhead for large applications. It doesn't look like it supports graphical views of datatypes.
Smalltalk and Glamorous Toolkit are excellent at user-configurable views of data. They also seem to have all the tools needed to support debug by print with their notion of transcripts. Programming in Smalltalk is an incomparable experience, but you do have to program in Smalltalk. It's also not exactly lightweight. There's a non-trivial learning curve.

Details for the jam

Simplifying assumptions to manage scope

Metadata will consist of just print/log statements in the source language.
The language interpreter needs to be modified just to execute each line's metadata immediately after executing it. I'm hoping we can avoid messing with interpreter internals by simply replacing the metadata delimiter with the source language's statement separator. In C that would be just ;. Lua has the nice property that the grammar is always unambiguous even without any statement separator, so that we can just delete some tokens using sed.

Open questions

What should happen to metadata when we split a statement across two lines? Consolidate a two-line statement into a single line?
Some language constructs such as for statements naturally span multiple lines. What does it mean to "run" a line containing the for keyword. I think the metadata should trigger on every iteration through the loop. This might be a lot of work, though.
What approach should we use to render graphics during the jam? There's a spectrum of possibilities:
- Check in tools for each codebase with it, allowing debug tools to coevolve with a single codebase and diverge over time from debug tools in other codebases.
- Keep source codebase and debug tools in the same language, and use conventions to import some files from the directory (tree) we run debug tools from. The core tools will be shared across codebases, but they'll also pick up local "idioms" specific to each codebase.
- Source codebase renders to a common format (svg?) that debug tools know about.

martinfouilleul · 2022-06-22T08:42:27Z

martinfouilleul
Jun 22, 2022

Hi Kartik!

That's a cool idea!

I've done something along those lines for my temporal programming environment Quadrant, where metadata about blocks of code are embeded into the compiled bytecode, and the runtime can signal the execution of each block. Currently I'm only using that to highlight blocks in the editor as they are executed, but it would also allow for logging and replay in the future.

It seems you intend to define meaningful "units" of execution on a line-by-line basis. What about statements that are broken into multiple lines for readability, or lines that contain multiple meaningful chunks of execution (e.g. several calls that are part of a larger expression)? Granted, it's also a problem with standard debuggers since they're line based, but since you're intending to use a more structured format, that's maybe something to take into consideration. I also think being able to log blocks of code (with timestamps) can give you a clearer picture of what's going on compared to having to mentally reconstruct that from a trace of individual lines (but the viewer can also help with that).

By the way, what are your thoughts about keeping it plain-text+markers vs. having a more structured format? Since it would need a special-purpose editor to be really useful, it seems to me that plain-text compatibility is not necessarily a strong argument, but maybe there are other reasons for you to prefer a plain-text approach.

How would the program provide a way to render datatypes? Would data be serialized into the log, and the client provides callbacks to the debugger to render various data types from the logs? I'm not sure I have a good picture of what you have in mind here, but it does seem interesting.

For long execution runs, maybe you could just have the log be a ring buffer, and the time travel is limited to some amout of time in the past, configurable at startup.

Anyway, I'm very interested in seeing what you'll come up with!

Cheers,

Martin

0 replies

akkartik · 2022-06-22T15:29:29Z

akkartik
Jun 22, 2022
Author

Thanks for those great questions! I'm glad to see another mind prepared for this strange idea.

Statements (or top-level expressions) spanning lines is definitely a concern, but one I figured we'd ignore for this initial prototype in the jam. Let me start a list of simplifying assumptions and open questions up top. Here I'm assuming for the jam we'll simply execute each line's metadata immediately after executing it. If a statement spans multiple lines, either put the onus on the programmer to keep the metadata on the final line, or run metadata for all lines the statement was on after running it.
Adding a single delimiter to plain text is one simplifying assumption. That way we can start with an existing text editor and only change a little bit of functionality. I'm imagining a suite of tools that can operate on the same format, and I suspect changes to the format will multiply the amount of effort needed to support them across all tools. More sophisticated alternatives are definitely worth trying out, but the current idea already feels like a stretch for a single week.
I've spent very little time thinking about the graphical rendering, and you're right that I didn't think things through when I said, "If your program can do graphics, so can the debugging experience." That only makes sense if each step renders an image, which seems unnecessarily slow. Let me reword the description above.
We can certainly do ring buffers. However, we may also want more nuanced retention policies with perhaps separate quotas for high- and low-level logs.

0 replies

akkartik · 2022-06-22T16:53:34Z

akkartik
Jun 22, 2022
Author

It might be worth throwing out an alternative approach. Instead of tagging metadata per line, just designate one level of "hidden lines". Positives:

We don't have to worry about statements spanning boundaries.
We don't have to pay attention to only including metadata in a line for only things that line affects (so that it's robust to reordering lines)

Negatives:

There's new kinds of issues to worry about. If you delete a line and insert it, should it come before or after a hidden line? What if there are multiple contiguous hidden lines?

0 replies

leddoo · 2022-06-22T19:02:03Z

leddoo
Jun 22, 2022

hey kartik!
i've been thinking about something similar, mostly inspired by casey's Moustache Demo.
personally, i don't mind seeing logging statements in the code too much*. at least if they're succinct enough and don't cause runtime overhead when disabled, which is relatively easy in a language with good macros and/or reflection. *(you could even argue that they are a kind of documentation: "this thing matters for understanding what the code does")
i'd prefer it, if they could be hidden, but i think that's more something to think about for a structured ide. because once you talk about hiding certain parts of the code, why limit that to only logging statements?
what i think has more potential as a separate tool is something that helps you analyze that data: filter it, render it, compute derived values, display nesting (basically inferring the call stack from overlapping log entries - or some "scope" log entries, which you could also use for loops).
language orthogonality could be achieved by serializing to a common format (ideally something binary to reduce runtime overhead).

15 replies

leddoo Jun 23, 2022

change everything [...] at once

yup, i don't think that's a good idea either.
i've spent the past couple of years developing that "vision". the next few is figuring out how to make that a reality :D
an editor is definitely a good starting point. it's more or less language agnostic, and i already have a "prototype" of the code format.
seeing how far i can get with the editor is another jam idea. though that seems much more ambitious than the "debugging framework".

given that i believe significant improvements will require breaking changes, i wonder what you're working on now? seems like you've also explored this space quite a bit.

akkartik Jun 23, 2022
Author

I'm not sure what the future holds for me, to be honest 😄 My most recent project is http://akkartik.name/lines.html, and I'm hoping to use LÖVE in this jam as well. But it's also a relief not to build my own binaries and to be able to support Windows without thinking about it.

Over the years of building programming tools I've kinda forgotten what non-programming use I can put programming to. Lately I seem to be trying to focus more on things someone (not a programmer) can actually use, and not just programming to perpetuate programming.

All this said, though, my priority is finding people to jam with, whether in this jam or outside. Write up your proposal! Maybe I'll join it.

leddoo Jun 24, 2022

My most recent project is http://akkartik.name/lines.html

that's really cool! i like the idea of using code sections to "hide data in the text".

and not just programming to perpetuate programming

i don't think perpetuating "programming" would be all that bad, if it meant "using your computer".
many people are excellent problem solvers and could do amazing things with computers, if only they had "access" to them.
i think trying to make a programming environment for "normal people" is really important. anything that isn't an "entire computer" will always limit what a person can do with it. and i think it's a damn shame that so few people can actually "use" these machines.

Write up your proposal! Maybe I'll join it.

i'm still thinking about what i wanna do. right now, i'm working on a code based animation tool. today i've switched from using rust as the scripting language to lua. loving the flexibility so far. there's a good chance that i'll change my mind about what i want to do for the jam, as i mess around with lua and get my vector renderer into a usable state.

akkartik Jun 24, 2022
Author

i don't think perpetuating "programming" would be all that bad

Oh definitely! I'd just like to do more than that.

i think trying to make a programming environment for "normal people" is really important.

I care very much about this. But it also seems like there needs to be some sleight of hand. You have to get people programming without them realizing it at first. If you tell them they're programming, they'll back away without trying it.

But my statement was above all about what I need to avoid. Programmers today have a tendency to build programming tools. It's comforting, an illusion of progress. But to build something people will use, we have to get others to use what we build. There's an ever-present danger of going solipsistic. Tools for something besides programming are my attempt to break out of that rut.

leddoo Jul 1, 2022

But it also seems like there needs to be some sleight of hand.

i'm not so sure about that. i'd agree that you shouldn't use the word "programming", because that has a lot of negative connotations. we also definitely can't give them our current tools - intuitive interfaces with drag/drop and immediate feedback will be essential (eg: excel).
i think what it comes down to is the framing, and finding a way to explain how "this tool" can help people solve real problems they care about (-> explaining use cases, not features).

we have to get others to use what we build

i think this applies to both programmer and non-programmer tools. it is especially hard for programmer tools though, because they do so many things. a new ide with one or two cool new features won't get adopted, if people can't do half the things they need. and programmers have a tendency to underestimate number of problems existing tools solve.
but even the non-programmer programming tool market is quickly getting saturated with low/no-code tools.
so i can definitely understand the focus on non-programming tools.
personally, i'm currently interested in improving the programmer programming-tool situation though. i think if we want to tackle the software quality problem at large, we'll have to start by making better tools for ourselves. initiatives like jai and dion are very inspiring.

khinsen · 2022-06-23T06:58:56Z

khinsen
Jun 23, 2022

This looks like a cool project!

You mention Smalltalk among the notable project, but I think you should in particular have a look at Beacon (https://github.com/pharo-project/pharo-beacon), well described here. Note in particular the possibility to log stack frames. It's probably the closest in spirit to what you want to do, except that it's for a Smalltalk environment rather than for a text-based language.

A nice feature of Beacon is that event sends have a very small overhead. You can leave them in your code and just remove the listener, which is the costly part (because it stores all events).

0 replies

cben · 2022-06-29T12:55:11Z

cben
Jun 29, 2022

[just lurking] Obligatory mention to Jonathan Edwards's demo of Reified execution idea (even if people here are likely aware of it).

0 replies

deepakkarki · 2022-07-01T07:42:20Z

deepakkarki
Jul 1, 2022

@akkartik
Not quite time travel debugging, but gives similar advantages : Metawork
Basically it claims nearly zero overhead continuous runtime profiling. As a bonus you can save sessions and search through execution history.

1 reply

leddoo Jul 1, 2022

that looks interesting. is there a recorded demo somewhere? (i don't have a code base in the supported languages to try it with)
nearly zero overhead is a very bold claim. especially since they want to support compiled languages like rust. (seems impossible actually, given large enough data structures, unless you make them persistent, which requires major changes to the source program. or the runtime for "abstract memory" languages, which is probably what they're doing.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly structured plain text to enable lightweight, language-agnostic time-travel debugging #14

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 16 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Slightly structured plain text to enable lightweight, language-agnostic time-travel debugging #14

Experience

Drawbacks

Handmade projects

Notable projects

Details for the jam

Simplifying assumptions to manage scope

Open questions

Replies: 7 comments · 16 replies

akkartik Jun 22, 2022 Author

akkartik Jun 22, 2022 Author

akkartik Jun 23, 2022 Author

akkartik Jun 24, 2022 Author

Replies: 7 comments 16 replies

akkartik
Jun 22, 2022
Author

akkartik
Jun 22, 2022
Author

akkartik Jun 23, 2022
Author

akkartik Jun 24, 2022
Author