Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner state setting and getting #2

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Conversation

mapto
Copy link

@mapto mapto commented Jan 10, 2022

I'm putting forward a first suggestion of state manipulation that would allow persisting the story state externally, as discussed in Issue #1

I don't expect this solution to be accepted as it is, it's rather a discussion of the approach. I preferred constructor and repr() interface, because another save/load functions would have to export in/import from a complex data structure that would have to be represented in a format (new class or JSON, etc) that is yet to be introduced. The proposed approach is probably more pythonic, relying on built-in functions. Of course a more efficient way to export is to access the field values, rather than the repr() string. Yet, repr() enables the reflection pattern.

Also, my interpretation of state is rather naive:

  • I need use cases to better decide whether I should be including line_buffer, option_buffer, vm_data_stack, vm_instruction_stack and program_counter.
  • Command handlers are also not really persisted, as commands themselves are external to the runner

Let me know what your thoughts are and I will adapt the solution.

@mapto mapto changed the title Runner state setting and getting (addressing issue #1) Runner state setting and getting Jan 10, 2022
@weaversam8
Copy link
Member

Thanks so much for the PR! I want to warn you that it will take me longer than normal (probably about two weeks) before I can review and comment on this, but I'm excited to do so once I get the chance!

@mapto
Copy link
Author

mapto commented Jan 15, 2022

I'm having problems serializing and deserializing the compiled_yarn or the vm_instruction_stack. Which one makes sense to persist? Any examples (I am trying with protobuf's SerializeToString/ParseFromString, but doesn't work)?Any hints you could give me?

@mapto
Copy link
Author

mapto commented Jan 15, 2022

Sorry, just saw this:
https://github.com/relaypro-open/YarnRunner-Python/blob/main/yarnrunner_python/runner.py#L113
Should do the job for me.

@mapto
Copy link
Author

mapto commented Jan 19, 2022

I've expanded my state export with other attributes, including merging your changes. However, there's still something wrong with the state (see (this test)[https://github.com/mapto/YarnRunner-Python/blob/main/tests/test_shortcuts.py#L63] that fails on line 92). It seems that after loading the lines are not picking up from where they were left. Instead they restart. Could you give me a hand identifying what's the storage that I'm missing? Thanks!

PS: I will now change the persistence to JSON, as it seems to be the format that's already partially used. However, it would be pointless if I am not capturing the complete state.

@weaversam8
Copy link
Member

Thanks for working on a first pass at this @mapto! I really appreciate you putting this forward, and look forward to working through it with you to get it merged.

A couple of things:

  • Let's definitely not use repr()/eval() to save state- as nice as it is I don't like the idea of opening a hole for someone to try to sneak arbitrary code execution into save state files. JSON as you mentioned should be fine.
  • I don't think we necessarily need to use a constructor approach where we include every piece of internal data as a parameter. I'd prefer we either:
    1. rely on **kwargs without explicitly specifying parameter names in the function signature or,
    2. create a load() function (to go with the save() function that will replace repr()) that loads state data into an already constructed runner object
      • I prefer (ii) since it means save state data can be loaded into an already constructed instance. Though it means you have to construct and then load data, that's no big deal, since you can just start one with autostart=False.
  • In concert with this, we shouldn't require people to serialize the program or string table in the save file. In many cases, the program will be the same across a large number of save states, and serializing the program will make saved states larger than they need to be.
    • However, this means that we'll need some way to verify that the version of the program used to load a saved state matches (or is compatible with) the version of the program used to save a state. We should include some sort of "program version" in the saved state, which we can verify on load. For now, a simple MD5 hash of the program and string table files should suffice, we can build a less brittle program version indicator in the future if necessary.

Overall, I think, the major information that should be serialized in the save state should be:

  • visits
  • variables
  • the line buffer
  • the option buffer
  • the VM's data stack
  • the VM's instruction stack
  • the current program counter
  • the previous instruction
  • whether or not execution has finished
    • (this last one may seem counterintuitive, but I'd like the ability to save a finished run and reference it later. in the current time, the only really valuable information after a finished run would be the visits and variables, but I expect to implement tracing of node paths in the future, which might give us another reason to save a completed execution.)

The major information that should not be serialized is probably:

  • the program file
  • the string tables
  • other constructor parameters (enable_tracing, newlines, etc)
  • command handlers

How do you feel about these suggestions? Do you feel comfortable making these changes?

@mapto
Copy link
Author

mapto commented Jan 24, 2022

Thanks for the feedback!

At a personal level, my team dropped out from the game jam, yet both our story writer and I are interested to continue to explore yarn. So this means I can continue to work on this PR, but will be able to dedicate less time in February.

Now regarding solutions: Certainly agree that JSON is a better solution and I would readily do it the way you describe. However, I already tried serializing both the information you indicated as should and the one that you indicated as shouldn't and my test with shortcuts continues to fail, so I am sceptical that switching to JSON would take us any forward.

Given that it appears to me that explicit state serialization seems to miss something that remains unidentified, I am wondering if it makes sense to try serializing with (pickle)[https://docs.python.org/3/library/pickle.html] which shouldn't leave any chances.

Anyway, I'll try with JSON and if that same test continues to fail we can discuss that other option.

@mapto
Copy link
Author

mapto commented Jan 24, 2022

Good news: a quick implementation with JSON passed the previously failing test.

TODO: the hash function for version comparison and wider testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants