Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrenCall -> foreign call causes memory corruption #1185

Open
redthing1 opened this issue Apr 4, 2024 · 11 comments
Open

wrenCall -> foreign call causes memory corruption #1185

redthing1 opened this issue Apr 4, 2024 · 11 comments

Comments

@redthing1
Copy link

redthing1 commented Apr 4, 2024

Issue

I want to wrenCall a function in a wren script, and get its return value. This function will call into a foreign function.

The foreign function needs to do some list/dictionary processing with nested data, so I need some more slots. I use wrenEnsureSlots.

However, this causes some pretty bad corruption. In valgrind I see invalid reads, and this also results in the foreign call not actually properly returning a value, and the wren stack being corrupted. The program also sometimes segfaults when trying to free the VM.

I understand that this is related to this previous issue.

My understanding is that Wren doesn't like when you wrenEnsureStack within a foreign call; however, I am not aware of any other way than wrenEnsureSlots to ensure I have enough slots to process nested data.

Is there a better way I can do this? Am I using the library incorrectly?

It seems like a fairly standard use case to use some additional slots in a foreign call implementation. But calling wrenEnsureSlots causes corruption.

Interestingly, if I don't call wrenEnsureSlots inside the foreign call, because I called it before wrenCall, the wren stack doesn't get corrupted and does return the bool, but valgrind of course reports a lot of miscellaneous memory corruption.

Overview of my code

The overview of the scripts is as follows:

builtin.wren:

class Test {
  foreign static cool_func_impl(name, platforms, hashes)

  static cool_func(meta) {
    var meta_name = meta.name
    var meta_platforms = meta.platforms
    var meta_hashes = meta.hashes

    return cool_func_impl(meta_name, meta_platforms, meta_hashes)
  }
}

test.wren:

import "builtin" for Test

System.print("foreign call test")

class ProgramMeta {
    construct new() { }

    name { "foreign call test" }
    platforms { ["windows", "linux", "macos"] }
    hashes {{
        "windows" : [
            "hash1",
            "hash2"
        ],
        "linux" : [
            "hash1",
            "hash2"
        ],
        "macos" : [
            "hash1",
            "hash2"
        ]
    }}
}

class Script {
    static run() {
        System.print("script is now running")

        return Test.cool_func(ProgramMeta.new())
    }
}

Implementation of the foreign call:

#define WREN_NEEDED_SLOTS 12

// ...

static void fcall_cool_func_impl(WrenVM* vm) {
  // read the arguments to make a CoolMeta struct
  // if we can parse successfully, return true

  // ensure that we have plenty of slots
  wrenEnsureSlots(vm, WREN_NEEDED_SLOTS);
  size_t slot_temp = 10;

  // get name
  size_t arg_name_slot = 1;
  assert_msg(wrenGetSlotType(vm, arg_name_slot) == WREN_TYPE_STRING, "name must be a string");
  const char* name = wrenGetSlotString(vm, arg_name_slot);

  // get platforms
  size_t arg_platforms_slot = 2;
  assert_msg(wrenGetSlotType(vm, arg_platforms_slot) == WREN_TYPE_LIST, "platforms must be a list");
  WrenHandle* platforms_list = wrenGetSlotHandle(vm, arg_platforms_slot);
  std::vector<std::string> platforms;
  size_t platforms_count = wrenGetListCount(vm, arg_platforms_slot);
  for (size_t i = 0; i < platforms_count; i++) {
    size_t platform_list_el_slot = slot_temp;
    wrenGetListElement(vm, arg_platforms_slot, i, platform_list_el_slot);
    assert_msg(wrenGetSlotType(vm, platform_list_el_slot) == WREN_TYPE_STRING, "platform must be a string");
    const char* platform = wrenGetSlotString(vm, platform_list_el_slot);
    platforms.push_back(platform);
  }

  // return true
  wrenSetSlotBool(vm, 0, true);
}

My program then does the following:

  // now, we want to call Script.run()

  wrenEnsureSlots(vm, WREN_NEEDED_SLOTS);

  // get handle to Script class
  wrenGetVariable(vm, source_module, "Script", 0);
  WrenHandle* script_class = wrenGetSlotHandle(vm, 0);
  // make call handle to run method
  WrenHandle* run_method = wrenMakeCallHandle(vm, "run()");

  // invoke Script.run()
  wrenSetSlotHandle(vm, 0, script_class);
  printf("wren: call Script.run()\n");
  wrenCall(vm, run_method);

  // get the boolean return value
  if (wrenGetSlotType(vm, 0) != WREN_TYPE_BOOL) {
    printf("wren: Script.run() must return a boolean, but it returned %d\n", wrenGetSlotType(vm, 0));
    assert_msg(false, "Script.run() returned invalid type");
  }
  bool run_result = wrenGetSlotBool(vm, 0);
  printf("wren: Script.run() returned %s\n", run_result ? "true" : "false");

Reproduction of Issue

I have made a full reproduction of the issue here: https://github.com/redthing1/wrenvm_tests/tree/f3985888d39168bd69b281083ff6c3a77dd9ec3d/foreigncall

Instructions to build the repro

install prerequisites:

  • c/c++ compiler
  • meson
  • ninja

get submodules (wren is a submodule):

git submodule update --init --recursive

build a demo, such as ./foreigncall:

cd ./foreigncall
meson setup build
ninja -C build
./build/foreigncall_test
@ruby0x1
Copy link
Member

ruby0x1 commented Apr 4, 2024

Quick question: Did you run in debug (assertions are currently only active in debug, which guards against some misplaced uses).

One other thing to try: call wrenEnsureSlots later. You should likely read the values for your arguments and receiver (slot 0-N) before calling ensure slots.

Edit: also thanks for the detailed post and reproduction ✨

@redthing1
Copy link
Author

redthing1 commented Apr 4, 2024

Quick question: Did you run in debug (assertions are currently only active in debug, which guards against some misplaced uses).

I tried enabling debug assertions with #define DEBUG in wren_common.h:
I don't see any different results with that however. It doesn't seem like it catches any assertions.

One other thing to try: call wrenEnsureSlots later. You should likely read the values for your arguments and receiver (slot 0-N) before calling ensure slots.

I just tried this: I read all the arguments out of slots first.

But even with the list processing commented out, simply that wrenEnsureSlots call alone seems to cause all the issues.

static void fcall_cool_func_impl(WrenVM* vm) {
  // read the arguments to make a CoolMeta struct
  // if we can parse successfully, return true

  // get name
  size_t arg_name_slot = 1;
  assert_msg(wrenGetSlotType(vm, arg_name_slot) == WREN_TYPE_STRING, "name must be a string");
  const char* name = wrenGetSlotString(vm, arg_name_slot);

  // get platforms
  size_t arg_platforms_slot = 2;
  assert_msg(wrenGetSlotType(vm, arg_platforms_slot) == WREN_TYPE_LIST, "platforms must be a list");
  WrenHandle* platforms_list = wrenGetSlotHandle(vm, arg_platforms_slot);
  std::vector<std::string> platforms;

  // get hashes
  size_t arg_hashes_slot = 3;
  assert_msg(wrenGetSlotType(vm, arg_hashes_slot) == WREN_TYPE_MAP, "hashes must be a map");
  WrenHandle* hashes_map = wrenGetSlotHandle(vm, arg_hashes_slot);
  std::unordered_map<std::string, std::string> hashes;

  // ensure that we have plenty of slots
  wrenEnsureSlots(vm, WREN_NEEDED_SLOTS);
  size_t slot_temp = 10;

  // size_t platforms_count = wrenGetListCount(vm, arg_platforms_slot);
  // for (size_t i = 0; i < platforms_count; i++) {
  //   size_t platform_list_el_slot = slot_temp;
  //   wrenGetListElement(vm, arg_platforms_slot, i, platform_list_el_slot);
  //   assert_msg(wrenGetSlotType(vm, platform_list_el_slot) == WREN_TYPE_STRING, "platform must be a string");
  //   const char* platform = wrenGetSlotString(vm, platform_list_el_slot);
  //   platforms.push_back(platform);
  // }

  // release handles
  wrenReleaseHandle(vm, platforms_list);
  wrenReleaseHandle(vm, hashes_map);

  // return true
  wrenSetSlotBool(vm, 0, true);
}

Edit: also thanks for the detailed post and reproduction ✨

You're welcome! I like to make your job easier, I know what it's like to be on your end :)

@mhermier
Copy link
Contributor

mhermier commented Apr 4, 2024 via email

@ruby0x1
Copy link
Member

ruby0x1 commented Apr 4, 2024

Not quite right,

// When your foreign function is called, you are given one slot for the receiver
// and each argument to the method. The receiver is in slot 0 and the arguments
// are in increasingly numbered slots after that. You are free to read and
// write to those slots as you want. If you want more slots to use as scratch
// space, you can call wrenEnsureSlots() to add more.

...

// Ensures that the foreign method stack has at least [numSlots] available for
// use, growing the stack if needed.
//
// Does not shrink the stack if it has more than enough slots.
//
// It is an error to call this from a finalizer.
WREN_API void wrenEnsureSlots(WrenVM* vm, int numSlots);

This API is and has always been FOR foreign functions use to add more to the scratch space for your methods. I have several thousand foreign methods using this function this way in my engine just fine.

There are times when using it can definitely create problems, though, and needs to be addressed with either proper assertions or a fix. e.g if you call it in a finalizer and outside of a foreign.

So reproducible examples like this are plenty helpful in isolating stuff.

@redthing1
Copy link
Author

I was also under the impression that it was explicitly stated in the documentation that wrenEnsureSlots is for use in foreign functions.

In the meantime, is there an alternative way to accomplish the same thing?
One idea that looked promising as a hacky workaround was to pass in dummy parameters to provide additional temp slots. I don't like this however.

As for diagnosing this issue: Valgrind showed pretty clearly that Wren tries to reallocate stuff when I try to wrenEnsureSlots which then leads to invalid reads.

@mhermier
Copy link
Contributor

mhermier commented Apr 4, 2024 via email

@redthing1
Copy link
Author

redthing1 commented Apr 5, 2024

So, I did a little bit of looking into this.

Current code

I added some logs to wren, and here are the results for the current code:

wrenNewFiber: fiber->stackCapacity = 4
wrenEnsureStack: fiber->stackCapacity = 4, needed = 5
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 8
wrenEnsureStack: reallocating stack
wrenEnsureStack: fiber->stackCapacity = 8, needed = 6
wrenEnsureStack: fiber->stackCapacity = 8, needed = 9
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 16
wrenEnsureStack: reallocating stack
foreign call test
wren: ensure slots
wrenEnsureSlots: numSlots=12
wrenEnsureSlots: creating fiber
wrenNewFiber: fiber->stackCapacity = 1
wrenEnsureSlots: currentSize=0
wrenEnsureSlots: current size is insufficient, resizing
wrenEnsureSlots: wrenEnsureStack(needed=12)
wrenEnsureStack: fiber->stackCapacity = 1, needed = 12
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 16
wrenEnsureStack: reallocating stack
wren: get handle to Script class
wren: set slot 0 to Script class
wren: call Script.run()
wrenEnsureStack: fiber->stackCapacity = 16, needed = 2
wrenEnsureStack: fiber->stackCapacity = 16, needed = 4
wrenEnsureStack: fiber->stackCapacity = 16, needed = 6
wrenEnsureStack: fiber->stackCapacity = 16, needed = 9
script is now running
wrenEnsureStack: fiber->stackCapacity = 16, needed = 4
wrenEnsureStack: fiber->stackCapacity = 16, needed = 5
wrenEnsureStack: fiber->stackCapacity = 16, needed = 11
wrenEnsureStack: fiber->stackCapacity = 16, needed = 6
wrenEnsureStack: fiber->stackCapacity = 16, needed = 8
wrenEnsureStack: fiber->stackCapacity = 16, needed = 11
fcall_cool_func_impl: enter
wrenEnsureSlots: numSlots=12
wrenEnsureSlots: currentSize=4
wrenEnsureSlots: current size is insufficient, resizing
wrenEnsureSlots: wrenEnsureStack(needed=18)
wrenEnsureStack: fiber->stackCapacity = 16, needed = 18
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 32
wrenEnsureStack: reallocating stack
wren: Script.run() must return a boolean, but it returned 7
assertion failed: Script.run() returned invalid type

Notes

As we can see above, the wrenEnsureSlots(WREN_NEEDED_SLOTS) I do before the wrenCall ends up growing the stack to size 16. However, my call to the very same thing within the foreign function results in requesting a stack capacity of 18: I'm not sure what the cause of this discrepancy is, please explain it to me.

This got me thinking that if I can make my initial call allocate a large enough stack, then my wrenEnsureSlots in the foreign function won't have to reallocate the stack.

Potential workaround

So, if I do wrenEnsureSlots(WREN_NEEDED_SLOTS * 2) (or some other slack factor), I can get the initial stack reallocate prior to wrenCall to create a stack of size 32. Then, my wrenEnsureSlots call within the foreign function implementation does not need to reallocate, and no memory corruption happens.

So it looks like the issue can be worked around like this, but I would like someone to explain to me why the requested stack size differs from calls to ensure the same number of slots. Also, I think it would be beneficial to add some notes about this to the documentation. I can volunteer to do this as well, once I understand the root cause behind it.

Thank you all for your help in troubleshooting this issue.

@Brugarolas
Copy link

Brugarolas commented Jun 14, 2024

Can it be the problem that so many reallocs cause memory corruption in the values that still point to old Fiber stack (since the realloc the Fiber stack, which is allocated on heap, may be in a different memory region) or the Fibers are not realloc'ed and only the stackCapacity array or something is?

Maybe it's a stupid question, I have no idea how Fibers work, but would be a simple answer to the question about why is there memory corruption when number of Fibers grow

Anyway, my humble opinion: starting with just capacity for 4 fibers makes no sense to me since it's a major feature of the language and there's going to be a lot of reallocs, I'd set a safe high number enough for most applications, like 64 or 128 or even 256. The memory usage will be just some KBs higher and I find it a safer bet if triggering reallocs lead to bad things. I don't know which size Fiber stack allocator is but shouldn't be higher than... 32 KB being pessimistic? And the fiber stack capacity doesn't need to be full, you can set a number of, like, 16 fibers to cache them on object pool and don't have to create a new one when a Fiber is requested, and deallocate the rest

I have created Fibers in C++ with 8KB stack size and never run into a problem (but I start my Threads with 10 MB stack size because I like to abuse "alloca"/"_malloca" allocators, specially with my smart_ptr).

@mhermier
Copy link
Contributor

mhermier commented Jun 15, 2024 via email

@andydude
Copy link

This is just a wild guess from a developer who just started.

  • wrenEnsureSlots() uses
  • wrenEnsureStack() which uses
  • wrenReallocate() which uses
  • vm-config.reallocateFn

and since my first example did not work when reallocateFn=realloc, I had to write my own implementation like this:

void* myReallocate(void* ptr, size_t size, void* userData)
{
  if (newSize == 0) {
    free(ptr);
    return NULL;
  }
  if (ptr == NULL) {
    return calloc(1, size);
  }
  return realloc(ptr, size);
}

what I think is interesting, is that wrenReallocate() seems to be used in places that assume the data is moved if the pointer is changed to a new address, but the data is never moved explicitly. This is difficult to do, because the only place where that's detectable is inside of wrenReallocate, which also does the free(), but copying the data after the free, would of course be taboo, since that would introduce a use-after-free bug. The fail-safe way to do this is to copy the stack into a temporary buffer before the realloc(), then move that data into the new larger memory area that is returned. If the pointer is not moved in this process, then we can free the temporary buffer.

@mhermier
Copy link
Contributor

mhermier commented Jul 22, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants