-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrenCall -> foreign call causes memory corruption #1185
Comments
Quick question: Did you run in debug (assertions are currently only active in debug, which guards against some misplaced uses). One other thing to try: call Edit: also thanks for the detailed post and reproduction ✨ |
I tried enabling debug assertions with
I just tried this: I read all the arguments out of slots first. But even with the list processing commented out, simply that static void fcall_cool_func_impl(WrenVM* vm) {
// read the arguments to make a CoolMeta struct
// if we can parse successfully, return true
// get name
size_t arg_name_slot = 1;
assert_msg(wrenGetSlotType(vm, arg_name_slot) == WREN_TYPE_STRING, "name must be a string");
const char* name = wrenGetSlotString(vm, arg_name_slot);
// get platforms
size_t arg_platforms_slot = 2;
assert_msg(wrenGetSlotType(vm, arg_platforms_slot) == WREN_TYPE_LIST, "platforms must be a list");
WrenHandle* platforms_list = wrenGetSlotHandle(vm, arg_platforms_slot);
std::vector<std::string> platforms;
// get hashes
size_t arg_hashes_slot = 3;
assert_msg(wrenGetSlotType(vm, arg_hashes_slot) == WREN_TYPE_MAP, "hashes must be a map");
WrenHandle* hashes_map = wrenGetSlotHandle(vm, arg_hashes_slot);
std::unordered_map<std::string, std::string> hashes;
// ensure that we have plenty of slots
wrenEnsureSlots(vm, WREN_NEEDED_SLOTS);
size_t slot_temp = 10;
// size_t platforms_count = wrenGetListCount(vm, arg_platforms_slot);
// for (size_t i = 0; i < platforms_count; i++) {
// size_t platform_list_el_slot = slot_temp;
// wrenGetListElement(vm, arg_platforms_slot, i, platform_list_el_slot);
// assert_msg(wrenGetSlotType(vm, platform_list_el_slot) == WREN_TYPE_STRING, "platform must be a string");
// const char* platform = wrenGetSlotString(vm, platform_list_el_slot);
// platforms.push_back(platform);
// }
// release handles
wrenReleaseHandle(vm, platforms_list);
wrenReleaseHandle(vm, hashes_map);
// return true
wrenSetSlotBool(vm, 0, true);
}
You're welcome! I like to make your job easier, I know what it's like to be on your end :) |
wrenEnsureSlots is forbidden in foreign functions. It break some internal
assumptions about the stack, in case the stack needs to be reallocated.
|
Not quite right, // When your foreign function is called, you are given one slot for the receiver
// and each argument to the method. The receiver is in slot 0 and the arguments
// are in increasingly numbered slots after that. You are free to read and
// write to those slots as you want. If you want more slots to use as scratch
// space, you can call wrenEnsureSlots() to add more.
...
// Ensures that the foreign method stack has at least [numSlots] available for
// use, growing the stack if needed.
//
// Does not shrink the stack if it has more than enough slots.
//
// It is an error to call this from a finalizer.
WREN_API void wrenEnsureSlots(WrenVM* vm, int numSlots); This API is and has always been FOR foreign functions use to add more to the scratch space for your methods. I have several thousand foreign methods using this function this way in my engine just fine. There are times when using it can definitely create problems, though, and needs to be addressed with either proper assertions or a fix. e.g if you call it in a finalizer and outside of a foreign. So reproducible examples like this are plenty helpful in isolating stuff. |
I was also under the impression that it was explicitly stated in the documentation that In the meantime, is there an alternative way to accomplish the same thing? As for diagnosing this issue: Valgrind showed pretty clearly that Wren tries to reallocate stuff when I try to |
The easiest workaround is to hack the VM configuration to increase the
initial stack size so that the fiber stack never reallocate.
|
So, I did a little bit of looking into this. Current codeI added some logs to wren, and here are the results for the current code: wrenNewFiber: fiber->stackCapacity = 4
wrenEnsureStack: fiber->stackCapacity = 4, needed = 5
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 8
wrenEnsureStack: reallocating stack
wrenEnsureStack: fiber->stackCapacity = 8, needed = 6
wrenEnsureStack: fiber->stackCapacity = 8, needed = 9
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 16
wrenEnsureStack: reallocating stack
foreign call test
wren: ensure slots
wrenEnsureSlots: numSlots=12
wrenEnsureSlots: creating fiber
wrenNewFiber: fiber->stackCapacity = 1
wrenEnsureSlots: currentSize=0
wrenEnsureSlots: current size is insufficient, resizing
wrenEnsureSlots: wrenEnsureStack(needed=12)
wrenEnsureStack: fiber->stackCapacity = 1, needed = 12
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 16
wrenEnsureStack: reallocating stack
wren: get handle to Script class
wren: set slot 0 to Script class
wren: call Script.run()
wrenEnsureStack: fiber->stackCapacity = 16, needed = 2
wrenEnsureStack: fiber->stackCapacity = 16, needed = 4
wrenEnsureStack: fiber->stackCapacity = 16, needed = 6
wrenEnsureStack: fiber->stackCapacity = 16, needed = 9
script is now running
wrenEnsureStack: fiber->stackCapacity = 16, needed = 4
wrenEnsureStack: fiber->stackCapacity = 16, needed = 5
wrenEnsureStack: fiber->stackCapacity = 16, needed = 11
wrenEnsureStack: fiber->stackCapacity = 16, needed = 6
wrenEnsureStack: fiber->stackCapacity = 16, needed = 8
wrenEnsureStack: fiber->stackCapacity = 16, needed = 11
fcall_cool_func_impl: enter
wrenEnsureSlots: numSlots=12
wrenEnsureSlots: currentSize=4
wrenEnsureSlots: current size is insufficient, resizing
wrenEnsureSlots: wrenEnsureStack(needed=18)
wrenEnsureStack: fiber->stackCapacity = 16, needed = 18
wrenEnsureStack: stack capacity is insufficient, growing
wrenEnsureStack: new capacity = 32
wrenEnsureStack: reallocating stack
wren: Script.run() must return a boolean, but it returned 7
assertion failed: Script.run() returned invalid type NotesAs we can see above, the This got me thinking that if I can make my initial call allocate a large enough stack, then my Potential workaroundSo, if I do So it looks like the issue can be worked around like this, but I would like someone to explain to me why the requested stack size differs from calls to ensure the same number of slots. Also, I think it would be beneficial to add some notes about this to the documentation. I can volunteer to do this as well, once I understand the root cause behind it. Thank you all for your help in troubleshooting this issue. |
Can it be the problem that so many reallocs cause memory corruption in the values that still point to old Fiber stack (since the realloc the Fiber stack, which is allocated on heap, may be in a different memory region) or the Fibers are not realloc'ed and only the stackCapacity array or something is? Maybe it's a stupid question, I have no idea how Fibers work, but would be a simple answer to the question about why is there memory corruption when number of Fibers grow Anyway, my humble opinion: starting with just capacity for 4 fibers makes no sense to me since it's a major feature of the language and there's going to be a lot of reallocs, I'd set a safe high number enough for most applications, like 64 or 128 or even 256. The memory usage will be just some KBs higher and I find it a safer bet if triggering reallocs lead to bad things. I don't know which size Fiber stack allocator is but shouldn't be higher than... 32 KB being pessimistic? And the fiber stack capacity doesn't need to be full, you can set a number of, like, 16 fibers to cache them on object pool and don't have to create a new one when a Fiber is requested, and deallocate the rest I have created Fibers in C++ with 8KB stack size and never run into a problem (but I start my Threads with 10 MB stack size because I like to abuse "alloca"/"_malloca" allocators, specially with my smart_ptr). |
The problem is caused by the fact that WrenVM cache the current stack base
pointer when performing a foreign call for "performance" reasons. When
performing a wrenEnsureSlots that cached value is not easy to update
because it is sometimes stored on the stack of the host. The issue was
mitigated to some degree, but is not fully fixed.
About the 4 you are mentioning, it is the original stack size of the fiber.
There is nothing fancy like a fiber pool in the code. And as stated, the
issue as nothing to do with the number of fibers. But the higher the number
of fiber, the higher the chance you have a scenario, where a stack growth
triggers the bug. And you can trigger the issue with one fibber, if stack
growth is performed in the way to trigger the bug.
|
This is just a wild guess from a developer who just started.
and since my first example did not work when reallocateFn=realloc, I had to write my own implementation like this:
what I think is interesting, is that wrenReallocate() seems to be used in places that assume the data is moved if the pointer is changed to a new address, but the data is never moved explicitly. This is difficult to do, because the only place where that's detectable is inside of wrenReallocate, which also does the free(), but copying the data after the free, would of course be taboo, since that would introduce a use-after-free bug. The fail-safe way to do this is to copy the stack into a temporary buffer before the realloc(), then move that data into the new larger memory area that is returned. If the pointer is not moved in this process, then we can free the temporary buffer. |
This is the cause, not the issue. Somewhere along the C stack, we preserve
a pointer into the old stack, and reuse it later.
Le lun. 22 juil. 2024, 18:36, Andrew Robbins ***@***.***> a
écrit :
… This is just a wild guess from a developer who just started.
- wrenEnsureSlots() uses
- wrenEnsureStack() which uses
- wrenReallocate() which uses
- vm-config.reallocateFn
and since my first example did not work when reallocateFn=realloc, I had
to write my own implementation like this:
void* myReallocate(void* ptr, size_t size, void* userData)
{
if (newSize == 0) {
free(ptr);
return NULL;
}
if (ptr == NULL) {
return calloc(1, size);
}
return realloc(ptr, size);
}
what I think is interesting, is that wrenReallocate() seems to be used in
places that assume the data is moved if the pointer is changed to a new
address, but the data is never moved explicitly. This is difficult to do,
because the only place where that's detectable is inside of wrenReallocate,
which also does the free(), but copying the data after the free, would of
course be taboo, since that would introduce a use-after-free bug. The
fail-safe way to do this is to copy the stack into a temporary buffer
before the realloc(), then move that data into the new larger memory area
that is returned. If the pointer is not moved in this process, then we can
free the temporary buffer.
—
Reply to this email directly, view it on GitHub
<#1185 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGHNXFPHUQKBHAMT265DAGTZNUYIHAVCNFSM6AAAAABFWOUGJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTGM3TIMZZGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Issue
I want to
wrenCall
a function in a wren script, and get its return value. This function will call into a foreign function.The foreign function needs to do some list/dictionary processing with nested data, so I need some more slots. I use
wrenEnsureSlots
.However, this causes some pretty bad corruption. In valgrind I see invalid reads, and this also results in the foreign call not actually properly returning a value, and the wren stack being corrupted. The program also sometimes segfaults when trying to free the VM.
I understand that this is related to this previous issue.
My understanding is that Wren doesn't like when you
wrenEnsureStack
within a foreign call; however, I am not aware of any other way thanwrenEnsureSlots
to ensure I have enough slots to process nested data.Is there a better way I can do this? Am I using the library incorrectly?
It seems like a fairly standard use case to use some additional slots in a foreign call implementation. But calling
wrenEnsureSlots
causes corruption.Interestingly, if I don't call
wrenEnsureSlots
inside the foreign call, because I called it beforewrenCall
, the wren stack doesn't get corrupted and does return the bool, but valgrind of course reports a lot of miscellaneous memory corruption.Overview of my code
The overview of the scripts is as follows:
builtin.wren
:test.wren
:Implementation of the foreign call:
My program then does the following:
Reproduction of Issue
I have made a full reproduction of the issue here: https://github.com/redthing1/wrenvm_tests/tree/f3985888d39168bd69b281083ff6c3a77dd9ec3d/foreigncall
Instructions to build the repro
install prerequisites:
meson
ninja
get submodules (wren is a submodule):
build a demo, such as
./foreigncall
:cd ./foreigncall meson setup build ninja -C build ./build/foreigncall_test
The text was updated successfully, but these errors were encountered: