Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neko thread usage causes seg faults during global free #281

Open
tobil4sk opened this issue Apr 4, 2023 · 2 comments
Open

Neko thread usage causes seg faults during global free #281

tobil4sk opened this issue Apr 4, 2023 · 2 comments

Comments

@tobil4sk
Copy link
Member

tobil4sk commented Apr 4, 2023

Ever since haxelib was updated to use threads on neko, it has been segfaulting randomly in github actions. e.g.

Command: haxelib [git,utest,https://github.com/haxe-utest/utest,master,--always]
Installing utest from https://github.com/haxe-utest/utest branch: master
Library utest current version is now git
Command exited with 139 in 1s: haxelib [git,utest,https://github.com/haxe-utest/utest,master,--always]
Segmentation fault (core dumped)

I haven't been able to reproduce at all on any local systems, but I did some troubleshooting and I found that the seg fault occurs after the main function is completed, at some point after this call, but before the program closes: https://github.com/HaxeFoundation/neko/blob/master/vm/main.c#L342.

I managed to download the core dump and load it, and it says that the seg fault comes from line 46 here:

neko/vm/callback.c

Lines 44 to 48 in 9076cfa

EXTERN value val_callEx( value vthis, value f, value *args, int nargs, value *exc ) {
neko_vm *vm = NEKO_VM();
value old_this = vm->vthis;
value old_env = vm->env;
value ret = val_null;

I later added a printf here and confirmed that during the segfault, vm is a null pointer. Perhaps there is a finaliser that is getting called after the main function has already finished or something?

Full backtrace
Core was generated by `haxelib git utest https://github.com/haxe-utest/utest master --always'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f30d8f14ef6 in neko_val_callEx (vthis=0x7f30d782a000, f=0x7f30d8b4d8a0, args=0x7f30d5e3d7f8, nargs=1, exc=0x0)
    at /src/vm/callback.c:46
46	/src/vm/callback.c: Bad file descriptor.
[Current thread is 1 (LWP 2473)]
(gdb) bt
#0  0x00007f30d8f14ef6 in neko_val_callEx (vthis=0x7f30d782a000, f=0x7f30d8b4d8a0, args=0x7f30d5e3d7f8, nargs=1, exc=0x0)
    at /src/vm/callback.c:46
#1  0x00007f30d8f17818 in neko_interp_loop (vm=0x7f30d77e61c0, m=0x7f30d8b4cea0, _acc=139847740806880, _pc=0x7f30d77109b8)
    at /src/vm/interp.c:708
#2  0x00007f30d8f20e24 in neko_interp (vm=0x7f30d77e61c0, _m=0x7f30d8b4cea0, acc=139847740806880, pc=0x7f30d77109b8)
    at /src/vm/interp.c:1214
#3  0x00007f30d8f15511 in neko_val_callEx (vthis=0x7f30d914f870 <t_null>, f=0x7f30d6e9b360, args=0x7f30d8b490f8, nargs=1,
    exc=0x7f30d5e3dd20) at /src/vm/callback.c:117
#4  0x00007f30d7909af1 in thread_loop (_p=0x7f30d8b490f0) at /src/libs/std/thread.c:237
#5  0x00007f30d8f26456 in ThreadMain (_p=0x7ffd92492990) at /src/vm/threads.c:122
#6  0x00007f30d8f41678 in GC_inner_start_routine () from fs/usr/local/lib/libneko.so.2
#7  0x00007f30d8f3558a in GC_call_with_stack_base () from fs/usr/local/lib/libneko.so.2
#8  0x00007f30d8f3b144 in GC_start_routine () from fs/usr/local/lib/libneko.so.2
#9  0x00007f30d8ed2609 in pwd_traced_file () from fs/lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) bt full
#0  0x00007f30d8f14ef6 in neko_val_callEx (vthis=0x7f30d782a000, f=0x7f30d8b4d8a0, args=0x7f30d5e3d7f8, nargs=1, exc=0x0)
    at /src/vm/callback.c:46
        vm = 0x0
        old_this = 0x0
        old_env = 0x0
        ret = 0x0
        oldjmp = {{__jmpbuf = {0, 0, 0, 0, 139845314828357, 139847775009936, 7883446016, 16}, __mask_was_saved = -706488560,
            __saved_mask = {__val = {1, 139847723636592, 139847774906397, 1, 139847770864720, 17450007603122798595, 139847750572680,
                139847723636592, 38654705672, 17450007603122798600, 139847750525952, 139847723636592, 139847774911661,
                17450007606711277424, 139847750819840, 139847750572672}}}}
#1  0x00007f30d8f17818 in neko_interp_loop (vm=0x7f30d77e61c0, m=0x7f30d8b4cea0, _acc=139847740806880, _pc=0x7f30d77109b8)
    at /src/vm/interp.c:708
        _o = 0x7f30d782a000
        _arg = 0x1
        _f = 0x7f30d8b4d8a0
        acc = 1
        pc = 0x7f30d76efe28
        instructions = {0x7f30d8f170c2 <neko_interp_loop+130>, 0x7f30d8f170dc <neko_interp_loop+156>,
          0x7f30d8f170f5 <neko_interp_loop+181>, 0x7f30d8f1710e <neko_interp_loop+206>, 0x7f30d8f17128 <neko_interp_loop+232>,
          0x7f30d8f17188 <neko_interp_loop+328>, 0x7f30d8f171ab <neko_interp_loop+363>, 0x7f30d8f171c7 <neko_interp_loop+391>,
          0x7f30d8f172d0 <neko_interp_loop+656>, 0x7f30d8f175b4 <neko_interp_loop+1396>, 0x7f30d8f18081 <neko_interp_loop+4161>,
          0x7f30d8f18417 <neko_interp_loop+5079>, 0x7f30d8f18430 <neko_interp_loop+5104>, 0x7f30d8f18453 <neko_interp_loop+5139>,
          0x7f30d8f1846f <neko_interp_loop+5167>, 0x7f30d8f18578 <neko_interp_loop+5432>, 0x7f30d8f18791 <neko_interp_loop+5969>,
          0x7f30d8f18b88 <neko_interp_loop+6984>, 0x7f30d8f18f21 <neko_interp_loop+7905>, 0x7f30d8f18f3e <neko_interp_loop+7934>,
          0x7f30d8f18f9e <neko_interp_loop+8030>, 0x7f30d8f19dc2 <neko_interp_loop+11650>, 0x7f30d8f1a804 <neko_interp_loop+14276>,
          0x7f30d8f1b24f <neko_interp_loop+16911>, 0x7f30d8f1b264 <neko_interp_loop+16932>, 0x7f30d8f1b28e <neko_interp_loop+16974>,
          0x7f30d8f1b2b8 <neko_interp_loop+17016>, 0x7f30d8f1b3c7 <neko_interp_loop+17287>, 0x7f30d8f1b4f6 <neko_interp_loop+17590>,
          0x7f30d8f1b5a2 <neko_interp_loop+17762>, 0x7f30d8f1b716 <neko_interp_loop+18134>, 0x7f30d8f1b847 <neko_interp_loop+18439>,
          0x7f30d8f1b8df <neko_interp_loop+18591>, 0x7f30d8f1b916 <neko_interp_loop+18646>, 0x7f30d8f1b94d <neko_interp_loop+18701>,
          0x7f30d8f1c72d <neko_interp_loop+22253>, 0x7f30d8f1d4d2 <neko_interp_loop+25746>, 0x7f30d8f1e269 <neko_interp_loop+29225>,
          0x7f30d8f1e822 <neko_interp_loop+30690>, 0x7f30d8f1f6d2 <neko_interp_loop+34450>, 0x7f30d8f1f910 <neko_interp_loop+35024>,
          0x7f30d8f1fb4e <neko_interp_loop+35598>, 0x7f30d8f1fd92 <neko_interp_loop+36178>, 0x7f30d8f1ffb8 <neko_interp_loop+36728>,
          0x7f30d8f201de <neko_interp_loop+37278>, 0x7f30d8f20404 <neko_interp_loop+37828>, 0x7f30d8f20487 <neko_interp_loop+37959>,
          0x7f30d8f20603 <neko_interp_loop+38339>, 0x7f30d8f20686 <neko_interp_loop+38470>, 0x7f30d8f204fd <neko_interp_loop+38077>,
          0x7f30d8f20580 <neko_interp_loop+38208>, 0x7f30d8f1b893 <neko_interp_loop+18515>, 0x7f30d8f20709 <neko_interp_loop+38601>,
--Type <RET> for more, q to quit, c to continue without paging--c
          0x7f30d8f20743 <neko_interp_loop+38659>, 0x7f30d8f20808 <neko_interp_loop+38856>, 0x7f30d8f20911 <neko_interp_loop+39121>,
          0x7f30d8f20943 <neko_interp_loop+39171>, 0x7f30d8f18fe0 <neko_interp_loop+8096>, 0x7f30d8f17161 <neko_interp_loop+289>,
          0x7f30d8f17174 <neko_interp_loop+308>, 0x7f30d8f179a7 <neko_interp_loop+2407>, 0x7f30d8f17d10 <neko_interp_loop+3280>,
          0x7f30d8f207c1 <neko_interp_loop+38785>, 0x7f30d8f1929a <neko_interp_loop+8794>, 0x7f30d8f20980 <neko_interp_loop+39232>,
          0x7f30d8f1b7a2 <neko_interp_loop+18274>, 0x7f30d8f1713e <neko_interp_loop+254>, 0x7f30d8f2098f <neko_interp_loop+39247>}
        sp = 0x7f30d6eab7a8
        csp = 0x7f30d6eab058
#2  0x00007f30d8f20e24 in neko_interp (vm=0x7f30d77e61c0, _m=0x7f30d8b4cea0, acc=139847740806880, pc=0x7f30d77109b8)
    at /src/vm/interp.c:1214
        sp = 0x7f30d6eab768
        csp = 0x7f30d6eab078
        trap = 0x7f30d6eab738
        init_sp = 7
        m = 0x7f30d8b4cea0
        old = {{__jmpbuf = {0, 4064061087093578727, 140727057721422, 140727057721423, 140727057721680, 139847723638720,
              4064061087267642343, 4064050217118686183}, __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 16 times>}}}}
#3  0x00007f30d8f15511 in neko_val_callEx (vthis=0x7f30d914f870 <t_null>, f=0x7f30d6e9b360, args=0x7f30d8b490f8, nargs=1,
    exc=0x7f30d5e3dd20) at /src/vm/callback.c:117
        n = 1
        vm = 0x7f30d77e61c0
        old_this = 0x7f30d914f870 <t_null>
        old_env = 0x7f30d914eee0 <empty_array>
        ret = 0x7f30d914f870 <t_null>
        oldjmp = {{__jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, __saved_mask = {__val = {0 <repeats 16 times>}}}}
#4  0x00007f30d7909af1 in thread_loop (_p=0x7f30d8b490f0) at /src/libs/std/thread.c:237
        p = 0x7f30d8b490f0
        exc = 0x0
#5  0x00007f30d8f26456 in ThreadMain (_p=0x7ffd92492990) at /src/vm/threads.c:122
        lp = 0x7ffd92492990
        p = {init = 0x7f30d7909a1b <thread_init>, main = 0x7f30d7909a99 <thread_loop>, param = 0x7f30d8b490f0, lock = {__data = {
              __lock = 2, __count = 0, __owner = 2429, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
                __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000}\t\000\000\001", '\000' <repeats 26 times>, __align = 2}}
#6  0x00007f30d8f41678 in GC_inner_start_routine () from fs/usr/local/lib/libneko.so.2
No symbol table info available.
#7  0x00007f30d8f3558a in GC_call_with_stack_base () from fs/usr/local/lib/libneko.so.2
No symbol table info available.
#8  0x00007f30d8f3b144 in GC_start_routine () from fs/usr/local/lib/libneko.so.2
No symbol table info available.
#9  0x00007f30d8ed2609 in pwd_traced_file () from fs/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.

Here is the code in haxelib that uses threads: https://github.com/HaxeFoundation/haxelib/blob/4.1.x/src/haxelib/client/Vcs.hx#L162-L177

tobil4sk added a commit to HaxeFoundation/haxelib that referenced this issue Apr 4, 2023
On Ubuntu, threads can cause seg faults, see:
HaxeFoundation/neko#281
tobil4sk added a commit to tobil4sk/haxec that referenced this issue Apr 4, 2023
Haxelib was causing CI failures in the Ubuntu runners due to a threading
issue with neko:

HaxeFoundation/neko#281
kLabz pushed a commit to HaxeFoundation/haxe that referenced this issue Apr 4, 2023
* Patch haxelib to avoid segmentation faults

Haxelib was causing CI failures in the Ubuntu runners due to a threading
issue with neko:

HaxeFoundation/neko#281

* Update haxelib for run.n fix
tobil4sk added a commit to HaxeFoundation/haxelib that referenced this issue Apr 6, 2023
On Ubuntu, threads can cause seg faults, see:
HaxeFoundation/neko#281
@tobil4sk
Copy link
Member Author

We just had a similar crash on Windows, so looks like it's not specific to Linux:

Command: haxelib [git,utest,https://github.com/haxe-utest/utest,master,--always]
Installing utest from https://github.com/haxe-utest/utest branch: master
Library utest current version is now git
Command exited with -1073741819 in 3s: haxelib [git,utest,https://github.com/haxe-utest/utest,master,--always]

-1073741819 is equivalent to 0xC0000005, which is STATUS_ACCESS_VIOLATION: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55

@tobil4sk
Copy link
Member Author

This sample seems to reproduce the seg fault some of the time, at least on my windows machine:

function main() {
	final streamsLock = new sys.thread.Lock();

	sys.thread.Thread.create(function() {
		Sys.sleep(0.2);
		streamsLock.release();
	});

	sys.thread.Thread.create(function() {
		Sys.sleep(0.2);
		streamsLock.release();
	});

	streamsLock.wait();
	streamsLock.wait();
}

@tobil4sk tobil4sk changed the title Neko threads cause seg faults in Ubuntu github actions environment Neko thread usage causes seg faults during global free Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant