Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luajit light userdata incompatibility #223

Closed
vcunat opened this issue Jul 10, 2019 · 14 comments
Closed

luajit light userdata incompatibility #223

vcunat opened this issue Jul 10, 2019 · 14 comments

Comments

@vcunat
Copy link
Contributor

vcunat commented Jul 10, 2019

Typical test case: aarch64 running Linux configured with ARM64_VA_BITS >= 48

$ luajit -l cqueues -e 'os.exit(0)'
luajit: bad light userdata pointer

Other platforms don't commonly run into this AFAIK, but on aarch64 this kernel configuration now seems to be typical. Note that upstream luajit seems very unlikely to fix this incompatibility (and it would probably be hard without caveats).

I don't know code internals of cqueues, so I can't guess how hard this will be, but I recently did this transformation for Knot Resolver codebase, so I might be able to help, though I'm typically overloaded with higher-priority work.

@daurnimator
Copy link
Collaborator

What sort of variables can't be used? Should we be avoiding global vars? pointers to the stack? pointers as returned by malloc?

@daurnimator
Copy link
Collaborator

Judging by the fact that it failed when requiring cqueues, it's probably pushing CQUEUE__POLL in luaopen_cqueues that is failing for you (though I'm sure once we get past this first one there will be more).

As I don't have access to an arm64 platform, could you verify the above by adding prints on either side of

lua_pushlightuserdata(L, CQUEUE__POLL);

@vcunat
Copy link
Contributor Author

vcunat commented Jul 11, 2019

Well, generally there's probably no guarantee which pointers get allocated too high. From aarch64 linux experience, stack is bad and static variables (inside C functions) are OK.

I believe you're right that's the first place to violate the condition:

#0 lua_pushlightuserdata (L=0x252904278378, p=0xffffbf492038 <cqueue.poll>) at lj_api.c:699

@vcunat
Copy link
Contributor Author

vcunat commented Jul 11, 2019

I think the memory layout ends up the same as on x86_64, only using (usually) 48 bits instead of 47. Notably I see that so-libs are placed on the high end – so the static trick likely shouldn't work there, and I wouldn't expect malloc/mmap getting this high normally (especially when we're just one bit over the limit).

@daurnimator
Copy link
Collaborator

I'm not sure what to replace that code with.... a lightuserdata of cqueues_poll is part of the public ABI of the library.
How else can you share a unique value across dlopend libraries?

Maybe for LuaJIT we should mask off the high bits? Would be a bit risky that a collision occurs.
Or maybe cqueues_poll could be a struct with some luajit compat value in it? But who would initialise it and when?
None of the solutions I can think of seem particularly great :(

@vcunat
Copy link
Contributor Author

vcunat commented Jul 14, 2019

If it's just about collisions in the most common case when one bit is missing, I expect it's possible to hack around this without too much risk:

  1. sacrifice the least significant bit instead the bit 47 (i.e. 1 << 47), as existence of valid distinct addresses one byte off is practically impossible (and almost all allocators would guarantee the bit to be zero anyway), or
  2. assume sign-extension, i.e. that bit 47 and 46 have the same value... as allocations tend to keep by either of the extremes.

These hacks might even work luajit-wide... it's just about choosing a better part of the space to reject.

@daurnimator
Copy link
Collaborator

@vcunat thoughts on https://github.com/Kong/openresty-patches/pull/47/files to (mostly) fix things from the luajit side?

@vcunat
Copy link
Contributor Author

vcunat commented Jul 20, 2019

That sounds nice in principle. Still, any solution not accepted into luajit upstream might have a hard time getting into distributions, etc. I can't immediately see if this this couple lines fix everything; I don't know luajit internals.

@daurnimator
Copy link
Collaborator

@vcunat could you test out #225 for me?

@vielmetti
Copy link

Chiming in here and happy to help the best I can (at least from the point of view of having seen this general problem before, and having seen it solved a couple of times).

@daurnimator
Copy link
Collaborator

Closed via #225

@vcunat

This comment has been minimized.

@daurnimator
Copy link
Collaborator

@pspacek ^

@vcunat #241

@vcunat
Copy link
Contributor Author

vcunat commented Aug 3, 2020

Yes, let's continue on the newly open ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants