-
Notifications
You must be signed in to change notification settings - Fork 1k
[RFC] Intrinsic implementation #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
fsfod
wants to merge
23
commits into
LuaJIT:v2.1
Choose a base branch
from
fsfod:intrinsicpr
base: v2.1
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2ed3d2e
to
ef61fa2
Compare
563a59c
to
fccea5a
Compare
072540e
to
b5f446d
Compare
d17c803
to
0312429
Compare
99c5915
to
2102433
Compare
8a198e6
to
97f8c97
Compare
This was referenced Feb 15, 2016
5bc3850
to
f57039e
Compare
This was referenced Mar 9, 2017
…to 3 byte form if needed
…_tv by using a special cast flag(CCF_INTRINS_ARG) for intrinsic vector arguments
…abled DCE of intrinsics Intrinsics are now assumed to have no side effects unless flagged to with either memory side effects(S) or non memory side effects(s)
…trinsics that have no side effects and are not forced indirect ModRM which could be a load or store
…rectly allocated an input register
…us ways. Fix wrappers truncating GCobj pointers in GC64 mode when loading them from the stack to store output registers in to cdata. Fix the stack for intrinsics not being adjusted correctly in there interpreter wrapper when it uses the RID_DISPATCH register on GC64 because RSET_GPR does not contain it
…g RID_DISPATCH Make RID_DISPATCH an unallocatable register for intrinsics when building as GC64. Fix trying to evict RID_DISPATCH for LJ_GC64 builds on x64 for intrinsics and add some asserts that we never try to again. Don't set register hints for intrinsic input\output registers that are RID_DISPATCH. Restore RID_DISPATCH first when handling output registers and defer it till last for input registers of intrinsics in the JIT.
…ests causing random test failures
…ifferent builds of LuaJIT
…s to allow pointer based intrinsics to work in both 64 bit and 32 bit with the same definion.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an implementation of #39 and limited to x86/x64 for Windows and Linux ABIs for time being.
There are some working toy examples in test/test.lua and test/intrinsic_spec.lua of the current API. JIT support for support for Vector register will be left as NYI because it needs various change to the JITs systems. If you feeling brave you can try out a experimental branch with JIT vector support.
An intrinsic can either be single machine instruction that LuaJIT might have some specialized understanding of or an opaque blob of 1 or more machine instructions that may be user supplied. Intrinsics will behave like a callable function in the interpreter There argument order will be the same order that input registers were declared in the register list.
API
Declaring an vector opcode intrinsic with immediate control byte
Declaring an opcode with both a prefix and immediate byte, that takes an address and has memory side effects.
Running intrinsics in the interpreter
To allow calling intrinsics in the interpreter an internal wrapper function is generated using part of the existing JIT engine, in theory the full JIT engine could be used by generating IR instead of using the raw emit system but would probably require lots fixes where its assumed the code is being generated for a trace. The wrapper is called with two pointers the first is the input context structure that contains the values(or pointers for vectors) to the values of the input registers and the second is the Lua stack to write the results to . After the intrinsics code in the wrapper has run the wrapper writes output registers directly to the Lua stack if they are 32bit signed numbers otherwise it copies the output registers into the pre-created(before the wrapper is called) cdata that's on the Lua stack.
Intrinsics in the JIT
Three new IR instructions are added for intrinsics:
op2(literal) holds the fixed register id that the output value gets written to.
ASMRET for fixed registers have matching register hints set in register hint prepass.
Design notes
The mcode api/system was generalized to allow more than one mcode area since the existing JIT one is flushed when a full trace flush happens, while the generated wrappers need to stay around until state is closed. In theory the FFI callback stubs could also live in this mcode area as well instead of living in fixed size memory.
Currently arguments passed to an intrinsic in the interpreter are handled using a data drive approach in which they are converted and packed into a context like the FFI system uses to call C functions. If the input values were treated as strongly typed(direct ctype id match for cdata or built-in Lua type) the need to save and load input values into the context could be skipped by the wrapper directly loading the values off the Lua stack and moving them into registers.
Currently the only way to express memory side effects that a intrinsic does is XBAR when all might be needed is a fake store of particular size that the pointer aliasing system understands also see previous discussion of how s/l/mfence could work.
Tasks