-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64: Implement arraytranslateTRTO255 #7499
base: master
Are you sure you want to change the base?
Conversation
I will add support for other arraytranslateTRTO and arraytranslateTROT operations in separate PRs later. |
a0b9585
to
1ad8c13
Compare
Fixed a variable name. |
TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0)); | ||
TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1)); | ||
TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4)); | ||
TR::Register *outputLenReg = cg->allocateRegister(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Children 2,3,5 still need to be evaluated even if they are "dummy".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arraytranslate evaluators of p and x platforms leave those dummy children unevaluated.
- p:
omr/compiler/p/codegen/OMRTreeEvaluator.cpp
Lines 5087 to 5090 in be39f05
TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0)); TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1)); TR::Register *stopCharReg = arraytranslateTRTO255 ? NULL : cg->gprClobberEvaluate(node->getChild(3)); TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4)); - x:
omr/compiler/x/codegen/OMRTreeEvaluator.cpp
Lines 3462 to 3465 in be39f05
bool stopUsingCopyReg1 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(0), srcPtrReg, cg); bool stopUsingCopyReg2 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(1), dstPtrReg, cg); bool stopUsingCopyReg4 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(3), termCharReg, cg); bool stopUsingCopyReg5 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(4), lengthReg, cg);
Reference counts of all the child nodes are decremented.
|
||
TR_ASSERT_FATAL(!node->isSourceByteArrayTranslate(), "Source is byte[] for arraytranslate"); | ||
TR_ASSERT_FATAL(node->isTargetByteArrayTranslate(), "Target is char[] for arraytranslate"); | ||
TR_ASSERT_FATAL(node->getChild(3)->getOpCodeValue() == TR::iconst && node->getChild(3)->getInt() == 0x0ff00ff00, "Non-ISO8859 stop character for arraytranslate"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is child(3) used for? It does not appear to be used in this evaluator, so is this assert necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
child(3) is used for the arraytranslateTRTO
for ASCII translation. I already have the code for that, and I am thinking of opening separate PRs for that after arraytranslateTRTO255
(for ISO 8859-1 translation) gets merged.
Do you want to also review arraytranslateTRTO
together with arraytranslateTRTO255
in this PR?
if (!disableTRTO255) | ||
{ | ||
cg->setSupportsArrayTranslateTRTO255(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think support for this opcode should be enabled in OMR when there isn't an implementation for it in OMR (the helper implementation is missing). In the absence of that, this logic should be enabled in the downstream projects that do provide the helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both p and x platforms do the same: Implement the helper functions in the downstream project, and enable the feature in OMR.
On the other hand, z inlines the vectorized code instead of calling helper functions.
I can move the implementation of the AArch64 helper functions to OMR. In that case, maybe we want to move the helper functions for p and x to OMR for consistency.
[p]
- https://github.com/eclipse-openj9/openj9/blob/master/runtime/compiler/p/runtime/J9PPCArrayTranslate.spp
omr/compiler/p/codegen/OMRCodeGenerator.cpp
Lines 250 to 262 in 205c540
if (!disablePPCTRTO) cg->setSupportsArrayTranslateTRTO(); #ifndef __LITTLE_ENDIAN__ else if (!disablePPCTRTO255) setSupportsArrayTranslateTRTO255(); #endif if (!disablePPCTROT) cg->setSupportsArrayTranslateTROT(); if (!disablePPCTROTNoBreak) cg->setSupportsArrayTranslateTROTNoBreak(); }
[x]
- https://github.com/eclipse-openj9/openj9/blob/master/runtime/compiler/x/runtime/X86ArrayTranslate.nasm
omr/compiler/x/codegen/OMRCodeGenerator.cpp
Lines 424 to 447 in 205c540
static bool disableX86TRTO = feGetEnv("TR_disableX86TRTO") != NULL; if (!disableX86TRTO) { TR_ASSERT_FATAL(comp->compileRelocatableCode() || comp->isOutOfProcessCompilation() || comp->compilePortableCode() || comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1) == self()->getX86ProcessorInfo().supportsSSE4_1(), "supportsSSE4_1() failed\n"); if (comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1)) { self()->setSupportsArrayTranslateTRTO(); } } static bool disableX86TROT = feGetEnv("TR_disableX86TROT") != NULL; if (!disableX86TROT) { TR_ASSERT_FATAL(comp->compileRelocatableCode() || comp->isOutOfProcessCompilation() || comp->compilePortableCode() || comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1) == self()->getX86ProcessorInfo().supportsSSE4_1(), "supportsSSE4_1() failed\n"); TR_ASSERT_FATAL(comp->compileRelocatableCode() || comp->isOutOfProcessCompilation() || comp->compilePortableCode() || comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE2) == self()->getX86ProcessorInfo().supportsSSE2(), "supportsSSE4_1() failed\n"); if (comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1)) { self()->setSupportsArrayTranslateTROT(); } if (comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE2)) { self()->setSupportsArrayTranslateTROTNoBreak(); } } }
TR_RuntimeHelper helper = TR_ARM64arrayTranslateTRTO255; | ||
TR::SymbolReference *helperSym = cg->symRefTab()->findOrCreateRuntimeHelper(helper); | ||
uintptr_t addr = reinterpret_cast<uintptr_t>(helperSym->getMethodAddress()); | ||
generateImmSymInstruction(cg, TR::InstOpCode::bl, node, addr, deps, helperSym, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you've provided here is essentially a stub that calls a helper that must be implemented by any downstream project that wants to support arraytranslate opcodes. You should either have a comment at the beginning of this evaluator explaining this, or you should consider moving the entire evaluator into the downstream project itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the helper implementation to OMR as discussed yesterday.
At the same time I renamed ARM64arrayCopy.spp
to ARM64ArrayCopy.spp
for the consistency of file names.
// (2) translation table (dummy) | ||
// (3) stop character (terminal character, either 0xff00ff00 (ISO8859) or 0xff80ff80 (ASCII) | ||
// (4) input length (in elements) | ||
// (5) stopping char (dummy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the dummy nodes used by other architectures? Just wondering why they're dummy on AArch64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is different between p/x/aarch64 and z. See the code below:
omr/compiler/optimizer/ValuePropagationCommon.cpp
Lines 2535 to 2565 in 205c540
if (cg()->getSupportsArrayTranslateTRTO255()|| cg()->getSupportsArrayTranslateTRTO()) | |
{ | |
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0); | |
if (isISO88591Encoder) | |
termchar = 0xff00ff00; | |
else | |
termchar = 0xff80ff80; | |
} | |
else //z | |
{ | |
bool genSIMD = comp()->cg()->getSupportsVectorRegisters() && !comp()->getOption(TR_DisableSIMDArrayTranslate); | |
stopIndex = isISO88591Encoder ? 255: 127; | |
termchar = isISO88591Encoder ? 0x0B: 0xff; | |
if (genSIMD) | |
{ | |
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0); //dummy table node, it's not gonna be used | |
} | |
else | |
{ | |
uint8_t *table = (uint8_t*)comp()->trMemory()->allocateMemory(65536, stackAlloc); | |
int i; | |
for (i = 0; i <= stopIndex; i++) | |
table[i] = (uint8_t)i; | |
for (i = stopIndex+1; i < 65536; i++) | |
table[i] = (uint8_t)termchar; | |
tableNode = createTableLoad(comp(), callNode, 16, 8, table, false); | |
stopIndex=-1; | |
} | |
} |
The code for z from line 2543 generates a non-dummy node for child(2) at line 2562, while the non-z path generates a dummy aconst 0
at line 2537.
AArch64 takes the non-z path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of the arraytranslate
node for TRTO255
:
n819n ( 0) treetop [ 0x11fa0afb0] bci=[0,13,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1
n818n ( 1) arraytranslate <arraytranslate>[#278 helper Method] [flags 0x400 0x0 ] (in GPR_0115) (char2byteXlate byteArrayXlate tableBackedByRawStorage ) [ 0x11fa0af60] bci=[0,13,180] rc=1 vc=2947 vn=- li=57 udi=63488 nc=6 flg=0xa020
n795n ( 1) aladd (in GPR_0112) (X>=0 internalPtr ) [ 0x11fa0a830] bci=[0,12,180] rc=1 vc=2947 vn=- li=57 udi=62432 nc=2 flg=0x8100
n460n ( 0) ==>aRegLoad (in &GPR_0016)
n797n ( 0) lsub (in GPR_0017) (X>=0 cannotOverflow ) [ 0x11fa0a8d0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=14464 nc=2 flg=0x1100
n798n ( 0) lshl (in GPR_0017) (X>=0 ) [ 0x11fa0a920] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=14464 nc=2 flg=0x100
n799n ( 0) i2l (highWordZero X>=0 ) [ 0x11fa0a970] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1 flg=0x4100
n12n ( 0) ==>iRegLoad (in GPR_0017) (cannotOverflow )
n803n ( 0) iconst 1 (Unsigned X!=0 X>=0 ) [ 0x11fa0aab0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x4104
n804n ( 0) lconst -16 (X!=0 X<=0 ) [ 0x11fa0ab00] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
n805n ( 1) aladd (in GPR_0113) (X>=0 internalPtr ) [ 0x11fa0ab50] bci=[0,34,185] rc=1 vc=2947 vn=- li=57 udi=62976 nc=2 flg=0x8100
n7n ( 0) ==>newarray (in &GPR_0038) (X!=0 )
n807n ( 0) lconst 16 (highWordZero X!=0 X>=0 cannotOverflow ) [ 0x11fa0abf0] bci=[0,34,185] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x5104
n1286n ( 1) iconst 0 (X==0 X>=0 X<=0 ) [ 0x11fda41b0] bci=[-1,0,161] rc=1 vc=2947 vn=- li=- udi=- nc=0 flg=0x302
n816n ( 1) iconst 0xff00ff00 (X!=0 X<=0 ) [ 0x11fa0aec0] bci=[0,10,180] rc=1 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
n815n ( 1) i2l (in GPR_0114) (highWordZero X>=0 ) [ 0x11fa0ae70] bci=[0,5,179] rc=1 vc=2947 vn=- li=57 udi=63232 nc=1 flg=0x4100
n5n ( 1) ==>iRegLoad (in GPR_0018) (cannotOverflow )
n1287n ( 1) iconst -1 (X!=0 X<=0 ) [ 0x11fda4200] bci=[-1,0,161] rc=1 vc=2947 vn=- li=- udi=- nc=0 flg=0x204
This commit implements arraytranslateTRTO255 for AArch64. Signed-off-by: KONNO Kazuhiro <[email protected]>
1ad8c13
to
ea32e0b
Compare
Jenkins build aarch64,amac |
I responded to all the review comments so far. |
|
||
deps->addPostCondition(outputLenReg, TR::RealRegister::x0); | ||
deps->addPostCondition(outputReg, TR::RealRegister::x1); | ||
deps->addPostCondition(inputLenReg, TR::RealRegister::x2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register x3
will be used by arraytranslateTRTO
later.
This commit implements arraytranslateTRTO255 for AArch64.