Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Implement arraytranslateTRTO255 #7499

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

knn-k
Copy link
Contributor

@knn-k knn-k commented Oct 24, 2024

This commit implements arraytranslateTRTO255 for AArch64.

@knn-k
Copy link
Contributor Author

knn-k commented Oct 24, 2024

I will add support for other arraytranslateTRTO and arraytranslateTROT operations in separate PRs later.

@knn-k
Copy link
Contributor Author

knn-k commented Oct 28, 2024

Fixed a variable name.

TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0));
TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1));
TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4));
TR::Register *outputLenReg = cg->allocateRegister();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Children 2,3,5 still need to be evaluated even if they are "dummy".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arraytranslate evaluators of p and x platforms leave those dummy children unevaluated.

  • p:
    TR::Register *inputReg = cg->gprClobberEvaluate(node->getChild(0));
    TR::Register *outputReg = cg->gprClobberEvaluate(node->getChild(1));
    TR::Register *stopCharReg = arraytranslateTRTO255 ? NULL : cg->gprClobberEvaluate(node->getChild(3));
    TR::Register *inputLenReg = cg->gprClobberEvaluate(node->getChild(4));
  • x:
    bool stopUsingCopyReg1 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(0), srcPtrReg, cg);
    bool stopUsingCopyReg2 = TR::TreeEvaluator::stopUsingCopyRegAddr(node->getChild(1), dstPtrReg, cg);
    bool stopUsingCopyReg4 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(3), termCharReg, cg);
    bool stopUsingCopyReg5 = TR::TreeEvaluator::stopUsingCopyRegInteger(node->getChild(4), lengthReg, cg);

Reference counts of all the child nodes are decremented.


TR_ASSERT_FATAL(!node->isSourceByteArrayTranslate(), "Source is byte[] for arraytranslate");
TR_ASSERT_FATAL(node->isTargetByteArrayTranslate(), "Target is char[] for arraytranslate");
TR_ASSERT_FATAL(node->getChild(3)->getOpCodeValue() == TR::iconst && node->getChild(3)->getInt() == 0x0ff00ff00, "Non-ISO8859 stop character for arraytranslate");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is child(3) used for? It does not appear to be used in this evaluator, so is this assert necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

child(3) is used for the arraytranslateTRTO for ASCII translation. I already have the code for that, and I am thinking of opening separate PRs for that after arraytranslateTRTO255 (for ISO 8859-1 translation) gets merged.

Do you want to also review arraytranslateTRTO together with arraytranslateTRTO255 in this PR?

if (!disableTRTO255)
{
cg->setSupportsArrayTranslateTRTO255();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think support for this opcode should be enabled in OMR when there isn't an implementation for it in OMR (the helper implementation is missing). In the absence of that, this logic should be enabled in the downstream projects that do provide the helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both p and x platforms do the same: Implement the helper functions in the downstream project, and enable the feature in OMR.
On the other hand, z inlines the vectorized code instead of calling helper functions.

I can move the implementation of the AArch64 helper functions to OMR. In that case, maybe we want to move the helper functions for p and x to OMR for consistency.

[p]

[x]

  • https://github.com/eclipse-openj9/openj9/blob/master/runtime/compiler/x/runtime/X86ArrayTranslate.nasm
  • static bool disableX86TRTO = feGetEnv("TR_disableX86TRTO") != NULL;
    if (!disableX86TRTO)
    {
    TR_ASSERT_FATAL(comp->compileRelocatableCode() || comp->isOutOfProcessCompilation() || comp->compilePortableCode() || comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1) == self()->getX86ProcessorInfo().supportsSSE4_1(), "supportsSSE4_1() failed\n");
    if (comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1))
    {
    self()->setSupportsArrayTranslateTRTO();
    }
    }
    static bool disableX86TROT = feGetEnv("TR_disableX86TROT") != NULL;
    if (!disableX86TROT)
    {
    TR_ASSERT_FATAL(comp->compileRelocatableCode() || comp->isOutOfProcessCompilation() || comp->compilePortableCode() || comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1) == self()->getX86ProcessorInfo().supportsSSE4_1(), "supportsSSE4_1() failed\n");
    TR_ASSERT_FATAL(comp->compileRelocatableCode() || comp->isOutOfProcessCompilation() || comp->compilePortableCode() || comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE2) == self()->getX86ProcessorInfo().supportsSSE2(), "supportsSSE4_1() failed\n");
    if (comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE4_1))
    {
    self()->setSupportsArrayTranslateTROT();
    }
    if (comp->target().cpu.supportsFeature(OMR_FEATURE_X86_SSE2))
    {
    self()->setSupportsArrayTranslateTROTNoBreak();
    }
    }
    }

TR_RuntimeHelper helper = TR_ARM64arrayTranslateTRTO255;
TR::SymbolReference *helperSym = cg->symRefTab()->findOrCreateRuntimeHelper(helper);
uintptr_t addr = reinterpret_cast<uintptr_t>(helperSym->getMethodAddress());
generateImmSymInstruction(cg, TR::InstOpCode::bl, node, addr, deps, helperSym, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you've provided here is essentially a stub that calls a helper that must be implemented by any downstream project that wants to support arraytranslate opcodes. You should either have a comment at the beginning of this evaluator explaining this, or you should consider moving the entire evaluator into the downstream project itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the helper implementation to OMR as discussed yesterday.
At the same time I renamed ARM64arrayCopy.spp to ARM64ArrayCopy.spp for the consistency of file names.

// (2) translation table (dummy)
// (3) stop character (terminal character, either 0xff00ff00 (ISO8859) or 0xff80ff80 (ASCII)
// (4) input length (in elements)
// (5) stopping char (dummy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the dummy nodes used by other architectures? Just wondering why they're dummy on AArch64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is different between p/x/aarch64 and z. See the code below:

if (cg()->getSupportsArrayTranslateTRTO255()|| cg()->getSupportsArrayTranslateTRTO())
{
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0);
if (isISO88591Encoder)
termchar = 0xff00ff00;
else
termchar = 0xff80ff80;
}
else //z
{
bool genSIMD = comp()->cg()->getSupportsVectorRegisters() && !comp()->getOption(TR_DisableSIMDArrayTranslate);
stopIndex = isISO88591Encoder ? 255: 127;
termchar = isISO88591Encoder ? 0x0B: 0xff;
if (genSIMD)
{
tableNode = TR::Node::create(callNode, TR::aconst, 0, 0); //dummy table node, it's not gonna be used
}
else
{
uint8_t *table = (uint8_t*)comp()->trMemory()->allocateMemory(65536, stackAlloc);
int i;
for (i = 0; i <= stopIndex; i++)
table[i] = (uint8_t)i;
for (i = stopIndex+1; i < 65536; i++)
table[i] = (uint8_t)termchar;
tableNode = createTableLoad(comp(), callNode, 16, 8, table, false);
stopIndex=-1;
}
}

The code for z from line 2543 generates a non-dummy node for child(2) at line 2562, while the non-z path generates a dummy aconst 0 at line 2537.
AArch64 takes the non-z path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of the arraytranslate node for TRTO255:

 n819n    (  0)  treetop                                                                              [       0x11fa0afb0] bci=[0,13,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1
 n818n    (  1)    arraytranslate  <arraytranslate>[#278  helper Method] [flags 0x400 0x0 ] (in GPR_0115) (char2byteXlate byteArrayXlate tableBackedByRawStorage )  [       0x11fa0af60] bci=[0,13,180] rc=1 vc=2947 vn=- li=57 udi=63488 nc=6 flg=0xa020
 n795n    (  1)      aladd (in GPR_0112) (X>=0 internalPtr )                                          [       0x11fa0a830] bci=[0,12,180] rc=1 vc=2947 vn=- li=57 udi=62432 nc=2 flg=0x8100
 n460n    (  0)        ==>aRegLoad (in &GPR_0016)
 n797n    (  0)        lsub (in GPR_0017) (X>=0 cannotOverflow )                                      [       0x11fa0a8d0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=14464 nc=2 flg=0x1100
 n798n    (  0)          lshl (in GPR_0017) (X>=0 )                                                   [       0x11fa0a920] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=14464 nc=2 flg=0x100
 n799n    (  0)            i2l (highWordZero X>=0 )                                                   [       0x11fa0a970] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=1 flg=0x4100
 n12n     (  0)              ==>iRegLoad (in GPR_0017) (cannotOverflow )
 n803n    (  0)            iconst 1 (Unsigned X!=0 X>=0 )                                             [       0x11fa0aab0] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x4104
 n804n    (  0)          lconst -16 (X!=0 X<=0 )                                                      [       0x11fa0ab00] bci=[0,12,180] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
 n805n    (  1)      aladd (in GPR_0113) (X>=0 internalPtr )                                          [       0x11fa0ab50] bci=[0,34,185] rc=1 vc=2947 vn=- li=57 udi=62976 nc=2 flg=0x8100
 n7n      (  0)        ==>newarray (in &GPR_0038) (X!=0 )
 n807n    (  0)        lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )                             [       0x11fa0abf0] bci=[0,34,185] rc=0 vc=2947 vn=- li=57 udi=- nc=0 flg=0x5104
 n1286n   (  1)      iconst 0 (X==0 X>=0 X<=0 )                                                       [       0x11fda41b0] bci=[-1,0,161] rc=1 vc=2947 vn=- li=- udi=- nc=0 flg=0x302
 n816n    (  1)      iconst 0xff00ff00 (X!=0 X<=0 )                                                   [       0x11fa0aec0] bci=[0,10,180] rc=1 vc=2947 vn=- li=57 udi=- nc=0 flg=0x204
 n815n    (  1)      i2l (in GPR_0114) (highWordZero X>=0 )                                           [       0x11fa0ae70] bci=[0,5,179] rc=1 vc=2947 vn=- li=57 udi=63232 nc=1 flg=0x4100
 n5n      (  1)        ==>iRegLoad (in GPR_0018) (cannotOverflow )
 n1287n   (  1)      iconst -1 (X!=0 X<=0 )                                                           [       0x11fda4200] bci=[-1,0,161] rc=1 vc=2947 vn=- li=- udi=- nc=0 flg=0x204

This commit implements arraytranslateTRTO255 for AArch64.

Signed-off-by: KONNO Kazuhiro <[email protected]>
@knn-k
Copy link
Contributor Author

knn-k commented Oct 31, 2024

Jenkins build aarch64,amac

@knn-k
Copy link
Contributor Author

knn-k commented Oct 31, 2024

I responded to all the review comments so far.


deps->addPostCondition(outputLenReg, TR::RealRegister::x0);
deps->addPostCondition(outputReg, TR::RealRegister::x1);
deps->addPostCondition(inputLenReg, TR::RealRegister::x2);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Register x3 will be used by arraytranslateTRTO later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants