Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Path Math.min/max_F/D #20999

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

luke-li-2003
Copy link
Contributor

Re-enable the fast-pathing of Math.min/max for floating points with the behaviours around +/-0.0 and NaN correctly handled.

Depends on eclipse-omr/omr#7617

@luke-li-2003
Copy link
Contributor Author

Copy link
Contributor

@rmnattas rmnattas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luke-li-2003
Copy link
Contributor Author

@hzongaro can you review and merge this?

@@ -12173,7 +12173,38 @@ J9::Power::CodeGenerator::inlineDirectCall(TR::Node *node, TR::Register *&result
return true;
}
break;

case TR::java_lang_Math_max_F:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this section of code? on OMR side, generateMaxMin is expecting f/d max/min IlOpcode already (i.e. the call-node has been transformed, so you will not run to this section). please double-check it ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention that from my understanding of the Z code they do both. Calls either transformed to f/d max/min ILOpcode during RecognizedCallTransformer, or inlined in inlineDirectCall if it remained a call.
Only benefit in runs where RecognizedCallTransformer is not performed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it remained a call and evaluator chose to call generateMaxMin, it will crash, won't it?

   switch (node->getOpCodeValue())
      {
      case TR::imax:
      case TR::imin:
         cmp_op = TR::InstOpCode::cmp4;  break;
      case TR::iumax:
      case TR::iumin:
         cmp_op = TR::InstOpCode::cmpl4; break;
      case TR::lmax:
      case TR::lmin:
         cmp_op = TR::InstOpCode::cmp8;  break;
      case TR::lumax:
      case TR::lumin:
         cmp_op = TR::InstOpCode::cmpl8; break;
      case TR::fmax:
      case TR::fmin:
      case TR::dmax:
      case TR::dmin:
         cmp_op = TR::InstOpCode::fcmpu; break;
      default:       TR_ASSERT(false, "assertion failure");       break;     <=== assert here?
      }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing so far shows that all calls should be caught in the transformation, so removing the code should have no effect. But my testing code likely doesn't cover all cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luke-li-2003 please attach some jit logs compiling the targeted method both before and after your changes, thus allowing us to see it did kick in. also, it is a little surprise that performance benefit is not as big as expected.

Re-enable the fast-pathing of Math.min/max for floating
points with the behaviours around +/-0.0 and NaN correctly
handled.

Signed-off-by: Luke Li <[email protected]>
Copy link
Contributor

@zl-wang zl-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luke-li-2003
Copy link
Contributor Author

Here are the logs of the master branch build and the build with my changes.

trace.default.log
trace.mathMinMax.log

Relevant excerpts

Before:

 \\ Main.main([Ljava/lang/String;)V
 \\   34 JBinvokestatic 5 Main.max(DD)D
 \\       2 JBinvokestatic 12 java/lang/Math.max(DD)D

    0x718b5ff6aa34 000000b8 [    0x718b24585390] 4bfffce1          2 	bl 	0000718B5FF6A714		; Direct Call "java/lang/Math.max(DD)D"
 PRE: [D_GPR_0128 : gr2] [FPR_0065 : fp0] [FPR_0068 : fp1] [D_GPR_0130 : gr11] [D_GPR_0131 : gr12] [D_GPR_0132 : gr0] [D_GPR_0133 : gr3] [D_GPR_0134 : gr4] [D_GPR_0135 : gr5] [D_GPR_0136 : gr6] [D_GPR_0137 : gr7] [D_GPR_0138 : gr8] [D_GPR_0139 : gr9] [D_GPR_0140 : gr10] [D_FPR_0141 : fp2] [D_FPR_0142 : fp3] [D_FPR_0143 : fp4] [D_FPR_0144 : fp5] [D_FPR_0145 : fp6] [D_FPR_0146 : fp7] [D_FPR_0147 : fp8] [D_FPR_0148 : fp9] [D_FPR_0149 : fp10] [D_FPR_0150 : fp11] [D_FPR_0151 : fp12] [D_FPR_0152 : fp13] [D_FPR_0153 : fp14] [D_FPR_0154 : fp15] [D_FPR_0155 : fp16] [D_FPR_0156 : fp17] [D_FPR_0157 : fp18] [D_FPR_0158 : fp19] [D_FPR_0159 : fp20] [D_FPR_0160 : fp21] [D_FPR_0161 : fp22] [D_FPR_0162 : fp23] [D_FPR_0163 : fp24] [D_FPR_0164 : fp25] [D_FPR_0165 : fp26] [D_FPR_0166 : fp27] [D_FPR_0167 : fp28] [D_FPR_0168 : fp29] [D_FPR_0169 : fp30] [D_FPR_0170 : fp31] [D_CCR_0171 : cr0] 
POST: [D_GPR_0128 : gr2] [FPR_0129 : fp0] [FPR_0068 : fp1] [D_GPR_0130 : gr11] [D_GPR_0131 : gr12] [D_GPR_0132 : gr0] [D_GPR_0133 : gr3] [D_GPR_0134 : gr4] [D_GPR_0135 : gr5] [D_GPR_0136 : gr6] [D_GPR_0137 : gr7] [D_GPR_0138 : gr8] [D_GPR_0139 : gr9] [D_GPR_0140 : gr10] [D_FPR_0141 : fp2] [D_FPR_0142 : fp3] [D_FPR_0143 : fp4] [D_FPR_0144 : fp5] [D_FPR_0145 : fp6] [D_FPR_0146 : fp7] [D_FPR_0147 : fp8] [D_FPR_0148 : fp9] [D_FPR_0149 : fp10] [D_FPR_0150 : fp11] [D_FPR_0151 : fp12] [D_FPR_0152 : fp13] [D_FPR_0153 : fp14] [D_FPR_0154 : fp15] [D_FPR_0155 : fp16] [D_FPR_0156 : fp17] [D_FPR_0157 : fp18] [D_FPR_0158 : fp19] [D_FPR_0159 : fp20] [D_FPR_0160 : fp21] [D_FPR_0161 : fp22] [D_FPR_0162 : fp23] [D_FPR_0163 : fp24] [D_FPR_0164 : fp25] [D_FPR_0165 : fp26] [D_FPR_0166 : fp27] [D_FPR_0167 : fp28] [D_FPR_0168 : fp29] [D_FPR_0169 : fp30] [D_FPR_0170 : fp31] [D_CCR_0171 : cr0] 
    0x718b5ff6aa38 000000bc [    0x718b2460e9b0] c82e0050          2 	lfd 	fp1, [gr14, 80]		; spilled for dcall # #SPILL8		# SymRef  <#SPILL8_466     0x718b2460e850>[#466  Auto +80] [flags 0x80000000 0x0 ]

After:

 \\ Main.main([Ljava/lang/String;)V
 \\   34 JBinvokestatic 5 Main.max(DD)D
 \\       2 JBinvokestatic 12 java/lang/Math.max(DD)D

    0x788a54707fa8 000000ac [    0x788a4c7bcb00] fc001000          2 	fcmpu 	cr0, fp0, fp2
    0x788a54707fac 000000b0 [    0x788a4c7bcba0] 4183000c          2 	bun 	cr0, Label L0083
    0x788a54707fb0 000000b4 [    0x788a4c7bcc40] f0001500          2 	xsmaxdp 	vsr0, vsr0, vsr2
    0x788a54707fb4 000000b8 [    0x788a4c7bcce0] 48000008          2 	b 	Label L0082	
    0x788a54707fb8 000000bc [    0x788a4c7bcd70]                   2 	Label L0083:	
    0x788a54707fb8 000000bc [    0x788a4c7bce00] fc00102a          2 	fadd 	fp0, fp0, fp2
    0x788a54707fbc 000000c0 [    0x788a4c7bcf40]                   2 	Label L0082:	; (End of internal control flow)	
 PRE: 
POST: [CCR_0080 : cr0] [FPR_0065 : fp0] [FPR_0065 : fp0] [FPR_0066 : fp2] 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants