-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running SpecJBB test #5
Comments
I'm using the inbuilt IBM Java |
Hey Adi, |
I got around that error but running into another one: Reading property file: /home/rack/op-benchmark-recipes/standard-benchmarks/Java/SPEC-jbb2015/iso/18-06-30_012343/./config/specjbb2015.props
|
This is a Dual 22 core Power9 system ( 3.3 GHz - 3.8 GHz) |
Hey Adi, from the above it is not clear why the run was terminated. It will be great if you can zip up the run directory (should look something like "18-......" with the date and time of the run) and attach it here, as it would give more insight into what is failing. |
Please find attached log |
Hi Adi, the zip file is empty for some reason. |
Hi Adi, Please try with the latest IBMJDK build from: https://developer.ibm.com/javasdk/downloads/sdk8/ ... "ibm-java-ppc64le-sdk-8.0-5.17" or "ibm-java-ppc64le-jre-8.0-5.17.bin" Please send a new zip of the run folder or have a look at the log files in the run folder to check for glaring errors (18*/*log) |
specjbb_failure_new_java.zip |
System config: root@ubuntu:/home/ubuntu/SPEC-jbb2015# lscpu root@ubuntu:/home/ubuntu/SPEC-jbb2015# lshw -short
|
Hi Adi, I have reviewed the run folder. The main issue is stemming from the run script: run_multi.sh.ibmjdk_829_20C_2S_2grp_63GB.sh In the run script, please replace the Java options for the controller, TXI and BE with the below. Also, replace the Java execution command for the controller, TXI and BE. JAVA_OPTS_C="-XX:-RuntimeInstrumentation -Xms1g -Xmx1g -Xmn800m -Xcompressedrefs -XX:-EnableHCR" echo "Start Controller JVM" echo " Start $TI_NAME" echo " Start $BE_NAME" |
It seems as though these arguments work. (Test is still in progress) . First time I've gotten IBM java to run on Barreleye G2 Power9 systems. |
Here are the numbers I got from my run . As you can see, they are not the most stellar numbers , you have observed. I think it could be because I ran this test with SMT=0 . With SMT=4 I was running out of memory. Do you can how I can work around this ? Thanks in advance for the suggestions. |
Hi Adi, great to hear it is working now. Odd to see that changing SMT causes OOM. I see you are running two groups, and I'm guessing each backend JVM is using about 63GB... so the benchmark should be using 140GB total system memory. You have 256GB RAM from above. Can you please check that you are not over allocating hugepages in the tune script ? |
Here is my tune script ulimit -n 1048576 swapoff -a echo 120000 > /proc/sys/vm/nr_hugepages ppc64_cpu --dscr=1 Network tuningsysctl -w net.ipv4.udp_rmem_min=1024 Scheduler tuningecho 1000 > /proc/sys/kernel/sched_migration_cost_ns CPU governorcpupower frequency-set -g performance |
Hi Adi, try reducing the number of memory pages (currently u are allocating 245GB of hugepages): Also, check that your hugepages are divided equally on your two sockets: |
Hey @Tom-Tran I realized that I was allocated almost all my memory and reduced it to half. Thanks for confirming that's fine. Started the run now. Hopefully numbers are better with SMT-4 Seems equally divided. This tuning script is what I got from @basuv . I need to look into new patch / tuning scripts uploaded by @johnjmar . If things still look shitty, hopefully we can arrange a screen share next week to debug. A Thoroughly appreciate your help in tuning this here. root@ubuntu:/home/ubuntu/SPEC-jbb2015# cat /sys/devices/system/node/node*/hugepages/hugepages-*/nr_hugepages |
Sorry to bother you on the weekend. Failure again. (OOM) [ 1156.011862] bash (6599): drop_caches: 3
|
Hi Adi, can you try Sorry 30K each socket only gives 58GB. |
ibm_java_run2_smt4_hpg_90k.zip @Tom-Tran much better . From here , I'd like to tune it, better to get a bit more close to 24 core published numbers. Appreciate your help till here!!! |
Hi Adi, great to hear the issues are resolved. What is your performance goal? The largest performance gain you can probably get is from increasing the number of groups per socket. I recommend 2 JVM groups per socket with 11c binded to each JVM group. Since you are limited to 128GB RAM per socket, I'd recommend you bump down the backend JVM heap in the JAVA_OPT_BE by 5 to 15 GB, ie, let's try "-Xms50g -Xmx50g -Xmn48g". Also, you'll have to increase the number of hugepages again. echo 115000 > /proc/sys/vm/nr_hugepages |
@Tom-Tran My goal is to beat Dual EPYC 7601 by a 10% or so margin. 7601 clocks around 120K max-jops as well. |
@Tom-Tran Here is what I went with procs[1]="0-43" What do you think ? While I'm not sure if this is going to give better results, and I don't want to fiddle with system while the test is running. I got the "out of band" power measurement and chips seems to be running 10% hotter Wattage wise than previous test. |
Those bindings look good. I'm curious if you'll get any minor performance gains from: Are there any vmstat/mpstat output to complement the wattage output from the two runs? Is the hotter run achieving higher max-jops, and therefore, maximizing the CPUs which can cause increase wattage? |
ibm_java_run3_4_groups (1).zip Hi Tom, here is the output with the previous run. It is showing 8% improvement :) Is there anything else that comes to your mind, to try ? |
That's great. Here are some other items that may get a couple more percent.
|
Hi @Tom-Tran |
|
Interesting that 2) hampered results, Was the difference between Xmx and Xmn 2? I'm guessing this is run-to-run variation. |
Hi @Tom-Tran Till now I was running the tests with 2400 MHz Host build setting and moved to 2666MHz and seem to be running into errors : Does this makes sense to you ? I used the same parameters and settings from what we discussed before. dmesg |
SpeccJBB ran error while we set Barreleye_G2 memory freq to 2666. |
Hi Adi, OOM sounds more of an OS problem than Java. Do you still have the same amount of available memory on the OS with the 2666 GHz memory freq? Is Terry doing the runs now ? |
Both terry and I did the runs and ran into issues . Yes still 16x 16 GB DIMMs , so same amount of memory. |
I see. @basuv please comment on any side affects of changing the memory frequency from 2400 to 2666. From the outputs, one of the JVMs on node 8 got killed by the kernel because of OOM. I don't think it is due to the lack of allocated hugepages. Probably ran out of normal 64K page memory. Can we try reducing the hugepage allocaiton per node from 54500 to 53000 or 52500? |
Is this issue resolved? Wondering if I should close it. |
Hi @johnjmar
Can you let me know if you've run into this error with iBM Java (that you've in-built into your benchmarking suite) ?
root@ubuntu:/home/ubuntu/op-benchmark-recipes/standard-benchmarks/Java/SPEC-jbb2015/18-06-14_174600# cat controller.log
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: joptsimple.HelpFormatter
at java.lang.J9VMInternals.prepareClassImpl(Native Method)
at java.lang.J9VMInternals.prepare(J9VMInternals.java:291)
at java.lang.Class.getMethod(Class.java:1216)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:556)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:538)
Caused by: java.lang.ClassNotFoundException: joptsimple.HelpFormatter
at java.net.URLClassLoader.findClass(URLClassLoader.java:607)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:850)
at java.lang.ClassLoader.loadClass(ClassLoader.java:829)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:325)
at java.lang.ClassLoader.loadClass(ClassLoader.java:809)
... 5 more
The text was updated successfully, but these errors were encountered: