Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

Return a DataFrame from HPAT Jited function #173

Open
bigwater opened this issue Sep 26, 2019 · 1 comment
Open

Return a DataFrame from HPAT Jited function #173

bigwater opened this issue Sep 26, 2019 · 1 comment

Comments

@bigwater
Copy link

bigwater commented Sep 26, 2019

Hi,

I am trying to use HPAT to accelerate ETL process. Although HPAT gave significant speedup on a multi-core CPU in terms of the data frame transformation, it has an issue that I could not figure out now.

It gives no speedup or raises an error when we return the data frame from the jitted function. The example with minimal code is listed as follows.

@hpat.jit
def test2():
    t0 = time.time()
    df = pandas.read_csv('random.csv', names=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype={'A' : 'float', 'B' : 'float', 'C' : 'float', 'D' : 'float', 'E' : 'float', 'F' : 'float', 'G' : 'float', 'H': 'float', 'I': 'str', 'J': 'str'})
    t_readcsv = time.time() - t0
    print('t_readcsv = ', t_readcsv)

    t0 = time.time()
    res = ( df.A.mean(), df.A.max(), df.A.min() , df.B.mean(), df.B.max(), df.B.min())
    t_calc = time.time() - t0
    print('t_calc = ', t_calc)

    return df

df = test2()

In the baseline case, time python test_hpat3.py uses 30.31s.

time mpiexec -n 2 python test_hpat3.py
real    0m33.568s

time mpiexec -n 8 python test_hpat3.py
real    0m32.557s

time mpiexec -n 16 python test_hpat3.py
real    0m37.037s

time mpiexec -n 32 python test_hpat3.py
real    0m48.858s

We found that using more processes on MPI for this example program only gives more slowdown.

The observation is different when I remove the return df from the JITted function, where we have more speedup with the increasing number of processes used.

Besides, if I use even more processes, an error is reported.

time mpiexec -n 44 python test_hpat3.py

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 68909 RUNNING AT CR3PPM-SER010
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

real    0m38.648s
user    22m1.044s
sys     4m20.828s

I am not sure if the slowdown/error is supposed to happen since I am quite new to HPAT.

Could you give me more explanation and suggestions about it? Let me know if other information is needed.

Since I would like to feed the data frame after the ETL process, how can I return the data frame out of the HPAT jitted function?

Thank you so much.

Best regards,
Hongyuan Liu

Software configuration:
hpat 0.30.0 py37hc547734_15 intel/label/test
numba 0.45.0 py37h962f231_0

@bigwater bigwater changed the title Return DataFrame from a HPAT Jiited function Return a DataFrame from HPAT Jiited function Sep 26, 2019
@fschlimb
Copy link
Contributor

@bigwater thanks or your report. We'll look into it.

@bigwater bigwater changed the title Return a DataFrame from HPAT Jiited function Return a DataFrame from HPAT Jited function Sep 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants