Skip to content
This repository has been archived by the owner on Feb 2, 2024. It is now read-only.

HPAT unable to handle non-existent dataframe index name. #132

Open
triskadecaepyon opened this issue Sep 3, 2019 · 3 comments
Open

HPAT unable to handle non-existent dataframe index name. #132

triskadecaepyon opened this issue Sep 3, 2019 · 3 comments

Comments

@triskadecaepyon
Copy link

HPAT crashes if it gets a Pandas dataframe with non-existent dataframe index. Many users create ones without such indexes (as is shown in Pandas docs), so it would be useful to fix this bug.

@akharche perhaps the changes to indexing made this behavior happen?

import pandas as pd
import numpy as np
import hpat

df = pd.DataFrame({'0':[100,200,300,400,200,100]})
df2 = pd.DataFrame([100,200,300,400,200,100])
df3 = pd.DataFrame({'A':[100,200,300,400,200,100]})

@hpat.jit
def test_func(data_frame):
    return data_frame

test_func(df) # works with warnings
test_func(df2) # fails and crashes kernel
test_func(df3) # works with warnings

@akharche
Copy link
Contributor

akharche commented Sep 5, 2019

Thank you for the example. I guess this issue is not connected to non-existent dataframe index (I mean the additional column "Index").
Now Hpat is limited in handling DataFrames created such way df2 = pd.DataFrame([100,200,300,400,200,100]), Hpat expects column name like df2 = pd.DataFrame([100,200,300,400,200,100], columns=['A']) or a dictionary like in your examples.
It is mentioned in Hpat docs.

We will add ability to handle such cases.

@triskadecaepyon
Copy link
Author

So an additional thought then since we are adding this ability:

Is there a way we can elegantly handle unsupported features in the future? As in throw a warning, don't continue, leave as Python Object code?

@fschlimb
Copy link
Contributor

fschlimb commented Sep 6, 2019

It is of course possible to just not compile anything and leave the entire function uncompiled.
It's less straight forward to partially fall back to object mode. It is conceptually possible for certain functions, e.g. those which are trivially data parallel. There is no default fallback for others, like reductions, joins etc. The situation is different if we separated distribution from compilation: going to object-mode in (a hypothetical) non-distributed mode should always be possible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants