-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interchange Protocol: unused nan_as_null
keyword?
#125
Comments
I would vote to remove this. This is redundant to |
Seems okay to remove indeed. The original motivation was that Arrow uses bit rather than byte masks, and those aren't supported by NumPy. So the nan-to-null mapping has to be done somewhere, and it's easier to do on the producer than the consumer side. However, Joris says Pandas already has the code to handle this on the consumer side. So, since no one currently uses |
nan_as_null
keyword?nan_as_null
keyword?
gh-228, which deprecates the |
No one was using it yet and it seemed easier to clean it up from libraries than to ask everyone to add support for it - see data-apis/dataframe-api#125.
No one was using it yet and it seemed easier to clean it up from libraries than to ask everyone to add support for it - see data-apis/dataframe-api#125.
The DataFrame Interchange Protocol has a
nan_as_null
keyword in__dataframe__
that can be specified by the consumer, i.e. the person/library calling this method. The docstring explains its goal:dataframe-api/protocol/dataframe_protocol.py
Lines 400 to 403 in d10a096
However, at the moment I think none of the existing implementation we are aware of actually supports this keyword: pandas, vaex, modin, cudf all simply ignore it (and cudf actually uses a wrong default), and pyarrow as well but will raise an error if you pass a non-default value (i.e. True).
The following disclaimer in the docstring seems to have been copied around as some version of this appears in all the listed projects:
I am not fully sure what this note is meant for (I suppose it originates from the pandas implementation where for nullable extension types this keyword could be implemented? Since the default numpy dtypes already use NaN as null, the keyword wouldn't have any effect for that. But for other libraries that explanation doesn't necessarily hold).
And also, if nobody implemented it, do we actually need it?
The text was updated successfully, but these errors were encountered: