Interchange Protocol: unused `nan_as_null` keyword? #125

jorisvandenbossche · 2023-03-30T12:14:38Z

The DataFrame Interchange Protocol has a nan_as_null keyword in __dataframe__ that can be specified by the consumer, i.e. the person/library calling this method. The docstring explains its goal:

dataframe-api/protocol/dataframe_protocol.py

Lines 400 to 403 in d10a096

    
                   ``nan_as_null`` is a keyword intended for the consumer to tell the 
        
                   producer to overwrite null values in the data with ``NaN``. 
        
                   It is intended for cases where the consumer does not support the bit 
        
                   mask or byte mask that is the producer's native representation.

However, at the moment I think none of the existing implementation we are aware of actually supports this keyword: pandas, vaex, modin, cudf all simply ignore it (and cudf actually uses a wrong default), and pyarrow as well but will raise an error if you pass a non-default value (i.e. True).
The following disclaimer in the docstring seems to have been copied around as some version of this appears in all the listed projects:

        `nan_as_null` currently has no effect; once support for nullable extension
        dtypes is added, this value should be propagated to columns.

I am not fully sure what this note is meant for (I suppose it originates from the pandas implementation where for nullable extension types this keyword could be implemented? Since the default numpy dtypes already use NaN as null, the keyword wouldn't have any effect for that. But for other libraries that explanation doesn't necessarily hold).
And also, if nobody implemented it, do we actually need it?

The text was updated successfully, but these errors were encountered:

kkraus14 · 2023-03-31T20:45:17Z

I would vote to remove this. This is redundant to describe_null on the column level with much more rigid capabilities.

rgommers · 2023-03-31T22:05:22Z

Seems okay to remove indeed. The original motivation was that Arrow uses bit rather than byte masks, and those aren't supported by NumPy. So the nan-to-null mapping has to be done somewhere, and it's easier to do on the producer than the consumer side. However, Joris says Pandas already has the code to handle this on the consumer side. So, since no one currently uses nan_as_null, it seems safe to remove it.

rgommers · 2023-08-29T12:06:05Z

gh-228, which deprecates the nan_as_null keyword, is now merged. As discussed in gh-226, we should ensure it's removed from all the consumers before removing it from the signature of __dataframe__. So let's leave this issue open, and use it as the reference to point to for PRs where the keyword is removed from consumers which call the __dataframe__ method.

No one was using it yet and it seemed easier to clean it up from libraries than to ask everyone to add support for it - see data-apis/dataframe-api#125.

rgommers added the interchange-protocol label Mar 31, 2023

rgommers mentioned this issue Aug 7, 2023

feat(python): Native implementation of dataframe interchange protocol pola-rs/polars#10267

Merged

MarcoGorelli changed the title ~~Interchange Protocol: usused nan_as_null keyword?~~ Interchange Protocol: unused nan_as_null keyword? Aug 7, 2023

stinodego mentioned this issue Aug 7, 2023

Remove nan_as_null parameter for DataFrame protocol #226

Closed

rgommers mentioned this issue Aug 29, 2023

DEPR: remove use of nan_as_null from callers of __dataframe__ pandas-dev/pandas#54846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interchange Protocol: unused `nan_as_null` keyword? #125

Interchange Protocol: unused `nan_as_null` keyword? #125

jorisvandenbossche commented Mar 30, 2023

kkraus14 commented Mar 31, 2023

rgommers commented Mar 31, 2023

rgommers commented Aug 29, 2023

Interchange Protocol: unused nan_as_null keyword? #125

Interchange Protocol: unused nan_as_null keyword? #125

Comments

jorisvandenbossche commented Mar 30, 2023

kkraus14 commented Mar 31, 2023

rgommers commented Mar 31, 2023

rgommers commented Aug 29, 2023

Interchange Protocol: unused `nan_as_null` keyword? #125

Interchange Protocol: unused `nan_as_null` keyword? #125