Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_ndjson with null column drops the column #8572

Closed
2 tasks done
CHDev93 opened this issue Apr 28, 2023 · 1 comment
Closed
2 tasks done

read_ndjson with null column drops the column #8572

CHDev93 opened this issue Apr 28, 2023 · 1 comment
Labels
bug Something isn't working python Related to Python Polars

Comments

@CHDev93
Copy link

CHDev93 commented Apr 28, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

When reading a file with read_ndjson that has a column that is null, this column gets dropped from the result dataframe rather than being included with a column with a single null entry

Reproducible example

from pathlib import Path

import pandas as pd
import polars as pl

payload_w_null = """{"x":1,"text":null}"""
payload_wo_null = """{"x":1,"text":"a"}"""

file_w_null = "jsonwithnull.jsonl"
file_wo_null = "jsonwithoutnull.jsonl"
Path(file_w_null).write_text(payload_w_null)
Path(file_wo_null).write_text(payload_wo_null)
print(
    pl.read_ndjson(file_w_null),
    pl.read_ndjson(file_wo_null),
    pd.read_json(file_w_null, lines=True),
    sep="\n",
)

# shape: (1, 1)
# ┌─────┐
# │ x   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# └─────┘
# shape: (1, 2)
# ┌─────┬──────┐
# │ x   ┆ text │
# │ --- ┆ ---  │
# │ i64 ┆ str  │
# ╞═════╪══════╡
# │ 1   ┆ a    │
# └─────┴──────┘
#    x  text
# 0  1   NaN

Expected behavior

I would expect the output to match that of the below

pl.DataFrame({"x": [1], "text": [None]})

shape: (1, 2)
┌─────┬──────┐
│ xtext │
│ ------  │
│ i64f32  │
╞═════╪══════╡
│ 1null │
└─────┴──────┘

Installed versions

---Version info---
Polars: 0.17.9
Index type: UInt32
Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.29
Python: 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0]
---Optional dependencies---
numpy: 1.23.5
pandas: 1.5.3
pyarrow: 11.0.0
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2022.7.1
matplotlib: 3.7.1
xlsx2csv: <not installed>
xlsxwriter: 3.1.0```

</details>
@CHDev93 CHDev93 added bug Something isn't working python Related to Python Polars labels Apr 28, 2023
@universalmind303
Copy link
Collaborator

duplicate of #7858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants