Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'drop_invalid_rows' always False with from_json() and to_json() #1339

Open
2 of 3 tasks
Nico-VC opened this issue Sep 15, 2023 · 1 comment
Open
2 of 3 tasks

'drop_invalid_rows' always False with from_json() and to_json() #1339

Nico-VC opened this issue Sep 15, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@Nico-VC
Copy link

Nico-VC commented Sep 15, 2023

Describe the bug
'drop_invalid_rows: false' argument at a DataFrameSchema level gets set to False using .from_json().
Creating json from .py with 'drop_invalid_rows=True' does not work either.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Using .to_json() with this inferred schema ignores the True 'drop_invalid_rows' argument

from pandera import DataFrameSchema, Column, Check, Index, MultiIndex
import numpy as np
import pandas as pd

schema = DataFrameSchema(
    columns={
        "Model": Column(
            dtype=np.int32,
            checks=None,
            nullable=False,
            unique=False,
            coerce=True,
            required=True,
            regex=True,
            description=None,
            title=None,
        ),
        "ID": Column(
            dtype=np.int32,
            checks=None,
            nullable=False,
            unique=False,
            coerce=False,
            required=True,
            regex=False,
            description=None,
            title=None,
        ),
    },
    checks=None,
    index=Index(
        dtype="int64",
        checks=[],
        nullable=False,
        coerce=False,
        name=None,
        description=None,
        title=None,
    ),
    dtype=None,
    coerce=True,
    strict=True,
    name=None,
    ordered=False,
    unique=None,
    report_duplicates="all",
    unique_column_names=False,
    add_missing_columns=False,
    title=None,
    description=None,
    drop_invalid_rows=True
)

schema.to_json()

This same behavior is observed if set a Column level.
I end up having to manually set this value to True in the schema class.

@Nico-VC Nico-VC added the bug Something isn't working label Sep 15, 2023
@hugolytics
Copy link

hugolytics commented Jul 16, 2024

i am also running into this issue!
So I made a PR to fix:
#1743

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants