Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polars.exceptions.ColumnNotFoundError when coerce=True and Optional field is missing #1804

Open
2 of 3 tasks
antonioalegria opened this issue Sep 7, 2024 · 1 comment
Open
2 of 3 tasks
Labels
bug Something isn't working

Comments

@antonioalegria
Copy link

Describe the bug
A clear and concise description of what the bug is.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from pandera.polars import Field # type: ignore
from pandera.polars import DataFrameModel # type: ignore

from typing import Optional

import polars as pl


class MyModel(DataFrameModel):
    a: Optional[str] = Field(description="some description", nullable=True)
    b: Optional[str] = Field(description="some description") # BOOM
    c: Optional[str] = Field(description="some description", str_contains=".", nullable=True)
    d: Optional[str] = Field(description="some description", str_contains=".") # BOOM

df = pl.DataFrame({})
schema = MyModel.to_schema()
schema.strict = True
schema.coerce = True # -> without this it works
print(schema.validate(df)) # BOOM

Exception:
(.venv) antonioalegria@shiro dojo % /Users/antonioalegria/Developer/dojo/.venv/bin/python /Users/antonioalegria/Developer/dojo/dojo/test.py
Traceback (most recent call last):
File ".../test.py", line 22, in
print(schema.validate(df))
^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.12/site-packages/pandera/api/polars/container.py", line 58, in validate
output = self.get_backend(check_obj).validate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.12/site-packages/pandera/backends/polars/container.py", line 63, in validate
check_obj = parser(check_obj, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.12/site-packages/pandera/backends/polars/container.py", line 396, in coerce_dtype
check_obj = self._coerce_dtype_helper(check_obj, schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.12/site-packages/pandera/backends/polars/container.py", line 455, in _coerce_dtype_helper
obj = getattr(col_schema.dtype, coerce_fn)(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../.venv/lib/python3.12/site-packages/pandera/engines/polars_engine.py", line 181, in try_coerce
lf.collect()
File ".../.venv/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 2034, in collect
return wrap_df(ldf.collect(callback))
^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ColumnNotFoundError: a

Expected behavior

The dataframe should've been validated.

Desktop (please complete the following information):

OS: macOS 14.6.1
Python 3.12.4
polars-lts-cpu 1.6.0
pandera 0.20.3

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

@antonioalegria antonioalegria added the bug Something isn't working label Sep 7, 2024
@IsaiasGutierrezCruz
Copy link

I have the same problem. If the schema does not coerce the types, the Optional type works as expected (the DataFrame does not require that column). Otherwise, the schema requires that column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants