You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello pandera community, I am trying out pandera to validate a normal polars.LazyFrame as described in the first example in the docs.
Now if I understood the docs correctly, by design, calling the validate method on the LazyFrame would only check the schema. I have the following questions:
What is the extra benefit here for the user to declare a pandera.DataFrameSchema when they can just use the == operator to compare the schema with a pre-defined polars.Schema object?
Now in case we want to do in-depth data validation on the LazyFrame we should call the collect method on it but then if in a situation we have, let's say, 50 columns but in the pandera.DataFrameSchema we have 3 columns then does it make sense to pull the rest 50 columns in-memory?
Would it make more sense to do control this behaviour inside the validate method, this way pandera could add a projection on columns selecting only the ones that have been defined in the pandera.DataFrameSchema and then maybe execute the validation checks/logics and then finally call the collect internally instead of asking the user to call collect before doing the validations.
The text was updated successfully, but these errors were encountered:
Question about pandera
Hello pandera community, I am trying out
pandera
to validate a normalpolars.LazyFrame
as described in the first example in the docs.Now if I understood the docs correctly, by design, calling the
validate
method on theLazyFrame
would only check the schema. I have the following questions:pandera.DataFrameSchema
when they can just use the==
operator to compare the schema with a pre-definedpolars.Schema
object?LazyFrame
we should call thecollect
method on it but then if in a situation we have, let's say, 50 columns but in thepandera.DataFrameSchema
we have 3 columns then does it make sense to pull the rest 50 columns in-memory?Would it make more sense to do control this behaviour inside the
validate
method, this waypandera
could add a projection on columns selecting only the ones that have been defined in thepandera.DataFrameSchema
and then maybe execute the validation checks/logics and then finally call thecollect
internally instead of asking the user to callcollect
before doing the validations.The text was updated successfully, but these errors were encountered: