Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug?] Ordered: True results in alphabetically ordering #1769

Open
1 task
data-han opened this issue Jul 31, 2024 · 0 comments
Open
1 task

[Bug?] Ordered: True results in alphabetically ordering #1769

data-han opened this issue Jul 31, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@data-han
Copy link

data-han commented Jul 31, 2024

Describe the bug
Not quite sure if this is a bug or a mistake in my set-up.

I have specified ordered: True in my yaml file and wants to validate the ordering of columns against the pandas dataframe which I will pass this into the schema.validate() function. However, it keeps raising error due to COLUMN_NOT_ORDERED and this is because the first column in my dataframe is not matching what the first column in the schema is.

Diving deeper, when i parse the yaml file using yaml.safe_loads(<>) and later check the schema using DataFrameSchema, the column ordering in this schema object becomes an alphabetically one and not the order which i have defined in the yaml file as shown below:

  • Column order 1: GId
  • Column order 2: Full name
  • Column order 3: Access group
schema:
  schema_type: dataframe
  version: 0.18.0
  columns:
    **GId**:
      title: null
      description: null
      # dtype: object
      nullable: true
      checks: null
      unique: false
      coerce: false
      required: true
      regex: false
    **Full Name**:
      title: null
      description: null
      # dtype: object
      nullable: true
      checks: null
      unique: false
      coerce: false
      required: true
      regex: false
   **Access group**:
      title: null
      description: null
      # dtype: object
      nullable: true
      checks: null
      unique: false
      coerce: false
      required: true
      regex: false
ordered: true
...

With this set-up, i wanted to check if my dataframe's columns are in the order of this. However, when i run pa.DataFrameSchema().from_yaml(schema_content), it outputs the column in an alphabetically order which then determines that my dataframe's column ordering is wrong.

This is what the Pandera columns look like:

Schema DataFrameSchema(
    columns={
        'Access group': <Schema Column(name=Access group, type=None)>
        'GId': <Schema Column(name=Company ID, type=None)>
        'Full Name': <Schema Column(name=Create Date/Time, type=None)>
  • [ Y] I have checked that this issue has not already been reported.
  • [ Y] I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample, a copy-pastable example

# Your code here
schema = pa.DataFrameSchema().from_yaml(schema_content)
schema.validate(df_manual_file, lazy=True)

Expected behavior

I should be able to specify the order of columns i want in the list of yaml and pandera shouldn't be ordering the columns alphabetically?

Schema DataFrameSchema(
    columns={
        'GId': <Schema Column(name=Company ID, type=None)>
        'Full Name': <Schema Column(name=Create Date/Time, type=None)>
         'Access group': <Schema Column(name=Access group, type=None)>
   
        

Desktop (please complete the following information):

  • OS: [e.g. iOS ] : MacOS
  • Browser: [e.g. chrome, safari] : not relevant
  • Version: [e.g. 22]: latest

Screenshots

If applicable, add screenshots to help explain your problem.
image
when i print(schema) this gives me the first column as "Access group" and not GId

Additional context

Add any other context about the problem here.

@data-han data-han added the bug Something isn't working label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant