Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Selectors should raise on + between themselves #20825

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jan 21, 2025

Closes #20821.

Selectors as sets

Selectors act as sets and support the standard set ops:
https://docs.pola.rs/api/python/stable/reference/selectors.html#set-operations

Operation Expression
UNION A | B
INTERSECTION A & B
DIFFERENCE A - B
SYMMETRIC DIFFERENCE A ^ B
COMPLEMENT ~A

Standard set behaviour

However... sets do not support the + operator between themselves.

{1,2,3} + {2,3,4}
# TypeError: unsupported operand type(s) for +: 'set' and 'set'

We are inadvertently passing-through + between selectors to Expr, leading to very peculiar (and unintended) results or errors in cases where the caller meant to use | or & instead. This PR fixes that (while retaining support for broadcasting).

Example

import polars as pl
import polars.selectors as cs

data = {
    "col1": [1, 2, 3],
    "col2": [4.1, 5.2, 6.3],
    "col3": ["x", "y", "z"],
}
df = pl.DataFrame(data)

Use of + between selectors now raises...

df.select(cs.numeric() + cs.string())
# TypeError: unsupported operand type(s) for op: ('Selector' + 'Selector')

...but continues to broadcast:

df.select(cs.numeric() + 100)
# shape: (3, 2)
# ┌──────┬───────┐
# │ col1 ┆ col2  │
# │ ---  ┆ ---   │
# │ i64  ┆ f64   │
# ╞══════╪═══════╡
# │ 101  ┆ 104.1 │
# │ 102  ┆ 105.2 │
# │ 103  ┆ 106.3 │
# └──────┴───────┘

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one!

@alexander-beedie alexander-beedie merged commit e567c79 into pola-rs:main Jan 21, 2025
20 checks passed
@alexander-beedie alexander-beedie deleted the fix-selector-add-set-op branch January 21, 2025 14:22
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.69%. Comparing base (099ee3c) to head (bf228b0).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20825      +/-   ##
==========================================
+ Coverage   79.63%   79.69%   +0.06%     
==========================================
  Files        1568     1568              
  Lines      222970   222981      +11     
  Branches     2544     2545       +1     
==========================================
+ Hits       177555   177702     +147     
+ Misses      44831    44695     -136     
  Partials      584      584              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add + as an operator for set operations in selectors
2 participants