-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renamed columns are not being handled correctly in join_by()
translation
#1524
Comments
To add on: #> ! no such column: tbl_b.rhs_val The `VALUE` AS `rhs_val` As in show_query(tbl_b2)
#> <SQL>
#> SELECT `ID` AS `id`, `VALUE` AS `rhs_val`
#> FROM `tbl_b` Trying with column-based joins the rename is in the final SELECT, but not applied to the table being joined to, so you still get the no such column error, as well as the mismatch between left and right. col_query <- dplyr::tbl(con, "tbl_a") |>
janitor::clean_names() |>
dplyr::left_join( # left instead of semi join
tbl_b2,
join_by(
"id",
between(y$rhs_val, x$left, x$right)
)
)
col_query
#> Error in `collect()`:
#> ! Failed to collect lazy table.
#> Caused by error:
#> ! no such column: tbl_b.rhs_val
#> Run `rlang::last_trace()` to see where the error occurred.
show_query(col_query)
#> <SQL>
#> SELECT
#> `tbl_a`.`ID` AS `id`,
#> `LEFT` AS `left`,
#> `RIGHT` AS `right`,
#> `VALUE` AS `rhs_val`
#> FROM `tbl_a`
#> LEFT JOIN `tbl_b`
#> ON (
#> `tbl_a`.`ID` = `tbl_b`.`ID` AND
#> `tbl_a`.`LEFT` <= `tbl_b`.`VALUE` AND
#> `tbl_a`.`RIGHT` >= `tbl_b`.`rhs_val`
#> ) |
I've just discovered https://dbplyr.tidyverse.org/articles/reprex.html and I'm going to try to supply a tighter, smaller, lower-level reprex of the issue if I can home in on it. |
library(dbplyr)
translate_sql(between(x, 2L, 4L), con = simulate_sqlite())
#> <SQL> `x` BETWEEN 2 AND 4
translate_sql(between(x, 2L, 4L), con = simulate_postgres())
#> <SQL> `x` BETWEEN 2 AND 4
translate_sql(between(x, 2L, 4L), con = simulate_mssql())
#> Error in if (context$clause == "WHERE") {: argument is of length zero Created on 2024-07-17 with reprex v2.1.1 |
Another reprex - sorry, I'm sorting of using this issue as a place to work through this. it doesn't make a difference if you use library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
v1 <- list(id = LETTERS[1:5], x = 1:5, y = 5:1)
v2 <- list(id = LETTERS[1:5], w = 2:6)
lf1 <- lazy_frame(!!!v1, con = simulate_mssql())
lf2 <- lazy_frame(!!!v2, con = simulate_mssql()) |>
rename(z = "w")
lf1 |>
semi_join(lf2, join_by("id", between(y$z, x$x, x$y)))
#> <SQL>
#> SELECT `df_LHS`.*
#> FROM `df` AS `df_LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM `df` AS `df_RHS`
#> WHERE
#> (`df_LHS`.`id` = `df_RHS`.`id`) AND
#> (`df_LHS`.`x` <= `df_RHS`.`w`) AND
#> (`df_LHS`.`y` >= `df_RHS`.`z`)
#> )
lf1 <- lazy_frame(!!!v1, con = simulate_sqlite())
lf2 <- lazy_frame(!!!v2, con = simulate_sqlite()) |>
rename(z = "w")
lf1 |>
semi_join(lf2, join_by("id", between(y$z, x$x, x$y)))
#> <SQL>
#> SELECT `df_LHS`.*
#> FROM `df` AS `df_LHS`
#> WHERE EXISTS (
#> SELECT 1 FROM `df` AS `df_RHS`
#> WHERE
#> (`df_LHS`.`id` = `df_RHS`.`id`) AND
#> (`df_LHS`.`x` <= `df_RHS`.`w`) AND
#> (`df_LHS`.`y` >= `df_RHS`.`z`)
#> ) Created on 2024-07-17 with reprex v2.1.1 |
There is a test that seems to relate to this here - with a comment to say that the query generated is wrong, although all the tests there pass. If I amend the example situation in that test so that the last condition with library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
# from https://github.com/tidyverse/dbplyr/blob/860bd6adf988a2b039485030d234eba05ee46a3c/tests/testthat/test-verb-joins.R#L656-L681
lf1 <- lazy_frame(x = 1, a = 2, .name = "lf1")
lf2 <- lazy_frame(x = 1, b = 3, .name = "lf2")
lf3 <- lazy_frame(x = 1, c = 4, .name = "lf3")
lf4 <- lazy_frame(x = 1, a = 5, .name = "lf4") |>
rename(a4 = "a")
lf1 |>
inner_join(lf2, by = "x") |>
inner_join(lf3, by = "x") |>
inner_join(lf4, by = join_by("x", a < a4))
#> <SQL>
#> SELECT `lf1`.*, `b`, `c`, `lf4`.`a` AS `a4`
#> FROM `lf1`
#> INNER JOIN `lf2`
#> ON (`lf1`.`x` = `lf2`.`x`)
#> INNER JOIN `lf3`
#> ON (`lf1`.`x` = `lf3`.`x`)
#> INNER JOIN `lf4`
#> ON (`lf1`.`x` = `lf4`.`x` AND `lf1`.`a` < `lf4`.`a`) Created on 2024-07-18 with reprex v2.1.1 Suggesting that the rename is being kept in the top-level SELECT `lf1`.*, `b`, `c`, `a4`
FROM `lf1`
INNER JOIN `lf2`
ON (`lf1`.`x` = `lf2`.`x`)
INNER JOIN `lf3`
ON (`lf1`.`x` = `lf3`.`x`)
INNER JOIN (SELECT `x`, `a` AS `a4` FROM `lf4`) AS `lf4`
ON (`lf1`.`x` = `lf4`.`x` AND `lf1`.`a` < `lf4`.`a4`) |
between()
join_by()
translation
Here's a really small reprex showing the issue with library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
lf1 <- lazy_frame(x = seq(2), con = simulate_sqlite(), .name = "lf1")
lf2 <- lazy_frame(x = 1, con = simulate_sqlite(), .name = "lf2") |>
rename(y = "x")
lf1 |>
left_join(lf2, by = c(x = "y"), keep = TRUE)
#> <SQL>
#> SELECT `lf1`.`x` AS `x`, `lf2`.`x` AS `y`
#> FROM `lf1`
#> LEFT JOIN `lf2`
#> ON (`lf1`.`x` = `lf2`.`x`) Created on 2024-07-18 with reprex v2.1.1 The behaviour in my first post above with the left and right sides of |
I discovered a strange error in my work, it seems like a bug?
It probably looks like a pretty contrived situation in the reprex below, but that is what I had to do to reproduce what I found.
For some reason, the column name
y$rhs_val
is treated differently when being compared to theright
argument ofbetween()
, to when compared to theleft
argument.In the former case it retains its renamed name, in the latter case it reverts to its original name (
VALUE
).I would expect the "new" column name to be used identically on both halves of the between step.
Created on 2024-07-16 with reprex v2.1.1
Needless to say, if you
collect()
both tables before attempting the semi_join, it works fine.So it seems a
dbplyr
issue specifically, as theshow_query()
code seems to confirm.I have dbplyr v2.5.0 (CRAN)
(PS I was expecting the SQL translation here to use
WHERE ... rhs_val BETWEEN LEFT AND RIGHT
instead of the two-step comparison using<=
and>=
- but maybeBETWEEN ... AND ...
doesn't work across all SQL implementations?)The text was updated successfully, but these errors were encountered: