Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify escaping strategy #1396

Merged
merged 55 commits into from
Feb 20, 2024
Merged

Simplify escaping strategy #1396

merged 55 commits into from
Feb 20, 2024

Conversation

hadley
Copy link
Member

@hadley hadley commented Nov 7, 2023

The big idea in this PR is to simplify the implementation by immediately escaping as soon as we get a table name. That allows user facing functions to accept a wide range of table specifications while internal functions only need to deal with a new table_name class. Additionally, only the table_name class is vectorised, so we can ensure that user supplies a single table name (or SQL, where appropriate), while still providing vectorised tools internally. Since we escape table ids early we also need a new tool to extract the name of just the table; this is sort of the opposite problem to table_ident which needed tools to stitch components together instead of pulling them apart.

I've also simplified the implementation a little by taking I() to mean "don't escape". My expectation is that I("foo.bar")/I("foo.bar.baz")) will become the preferred way to specify nested tables.

To do:

  • Figure out how to make this more backward compatible for backends
  • db_table_temporary.Microsoft SQL Server
  • Implement check_table_name() and review as_table_name() usage
  • Add con argument to unique_table_name() and unique_column_name()
  • Polish db_parse_table_name()

Fixes #1388. Fixes #1396. Fixes #1413. Fixes #1408. Closes #1385. Fixes #1416. Fixes #1404.

@hadley

This comment was marked as outdated.

@mgirlich
Copy link
Collaborator

@hadley I think it would be great to merge this and release an updated dbplyr soon 😄. What do you think?

@hadley
Copy link
Member Author

hadley commented Jan 31, 2024

@mgirlich yeah agreed. Unfortunately I need to finish rvest/chromote this week and then we have a company-wide work week next week. So unless I manage to get to this on Friday, it's likely be another week or so before I can finish this off.

Apart from responding to your comments, I need to think through the downstream implications of this fix; ideally it wouldn't require patches to every backend 😬.

@krlmlr
Copy link
Member

krlmlr commented Jan 31, 2024

Testing dm with this branch in cynkra/dm#2189 . I'd like the dbplyr release to work with all database backends in dm, I'll post more details tomorrow.

@krlmlr
Copy link
Member

krlmlr commented Jan 31, 2024

Had to add collapse() in dm: cynkra/dm@60fa74e (#2189)

This seems to be a regression in this PR, works for me in the main branch and in the CRAN version:

options(conflicts.policy = list(warn = FALSE))
library(dplyr)
library(dbplyr)

con <- simulate_mssql()

fkc <- lazy_frame(
  con = con,
  catalog = character(),
  constraint_object_id = character(),
  constraint_column_id = character(),
  referenced_object_id = character(),
  referenced_column_id = character(),
  .name = "fkc"
)

columns <- lazy_frame(
  con = con,
  catalog = character(),
  column_name = character(),
  object_id = character(),
  column_id = character(),
  .name = "columns"
)

tables <- lazy_frame(
  con = con,
  catalog = character(),
  schema_id = character(),
  table_name = character(),
  object_id = character(),
  .name = "tables"
)

schemas <- lazy_frame(
  con = con,
  catalog = character(),
  schema_id = character(),
  table_schema = character(),
  .name = "schemas"
)

objects <- lazy_frame(
  con = con,
  constraint_name = character(),
  object_id = character(),
  .name = "objects"
)

fkc |>
  left_join(columns, by = c("catalog", "referenced_object_id" = "object_id", "referenced_column_id" = "column_id")) |>
  left_join(tables, by = c("catalog", "referenced_object_id" = "object_id")) |>
  left_join(schemas, by = c("catalog", "schema_id"))
#> Error in `purrr::map()`:
#> ℹ In index: 3.
#> Caused by error in `sql_table_name_prefix()`:
#> ! `table` must be a single string, not a character vector.

# Can't use collapse() for simulated frames
# Full query:

# fkc |>
#   left_join(columns, by = c("catalog", "referenced_object_id" = "object_id", "referenced_column_id" = "column_id")) |>
#   left_join(tables, by = c("catalog", "referenced_object_id" = "object_id")) |>
#   left_join(schemas, by = c("catalog", "schema_id")) |>
#   left_join(objects, by = c("constraint_object_id" = "object_id")) |>
#   transmute(constraint_catalog = catalog, constraint_schema = table_schema, constraint_name, table_schema, table_name, column_name, ordinal_position = constraint_column_id)

Created on 2024-02-01 with reprex v2.1.0

@mgirlich
Copy link
Collaborator

mgirlich commented Feb 1, 2024

Simpliefied reprex

options(conflicts.policy = list(warn = FALSE))
library(dplyr)
library(dbplyr)

lf1 <- lazy_frame(x1 = 1)
lf2 <- lazy_frame(x = 1, y2 = 1)
lf3 <- lazy_frame(x = 1, y = 1)

lf1 %>% 
  left_join(lf2, by = c(x1 = "x")) %>% 
  left_join(lf3, by = c(x1 = "x", y2 = "y"))
#> Error in `purrr::map()`:
#> ℹ In index: 2.
#> Caused by error in `sql_table_name_prefix()`:
#> ! `table` must be a single string, not a character vector.

Created on 2024-02-01 with reprex v2.1.0

R/lazy-join-query.R Outdated Show resolved Hide resolved
R/query-join.R Outdated Show resolved Hide resolved
R/query-join.R Outdated Show resolved Hide resolved
R/query-join.R Outdated Show resolved Hide resolved
@hadley
Copy link
Member Author

hadley commented Feb 15, 2024

I think I've fixed all the small issues (including the problem that @krlmlr had), so the last thing I'm going to do in this PR is rename table_name() to table_path(). Then I plan to merge this PR, and start to look at the impact on the packages that supply backends for dbplyr.

@krlmlr
Copy link
Member

krlmlr commented Feb 15, 2024

Running another check in dm now.

@krlmlr
Copy link
Member

krlmlr commented Feb 15, 2024

The last check tonight failed: https://github.com/cynkra/dm/actions/runs/7909198372

@krlmlr
Copy link
Member

krlmlr commented Feb 15, 2024

I restarted the run, we'll see.

@krlmlr
Copy link
Member

krlmlr commented Feb 15, 2024

Looks green now. Thanks!

@hadley
Copy link
Member Author

hadley commented Feb 15, 2024

@mgirlich are you ok with me merging this now? I've filed a few other issues to make sure I follow up on the other issues before we release.

@mgirlich
Copy link
Collaborator

@hadley Looks good to me.

@hadley hadley merged commit ae1debb into main Feb 20, 2024
13 checks passed
@hadley hadley deleted the simplify-escaping branch February 20, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment