`Spark SQL` translate_sql for "add_years" is double-quoting column name '`foo`' #1510

bschulth · 2024-06-04T17:42:20Z

When translating sql on a Spark SQL connection, the 'add_year' translation is double quoting the column name:

# SQL:  add_years(dob, 1)
<SQL> ADD_MONTHS('`dob`', 1.0 * 12.0)

This fails on Spark. I would expect single quotes like:

# SQL:  add_years(dob, 1)
<SQL> ADD_MONTHS(`dob`, 1.0 * 12.0)

This seems to be coming from the triple-bang expression here !!!x:

dbplyr/R/backend-spark-sql.R

Line 47 in 8f2fcb0

sql_expr(add_months(!!!x, !!n*12))

       add_years = function(x, n, ...) {
         check_dots_empty()
         sql_expr(add_months(!!!x, !!n*12))
       }

There even seems to be a bad unit test for this here:

https://github.com/tidyverse/dbplyr/blob/8f2fcb0852994afe11cc6792ef59ef5ffda84936/tests/testthat/test-backend-spark-sql.R#L3C35-L3C50

expect_equal(test_translate_sql(add_years(x, 1)), sql("ADD_MONTHS('`x`', 1.0 * 12.0)"))

If I change the triple-bang !!!x to a double-bang !!x:

       add_years = function(x, n, ...) {
         check_dots_empty()
         # sql_expr(add_months(!!!x, !!n*12))
         sql_expr(add_months(!!x, !!n*12))
       }

it seems to produce the correct output.

Here is the reprex:

dbplyr:::local_con(dbplyr:::simulate_spark_sql())
dbplyr:::test_translate_sql(add_years(dob, 1))
#> <SQL> ADD_MONTHS('`dob`', 1.0 * 12.0)
reprex::reprex()
#> ℹ Non-interactive session, setting `html_preview = FALSE`.
#> CLIPR_ALLOW has not been set, so clipr will not run interactively
#> Error in switch(where, expr = stringify_expression(x_expr), clipboard = ingest_clipboard(), : EXPR must be a length 1 vector

The text was updated successfully, but these errors were encountered:

ablack3 · 2024-06-05T13:47:13Z

Here is a reprex on spark showing correct SQL works and current SQL does not.

con <- DBI::dbConnect(
  odbc::databricks(),
  httpPath = Sys.getenv("DATABRICKS_HTTPPATH"),
  useNativeQuery = FALSE,
  bigint = "numeric"
)


DBI::dbGetQuery(con,
                "select observation_period_start_date, ADD_MONTHS('`observation_period_start_date`', 1.0 * 12.0) as day 
                from gibleed.observation_period limit 5")
#>   observation_period_start_date  day
#> 1                    1963-12-31 <NA>
#> 2                    2009-04-26 <NA>
#> 3                    2002-01-30 <NA>
#> 4                    1971-10-14 <NA>
#> 5                    2009-05-30 <NA>


DBI::dbGetQuery(con,
                "select observation_period_start_date, ADD_MONTHS(`observation_period_start_date`, 1.0 * 12.0) as day 
                from gibleed.observation_period limit 5")
#>   observation_period_start_date        day
#> 1                    1963-12-31 1964-12-31
#> 2                    2009-04-26 2010-04-26
#> 3                    2002-01-30 2003-01-30
#> 4                    1971-10-14 1972-10-14
#> 5                    2009-05-30 2010-05-30


DBI::dbDisconnect(con)

^{Created on 2024-06-05 with reprex v2.1.0}

simonpcouch · 2024-08-27T17:44:08Z

On current main:

library(odbc)
library(clock)
library(dplyr)
con <- dbConnect(databricks(), HTTPPath = "...")

df <- data.frame(x = as.POSIXct("2000-01-01"))
copy_to(con, df)

q <- tbl(con, "df") %>% mutate(y = add_years(x, 1))

q %>% show_query()
#> <SQL>
#> SELECT `df`.*, ADD_MONTHS('`x`', 1.0 * 12.0) AS `y`
#> FROM `df`

# note y is NA
q %>% collect()
#> # A tibble: 1 × 2
#>   x                   y     
#>   <dttm>              <date>
#> 1 2000-01-01 06:00:00 NA

^{Created on 2024-08-27 with reprex v2.1.1}

ablack3 mentioned this issue Jun 5, 2024

Fix add_years translation on spark #1511

Merged

ablack3 added a commit to ablack3/dbplyr that referenced this issue Oct 28, 2024

add news entry for issue tidyverse#1510

1617c0a

simonpcouch linked a pull request Oct 31, 2024 that will close this issue

Fix add_years translation on spark #1511

Merged

simonpcouch closed this as completed in #1511 Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Spark SQL` translate_sql for "add_years" is double-quoting column name '`foo`' #1510

`Spark SQL` translate_sql for "add_years" is double-quoting column name '`foo`' #1510

bschulth commented Jun 4, 2024

ablack3 commented Jun 5, 2024 •

edited

Loading

simonpcouch commented Aug 27, 2024

Spark SQL translate_sql for "add_years" is double-quoting column name 'foo' #1510

Spark SQL translate_sql for "add_years" is double-quoting column name 'foo' #1510

Comments

bschulth commented Jun 4, 2024

ablack3 commented Jun 5, 2024 • edited Loading

simonpcouch commented Aug 27, 2024

`Spark SQL` translate_sql for "add_years" is double-quoting column name '`foo`' #1510

`Spark SQL` translate_sql for "add_years" is double-quoting column name '`foo`' #1510

ablack3 commented Jun 5, 2024 •

edited

Loading