You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Connecting to an AWS Redshift database via an odbc connection. I'm using corrr version 0.4.0. Using the correlate() function when variables are doubles seem to work. Adding an integer when connected to Redshift throws an error. Included Reprex below to demonstrate a workaround with mutate_all(as.numeric) and to illustrate that issue doesn't happen with a local data frame.
rs_table in this example has ~7.3 million observations.
# set up packages
library(odbc)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.3#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union
library(dbplyr)
#> Warning: package 'dbplyr' was built under R version 3.5.3#> #> Attaching package: 'dbplyr'#> The following objects are masked from 'package:dplyr':#> #> ident, sql
library(corrr)
#> Warning: package 'corrr' was built under R version 3.5.3# connect to dbredshift<- dbConnect(odbc::odbc(), "my_dsn", bigint="integer")
# create pointer to db tablers_table<- tbl(redshift, in_schema("my_schema", "my_table"))
# preview tablers_table#> # Source: table<my_schema.my_table> [?? x 5]#> # Database: Redshift 8.0.2[...]#> x1 x2 x3 x4 x5#> <dbl> <dbl> <dbl> <int> <int>#> 1 84.8 103. 18.0 0 0#> 2 215. 249. 33.9 0 0#> 3 196. 263. 67.0 0 0#> 4 9.79 15.7 5.95 0 0#> 5 80.0 97.9 18.0 1 0#> 6 210. 240. 30.0 0 0#> 7 260. 311. 51.2 0 0#> 8 100.0 129. 29.0 0 0#> 9 158. 195. 37.0 0 91#> 10 66.0 82.9 17.0 0 0#> # ... with more rows# correlate() works when vars are all dblrs_table %>%
select(x1:x3) %>%
correlate(quiet=TRUE)
#> # A tibble: 3 x 4#> rowname x1 x2 x3#> <chr> <dbl> <dbl> <dbl>#> 1 x1 NA 0.995 0.867#> 2 x2 0.995 NA 0.894#> 3 x3 0.867 0.894 NA# correlate() breaks when int vars are includedrs_table %>%
select(x1:x4) %>%
correlate(quiet=TRUE)
#> Error: <SQL> 'SELECT SUM("x1") AS "x1_sum", SUM("x2") AS "x2_sum", SUM("x3") AS "x3_sum", SUM("x4") AS "x4_sum", SUM("x1" * "x1") AS "x1_two", SUM("x2" * "x2") AS "x2_two", SUM("x3" * "x3") AS "x3_two", SUM("x4" * "x4") AS "x4_two", SUM("x1" * "x2") AS "x1_x2", SUM("x1" * "x3") AS "x1_x3", SUM("x1" * "x4") AS "x1_x4", SUM("x2" * "x3") AS "x2_x3", SUM("x2" * "x4") AS "x2_x4", SUM("x3" * "x4") AS "x3_x4", COUNT(*) AS "obs"#> FROM (SELECT "x1", "x2", "x3", "x4"#> FROM my_schema.my_table) "dbplyr_002"'#> nanodbc/nanodbc.cpp:1587: XX000: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR: Small integer data overflow (multiplication)#> DETAIL: #> -----------------------------------------------#> error: Small integer data overflow (multiplication)#> code: 1058#> context: #> query: ...#> location: ...#> process: ...#> -----------------------------------------------#> #> # mutate_all(as.numeric) as workaroundrs_table %>%
select(x1:x5) %>%
mutate_all(as.numeric) %>%
correlate(quiet=TRUE)
#> # A tibble: 5 x 6#> rowname x1 x2 x3 x4 x5#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 x1 NA 0.995 0.868 0.0707 0.00203#> 2 x2 0.995 NA 0.895 0.0607 0.00269#> 3 x3 0.868 0.895 NA 0.144 0.00559#> 4 x4 0.0707 0.0607 0.144 NA -0.0160 #> 5 x5 0.00203 0.00269 0.00559 -0.0160 NA# problem doesn't occur in local data framers_table %>%
select(x1:x5) %>%
head(1000) %>%
collect() %>%
correlate(quiet=TRUE)
#> # A tibble: 5 x 6#> rowname x1 x2 x3 x4 x5#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 x1 NA 0.995 0.868 0.0889 -0.0615#> 2 x2 0.995 NA 0.907 0.0939 -0.0575#> 3 x3 0.868 0.907 NA 0.109 -0.0340#> 4 x4 0.0889 0.0939 0.109 NA -0.0326#> 5 x5 -0.0615 -0.0575 -0.0340 -0.0326 NA
Connecting to an AWS Redshift database via an odbc connection. I'm using
corrr
version 0.4.0. Using thecorrelate()
function when variables are doubles seem to work. Adding an integer when connected to Redshift throws an error. Included Reprex below to demonstrate a workaround withmutate_all(as.numeric)
and to illustrate that issue doesn't happen with a local data frame.rs_table
in this example has ~7.3 million observations.Created on 2019-08-27 by the reprex package (v0.3.0)
Session info
The text was updated successfully, but these errors were encountered: