Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n_distinct() on sqlite with more columns returns an error #101

Closed
ghost opened this issue Jun 20, 2018 · 3 comments
Closed

n_distinct() on sqlite with more columns returns an error #101

ghost opened this issue Jun 20, 2018 · 3 comments
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Milestone

Comments

@ghost
Copy link

ghost commented Jun 20, 2018

@edoardomichielon commented on Jun 20, 2018, 11:10 AM UTC:


When I use a connection to a sqlite on disk, the verb n_distinct() returns an error if there are two or more columns. Same code, if I select just one column or collect data before counting record, it works properly.

# remove objects
rm(list =  ls())

# require packages
require(dplyr)
require(dbplyr)
require(RSQLite)

# create a sqlite db and connect to it
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# copy data into sqlite
copy_to(con, mtcars)

# point to table 
my_tbl <- tbl(con, "mtcars")

# n_distinct with one column (THIS WORKS)
my_tbl %>% group_by(gear) %>% summarise(n_distinct(mpg))

# if I use the local (or collected) data it is ok (THIS WORKS)
my_tbl %>% collect() %>% group_by(gear) %>% summarise(n_distinct(mpg, cyl))

# n_distinct with two columns (THIS DOES NOT WORKS)
my_tbl %>% group_by(gear) %>% summarise(n_distinct(mpg, cyl))

## Error in result_create(conn@ptr, statement) : 
## wrong number of arguments to function COUNT()

The code is correctly translated into Sql

# Show query
my_tbl %>% group_by(gear) %>% summarise(n_distinct(mpg, cyl)) %>% show_query()

## SELECT `gear`, COUNT(DISTINCT `mpg`, `cyl`) AS `n_distinct(mpg, cyl)`
## FROM `mtcars`
## GROUP BY `gear`

This issue was moved by batpigandme from tidyverse/dplyr/issues/3687.

@hadley
Copy link
Member

hadley commented Jan 2, 2019

Minimal reprex:

library(dplyr, warn.conflicts = FALSE)

mf <- dbplyr::memdb_frame(x = c(1, 1), y = c(2, 2), z = 1:2)
mf %>% group_by(x) %>% summarise(n_distinct(y, z))
#> Error in result_create(conn@ptr, statement): wrong number of arguments to function COUNT()

Created on 2019-01-02 by the reprex package (v0.2.1)

@hadley hadley added bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL labels Jan 2, 2019
@hadley hadley added this to the v1.4.0 milestone Jan 10, 2019
@hadley hadley closed this as completed in 1b58b68 Jan 10, 2019
@Jollywatt
Copy link

Why was this closed as completed? 1b58b68 simply makes dbplyr return a different error message. Multi-column n_distinct() is still not supported...

@hadley
Copy link
Member

hadley commented Aug 3, 2023

Because COUNT() only supports a single argument...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

No branches or pull requests

2 participants