Lines duplicated when ordering on a n-n join table using find() method in the DAO #249

ThibBal · 2021-02-10T09:30:10Z

Data model :

admin_users table (extends users table)
zones table
admin_users_zones table to store admin_user_id,zone_id

Problem encountered: when listing admin_users (using find() method in UserDao.php) with their zones and ordering on them, my lines are duplicated. But the "count" method still returns the correct number.

Here is the SQL request generated by TDBM:

SELECT DISTINCT 
  `users`.`id` AS `users____id`, `users`.`email` AS `users____email`, 
 `zones`.`id` AS `zones____id`,  `zones`.`label` AS `zones____label`
FROM   `admin_users`   
  LEFT JOIN `users` ON (    `admin_users`.`id` =   `users`.`id`)   
  LEFT JOIN `admin_users_zones` ON (    `admin_users_zones`.`admin_user_id` =   `admin_users`.`id`)   
  LEFT JOIN `zones` ON (    `admin_users_zones`.`zone_id` =   `zones`.`id`)   
  LEFT JOIN `admin_users_poles` ON (    `admin_users`.`pole_id` =   `admin_users_poles`.`id`)   
  LEFT JOIN `admin_users_roles` ON (    `admin_users`.`role_id` =   `admin_users_roles`.`id`)   
  LEFT JOIN `user_status` ON (    `users`.`status_id` =   `user_status`.`id`)
WHERE   (  (  (    `users`.`lastname` LIKE   '%%')
  OR (    `users`.`firstname` LIKE   '%%')))
  AND (  (    `users`.`email` LIKE   '%[email protected]%'))
ORDER BY   `zones`.`label` ASC LIMIT 10

Data model creation:

create users (id, email)
create roles (id, label)
create users_roles (id_user, id_role)

insert into roles (1, 'A'), (2, 'b')
insert into users (1, 'admin')
insert into users_roles (1, 1), (1, 2)

Method called: ->find('', [], 'roles.label ASC')

The text was updated successfully, but these errors were encountered:

homersimpsons · 2021-02-11T15:01:24Z

Overview

Thank you for your report !

This problem is kind of tricky as we may not reflect the correct user intent.

Reproducibility

I will use the following schema for this comment:

Database Setup

Here is a MySQL script to set up the issue

CREATE TABLE `users` (
`id` INT NOT NULL,
`email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`));

CREATE TABLE `roles` (
`id` INT NOT NULL,
`label` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`));

CREATE TABLE `users_roles` (
`id_user` INT NOT NULL,
`id_role` INT NOT NULL,
PRIMARY KEY (`id_user`, `id_role`),
CONSTRAINT `fk_id_user__users` FOREIGN KEY (`id_user`) REFERENCES `users` (`id`),
CONSTRAINT `fk_id_role__roles` FOREIGN KEY (`id_role`) REFERENCES `roles` (`id`));

INSERT INTO `users` (`id`, `email`) VALUES ('1', 'admin');
INSERT INTO `roles` (`id`, `label`) VALUES ('1', 'a');
INSERT INTO `roles` (`id`, `label`) VALUES ('2', 'b');
INSERT INTO `users_roles` (`id_user`, `id_role`) VALUES ('1', '1');
INSERT INTO `users_roles` (`id_user`, `id_role`) VALUES ('1', '2');

Using the schema and running the generated SQL query leads to the aforementioned issue:

SELECT DISTINCT users.*, roles.label
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  ORDER BY roles.label ASC;

users.id	users.email	roles.label
1	admin	a
1	admin	b

A Solution

I guess the best we can try is to use a GROUP BY in conjunction with MIN and MAX aggregate that would depend on the expected order.

This would lead to generate something like

SELECT users.*, MIN(roles.label) as order_0 -- Here we use `MIN` as the order is `ASC`
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  GROUP BY users.id
  ORDER BY order_0 ASC;

SELECT users.*, MAX(roles.label) as order_0 -- Here we use `MAX` as the order is `DESC`
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  GROUP BY users.id
  ORDER BY order_0 DESC;

Notice that we can easily extend this to more order clauses, thus we could have something like:

SELECT users.*, MIN(roles.label1) as order_0, MAX(roles.label2) as order_1
  FROM users
  JOIN users_roles ON users_roles.id_user = users.id
  JOIN roles ON roles.id = users_roles.id_role
  GROUP BY users.id
  ORDER BY order_0 ASC, order_1 DESC;

We should care about the order_* naming as this can conflict with a retrieved column

Additional notes

Also, PostgreSQL seems to have a DISTINCT ON keyword to retrieve distinct results on a subset of fields. The doc states the following:

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first

So under the hood it might act the same as the proposed implementation.

moufmouf · 2021-02-12T10:21:17Z

Hey @homersimpsons , hey @ThibBal ,

Ok, @homersimpsons , your comment is spot on.
Now, I have a real question. We are trying to order a row by a column that can have many values (we don't even need a many to many relationship... a one to many relationship has the same problem). As far as I can tell, this does not make any sense. If we choose "MIN" or "MAX" arbitrarily, we might actually confuse the user that is maybe expecting the opposite result.

Wouldn't the right thing to do be to throw an exception instead?
We could detect we try to order an a column that is related to the main bean in a 1 to many or many to many way, and then throw an exception, no?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lines duplicated when ordering on a n-n join table using find() method in the DAO #249

Lines duplicated when ordering on a n-n join table using find() method in the DAO #249

ThibBal commented Feb 10, 2021 •

edited

Loading

homersimpsons commented Feb 11, 2021

moufmouf commented Feb 12, 2021

Lines duplicated when ordering on a n-n join table using find() method in the DAO #249

Lines duplicated when ordering on a n-n join table using find() method in the DAO #249

Comments

ThibBal commented Feb 10, 2021 • edited Loading

homersimpsons commented Feb 11, 2021

Overview

Reproducibility

A Solution

Additional notes

moufmouf commented Feb 12, 2021

ThibBal commented Feb 10, 2021 •

edited

Loading