You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ArcadeDB Server v24.11.1-SNAPSHOT (build 4207821d055e37cbd74b76683e45073da759c2de/1729083296142/console)
OS and JDK Version:
Running on Mac OS X 12.7.6 - OpenJDK 64-Bit Server VM 17.0.12 (Homebrew)
Querying a database can return the wrong number of records, in particular too many records. This problem is not easy to reproduce so I attach a small database (75 records) generated from public data. This behavior occurs when from certain databases, such as the one provided below, is queried via SQL, ie SELECT, where an (not-unique) indexed property is used in the projection and also as ordering quantity (in ORDER BY). The schema is given here: https://github.com/ulbmuenster/dataasee/blob/main/database/schema.sql
SELECT count(*) FROM metadata; -- There are 75 records in the database
SELECT name, publicationYear FROM metadata; -- Returns 75 records correctly
SELECT name FROM metadata ORDER BY publicationYear; -- Returns 75 records correctly
SELECT name, publicationYear FROM metadata ORDER BY publicationYear; -- Returns 117 records INCORRECTLY
Notes
End of August (2024-08-28 -- 2024-08-31) @lvca (and I) looked into this and potential sources of this behavior could be:
FetchFromIndexStep
The UPSERT entering the data.
Here is a sample UPSERT statement that generated the provided database:
UPDATE metadata MERGE {"creators":[{"@type":"pair","name":"Neumann, Georg"},{"@type":"pair","name":"Boivin, Odette"},{"@type":"pair","name":"Kleber, Kristin"},{"@type":"pair","name":"Neumann, Georg"},{"@type":"pair","name":"Boivin, Odette"},{"@type":"pair","name":"Kleber, Kristin"}],"dataLocation":"https://data-management.uni-muenster.de/direct-access/wwurdm/07988641231","dataSteward":"https://datastore.uni-muenster.de","description":"The dataset comprises the RTI-data of the cuneiform tablet VAT 17966 (N5:27).\\n\\nRTI = Reflectance Transformation Imaging (RTI) is a computational photographic method that captures a subject’s surface shape and color and enables the interactive re-lighting of the subject from any direction in a software viewer, revealing details not visible with the naked eye (cf. https://culturalheritageimaging.org/).","identifiers":[{"@type":"pair","data":"10.17879/07988641225","name":"DE-6"},{"@type":"pair","data":"10.17879/07988641225","name":"null"}],"keywords":"Babylon,Cuneiform,Neo-Babylonian","language":"#4:0","metadataQuality":"Incomplete","name":"VAT 17966 - RTI dataset","publicationYear":2024,"publisher":"Universität Münster","rawChecksum":"TcrJlnTQYfugD2rROs2Y3g==","rawType":"marc21","recordId":"MTA0MzQ4NDE4NDMwODAyNDg3NA","resourceType":null,"rights":"CC BY-NC-SA 4.0","source":"https://datastore.uni-muenster.de/oai","synonyms":[{"@type":"pair","data":"GoviB","name":"Alternative Title"}]} UPSERT WHERE recordId == 'MTA0MzQ4NDE4NDMwODAyNDg3NA';
As a workaround the projection can be renamed, ie:
SELECT name, publicationYear AS pubYear FROM metadata ORDER BY publicationYear;
but this also excludes the use of the index!
The text was updated successfully, but these errors were encountered:
gramian
changed the title
Index returns too many entries when used with ORDER BY
SQL: Index returns too many entries when used with ORDER BYOct 16, 2024
gramian
changed the title
SQL: Index returns too many entries when used with ORDER BY
SQL: Index can return too many entries when used with ORDER BYOct 16, 2024
Here is some more testing which may help locate the source:
Both sorting directions ASC and DESC produce these duplicate results but different numbers: 146 for ASC, and 117 for DESC for a database with 81 records.
This index behavior occurs also for INSERT INTO ... CONTENT ...
Using DISTINCT removes the duplicates but this is no workaround as the in paged results the duplicates may be spread.
Using multiple order keys like ORDER BY publicationYear, name returns the correct amount of results. @lvca This does also not use the publicationYear index, right?
The data type seems not to make a difference (tested with SHORT, INTEGER, and LONG).
ArcadeDB Version:
ArcadeDB Server v24.11.1-SNAPSHOT (build 4207821d055e37cbd74b76683e45073da759c2de/1729083296142/console)
OS and JDK Version:
Running on Mac OS X 12.7.6 - OpenJDK 64-Bit Server VM 17.0.12 (Homebrew)
Querying a database can return the wrong number of records, in particular too many records. This problem is not easy to reproduce so I attach a small database (75 records) generated from public data. This behavior occurs when from certain databases, such as the one provided below, is queried via SQL, ie
SELECT
, where an (not-unique) indexed property is used in the projection and also as ordering quantity (inORDER BY
). The schema is given here: https://github.com/ulbmuenster/dataasee/blob/main/database/schema.sqlExpected behavior
Correct number of records returned.
Actual behavior
Too many records are returned.
Steps to reproduce
Restore this backup: metadatalake-backup-20240829-100753793.zip (337KB)
Notes
End of August (
2024-08-28
--2024-08-31
) @lvca (and I) looked into this and potential sources of this behavior could be:FetchFromIndexStep
UPSERT
entering the data.Here is a sample UPSERT statement that generated the provided database:
As a workaround the projection can be renamed, ie:
but this also excludes the use of the index!
The text was updated successfully, but these errors were encountered: