You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 17, 2024. It is now read-only.
Describe the bug
The counts reported on the command line by the open source data-diff program are correct for rows exclusive to table a, exclusive to b and changed rows. However, the counts of the result rows in the materialized results table do not match what I would expect based on those counts. Only the count of rows exclusive to table B is correct. SQL outside of data-diff confirms the inaccuracy of the materialized results.
Make sure to include the following (minus sensitive information):
The run output + error you're getting. (including tracestack)
247415 rows in table A
246058 rows in table B
1560 rows exclusive to table A (not present in B)
203 rows exclusive to table B (not present in A)
929 rows updated
244926 rows unchanged
1.01% difference score
Extra-Info:
diff_counts = {'service_request_id_a': 19, 'service_code_a': 15, 'service_name_a': 19, 'address_a': 19, 'subject_a': 19,
'y_a': 19, 'x_a': 19, 'status_a': 19, 'geometry_a': 19}
exclusive_count = 19
table1_count = 247415
table1_sum_x = -18593115.109934963
table1_sum_y = 9894894.846454354
table2_count = 246058
table2_sum_x = -18491140.8768517
table2_sum_y = 9840625.990917146
Count from SQL against the materialized table:
select is_exclusive_a, is_exclusive_b, count(*) from backup_.li_complaints_diff
group by is_exclusive_a,is_exclusive_b;
is_exclusive_a | is_exclusive_b | count
----------------+----------------+-------
f | f | 832
f | t | 203
t | f | 795
Raw counts from the source and target tables:
select count(*) from li_complaints p where not exists (select service_request_id from
backup_.li_complaints where service_request_id = p.service_request_id) ;
count
-------
1560
(1 row)
select count(*) from backup_.li_complaints b where not exists (select service_request_id from
li_complaints where service_request_id = b.service_request_id) ;
count
-------
203
(1 row)
Run data-diff with the -d switch for extra debug information.
see attached file data-diff-debug.txt data-diff-debug.txt
Describe the environment
data-diff 0.11.1
Python 3.11.4
os: Ubuntu Linux on WSL on Windows 10
PostgreSQL 14 on Ubuntu
The text was updated successfully, but these errors were encountered:
zigouras
changed the title
Counts from summary do not match up with row counts in materialized results
Counts from command line summary do not match up with row counts in materialized results
May 15, 2024
I think I figured it out, the default --table-write-limit of 1000 was the problem. I increased it and the numbers look correct in the materialized table now.
Describe the bug
The counts reported on the command line by the open source data-diff program are correct for rows exclusive to table a, exclusive to b and changed rows. However, the counts of the result rows in the materialized results table do not match what I would expect based on those counts. Only the count of rows exclusive to table B is correct. SQL outside of data-diff confirms the inaccuracy of the materialized results.
Make sure to include the following (minus sensitive information):
Count from SQL against the materialized table:
Raw counts from the source and target tables:
-d
switch for extra debug information.see attached file data-diff-debug.txt
data-diff-debug.txt
Describe the environment
The text was updated successfully, but these errors were encountered: