Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daff to output row-level changes without before->after in-cell differences #199

Open
dwrapson-arc opened this issue May 15, 2024 · 1 comment

Comments

@dwrapson-arc
Copy link

dwrapson-arc commented May 15, 2024

tldr: can I get only line-level changes highlighted in @@ column of daff output without in-cell before->after.


Daff is (almost) exactly the tool I've been searching for and was rejoiced when I found it. I need to diff two full-export datasets day by day and generate a delta (for CDC) to feed into an import framework.

However, whilst I do need comparisons made cell-by-cell (row-by-row respecting keys), I don't need to actually know the specifics of those changes. I just need to know which rows are add, modify, delete, and the framework will take care of the rest (SCD2).

In fact, having before->after within-cell actually makes it much harder to work with as I would have to parse that out which I don't want to need to do. It is possible to simulate this output with some code, taking the output then merging it with the newer file in dataframes, but would be amazing if daff could do it all directly.

I've gone through the options and maybe there's something I'm missing, but can I get purely a row-level output with only the modified values in the patch file output?

So if I daff 1.csv 2.csv I would get +++ for adds, --- for deletes, and just -> in the @@ column for modify and nothing else on the row.

I should note I've been mostly looking at the CLI interface, and not yet interfacing in Python or other language. If there are more options available there again I haven't been able to track them down from the specification.

@dwrapson-arc
Copy link
Author

Example for clarity:

$ daff 1.csv 2.csv --context 0

Regular Output

@@ bridge designer length
+++ Manhattan G. Lindenthal 1470
-> Williamsburg D. Duck->L. L. Buck 1600
--- Spamspan S. Spamington 10000

Desired Output

@@ bridge designer length
+++ Manhattan G. Lindenthal 1470
-> Williamsburg L. L. Buck 1600
--- Spamspan S. Spamington 10000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant