Match results show 100% equal for functions with differences #313

grimdoomer · 2024-08-31T19:37:37Z

I'm trying to use diaphora to diff different versions of the same binary and detect functions that have differences with a granularity of single instruction changes. I noticed when diff'ing two versions of this binary that only contain a single instruction difference the match of the function is detected as "100% equal" with a ratio of 1.0 even though the functions contain a single instruction difference.

If I diff the assembly for the functions I can see the single instruction change:

I understand the "lwz" line is a false positive because I changed the immediate display type in one of the databases before exporting, but I would still expect the slwi/sldi instruction change to get detected. Is there some settings I can change for the comparisons to be more strict? I thought some of the heuristics used the MD5 hash of the function data which I would expect to change between these two functions.

For additional confirmation I diff'd the two binaries in a hex editor and can clearly see the 4 byte change for the different instructions:

grimdoomer · 2024-09-01T04:11:53Z

Digging into this a bit more I got the "best" matches to run by changing DIFFING_ENABLE_EXPERIMENTAL to False but the results still showed "100% equal" for the function in question. I checked in the sqlite dbs to make sure the function in question has a different byte_hash between the two different versions and they are different.

In the diaphora.py file I found the find_equal_matches function that reports functions as "100% equal". However, it only compares them based on the following fields: id, address, mangled_function, nodes, edges, size. I added bytes_hash and now I get the function in question reported as a partial match with a ratio of 0.99, which is what I was expecting.

So it seems like as long as two functions have the same name, address, size, and control flow they get reported as 100% equal even though the instructions in the functions could have changed? Is this intended behavior or a bug?

joxeankoret · 2024-09-02T14:43:20Z

So it seems like as long as two functions have the same name, address, size, and control flow they get reported as 100% equal even though the instructions in the functions could have changed? Is this intended behavior or a bug?

This is intended behaviour. But according to the very detailed report you made, it might be wrong. I'm going to add the patch you did (adding bytes_hash) but, could you please share the two samples? (Or their hashes, and I would search them myself).

grimdoomer · 2024-09-09T08:01:25Z

I have attached the sample files and IDC scripts used to reproduce the IDA databases I had setup. You can load each .bin file as "PowerPC big-endian", use default memory layout settings, if asked analyze as 32-bit, use all default settings for IO ports, etc. Please let me know if you have any other questions or issues loading the samples.

hv_images.zip

joxeankoret · 2024-09-09T08:15:43Z

Bug fixed locally, waiting for all the tests to pass. Thanks a lot!

ML: Dropped support for training local models. They were not working properly at all. BUG: HEUR: Added field 'bytes_hash' to the '100% equal' heuristic, as it was ignoring some minimal changes (issue #313) BUG: HEUR: Always check if there are differences even for structurally 100% equal databases (issue #313).

joxeankoret self-assigned this Sep 4, 2024

joxeankoret added the bug label Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match results show 100% equal for functions with differences #313

Match results show 100% equal for functions with differences #313

grimdoomer commented Aug 31, 2024

grimdoomer commented Sep 1, 2024

joxeankoret commented Sep 2, 2024

grimdoomer commented Sep 9, 2024

joxeankoret commented Sep 9, 2024

Match results show 100% equal for functions with differences #313

Match results show 100% equal for functions with differences #313

Comments

grimdoomer commented Aug 31, 2024

grimdoomer commented Sep 1, 2024

joxeankoret commented Sep 2, 2024

grimdoomer commented Sep 9, 2024

joxeankoret commented Sep 9, 2024