Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android APK D2D: Match converted JARs to PurlDB #1371

Closed
Tracked by #1366
pombredanne opened this issue Aug 29, 2024 · 8 comments
Closed
Tracked by #1366

Android APK D2D: Match converted JARs to PurlDB #1371

pombredanne opened this issue Aug 29, 2024 · 8 comments

Comments

@pombredanne
Copy link
Member

pombredanne commented Aug 29, 2024

Once converted from classes.dex back to a Java JAR, we need to match these to the PurlDB.
This is special because the extracted Android bytecode that is converted back to a Java .class file or a JAR will not be the same exactly as the original Java bytecode this was derived from. We need to create special techniques for this.

@JonoYang
Copy link
Member

The purldb directory matching step would be useful here. jadx decompiles dex files to a directory adjacent to the dex file, where the classes in the dex files are converted to java source. We can fingerprint those directories and match them against the purldb. A caveat I can think of now would be that we would match to the source distribution of a java package rather than the binary, since the fingerprinted directories from the dex files would have java files instead of class files.

@mjherzog
Copy link
Member

From recent Android project experience, my understanding of JADX is that (1) it will produce a set of .class files and a set of .source files neither of which will be a fingerprint match to the original Java code and (2) the decompiled source will be more divergent from the original source than the binaries. So wouldn't fuzzy matching for the .class files be the most important first step?

@JonoYang
Copy link
Member

@mjherzog

I was thinking in terms of using the the directory structure fingerprints, where the fingerprints are created from the paths of the resources within it. We would not match anything on the binary level since the decompilation could vary.

@mjherzog
Copy link
Member

That makes sense - I had missed that point.

@chinyeungli
Copy link
Contributor

The directory structure fingerprints makes sense.

JonoYang added a commit that referenced this issue Sep 27, 2024
@JonoYang
Copy link
Member

There is a step in the android d2d pipeline for matching directories to packages on purldb.

@pombredanne
Copy link
Member Author

Thanks for completing this! Some specific issues wrt. Kotlin are tracked in:

@pombredanne
Copy link
Member Author

As a follow up there are several refinements we can implement. These are tracked in:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants