-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syft extract the full description of the license in python #3088
Comments
Hi @tomersein, thanks for the report! I was able to reproduce this issue. I appreciate the inclusion of a Dockerfile - it makes reproducing the issue a lot easier.
You can see that the license set has a value that's the whole text of the license and the SPDX expression is blank. Interestingly, on GitHub, the license is also shown as just "license" rather than having some automatic identification: https://github.com/numpy/numpy?tab=License-1-ov-file |
my suggestion is to use split \n and take the 1st part of the value in cases like that. |
👋 thanks for the license issue @tomersein! It looks like syft is pulling this value from the following path in the container: This happens when the Python cataloger runs and we construct package details from the Egg/Wheel metadata: syft/syft/pkg/cataloger/python/parse_wheel_egg.go Lines 186 to 192 in 9d40d11
Here is the field definition from we key off of from the python Metadata specification: In this case the package distributor opted to put the full text of the license in this field. This is a valid value for when we go and decode the map structure of this file here: syft/syft/pkg/cataloger/python/parse_wheel_egg_metadata.go Lines 39 to 51 in 9d40d11
I appreciate the suggestion of using a If you look at the contents of this license block the Author's take a good amount of effort to address different distributions they consume as a part of NumPy:
More information (not in their github license.txt) about what they distribute
There are also considerations for the following (there are more truncated for brevity):
Cutting the license off at the new line here would remove this information from the SBOM. I know long field values are ugly and bloat the document, but in this case the value of the license file is as accurate as we can make it without resorting to some specialized parsing for this unique case. @anchore/tools should we NOT be including this when we discover it associated with the package? |
Maybe we should add a new |
hi @wagoodman , what is the decision? |
Hey @tomersein! We talked about this on our livestream the other day. We're moving forward with FullText field being added to the license struct. What do you think is the best way forward detecting this? Just doing a simple len(arbitraryNumber) to see if it's the full text? |
i suggest \n character. I don't think saving twice the long license value is a good practice. |
@spiffcs , https://peps.python.org/pep-0639/ has been provisionally accepted which will hopefully make this metadata field more standardised for future python package releases at least |
Regarding:
My suggestion would be to do this: |
What happened:
I scanned an image of python, and one of the packages has a full description of the license.
here is an example:
What you expected to happen:
In case license has \n inside of it, I think syft should trim it so it will not make the SBOM look weird.
Steps to reproduce the issue:
Here is the Dockerfile to build a sample of an image:
Anything else we need to know?:
Environment:
syft version
: 1.9.0cat /etc/os-release
or similar): macThe text was updated successfully, but these errors were encountered: