-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: introduce new internal representation for Opossum files #192
base: main
Are you sure you want to change the base?
Conversation
1e1d053
to
057727c
Compare
60b1e47
to
59f5dc7
Compare
attribution_to_id: dict[OpossumPackage, str] = field( | ||
default_factory=default_attribution_id_mapper | ||
) | ||
output_file: OpossumOutputFile | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really optimal. We are mixing here two things in a model which are different from business point of view
- What is currently stored in the opossum input file at the end but from business point of view is the results of the scan
- What is currently stored in the opossum output file but actually is the result of the human interaction in the opossumUI tool
Proposal: Structure:
Opossum --> ScanResults (What is now opossum but without the output_file
|-----> ReviewResults (what is now the opossum output file)
Note: I would still use the opossum output file here as object ATM as we do not yet know the best format here
attribution_breakpoints=self.attribution_breakpoints, | ||
external_attribution_sources=external_attribution_sources, | ||
frequent_licenses=frequent_licenses, | ||
files_with_children=self.files_with_children, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to deep-copy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to be defensive and do a deepcopy just in case
] = {} | ||
|
||
def process_node(node: Resource) -> None: | ||
path = str(node.path).replace("\\", "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As replace("\\", "/")
is
- used at two different places
- is not totally obvious what it is doing
I would propose to extract to a function with an explanatory name
self.get_attribution_key(a): a.to_opossum_file_format() | ||
for a in attributions | ||
} | ||
external_attributions.update(new_attributions_with_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could principally do a merge. After all associated discussions, is that by purpose?
|
||
|
||
class BaseUrlsForSources(BaseModel): | ||
model_config = ConfigDict(frozen=True, extra="allow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it by purpose that some of these models are frozen
while others are not?
default_text: str | ||
|
||
def to_opossum_file_format(self) -> opossum_file.FrequentLicense: | ||
return opossum_file.FrequentLicense(**self.model_dump()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the type safe way of converting
package_version: str | None = None | ||
package_namespace: str | None = None | ||
package_type: str | None = None | ||
package_p_u_r_l_appendix: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need to use this ugly name here as we do no longer need to match the structure of the file
is_relevant_for_preferred: bool | None = None | ||
|
||
def to_opossum_file_format(self) -> opossum_file.ExternalAttributionSource: | ||
return opossum_file.ExternalAttributionSource(**self.model_dump()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again think about a type safe conversion
return opossum_file.Metadata(**self.model_dump()) | ||
|
||
|
||
class ResourceType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move that closer to the Resource class?
* this model encapsulates also the semantic relationships of resources, resourcesToAttribuions and externalAttributions. These are not enforced by the file structure alone. * This will be used as a target for the other file format frontends and simplify their logic. * It also allows for easier testing since it allows to check for semantic/structural equivalence among opossum files (e.g. the IDs of the attribution carry no semantic semantic information themselves i.e. are arbitrary labels)
* Opossum class should be able to carry all information that could be present in an .opossum file
* resources can now by added to the resource structure without knowledge about internals * for this reason: - resources can be created with just a path (i.e. without type) - resources can now be merged together if the types are compatible. types are compatible if at least one is not set or types are identical - when converting to opossum file format, unset type defaults to folder. Maybe this should raise an error instead, but being more permissible probably just makes things more ergonomic without hurting correctness.
* default to treating a Resource as file if the type is undefined and no children present * provide slightly more information in an error message
54f5c61
to
bd3735f
Compare
Note: rebased onto current main |
) | ||
output_file: OpossumOutputFile | None = None | ||
|
||
def to_opossum_file_format(self) -> opossum_file_content.OpossumFileContent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be nicer to read if at the top of the file you don't use namespace imports but import exactly the classes you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some duplication of names between opossum_file and opossum_model. That's why I wanted to explicit where which name comes from.
This duplication comes from the fact that opossum_model is quite similar to opossum_file generally but we chose to separate the implementations which meant copying some models verbatim.
The specific instance you marked could be removed though as there are only clashes between opossum_file and opossum_model.
attribution_breakpoints=[], | ||
external_attribution_sources={}, | ||
frequent_licenses=None, | ||
files_with_children=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the two None values? These are default values on the Opossum Models.
segments = path_segments(file.path) | ||
temp_root.get_path(segments).file = file | ||
resource = opossum_model.Resource( | ||
path=PurePath(file.path.replace("\\", "/")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again the replacement.
process_node(child) | ||
|
||
return external_attributions, resources_to_attributions | ||
def convert_resource_type(val: FileType) -> opossum_model.ResourceType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename val -> file_type
Summary of changes
This PR does not contain tests for the new model. These are done separately see #189
Context and reason for change
See: #190
Closes: #195
Part1 of: #196 (tests follow in a follow-up)