Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data licence addition #257

Merged
merged 30 commits into from
Apr 18, 2024
Merged

Data licence addition #257

merged 30 commits into from
Apr 18, 2024

Conversation

ItIsJordan
Copy link
Collaborator

@ItIsJordan ItIsJordan commented Feb 27, 2024

  • Adds functionality to insert resource file licence information into HEPData submissions in the add_additional_resource() function.
  • Also adds a new add_data_license() function to the Table class.
  • Updates documentation and testing for new features.
  • Changes default license written by the get_license() function from CC BY 4.0 to CC0.
  • Closes Add ability to specify license information #245.

📚 Documentation preview 📚: https://hepdata-lib--257.org.readthedocs.build/en/257/

ItIsJordan and others added 17 commits January 9, 2024 14:31
Adds support for adding an optional licence to the add_additional_resource function in additional resource objects
Adds a function to add licence information to the Table class
Adds a test for the new add_data_licence function in the Table class
Fixed incorrect docstring type added to the AdditionalResourceMixin object
Completes and updates the test_add_additional_resource test.
Completes implementation of incomplete test_copy_files function
Adds validation of licence additions to the add_additional_resource function for AdditionalResource objects.
Updates error check in TestTable.test_copy_files
Adds a test to check licence validation in TestTable.add_additional_resource.
Updates the usage documentation in USAGE.rst to include adding licences to additional resource and table objects.
* Use spelling "license" rather than "licence" consistently.
* Rename argument "license" as "resource_license" to avoid pylint issue.
* Rename Table "license" as "data_license" and write to YAML.
* Check that "resource_license" is a dict and correct mandatory keys.
* Not supported by JSON schema or HEPData web application.
* Modify "Getting_started.ipynb" to add license information.
* Also print out submission.yaml and YAML data file at end of notebook.
* Fix incorrect use of isinstance added in 919a5ca.
* Add a test for resource_license not a dictionary to increase coverage.
* Suppress empty "related_records" and "related_to_table_dois" keys.
# Conflicts:
#	examples/Getting_started.ipynb
@ItIsJordan
Copy link
Collaborator Author

This PR is intended to close #245.

Copy link

codecov bot commented Feb 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.37%. Comparing base (1f006c8) to head (2ce0035).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #257      +/-   ##
==========================================
+ Coverage   89.78%   90.37%   +0.58%     
==========================================
  Files           5        5              
  Lines        1087     1112      +25     
  Branches      242      251       +9     
==========================================
+ Hits          976     1005      +29     
+ Misses         80       78       -2     
+ Partials       31       29       -2     
Flag Coverage Δ
unittests-3.10 90.37% <100.00%> (+0.58%) ⬆️
unittests-3.11 90.37% <100.00%> (+0.58%) ⬆️
unittests-3.6 90.09% <100.00%> (+0.60%) ⬆️
unittests-3.7 90.09% <100.00%> (+0.60%) ⬆️
unittests-3.8 90.19% <100.00%> (+0.59%) ⬆️
unittests-3.9 90.19% <100.00%> (+0.59%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ItIsJordan
Copy link
Collaborator Author

I just noticed the coverage. Sorry, will look into that this week.

@GraemeWatt
Copy link
Member

@clelange : when testing Jordan's branch, I noticed that hepdata_lib already defines a default license:

def get_license():
"""Return the default license."""
data_license = {}
data_license["name"] = "cc-by-4.0"
data_license["url"] = "https://creativecommons.org/licenses/by/4.0/"
data_license[
"description"] = "The content can be shared and adapted but you must\
give appropriate credit and cannot restrict access to others."
return data_license

and writes it as data_license in the first YAML document of the submission.yaml file:

submission["data_license"] = self.get_license()

This has been the behaviour ever since the first hepdata_lib version from 2018. However, data_license has never been supported in the relevant additional_info_schema.json. The JSON schema only support a data_license in the submission_schema.json (data tables) or a license in the additional_resources_schema.json (additional resources). It would be difficult to modify our database model to support a license for a whole submission rather than just individual tables and resource files. Moreover, the default hepdata_lib license (CC BY 4.0) is slightly more restrictive than the default HEPData license (CC0).

Therefore, I removed the writing of a default license in commit 2cf9046. We could revert that commit if you disagree with the removal. However, I think it will be confusing if we provide new methods to write license information for data tables and additional resources, but keep a redundant default license for the whole submission that will just be ignored when uploading to HEPData.

@clelange
Copy link
Collaborator

@GraemeWatt - I think there should be a default license that is applied, and I agree that this should be CC0 since that's the default stated on the HEPData web page. I don't remember why we chose CC BY 4.0 back then. As part of this PR, I think it would be good to state in the documentation that this license is applied by default and that this is the default for all HEPData records as well as mentioning that users can change to their own license (with the newly added functionality of this PR).

Updates test data in TestTable.test_write_yaml to test missed lines.
@GraemeWatt
Copy link
Member

Looks like the CI has started failing when building a wheel for wrapt v1.12.1, installed by astroid v2.6.6 via pylint v2.9.6. One solution might be to address #234 by upgrading pylint to a more recent version.

@GraemeWatt
Copy link
Member

I opened a PR #260 that removes the pin on Pylint and fixes the build, so better to wait for that PR to be merged.

Updates test_create_files function in test_submission.py to cover missing lines
Fixed some issues output by pylint in the actions output
Updates usage documentation to note default licence
Fixes pylint error around use of open() for files
Updates the documentation for usage in usage.rst to improve licence notice clarity. Now mentions that a licence does not need to be specified if it is CC0.
Removes duplicate reference for `CC0` URL in usage.rst.
@clelange
Copy link
Collaborator

Looks like this is now good to go? @ItIsJordan

@ItIsJordan
Copy link
Collaborator Author

Should be ready now, yes! @clelange

Copy link
Collaborator

@clelange clelange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two comments for discussion that I think are important.

examples/Getting_started.ipynb Outdated Show resolved Hide resolved
hepdata_lib/__init__.py Show resolved Hide resolved
* A license will usually not be needed, so remove for clarity.
* Written in the first document of the submission.yaml file, although
  it is not supported by the JSON schema or the HEPData web application.
@clelange clelange merged commit e88fd41 into main Apr 18, 2024
26 checks passed
@clelange clelange deleted the data-license-update branch April 18, 2024 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to specify license information
3 participants