Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data licence addition #257

Merged
merged 30 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
47e27a8
Add optional licence argument to add_additional_resource
ItIsJordan Jan 9, 2024
14ccd11
Add licence function to Table class
ItIsJordan Jan 9, 2024
de8661a
Add test for Table data licence function
ItIsJordan Jan 9, 2024
04464bf
Fix incorrect type AdditionalResource docs
ItIsJordan Jan 10, 2024
ea1335f
Update add_additional_resource test
ItIsJordan Jan 10, 2024
1383a3c
Complete test_copy_files
ItIsJordan Jan 10, 2024
b83359c
Add validation check to additional resource licence add
ItIsJordan Jan 12, 2024
63f820d
Update error in TestTable
ItIsJordan Jan 12, 2024
efedcaf
Add additional resource licence check test
ItIsJordan Jan 12, 2024
d880784
Update usage documentation to include licences
ItIsJordan Jan 24, 2024
919a5ca
Some fixes and cleaning up for pylint
GraemeWatt Feb 14, 2024
2cf9046
Remove get_license() from Submission class
GraemeWatt Feb 14, 2024
7211f55
examples: show how to add license information
GraemeWatt Feb 15, 2024
3918d70
tests: fix isinstance, add test, suppress keys
GraemeWatt Feb 15, 2024
f440480
Merge branch 'main' into data-licence-update
GraemeWatt Feb 15, 2024
eb176e3
Merge branch 'main' into data-license-update
GraemeWatt Feb 16, 2024
d75d2c0
Merge branch 'main' into data-license-update
ItIsJordan Feb 27, 2024
d84bef6
Update test data
ItIsJordan Mar 25, 2024
5881aef
Merge branch 'main' into data-license-update
ItIsJordan Mar 25, 2024
4d06096
Merge branch 'main' into data-license-update
ItIsJordan Apr 16, 2024
14f65e3
Update test for coverage
ItIsJordan Apr 16, 2024
edcf916
Pylint fixes
ItIsJordan Apr 16, 2024
c579a96
Update usage.rst
ItIsJordan Apr 16, 2024
60bcbe0
Pylint fix
ItIsJordan Apr 17, 2024
1968670
Update usage documentation
ItIsJordan Apr 17, 2024
e4d7e7e
Remove duplicate ref in usage.rst
ItIsJordan Apr 17, 2024
3742535
Remove license addition from Getting_started.ipynb
GraemeWatt Apr 18, 2024
ec1359d
Merge branch 'main' into data-license-update
GraemeWatt Apr 18, 2024
d87017d
Revert "Remove get_license() from Submission class"
GraemeWatt Apr 18, 2024
2ce0035
Change default license from "CC BY 4.0" to "CC0"
GraemeWatt Apr 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,13 +123,21 @@ Additional resources, hosted either externally or locally, can be linked with th

sub.add_additional_resource("Web page with auxiliary material", "https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/STDM-2012-02/")
sub.add_additional_resource("Some file", "root_file.root", copy_file=True)
sub.add_additional_resource("Some file", "root_file.root", copy_file=True, resource_license={"name": "CC BY 4.0", "url": "https://creativecommons.org/licenses/by/4.0/", "description": "This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator."})
sub.add_additional_resource("Archive of full likelihoods in the HistFactory JSON format", "Likelihoods.tar.gz", copy_file=True, file_type="HistFactory")

The first argument is a ``description`` and the second is the ``location`` of the external link or local resource file.
The optional argument ``copy_file=True`` (default value of ``False``) will copy a local file into the output directory.
The optional argument ``resource_license`` can be used to define a data license for an additional resource.
The ``resource_license`` is in the form of a dictionary with mandatory string ``name`` and ``url`` values, and an optional ``description``.
The optional argument ``file_type="HistFactory"`` (default value of ``None``) can be used to identify statistical models provided in the HistFactory JSON
format rather than relying on certain trigger words in the ``description`` (see `pyhf section of submission documentation`_).

**Please note:** The default license applied to all data uploaded to HEPData is `CC0`_. You do not
need to specify a license for a resource file unless it differs from `CC0`_.

.. _`CC0`: https://creativecommons.org/public-domain/cc0/

The ``add_link`` function can alternatively be used to add a link to an external resource:

::
Expand Down Expand Up @@ -320,6 +328,20 @@ The documentation for this feature can be found here: `Linking tables`_.

.. _`Linking tables`: https://hepdata-submission.readthedocs.io/en/latest/bidirectional.html#linking-tables

Adding a data license
^^^^^^^^^^^^^^^^^^^^^

You can add data license information to a table using the ``add_data_license`` function of the Table class.
This function takes mandatory ``name`` and ``url`` string arguments, as well as an optional ``description``.

**Please note:** The default license applied to all data uploaded to HEPData is `CC0`_. You do not
need to specify a license for a data table unless it differs from `CC0`_.

::

table.add_data_license("CC BY 4.0", "https://creativecommons.org/licenses/by/4.0/")
table.add_data_license("CC BY 4.0", "https://creativecommons.org/licenses/by/4.0/", "This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.")

Uncertainties
+++++++++++++

Expand Down
154 changes: 150 additions & 4 deletions examples/Getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Welcome to JupyROOT 6.26/06\n",
"hepdata_lib version 0.14.1\n"
"Welcome to JupyROOT 6.30/04\n",
"hepdata_lib version 0.15.0\n"
]
}
],
Expand Down Expand Up @@ -278,7 +278,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This is all that's needed for the table/figure. We still need it to the submission:"
"This is all that's needed for the table/figure. We still need to add it to the submission:"
]
},
{
Expand Down Expand Up @@ -372,6 +372,152 @@
"source": [
"!ls example_output"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"---\n",
"additional_resources:\n",
"- description: Created with hepdata_lib 0.15.0\n",
" location: https://doi.org/10.5281/zenodo.1217998\n",
"- description: Webpage with all figures and tables\n",
" location: https://cms-results.web.cern.ch/cms-results/public-results/publications/B2G-16-029/\n",
"- description: arXiv\n",
" location: http://arxiv.org/abs/arXiv:1802.09407\n",
"- description: Original abstract file\n",
" location: abstract.txt\n",
"comment: A search for a new heavy particle decaying to a pair of vector bosons (WW\n",
" or WZ) is presented using data from the CMS detector corresponding to an integrated\n",
" luminosity of $35.9~\\mathrm{fb}^{-1}$ collected in proton-proton collisions at a\n",
" centre-of-mass energy of 13~TeV in 2016. One of the bosons is required to be a W\n",
" boson decaying to $e\\nu$ or $mu\\nu$, while the other boson is required to be reconstructed\n",
" as a single massive jet with substructure compatible with that of a highly-energetic\n",
" quark pair from a W or Z boson decay. The search is performed in the resonance mass\n",
" range between 1.0 and 4.5~TeV. The largest deviation from the background-only hypothesis\n",
" is observed for a mass near 1.4~TeV and corresponds to a local significance of 2.5\n",
" standard deviations. The result is interpreted as an upper bound on the resonance\n",
" production cross section. Comparing the excluded cross section values and the expectations\n",
" from theoretical calculations in the bulk graviton and heavy vector triplet models,\n",
" spin-2 WW resonances with mass smaller than 1.07~TeV and spin-1 WZ resonances lighter\n",
" than 3.05~TeV, respectively, are excluded at 95\\% confidence level.\n",
"data_license:\n",
" description: CC0 enables reusers to distribute, remix, adapt, and build upon the\n",
" material in any medium or format, with no conditions.\n",
" name: CC0\n",
" url: https://creativecommons.org/publicdomain/zero/1.0/\n",
"record_ids:\n",
"- id: 1657397\n",
" type: inspire\n",
"---\n",
"additional_resources:\n",
"- description: Original data file\n",
" location: effacc_signal.txt\n",
"- description: Image file\n",
" location: signalEffVsMass.png\n",
"- description: Thumbnail image file\n",
" location: thumb_signalEffVsMass.png\n",
"data_file: additional_figure_1.yaml\n",
"description: Signal selection efficiency times acceptance as a function of resonance\n",
" mass for a spin-2 bulk graviton decaying to WW and a spin-1 W' decaying to WZ.\n",
"keywords:\n",
"- name: observables\n",
" values:\n",
" - ACC\n",
" - EFF\n",
"- name: reactions\n",
" values:\n",
" - P P --> GRAVITON --> W+ W-\n",
" - P P --> WPRIME --> W+/W- Z0\n",
"- name: cmenergies\n",
" values:\n",
" - 13000\n",
"location: Data from additional Figure 1\n",
"name: Additional Figure 1\n"
]
}
],
"source": [
"!cat example_output/submission.yaml"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dependent_variables:\n",
"- header:\n",
" name: Efficiency times acceptance\n",
" qualifiers:\n",
" - name: Efficiency times acceptance\n",
" value: Bulk graviton --> WW\n",
" - name: SQRT(S)\n",
" units: TeV\n",
" value: 13\n",
" values:\n",
" - value: 0.4651\n",
" - value: 0.50336\n",
" - value: 0.5126\n",
" - value: 0.52474\n",
" - value: 0.531\n",
" - value: 0.5391\n",
" - value: 0.54943\n",
" - value: 0.55378\n",
" - value: 0.56216\n",
" - value: 0.56454\n",
" - value: 0.56682\n",
"- header:\n",
" name: Efficiency times acceptance\n",
" qualifiers:\n",
" - name: Efficiency times acceptance\n",
" value: Wprime --> WZ\n",
" - name: SQRT(S)\n",
" units: TeV\n",
" value: 13\n",
" values:\n",
" - value: 0.45136\n",
" - value: 0.5109\n",
" - value: 0.54016\n",
" - value: 0.5513\n",
" - value: 0.56724\n",
" - value: 0.5728\n",
" - value: 0.5856\n",
" - value: 0.58952\n",
" - value: 0.60324\n",
" - value: .nan\n",
" - value: 0.59978\n",
"independent_variables:\n",
"- header:\n",
" name: Resonance mass\n",
" units: GeV\n",
" values:\n",
" - value: 1000.0\n",
" - value: 1200.0\n",
" - value: 1400.0\n",
" - value: 1600.0\n",
" - value: 1800.0\n",
" - value: 2000.0\n",
" - value: 2500.0\n",
" - value: 3000.0\n",
" - value: 3500.0\n",
" - value: 4000.0\n",
" - value: 4500.0\n"
]
}
],
"source": [
"!cat example_output/additional_figure_1.yaml"
]
}
],
"metadata": {
Expand All @@ -390,7 +536,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.12.2"
}
},
"nbformat": 4,
Expand Down
77 changes: 69 additions & 8 deletions hepdata_lib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ def __init__(self):
self.files_to_copy = []
self.additional_resources = []

def add_additional_resource(self, description, location, copy_file=False, file_type=None):
def add_additional_resource(self, description, location, copy_file=False, file_type=None,
resource_license=None):
"""
Add any kind of additional resource.
If copy_file is set to False, the location and description will be added as-is.
Expand All @@ -80,8 +81,13 @@ def add_additional_resource(self, description, location, copy_file=False, file_t

:param file_type: Type of the resource file. Currently, only "HistFactory" has any effect.
:type file_type: string

:param resource_license: License information comprising name, url and optional description.
:type resource_license: dict
"""

#pylint: disable=too-many-arguments

resource = {}
resource["description"] = description
if copy_file:
Expand All @@ -95,6 +101,28 @@ def add_additional_resource(self, description, location, copy_file=False, file_t
if file_type:
resource["type"] = file_type

# Confirm that license does not contain extra keys,
# and has the mandatory name and description values
if resource_license:

if not isinstance(resource_license, dict):
raise ValueError("resource_license must be a dictionary.")

# Get the license dict keys as a set
license_keys = set(resource_license.keys())

# Create sets for both possibilities
mandatory_keys = {"name", "url"}
all_keys = mandatory_keys.union(["description"])

# If license matches either of the correct values
if license_keys in (mandatory_keys, all_keys):
resource["license"] = resource_license
else:
raise ValueError("Incorrect resource_license format: "
"resource_license must be a dictionary containing a "
"name, url and optional description.")

self.additional_resources.append(resource)

def copy_files(self, outdir):
Expand Down Expand Up @@ -308,6 +336,7 @@ def __init__(self, name):
self.location = "Example location"
self.keywords = {}
self.image_files = set()
self.data_license = {}

@property
def name(self):
Expand Down Expand Up @@ -365,6 +394,34 @@ def add_related_doi(self, doi):
else:
raise ValueError(f"DOI does not match the correct pattern: {pattern}.")

def add_data_license(self, name, url, description=None):
"""
Verify and store the given license data.

:param name: The license name
:type name: string
:param url: The license URL
:type url: string
:param description: The (optional) license description
:type description: string
"""
license_data = {}

if name:
license_data["name"] = name
else:
raise ValueError("You must insert a value for the license's name.")

if url:
license_data["url"] = url
else:
raise ValueError("You must insert a value for the license's url.")

if description:
license_data["description"] = description

self.data_license = license_data

def write_output(self, outdir):
"""
Write the table files into the output directory.
Expand Down Expand Up @@ -471,11 +528,14 @@ def write_yaml(self, outdir="."):
submission["name"] = self.name
submission["description"] = self.description
submission["location"] = self.location
submission["related_to_table_dois"] = self.related_tables
if self.related_tables:
submission["related_to_table_dois"] = self.related_tables
submission["data_file"] = f'{shortname}.yaml'
submission["keywords"] = []
if self.additional_resources:
submission["additional_resources"] = self.additional_resources
if self.data_license:
submission["data_license"] = self.data_license

for name, values in list(self.keywords.items()):
submission["keywords"].append({"name": name, "values": values})
Expand Down Expand Up @@ -509,11 +569,11 @@ def __init__(self):
def get_license():
clelange marked this conversation as resolved.
Show resolved Hide resolved
"""Return the default license."""
data_license = {}
data_license["name"] = "cc-by-4.0"
data_license["url"] = "https://creativecommons.org/licenses/by/4.0/"
data_license[
"description"] = "The content can be shared and adapted but you must\
give appropriate credit and cannot restrict access to others."
data_license["name"] = "CC0"
data_license["url"] = "https://creativecommons.org/publicdomain/zero/1.0/"
data_license["description"] = (
"CC0 enables reusers to distribute, remix, adapt, and build upon the material "
"in any medium or format, with no conditions.")
return data_license

def add_table(self, table):
Expand Down Expand Up @@ -612,7 +672,8 @@ def create_files(self, outdir=".", validate=True, remove_old=False):
submission = {}
submission["data_license"] = self.get_license()
submission["comment"] = self.comment
submission["related_to_hepdata_records"] = self.related_records
if self.related_records:
submission["related_to_hepdata_records"] = self.related_records

if self.additional_resources:
submission["additional_resources"] = self.additional_resources
Expand Down
Loading