Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop project vault iceberg #89

Open
wants to merge 346 commits into
base: develop
Choose a base branch
from

Conversation

MichaelTiemannOSC
Copy link
Contributor

This PR augments the existing ITR ptorotype by adding a connection to the Data Commons. It preserves/incorporates the Pint Units enhancements recently added to the develop branch. To test, this PR requires access to credentials to certain GitHub user identities. We really need to sort a better way to integrate credential management with CI/CD testing.

Happy to answer any questions...

@MichaelTiemannOSC MichaelTiemannOSC added this to the NZAOA Demo milestone May 5, 2022
@MichaelTiemannOSC
Copy link
Contributor Author

I can run the tests as me with my dotenv permissions, but the data vault requires special handling (as mentioned above). Help please ;-)

ImportError: Failed to import test module: test_vault_providers
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.12/x64/lib/python3.9/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/opt/hostedtoolcache/Python/3.9.12/x64/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/home/runner/work/ITR/ITR/test/test_vault_providers.py", line 13, in <module>
    from ITR.data.vault_providers import VaultCompanyDataProvider, VaultProviderProductionBenchmark, \
  File "/home/runner/work/ITR/ITR/ITR/data/vault_providers.py", line 3, in <module>
    from dotenv import load_dotenv
ModuleNotFoundError: No module named 'dotenv'

MichaelTiemannOSC and others added 26 commits February 15, 2023 12:36
Many companies report S1-only targets.  We should handle that, and translate to S1S2 according to methodology.

Signed-off-by: Michael Tiemann <[email protected]>
Added mentions for pint and pint-pandas, as well as latest pandas.

Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: David Kroon <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
…ctor

Signed-off-by: David Kroon <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Added openscm-units

Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: David Kroon <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: David Kroon <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: David Kroon <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Provide target projections based on S1 and S2 scopes, not only S1S2.  Also fix base year data exclusion bug (we don't have to abandon projection if base year == last_year).

Also, simplify input data as we have not yet implemented  everything described in #32

Signed-off-by: Michael Tiemann <[email protected]>
Fix more data errors...now somewhat demoable (but strange results when target values are well above existing attainment, such as Duke Energy).

Signed-off-by: Michael Tiemann <[email protected]>
Update openpyxl version number.

Signed-off-by: Michael Tiemann <[email protected]>
The current benchmark data treats Asia as "Global" but that doesn't mean we cannot properly list Asia as a distinct region for display and aggregation purposes.  Accordingly, change POSCO's region to Asia.

Signed-off-by: Michael Tiemann <[email protected]>
Convert to base units before calculating magnitude (need to check elsewhere for this error!) and clamp CAGR to non-positive result.

Signed-off-by: Michael Tiemann <[email protected]>
There was a bug in how EITargetProjector::project_ei_targets was projecting target data to 2050 absent specific targets with that as an explicit target year.  These changes fix that bug, as well as enabling the functionality of using the netzero_year field of the input template.

Also update template to use netzero_year instead of netzero_date.

Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
…nups)

Other cleanups include:

Let DataWarehouse call _calculate_target_projections so user doesn't have to worry about it.

Fix more spellings of emission_ to emissions_

When creating the one-row company_sector_region_info DataFrame, don't just initialize with singleton elements; put those elements into lists (so we can pass a Quantity as an ExtensionArray instead of being seen as a dict

Comment highly suspect declaration of projected_targets, which are available in the base class of ICompanyAggregates

Signed-off-by: Michael Tiemann <[email protected]>
Connected to previous checkin: construct company_sector_region_info DataFrame using dictionary of [] not singleton elements to make Quantity work as ArrayExtension instead of dict.

Signed-off-by: Michael Tiemann <[email protected]>
Modify the original GUI app to work with new unitized ITR backend:

* Added unitized JSON files
* Use new initialization procedures for data Template
* Unitize quantities within the GUI, such as specific temperature score values.

Not fully working: a graph of production output wrongly tries to mix Steel production numbers (Fe_ton) with Electricity production numbers (TWh).  It's good that the unit code caught it!

Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
There was long-standing confusion about the meaning of GHG_SCOPE12 (which, when looked at through one functional path, seemed to depend first and only on production values, and when looked at other ways, seemed to represent emissions values).  It was finally determined that this was, indeed, an emissions-based quantity, and the the production value pathway fed a ratio calculation that resolved to a dimensionless quantity (so it could be calculated just as well from emissions).  In any case, these changes principally fix these and some other problems in the way various column names and variable names work and work together.

Signed-off-by: Michael Tiemann <[email protected]>
Updated to work with with unit-aware code.

Signed-off-by: Michael Tiemann <[email protected]>
…hancements

Refactor _calculate_target_projections into BaseCompanyDataProvider and reorganize class definition order to accommodate.

Also fix some latent unit errors in excel.py and test_excel_provider.py resulting from GHG_SCOPE12 fixes.

Update quick example notebooks.

Signed-off-by: Michael Tiemann <[email protected]>
One more row of data!

Signed-off-by: Michael Tiemann <[email protected]>
MichaelTiemannOSC and others added 25 commits February 15, 2023 12:56
…, Chemicals, Textiles).

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
…(using m**2 right now).

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
…OECM benchmarks.

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
… guide S3 handling.

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
…enchmark-ingest.

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
See #157 for long-term fix.

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
…. Now ready for Real Estate!

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
… add.

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
…ata.

Signed-off-by: MichaelTiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Company ids are critical, when converting from Excel to
ICompanyData.

If input Excel is missing company ids, then following
conversion raises a hard-to-debug exception:

~~~~
ValueError: Shape of passed values is (265, 6), indices imply (260, 6)
~~~~

With this commit, we prevent further convertion, and raise a
more understandable exception:

~~~~
ValueError: Missing company ids
~~~~

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
If during the call to `_company_df_to_model` input Excel is missing
company name, error log message says:

~~~~
ERROR - (One of) the input(s) of company <NA> is invalid
~~~~

This message could be less helpful, when looking for an invalid
row in the Excel.

This commit prints company id instead, to make a problematic
row easy to find, when company name is missing:

~~~~
ERROR - (One of) the input(s) of company with ID US00130H1059 is invalid
~~~~

Company ids are used on erlier steps of Excel validation.
Also, earlier steps validate, that company ids exist for all rows.

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
For this particular bemchmark, only S3 matters.
If following commits, we use the absence of S1S2
as an indicator of S3-scope calculations

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Scope to calculate is usually S1S2, unless benchmark doesn't specify it.
The second candidate is S3
If scope to calculate is S3, `DataWarehouse` shouldn't merge S3 into S1S2.

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Modify production benchmark input to `AnyScope`

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Add support for `scope` argument where necessary.
Give benchmark scope_to_calc as argument.

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Use benchmark scope_to_calc for:

* creation of `TemperatureScore` object
* selection of scope content in the table in GUI

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
This allows us to calculate temperature score for companies,
which provided S3 data.
Otherwise, we will get an exception
"The value for S3 is missing for the following companies"

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
The change only touches GUI, where the term 'scenario' was used
for Emissions Intensity benchmark.

This could confuse users

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
GitHub token was required for this notebook, but never used.
At the same time, not all ITR users have a GitHub token,
which made this notebook not executable for them.

When i removed it, it's absense didn't effect execution
of the notebook. Which means, that it can be removed safely,
and make the notebook more accessible for ITR users.

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
This w/a resolves the exception:

~~~~
KeyError: "[('Europe', 'Construction Buildings', <EScope.S3: 'S3'>)]
not in index"
~~~~

TODO: Remove this w/a, when associated EI benchmarks become available

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
The notebook separately loads EI benchmarks for S1S2 and S3,
and separately calculates temperature scores for selected scopes.

The output is 2 separate tables with separately calculated scores

Signed-off-by: Kirill Marinushkin <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
This also includes an update to the organization name for pip-audit.
"trailofbits" redirects to pypa now, and while this is functional for
now, having the current name decreases the likelihood of problems
down the line.

Signed-off-by: Eric Ball <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
@MichaelTiemannOSC MichaelTiemannOSC force-pushed the develop-project-vault-iceberg branch from b732e52 to 6d4cfa7 Compare February 15, 2023 18:29
@MichaelTiemannOSC
Copy link
Contributor Author

The merge above was done hastily to fix long-standing DCO problems. It almost certainly won't run, but it will create a workable baseline from which the code can be resurrected.

@dp90
Copy link
Collaborator

dp90 commented Feb 16, 2023

Some of the current errors I was able to resolve by installing some missing packages (added those to requirements.txt).

Regarding the credentials, I think a common practice is to include them as GitHub secrets (under the repo settings in Secrets and variables -> actions). In the CI/CD testing we could then add a step to the workflow .yaml file such as

  • name: Create .env file
    run: |
    echo "${{ secrets.ENV_FILE }}" > .env

The ENV_FILE secret would hold the environment variables in the same format as they are locally. What are your thoughts?

Signed-off-by: David Kroon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants