-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create credible temp score using RMI data #16
base: develop
Are you sure you want to change the base?
Conversation
Also disable Steel sector, for which RMI provides no data.
The previous S1+S2 did not correct MMT vs MT as the unit of measure for NOX emissions.
It appeared that there was quite some ambiguity in how emissions vs. production data was being handled. Pervasively changed ESCope (emissions scope) to PScope (projection scope). This can be further renamed if there's a better wording for it. Regardless of the EScope/PScope naming, there were several other changes needed to make production projections their own thing (and not confused with emissions projections). The to_numpy() casts in DataWarehouse were covering up an egregious error in keeping production projections and emissions/intensity projections aligned. The rows are no longer scrambled. Included are files based on original portfolio data but also based on real RMI data (using 2019 as a base year). Comments welcome!
As @BertKramer explained to me, it may be that I was overzealous in terms of preferring company production projections over sectoral benchmark projections. If that's the case, then this pull request can be greatly simplified to just the part that fixes the calculations in If, however, there always was a plan to use company production projections, as an option or as a preference, this pull request lays groundwork for that implementation. |
ITR data pipeline now properly uses ISO3166 to put countries in the correct regions (with the help of ESSD's UN region definitions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also talked to Bert Kramer about the production projections, and it seems best, for now, to not merge that part back, until the relevant methodology parts have been discussed.
Not sure what the easiest way is to merge part of this PR back, while keeping the remainder, but I'd be happy to help out in any way I can.
As I already started, I did finish the review of the entire PR, so hopefully it's still useful for future reference.
self.column_config.GHG_SCOPE12]] | ||
ei_at_base = self._get_company_intensity_at_year(base_year, company_ids).rename(self.column_config.BASE_EI) | ||
# print(f"BA: company_info.loc[] = {company_info.loc['US0185223007']}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
get the projected productions for list of companies in ghg_scope12 | ||
:param ghg_scope12: DataFrame with at least the following columns : | ||
ColumnsConfig.COMPANY_ID,ColumnsConfig.GHG_SCOPE12, ColumnsConfig.SECTOR and ColumnsConfig.REGION | ||
get the projected productions for list of companies (PRODUCTIONS not S1S2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"(PRODUCTIONS not S1S2)" might be superfluous
@@ -1,4 +1,4 @@ | |||
from abc import ABC | |||
from abc import ABC # _project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing "# _project"
assert pd.Series(company_ids).isin(df_company_data.loc[:, self.column_config.COMPANY_ID]).all(), \ | ||
"some of the company ids are not included in the fundamental data" | ||
|
||
company_info_at_base_year = self.company_data.get_company_intensity_and_production_at_base_year(company_ids) | ||
# print(f"DW: company_info_at_base_year.loc[] = {company_info_at_base_year.loc['US0185223007']}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
# print(f"BUDG:\n{df_company_data.loc[df_company_data.index<40,['company_id',self.column_config.CUMULATIVE_BUDGET]]}\n\n") | ||
# print(f"CIABY:\n{company_info_at_base_year.loc[df_company_data.index<40,:]}\n\n") | ||
# print(f"""SDA:\n{self.benchmarks_projected_emission_intensity.get_SDA_intensity_benchmarks( | ||
# company_info_at_base_year).loc[df_company_data.index<40,:]}\n\n""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
# print(projected_emission_intensity.index[0:3]) | ||
# print(projected_emission_intensity.iloc[0:3]) | ||
# print(projected_production.index[0:3]) | ||
# print(projected_production.iloc[0:3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
@@ -30,7 +30,7 @@ def convert_benchmark_excel_to_model(df_excel: pd.DataFrame, sheetname: str, col | |||
result.append(bm) | |||
return IBenchmarks(benchmarks=result) | |||
|
|||
|
|||
# ??? This duplicates info from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it duplicates: it refers to tabs of an excel file rather than columns. We could consider moving it to configs.py.
Comment should be removed, I think.
use_S1S2 = (data[self.c.COLS.SCOPE] == EScope.S1S2) | (data[self.c.COLS.SCOPE] == EScope.S1S2S3) | ||
use_S3 = (data[self.c.COLS.SCOPE] == EScope.S3) | (data[self.c.COLS.SCOPE] == EScope.S1S2S3) | ||
use_S1S2 = (data[self.c.COLS.SCOPE] == PScope.S1S2) | (data[self.c.COLS.SCOPE] == PScope.S1S2S3) | ||
use_S3 = (data[self.c.COLS.SCOPE] == PScope.S3) | (data[self.c.COLS.SCOPE] == PScope.S1S2S3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you did on line 95. We could consider doing the same here.
If I have read the code, the diagram, and the methodology documentation correctly, It appears that there was quite some ambiguity in how emissions vs. production data was being handled, and a variety of other problems hiding therein. A sign that something was wrong was temperature scores than ranged from a fraction of a degree to several hundred degrees(!).
This PR pervasively changes the EScope (emissions scope) type label to PScope (projection scope). This can be further renamed if there's a another wording that's preferred (of if there's actually a second branch of logic that we should fully pull apart from the main branch).
Regardless of the EScope/PScope naming, there were several other changes needed to make production projections their own thing (and not confused with emissions projections).
The to_numpy() casts in DataWarehouse were covering up an egregious error in keeping production projections and emissions/intensity projections aligned. The rows are no longer scrambled.
Included are files based on original portfolio data but also based on real RMI data (using 2019 as a base year). As you can see by the Notebook file, there are some US-based utilities that are not Paris-aligned (or the data we have doesn't show them in very good light), but most of the temperature scores are in the 2-3 degree range, which is credible.
Comments welcome!