StEWI is a collection of Python modules that provide processed USEPA facility-based emission and waste generation inventory data in standard tabular formats. The standard outputs may be further aggregated or filtered based on given criteria, and can be combined based on common facility and flows across the inventories.
StEWI consists of a core module, stewi
, that digests and provides the USEPA inventory data in standard formats. Two matcher modules, the facilitymatcher
and chemicalmatcher
, provide commons IDs for facilities and flows across inventories, which is used by the stewicombo
module
to combine the data, and optionally remove overlaps and remove double counting of groups of chemicals based on user preferences.
StEWI v1 was peer-reviewed internally at USEPA and externally through Applied Sciences. An article describing StEWI was published in a special issue of Applied Sciences: Advanced Data Engineering for Life Cycle Applications.
Source | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 |
---|---|---|---|---|---|---|---|---|---|---|---|
Discharge Monitoring Reports* | x | x | x | x | x | x | x | x | x | x | x |
Greenhouse Gas Reporting Program | x | x | x | x | x | x | x | x | x | x | x |
Emissions & Generation Resource Integrated Database | x | x | x | x | x | x | |||||
National Emissions Inventory** | x | i | i | x | i | i | x | i | i | x | |
RCRA Biennial Report* | x | x | x | x | x | x | |||||
Toxic Release Inventory* | x | x | x | x | x | x | x | x | x | x | x |
*Earlier data exist and are accessible but have not been validated
**Only point sources included at this time from NEI; i interim years between triennial releases, accessed through the Emissions Inventory System, are not validated
The core stewi
module produces the following output formats:
Flow-By-Facility: Each row represents the total amount of release or waste of a single type in a given year from the given facility.
Flow-By-Process: Each row represents the total amount of release or waste of a single type in a given year from a specific process within the given facility. Applicable only to NEI and GHGRP.
Facility: Each row represents a unique facility in a given inventory and given year
Flow: Each row represents a unique flow (substance or waste) in a given inventory and given year
The chemicalmatcher
module produces:
Chemical Matches: Each row provides a common identifier for an inventory flow chemical
The facilitymatcher
module produces:
Facility Matches: Each row provides a common identifier for an inventory facility
The stewicombo
module produces:
Flow-By-Facility Combined: Analagous to the flowbyfacility, with chemical and facilitymatches added
The following describes details related to dataset access, processing, and validation
Processing of the DMR uses the custom search option of the Water Pollutant Loading Tool with the following parameters:
- Parameter grouping: On - applies a parameter grouping function to avoid double-counting loads for pollutant parameters that represent the same pollutant
- Detection limit: Half - set all non-detects to ½ the detection limit
- Estimation: On - estimates loads when monitoring data are not reported for one or more monitoring periods in a reporting year
- Nutrient Aggregation: On - Nitrogen and Phosphorous flows are converted to N and P equivalents
For validation, the sum of facility releases (excluding N & P) are compared against reported state totals. Some validation issues are expected due to differences in default parameters used by the water pollutant loading tool for calculating state totals.
eGRID data are sourced from EPA's eGRID site. For validation, the sum of facility releases are compared against reported U.S. totals by flow.
GHGRP data are sourced from EPA's Envirofacts API For validation, the sum of facility releases by subpart are compared against reported U.S. totals by subpart and flow. The validation of some flows (HFC, HFE, and PFCs) are reported in carbon dioxide equivalents. Mixed reporting of these flows in the source data in units of mass or carbon dioxide equivalents results in validation issues.
NEI data are downloaded from the EPA Emissions Inventory System (EIS) Gateway and hosted on EPA Data Commons for access by StEWI. For validation, the sum of facility releases are compared against reported totals by flow. Validation is only available for triennial datasets.
RCRAInfo data are sourced from the Public Data Files For validation, the sum of facility waste generation are compared against reported state totals as calculated for the National Biennial Report.
TRI data are sourced from the Basic Plus Data files For validation, the sum of facility releases are compared to national totals by flow from the TRI Explorer.
stewicombo
module combines inventory data from within and across selected inventories by matching facilities in the Facility Registry Service and
chemical flows using the Substance Registry Service.
If the remove_overlap
parameter is set to True (default), stewicombo
combines records using the following default logic:
- Records that share a common compartment, SRS ID and FRS ID within an inventory are summed.
- Records that share a common compartment, SRS ID and FRS ID across an inventory are assessed by compartment preference (see
INVENTORY_PREFERENCE_BY_COMPARTMENT
). - Additional steps are taken to avoid overlap of:
- nutrient flow releases to water between the TRI and DMR
- particulate matter releases to air reflecting PM < 10 and PM < 2.5 in the NEI
- Volatile Organic Compound (VOC) releases to air for individually reported VOCs and grouped VOCs
Install a release directly from github using pip. From a command line interface, run:
pip install git+https://github.com/USEPA/[email protected]#egg=StEWI
where you can replace 'v1.1.0' with the version you wish to use under Releases.
Alternatively, to install from the most current point on the repository:
git clone https://github.com/USEPA/standardizedinventories.git
cd standardizedinventories
pip install . # or pip install -e . for devs
In order to enable calculation and assignment of urban/rural secondary contexts, please refer to
esupy's README.md for installation instructions,
which may require a copy of the env_sec_ctxt.yaml
file included here.
Output of StEWI can be accessed for selected releases without having to run StEWI. See the Data Product Links page for direct links to StEWI output files in Apache parquet format.
See the Wiki for instructions on installation and use and for citation and contact information.
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.