-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 script to convert sr14 land use files to sr15 #9
base: main
Are you sure you want to change the base?
Changes from all commits
66e8af1
15c5c9b
de7c12a
c3c5765
a3760cd
cb9ff37
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# 1.0 Introduction | ||
The _conversion_sr14_sr15_ script converts Series 14 (SR14) data, used primarily in ABM2+, into Series 15 (SR15) format, which is used in ABM3. It adjusts SR14 land use and synthetic population files to match SR15 MGRA boundaries and uses an existing SR15 land use data to estimate SR15-equivalent land use columns. The exact processing procedures implemented are detailed in the following section. | ||
|
||
# 2.0 Methodology | ||
## 2.1 Input Files | ||
Table 1 lists the input files required to run the conversion script and their purpose in the process. All files must be for the same scenario year. For clarity, SR15 refers to the converted SR14 file (output of this script), while ABM3 refers to the existing ABM3 land use input file (in SR15 format) that is used in the conversion process. | ||
|
||
**Table 1. Input Files** | ||
|File Name|Purpose| | ||
|---|---| | ||
|SR14 synthetic household file|Recoded to SR 15 MGRAs and fields renamed consistent with ABM3 field names. Also summarized for household totals to be used in converted MGRA data. | | ||
|SR14 synthetic person file| Fields renamed consistent with ABM3 field names. Summarized for person totals to be used in converted MGRA data.| | ||
|SR14 land use file| Source of employment and income to be used in converted MGRA data file.| | ||
|ABM3 land use| Source of SR15 MGRAs, TAZs, acres, park space, parking costs, and a few other fields held constant in converted MGRA data.| | ||
|MGRA SR14 to SR15 crosswalk| Crosswalk used for conversion. Created via a separate process in which the centroid of each SR14 MGRA was geocoded to the polygon of SR15 MGRAs.| | ||
|
||
## 2.2 Data Processing | ||
### Households and Persons | ||
The synthetic household and person files undergo minimal processing. For the household file, the MGRA column is updated using the SR14 to SR15 crosswalk and the column “hworkers” is renamed to “num_workers”. For the persons file, column ‘miltary’ is renamed to ‘military’, and columns ‘indcen’, ‘weeks’, ‘hours’, ‘rac1p’, ‘hisp’, and ‘version’ are removed, as they are not used in the modeling process. These files are used to calculate household and person totals in the converted SR15 land use file. | ||
|
||
### Land Use | ||
The land use file undergoes extensive processing. There are three major procedures implemented depending on the columns: | ||
1) The converted SR14 synthetic population is used to populate household and person total columns | ||
|
||
2) Some ABM3 columns are maintained in the output MGRA file exactly as they are specified in the input file | ||
|
||
3) Some of the ABM3 columns are set based on the SR14 MGRA data file. In some cases, these are based on distributions obtained from the SR15 input data. | ||
|
||
It is important to note that during the conversion process values may be rounded to match the ABM3 file format, leading to some loss of accuracy and preventing the output SR15 totals from exactly matching the ABM3 totals. Table 2 below details the processing procedures and notes the columns impacted by this rounding error. Note that in the table below, we use the term “converted” to refer to data from the SR14 input file that has been converted to SR15 MGRAs using the crosswalk file. For example, the converted SR14 emp_total (total employment) field was created merging the crosswalk file with the SR14 MGRA data file, then summing emp_total by SR15 MGRA, and merging that data with the SR15 MGRAs so that SR15 MGRAs with no emp_total have a emp_total equal to 0. | ||
|
||
**Table 2: Converted Series 15 Land Use File Fields and How They Are Calculated** | ||
| Column Name | Calculation Procedure | | ||
|---|---| | ||
|mgra taz, luz_id, pseduomsa, zip09, parkactive, openspaceparkpreserve, beachactive, district27, milestocoast, acres, land_acres, effective_acres, truckregiontype, remoteAVParking, refueling_stations, MicroAccessTime, microtransit, each_dist, hch_dist, nev |Transferred over directly from the ABM3 land use file for each MGRA</p>| | ||
|pop, hhp, gq_civ, gq_mil|Calculated based on the converted synthetic population. ‘military’ column of persons file was used to determine if a person was gq_mil. | | ||
|hh, hh_mf, hh_sf, hh_mf| Calculated based on the converted synthetic population. ‘bldgsz’ of the households file was used to determine the type of household.| | ||
|hhs|Calculated as hhp/hh| | ||
|i1,i2,…,i10|ABM3 input data is used to calculate the ratio of each income column to the number of households (hh). These ratios are then used to create the new income groups in the output SR15. For example, from the ABM3 file, we calculate i1_share = i1/hh. We then multiply hh obtained from the converted synthetic population file by i1_share to calculate the final SR15 i1 values. Due to rounding errors, the sum of all income columns may not exactly equal hh in each MGRA. | | ||
|emp_total, all employment categories| Employment categories are calculated by applying the share of an employment category at the TAZ level to the scaled SR14 employment totals, as such: <br><br><ls>1. The total employment scaling factor is calculated as follows: <br>scaling factor = sum(ABM3 emp_total) / sum(SR14 emp_total)</ls><br><br><ls> 2. SR15 emp_total = converted SR14 emp_total * scaling factor</ls><br><br><ls>3. ABM3 employment categories are aggregated to the TAZ level and the share of each employment category is calculated as: <br> ABM3_category_share = emp_category/emp_total</ls><br><br><ls> 4. The share is applied to all MGRAs in the same TAZ in the SR15 output file. SR15 emp_category = ABM3_category_share * SR15 emp_total </ls><br><br><ls> 5. Final employment values are rounded to zero decimals. </ls><br><br><ls>6. SR15 emp_total is recalculated to maintain internal consistency and correct rounding errors: SR15 emp_total = sum(all SR15 employment categories) </ls><br><br> Due to rounding errors from step 5, the sum of SR15 output emp_total does not exactly match the sum of SR15 input emp_total| | ||
|hs, hs_sf, hs_mf, hs_mh, enrollgradekto8, enrollgrade9to12, eollegeenroll, othercollegeenroll, hotelroomtotal|Obtained by applying MGRA crosswalk and summing together rows in the same MGRA | | ||
|
||
|
||
## 2.3 Consistency Checks | ||
Several checks were conducted to guarantee consistency across the converted synthetic population and land use files. Specifically, we checked the following in the converted SR15 land use file: | ||
- Same number of MGRAs as the ABM3 land use file | ||
- Taz, luz_id, and other fields were unchanged from the ABM3 land use file | ||
- Sum of pop = total population records in synthetic population | ||
- Sum of hh = total household records in synthetic population | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I checked the current MGRA based land use and synthetic population files and found the total population and household population are not equal to each other. Why do we need to have the checks (line 47 & 48)? |
||
|
||
## 2.4 Output Files | ||
The conversion script outputs three files: households, persons, and land use. All three files contain the columns necessary for running the files in ABM3. Refer to the SANDAG ABM3 documentation at https://sandag.github.io/ABM/inputs.html for more information. | ||
|
||
# 3.0 User Guide | ||
To run the conversion, two files are required: | ||
1. _config.yaml_ – which contains input and output directories and file names | ||
2. _conversion_sr14_sr15.py_ – which contains the code to convert the files | ||
|
||
User should clone (or download) these files to their local directory. Next, follow the steps below to convert Series 14 data to Series 15 format: | ||
1. Save the input files in the directory of your choice. Input files should include: | ||
- Series 14 synthetic person file | ||
- Series 14 synthetic household file | ||
- Series 14 land use file | ||
- ABM3 land use file | ||
- SR14 to SR15 MGRA crosswalk | ||
2. Open the config.yaml file and update the input and output directories as well as the input and output file names. | ||
|
||
![image](images/config.png) | ||
|
||
3. Open a terminal and navigate to the folder where conversion_sr14_sr15.py is saved. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any specific environment needed? |
||
4. Once in the folder, type the following: **python conversion_sr14_sr15.py** | ||
5. When the conversion ends, the converted files will be saved in the specified output directory (line 13 of config.yaml). | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Input settings | ||
input: | ||
input_dir: '/home/eamcevoy/edna.aguilar/OneDrive - Resource Systems Group, Inc/Documents/20_Projects/SANDAG/inputs' | ||
filenames: | ||
households: 'households.csv' | ||
persons: 'persons.csv' | ||
mgra_xwalk: 'xref_mgra13-15.csv' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this crosswalk file same format as the one that I shared before? If not, could you share it? |
||
land_use: 'mgra13_based_input2035.csv' | ||
land_use_abm3: 'mgra15_based_input2035.csv' | ||
|
||
# Output settings | ||
output: | ||
output_dir: '/home/eamcevoy/edna.aguilar/OneDrive - Resource Systems Group, Inc/Documents/20_Projects/SANDAG/outputs' | ||
filenames: | ||
households: 'households.csv' | ||
persons: 'persons.csv' | ||
land_use: 'mgra13_based_input2035.csv' |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
import pandas as pd | ||
import yaml | ||
import os | ||
|
||
class ConfigLoader: | ||
def __init__(self, config_path): | ||
self.config_path = config_path | ||
self.config = self.load_config() | ||
|
||
def load_config(self): | ||
os.chdir(os.path.dirname(os.path.abspath(__file__))) | ||
with open(self.config_path, 'r') as yamlfile: | ||
return yaml.load(yamlfile, Loader=yaml.FullLoader) | ||
|
||
class DataLoader: | ||
def __init__(self, config): | ||
self.config = config | ||
self.hh_s14 = None | ||
self.per_s14 = None | ||
self.mgra_xwalk = None | ||
self.landuse_s14 = None | ||
self.landuse_abm3 = None | ||
self.load_data() | ||
|
||
def load_data(self): | ||
os.chdir(self.config['input']['input_dir']) | ||
self.hh_s14 = pd.read_csv(self.config['input']['filenames']['households']) | ||
self.per_s14 = pd.read_csv(self.config['input']['filenames']['persons']) | ||
self.mgra_xwalk = pd.read_csv(self.config['input']['filenames']['mgra_xwalk']) | ||
self.landuse_s14 = pd.read_csv(self.config['input']['filenames']['land_use']) | ||
self.landuse_abm3 = pd.read_csv(self.config['input']['filenames']['land_use_abm3']) | ||
|
||
class Converter: | ||
def __init__(self, data_loader): | ||
self.data_loader = data_loader | ||
self.mgra_xwalk_dict = self.data_loader.mgra_xwalk.set_index('MGRA13')['MGRA15'].to_dict() | ||
|
||
def convert_hh(self): | ||
hh_s14 = self.data_loader.hh_s14 | ||
hh_s14['mgra'] = hh_s14['mgra'].replace(self.mgra_xwalk_dict) | ||
hh_s15_converted = hh_s14.rename(columns={'hworkers': 'num_workers'}) | ||
return hh_s15_converted | ||
|
||
def convert_per(self): | ||
per_s14 = self.data_loader.per_s14 | ||
per_s14.rename(columns={'miltary': 'military'}, inplace=True) | ||
per_s15_converted = per_s14.drop(columns=['indcen', 'weeks', 'hours', 'rac1p', 'hisp', 'version']) | ||
return per_s15_converted | ||
|
||
def convert_landuse(self, hh_s15_converted, per_s15_converted): | ||
landuse_s14 = self.data_loader.landuse_s14 | ||
landuse_abm3 = self.data_loader.landuse_abm3 | ||
landuse_s15 = pd.DataFrame() | ||
landuse_s14['mgra_15'] = landuse_s14['mgra'].replace(self.mgra_xwalk_dict) | ||
|
||
cols_to_keep_from_abm3 = ['taz', 'luz_id', 'pseudomsa', 'zip09', 'parkactive', 'openspaceparkpreserve', 'beachactive', | ||
'district27', 'milestocoast', 'acres', 'land_acres', 'effective_acres', 'truckregiontype', 'nev', | ||
'remoteAVParking', 'refueling_stations', 'MicroAccessTime', 'microtransit', 'ech_dist', 'hch_dist'] | ||
|
||
landuse_s15 = landuse_abm3[['mgra'] + cols_to_keep_from_abm3].sort_values(by='mgra') | ||
|
||
|
||
cols_to_adjust_to_match_synth_pop = ['pop', 'hhp', 'gq_civ', 'gq_mil'] | ||
per_hh_s15 = per_s15_converted.merge(hh_s15_converted[['mgra', 'hhid', 'unittype']], on='hhid', how='left') | ||
landuse_s15 = (per_hh_s15.groupby('mgra').size().reset_index().rename(columns={0: 'pop'}) | ||
.merge(landuse_s15, on='mgra', how='right')) | ||
|
||
landuse_s15 = (per_hh_s15[per_hh_s15['unittype'] == 1].groupby(['mgra', 'military']).size().unstack().reset_index() | ||
.rename(columns={0: 'gq_civ', 1: 'gq_mil'}) | ||
.merge(landuse_s15, on='mgra', how='right')) | ||
|
||
landuse_s15[['pop', 'gq_mil', 'gq_civ']] = landuse_s15[['pop', 'gq_mil', 'gq_civ']].fillna(0) | ||
landuse_s15['hhp'] = landuse_s15['pop'] - landuse_s15['gq_civ'] - landuse_s15['gq_mil'] | ||
landuse_s15[cols_to_adjust_to_match_synth_pop] = landuse_s15[cols_to_adjust_to_match_synth_pop].round(0).astype(int) | ||
|
||
cols_to_calc_from_synth_pop = ['hh_mf', 'hh_mh', 'hh_sf', 'hh'] | ||
bldgsz_map = {0: 'hh_mf', 1: 'hh_mh', 2: 'hh_sf', 3: 'hh_sf', 4: 'hh_mf', 5: 'hh_mf', 6: 'hh_mf', 7: 'hh_mf', 8: 'hh_mf', 9: 'hh_mf', 10: 'hh_mh'} | ||
gb = (hh_s15_converted[['mgra', 'bldgsz']] | ||
.assign(bldgsz_rm=lambda df: df['bldgsz'].replace(bldgsz_map)) | ||
.groupby(['mgra', 'bldgsz_rm']).size().unstack(fill_value=0).reset_index()) | ||
gb['hh'] = gb[['hh_mf', 'hh_mh', 'hh_sf']].sum(axis=1) | ||
landuse_s15 = landuse_s15.merge(gb, on='mgra', how='left') | ||
landuse_s15[cols_to_calc_from_synth_pop] = landuse_s15[cols_to_calc_from_synth_pop].fillna(0).astype(int) | ||
|
||
landuse_s15['hhs'] = landuse_s15.apply(lambda row: round(row['hhp'] / row['hh'], 3) if row['hh'] != 0 else 0, axis=1) | ||
|
||
inc_cols = ['i1', 'i2', 'i3', 'i4', 'i5', 'i6', 'i7', 'i8', 'i9', 'i10'] | ||
abm3_i_shares = landuse_abm3[['mgra']].copy() | ||
temp = landuse_abm3[['mgra'] + inc_cols + ['hh']] | ||
|
||
for col in inc_cols: | ||
abm3_i_shares[f'{col}_pct'] = temp.apply(lambda row: row[col] / row['hh'] if row['hh'] != 0 else 0, axis=1) | ||
|
||
for col in inc_cols: | ||
landuse_s15[col] = landuse_s15['hh'] * abm3_i_shares[f'{col}_pct'].values | ||
|
||
landuse_s15[inc_cols] = landuse_s15[inc_cols].round(0).astype(int) | ||
|
||
emp_adj_fac = landuse_abm3['emp_total'].sum() / landuse_s14['emp_total'].sum() | ||
temp = (landuse_s14.groupby('mgra_15')['emp_total'].sum() | ||
.reset_index() | ||
.assign(emp_total=lambda df: df['emp_total'] * emp_adj_fac)) | ||
|
||
landuse_s15 = landuse_s15.merge(temp, left_on='mgra', right_on='mgra_15', how='left') | ||
|
||
emp15_cols = [x for x in landuse_abm3.columns if x.startswith('emp_') and x != 'emp_total'] | ||
amb3_emp_shares_taz = (landuse_abm3[['taz']].drop_duplicates().sort_values(by='taz') | ||
.merge(landuse_abm3.groupby('taz')[emp15_cols + ['emp_total']].sum().reset_index(), on='taz', how='left')) | ||
|
||
for col in emp15_cols: | ||
amb3_emp_shares_taz[f'{col}_pct'] = amb3_emp_shares_taz[col] / amb3_emp_shares_taz['emp_total'] | ||
|
||
amb3_emp_shares_mgra = landuse_abm3[['mgra', 'taz']].merge(amb3_emp_shares_taz, on='taz', how='left') | ||
|
||
for col in emp15_cols: | ||
landuse_s15[col] = amb3_emp_shares_mgra[f'{col}_pct'] * landuse_s15['emp_total'] | ||
|
||
landuse_s15[emp15_cols] = landuse_s15[emp15_cols].fillna(0).round(0).astype(int) | ||
landuse_s15['emp_total'] = landuse_s15[emp15_cols].sum(axis=1) | ||
|
||
cols = ['hs', 'hs_sf', 'hs_mf', 'hs_mh', 'enrollgradekto8', 'enrollgrade9to12', 'collegeenroll', 'othercollegeenroll', 'hotelroomtotal'] | ||
landuse_s15 = landuse_s14.groupby('mgra_15')[cols].sum().merge(landuse_s15, right_on='mgra', left_on=['mgra_15'], how='right') | ||
landuse_s15[cols] = landuse_s15[cols].fillna(0).astype(int) | ||
|
||
landuse_s15 = landuse_s15.drop(columns=['mgra_15', ]).fillna(0) | ||
cols_order = [col for col in landuse_abm3.columns if col in landuse_s15.columns] | ||
landuse_s15 = landuse_s15[cols_order] | ||
|
||
return landuse_s15 | ||
|
||
class Main: | ||
def __init__(self, config_path): | ||
self.config_loader = ConfigLoader(config_path) | ||
self.data_loader = DataLoader(self.config_loader.config) | ||
self.converter = Converter(self.data_loader) | ||
|
||
def run(self): | ||
hh_s15 = self.converter.convert_hh() | ||
per_s15 = self.converter.convert_per() | ||
landuse_s15 = self.converter.convert_landuse(hh_s15, per_s15) | ||
|
||
os.chdir(self.config_loader.config['output']['output_dir']) | ||
hh_s15.to_csv(self.config_loader.config['output']['filenames']['households'], index=False) | ||
per_s15.to_csv(self.config_loader.config['output']['filenames']['persons'], index=False) | ||
landuse_s15.to_csv(self.config_loader.config['output']['filenames']['land_use'], index=False) | ||
|
||
if __name__ == "__main__": | ||
main = Main('config.yaml') | ||
main.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need clarification: the converted SR14 synthetic population is SR15 files, correct?