This project processes NOAA's High Tide Flooding (HTF) data to create county-level estimates of both historical flooding events and future flooding projections. It combines gauge-level measurements with spatial relationships between tide gauges and counties to generate weighted estimates of flooding frequency for coastal counties.
graph TD
subgraph Raw Data
A1[US Medium Shoreline Shapefile] --> B1[shapefile_converter.py]
A2[County Shapefile] --> B1
A3[Regional Shoreline Shapefiles] --> B1
end
subgraph Shapefile Processing
B1 --> C1[shoreline.parquet]
B1 --> C2[county.parquet]
B1 --> C3[regional_shorelines/*.parquet]
end
subgraph County Processing
C2 --> D1[coastal_counties_finder.py]
C3 --> D1
D1 --> E1[coastal_counties.parquet]
end
subgraph Reference Point Generation
E1 --> F1[coastal_points.py]
C3 --> F1
F1 --> G1[coastal_reference_points.parquet]
end
graph TD
subgraph Data Collection
A[NOAA Historical HTF API] --> B[htf_fetcher.py]
C[NOAA Projected HTF API] --> D[htf_projector.py]
E[Gauge-County Mapping] --> F[imputed_gauge_county_mapping.parquet]
end
subgraph Data Processing
B --> G[historical_htf.parquet]
D --> H[projected_htf.parquet]
G --> I[data_loader.py]
H --> I
F --> I
I --> J[assignment.py]
J --> K[calculate_historical_county_htf]
J --> L[calculate_projected_county_htf]
end
subgraph Output Generation
K --> M[historical_county_htf.csv/parquet]
L --> N[projected_county_htf.csv/parquet]
M --> O[Visualization & Analysis]
N --> O
O --> P[HTF Analysis Report]
O --> Q[Data Visualizations]
end
The project generates two main datasets:
- Description: Observed high tide flooding events aggregated to county level
- Time Range: 1950-2022
- Contents:
- Flood day counts by severity (minor, moderate, major)
- Total flood days per year
- Coverage: 328 coastal counties
- Location:
data/processed/county_htf/
- CSV:
historical_county_htf.csv
- Parquet:
historical_county_htf.parquet
- Data Dictionary:
historical_county_htf_data_dictionary.md
- CSV:
- Description: Projected future flooding frequency under different sea level rise scenarios
- Time Range: 2020-2100
- Contents:
- Flood day projections for 5 scenarios:
- Low
- Intermediate Low
- Intermediate
- Intermediate High
- High
- Coverage: 328 coastal counties
- Flood day projections for 5 scenarios:
- Location:
data/processed/county_htf/
- CSV:
projected_county_htf.csv
- Parquet:
projected_county_htf.parquet
- Data Dictionary:
projected_county_htf_data_dictionary.md
- CSV:
This analysis is based on two primary NOAA data products:
-
Annual Flood Count Product
- Historical observations of high tide flooding events
- Categorized by severity (minor, moderate, major)
- Collected at NOAA tide gauge stations
-
Decadal Projections Product
- Future flooding frequency estimates
- Multiple sea level rise scenarios
- Based on NOAA tide gauge locations
The county-level estimates are generated through the following process:
-
Gauge-County Mapping
- Each county is associated with up to 3 nearest tide gauges
- Weights are assigned based on proximity and other relevant factors
- Stored in:
data/processed/imputed_gauge_county_mapping.parquet
-
Data Processing
- Historical and projected data are processed separately
- Gauge measurements are weighted and aggregated to county level
- Quality checks ensure data completeness and validity
-
Output Generation
- Results saved in both CSV (for sharing) and Parquet (for analysis)
- Comprehensive data dictionaries document all fields
- Summary statistics included for data validation
To process the HTF data and generate county-level datasets:
python3 -m src.county_htf.main
This will:
- Load gauge-county mapping and HTF data
- Process historical observations
- Process future projections
- Generate output files and data dictionaries
.
├── data/
│ └── processed/
│ ├── county_htf/ # Output datasets
│ ├── historical_htf/ # Input historical data
│ ├── projected_htf/ # Input projected data
│ └── imputed_gauge_county_mapping.parquet
├── src/
│ └── county_htf/
│ ├── __init__.py
│ ├── main.py # Main processing pipeline
│ ├── data_loader.py # Data loading utilities
│ └── assignment.py # Core processing logic
└── README.md
- Some counties may have fewer than three associated gauges
- Flood day counts are weighted averages and may include fractional days
- Historical data represents actual observations while projections are model-based estimates
- Two counties (13073 and 45037) have no gauge data and are excluded from the results
For more information about the source data and methodology, refer to:
- NOAA HTF Documentation
- Data dictionaries in the output directory