Skip to content

msu-econ-data-analytics/course-materials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Course Materials for Advanced Data Analytics in Economics

Nick Hagerty, Montana State University

Except where otherwise noted, this work is licensed under Creative Commons BY-NC-SA 4.0.


Skip to: Lecture slides | Supplemental labs | External resources


Lecture slides

Fall 2024

Part I: Pre-Analytical Tools

Topic 1: R Basics

  • About R
  • Operators
  • Objects and functions
  • Data frames
  • Vectors
  • Indexing

Topic 2: Programming in R

  • If/else statements
  • For-loops
  • Functions
  • Vectorization
  • Parallelization

Topic 3: Productivity Tools

Topic 4: Data Wrangling

  • Philosophy of tidy data
  • Wrangling data with dplyr
  • Joining data with dplyr
  • Tidying data with tidyr
  • Importing data with readr

Topic 5: Data Cleaning

  • Join safety
  • Keys and relational data
  • String cleaning
  • Number storage
  • Data Cleaning Checklist (pdf version)

Topic 6: Data Acquisition

  • Where data comes from
  • Webscraping
  • Using APIs

Topic 7: Best Practices of Coding and Workflows

  • The perils of bad data cleaning
  • Reproducibility and transparency
  • Best practices (code organization, file organization, version control, abstraction, commenting, unit tests)

Part II: Core Analytical Tools

Topic 8: Distinguishing Goals of Data Analysis

  • The Data Generating Process
  • Potential outcomes, counterfactuals, and causal inference
  • Descriptive, Predictive, or Causal Analysis?

Topic 9: Exploratory Analysis

  • Part 1: Understanding variables

    • Summaries, frequency tables and crosstabs in R
    • Characterizing distributions
    • Handling extreme values
    • Handling variable transformations
    • Handling missing data
  • Part 2: Understanding relationships

    • Characterizing relationships
    • Binscatter
    • The Conditional Expectation Function
    • Adjusting for other variables
    • Bin smoothing and local regression

Topic 10: Regression Modeling

  • Basic regression in R
  • Indicator and interaction terms
  • Econometrics packages in R
  • Modeling nonlinear relationships
  • Using regression models for prediction

Topic 11: Data Visualization

  • Basics of ggplot2
  • Plotting examples
  • Colors and themes
  • Principles of data visualization
  • Case studies

Part III: Extended Analytical Tools

Topic 12: Spatial Analysis

  • Intro to Geospatial Data
  • Part 1
    • Spatial data and quick mapping
    • Reference systems and projections
  • Part 2
    • Spatial queries (measurement, relationships)
    • Spatial subsetting
    • Geometry operations
    • Spatial joins

Topic 13: Machine Learning Fundamentals

  • Overview: Statistical learning
  • Assessing model accuracy
  • Cross-validation

Topic 14: Prediction Methods

Topic 15: Machine Learning in Economics

  • Predicting outcomes
  • Constructing new data
  • Selecting covariates
  • Predicting causal effects

Topic 16: Databases and Big Data

  • Tools for big data
  • Databases in R
  • Writing SQL queries
  • Getting started with BigQuery

Part IV: Further extensions not covered in class

ML Methods for Classification Tasks

  • Part 1: Methods
    • Classification
    • Logistic regression
    • k-nearest neighbors
    • Model assessment
    • Decision trees
  • Part 2: Examples
    • Logistic regression and KNN
    • Cross-validation
    • Decision trees
    • Teach your laptop to read

Supplemental labs

By Laura Sikoski


External resources

This is a list of further resources that you may find helpful throughout (and after!) this course. Start with the course materials above, but check these out for alternative explanations or if you want to take a deeper dive into a particular topic. If one isn't speaking to you, try another.

Basics of R

Programming in R

R Markdown

Git and GitHub

Data wrangling with the tidyverse

Data cleaning

Data acquisition and webscraping

Best practices for coding and workflows

Distinguishing goals of data analysis

Exploratory analysis

Regression modeling in R

Data visualization

Spatial analysis

Fundamentals of machine learning

Shrinkage methods

Classification methods

Machine learning with tidymodels

Unsupervised learning

Further methods in machine learning

  • ISLR (James, Witten, Hastie, Tibshirani).
    • Ch. 8: Tree-Based Methods
    • Ch. 9: Support Vector Machines
    • Ch. 10: Deep Learning
  • Prediction and Machine Learning Lectures (Ed Rubin).
    • Lecture 007: Decision Trees
    • Lecture 008: Ensemble Methods
    • Lecture 009: Support Vector Machines

Applications of machine learning in economics

Databases (SQL)

Distributed and cloud computing

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages