Skip to content

Latest commit

 

History

History
97 lines (63 loc) · 6.96 KB

syllabus.md

File metadata and controls

97 lines (63 loc) · 6.96 KB

Part 0: Introduction

**Data science articulated, data science examples, history and context, technology landscape **

Readings

Part 1: Data Manipulation, at Scale

Databases and the relational algebra

Readings

**MapReduce, Hadoop, relationship to databases, algorithms, extensions, language; key-value stores and NoSQL; tradeoffs of SQL and NoSQL **

Readings

Data cleaning, entity resolution, data integration, information extraction

Readings

  • Elmagarmid, et. al. "Duplicate Record Detection: A Survey"
  • Koudas, et. al. "Record Linkage: Similarity Measures and Algorithms"

Part 2: Analytics

Basic statistical modeling, experiment design

Readings

Introduction to Machine Learning, supervised learning, decision trees/forests, simple nearest neighbor

Readings

Unsupervised learning: k-means, multi-dimensional scaling

Readings

Part 3: Interpreting and Communicating Results

**Visualization, visual data analytics **

Readings (well, watchings)

Ethics, privacy

Part 4: Special Topics

  • Graph Analytics: PageRank, community detection, recursive queries, iterative processing
  • Guest Lecture: Datameer
  • Guest Lecture: Wibidata

_Readings _