This repository contains the materials for a topic modeling tutorial. There is a presentation, intro-to-tm.ipynb
, and a step by step walkthrough, running-MALLET.ipynb
. Both are composed as IPython Notebooks and can be executed or viewed as read-only on Github.
- Introduction to Topic Modeling is an IPython Slide show that provides a conceptual introduction to topic modeling. For a more extensive discussion please see my blog post The Joy of Topic Modeling. This presentation includes some valuable links at the end.
- Running MALLET documents the commands one would use to train a topic model on a set of documents contained in the
data/
directory. - The data directory contains around 200 text files which are blog posts scraped from the DHNow Editor's Choice index.