All three groups are off to a good start on quite different projects:
- Logistic Regression and Image Recognition
- Nearest Neighbors and Recommender Systems
- Game playing
Looking ahead to this week, here's where you should be spending your time.
All you of you have found example code of various sorts illustrating your problem. By looking at those examples, you should be able to use jupyter to produce an introduction to your data and the problem. This means:
- loading your data into an appropriate format
- displaying examples from your data set
- producing summary statistics on your data as appropriate
- generally getting more comfortable working in this environment
The second goal for this week is to understand the underlying structure of the method you are studying.
Both of the rather advanced references for the course:
- Pattern Recognition and Machine Learning by Christopher Bishop
- The elements of statistical learning by Hastie, Tibshirani, and Friedman
contain treatments of Logistic Regression and Nearest Neighbor methods. However, there are many other resources and more elementary books on statistics may have more accessible introductions.
Particularly important for this goal is:
- to clarify any underlying assumptions that your method makes about how the data is distirbuted;
- to get a sense of practical limitations of your method (how much space, time and so on are involved in using it).
Each group will make a progress report on Monday 2/10, following the standard rules:
- no more than 15 minutes per group,
- everybody talks.
In addition, this week I plan to do some talking on Wednesday 2/5 and Friday 2/7 about some important fundamentals:
- the curse of dimensionality
- the bias/variance tradeoff and overfitting
- evaluating a classifier (false positives and negatives, precision and recall).