Overview.
Python tools.
Buzzwords.
Applications.
https://www.codecademy.com/en/tracks/npr https://www.codecademy.com/en/tracks/nhtsa
http://maxberggren.github.io/2015/08/04/basemap/ http://matplotlib.org/basemap/users/examples.html
http://geoffboeing.com/2014/08/visualizing-summer-travels-with-cartodb/
https://jakevdp.github.io/blog/2012/08/18/matplotlib-animation-tutorial/ http://www.christianmoscardi.com/blog/2015/08/12/embedding-d3-in-ipython-notebook.html Savvas on NBA motion
http://blog.ionelmc.ro/2013/06/05/python-debugging-tools/
http://multithreaded.stitchfix.com/blog/2015/07/16/pyxley/
Scrapy, Beautful Soup
http://pbpython.com/web-scraping-mn-budget.html
http://savvastjortjoglou.com/nba-shot-sharts.html http://www.gregreda.com/tag/scraping.html http://www.gregreda.com/2013/03/03/web-scraping-101-with-python/
https://blog.scrapinghub.com/2016/02/24/scrapy-tips-from-the-pros-february-2016-edition/
http://www.r-bloggers.com/the-star-wars-grossing-war/
http://rowanv.com/portfolio/oecd_unemployment/ https://github.com/rowanv/giraffe_viz/blob/master/oecd_unemployment.py
https://twitter.com/pwang/status/703040765459468288
This uses flask: http://dash.rowanv.com/ https://github.com/rowanv/giraffe_dash
Processing text...
http://nbviewer.jupyter.org/url/norvig.com/ipython/How%20to%20Do%20Things%20with%20Words.ipynb
http://gaussiangeek.blogspot.com/2015/06/ever-since-i-heard-abbey-road-i.html
Other resources from our politics group: https://www.youtube.com/watch?v=AOU-Yw1qdJs http://www.nyu.edu/projects/politicsdatalab/learning_extra.html
http://ptrckprry.com/course/ssd/nltk-tutorial.pdf http://ptrckprry.com/ssd/
From Itamar: I tried playing around with some of the examples below and others I found. My key insight is that the data sets are extremely large (at least few GB each), and therefore the way to access the data is by running SQL-lite queries as part of the API request and load to memory only aggregated data. As far as I remember we said SQL is not a focus of this course. For that reason I don't think this data set can be handy for us. Let me know if you guys think otherwise.
On Thursday, November 19, 2015 at 4:21:05 PM UTC-5, David Backus wrote: Data https://data.cityofnewyork.us/ https://nycopendata.socrata.com/dashboard
Examples https://plot.ly/ipython-notebooks/big-data-analytics-with-pandas-and-sqlite/ http://iquantny.tumblr.com/ http://fivethirtyeight.com/features/uber-is-serving-new-yorks-outer-boroughs-more-than-taxis-are/ http://fivethirtyeight.com/features/how-data-made-me-a-believer-in-new-york-citys-restaurant-grades/
Pokemon or Big Data? https://pixelastic.github.io/pokemonorbigdata/
SQL + Python: http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/
There are lots of coding environments out there. Spyder is the easiest, in our view, but there are lots of choices.
- Dave uses Spyder because he likes its old school Matlab look and feel.
- Chase and Spencer use Sublime Text, an editor that can be customized to do almost anything. Paul does the same with Vim. Both are text editors only. You run Python from the command line, which is even more old school.
- Lots of people recommend Pycharm. Dave thinks this is the tool of choice for someone who wants to go to the next level: slightly harder than Spyder to get going, but way more powerful once you do. Among other things, it looks really cool.
Here are two lists if you'd like to get a sense of what's out there and what others think about it.
Install R. If you decide you'd like to try R some time, choose a "mirror" and download the appropriate version. We recommend you run it in RStudio, a popular coding environment. Both are free. Once you've installed them, start up RStudio and it will access R as needed.
Things to check out
- Conda. This could have promise: Continuum support of R. Two things:
-
You can install from conda, which you have if you installed Anaconda.
-
You can run in a Jupyter notebook, as Brian mentioned.
The bad news is that it doesn't seem to have RStudio, which I like.
More at https://www.continuum.io/blog/developer/jupyter-and-conda-r
- List of online resources.
http://www.r-bloggers.com/learning-r-index-of-online-r-courses-october-2015/
- Princeton's intro.
Learn R. If you want to learn how to program in R, there's lots of good stuff online -- too much, really. We like these:
- Try R is like Codecademy, you run code online.
- Princeton: http://data.princeton.edu/R/
- List http://www.r-bloggers.com/learning-r-index-of-online-r-courses-october-2015/
- Kelly Black has a tutorial that covers more advanced topics, including introductory statistics.
- Introduction to Statistical Learning combines R programming with an introduction to modern statistics and machine learning.
- NYU https://github.com/pablobarbera/data-science-workshop
We also like the blog aggregator R-bloggers, which is filled with applications, including code.
http://www.curiousefficiency.org/posts/2015/10/languages-to-improve-your-python.html
https://cloud.google.com/datalab/
Also Wakari, AWS...