Skip to content

Latest commit

 

History

History
186 lines (102 loc) · 6.2 KB

more.md

File metadata and controls

186 lines (102 loc) · 6.2 KB

More cool stuff


Overview.

Python tools.

Buzzwords.

Applications.


Using APIs

https://www.codecademy.com/en/tracks/npr https://www.codecademy.com/en/tracks/nhtsa

Maps

http://maxberggren.github.io/2015/08/04/basemap/ http://matplotlib.org/basemap/users/examples.html

http://geoffboeing.com/2014/08/visualizing-summer-travels-with-cartodb/

Animations

https://jakevdp.github.io/blog/2012/08/18/matplotlib-animation-tutorial/ http://www.christianmoscardi.com/blog/2015/08/12/embedding-d3-in-ipython-notebook.html Savvas on NBA motion

Debugging

http://blog.ionelmc.ro/2013/06/05/python-debugging-tools/

Dashboards

http://multithreaded.stitchfix.com/blog/2015/07/16/pyxley/

Scraping websites

Scrapy, Beautful Soup

http://pbpython.com/web-scraping-mn-budget.html

http://savvastjortjoglou.com/nba-shot-sharts.html http://www.gregreda.com/tag/scraping.html http://www.gregreda.com/2013/03/03/web-scraping-101-with-python/

https://blog.scrapinghub.com/2016/02/24/scrapy-tips-from-the-pros-february-2016-edition/

http://blog.webhose.io/2015/08/16/dead-simple-for-devs-python-crawler-script-for-extracting-structured-data-from-any-almost-website-into-csv/

https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Dashboard/Medicare-Drug-Spending/Drug_Spending_Dashboard.html

http://www.r-bloggers.com/the-star-wars-grossing-war/

Bokeh

http://rowanv.com/portfolio/oecd_unemployment/ https://github.com/rowanv/giraffe_viz/blob/master/oecd_unemployment.py

https://twitter.com/pwang/status/703040765459468288

Dashboards

This uses flask: http://dash.rowanv.com/ https://github.com/rowanv/giraffe_dash

Natural language processing

Processing text...

http://nbviewer.jupyter.org/url/norvig.com/ipython/How%20to%20Do%20Things%20with%20Words.ipynb

http://gaussiangeek.blogspot.com/2015/06/ever-since-i-heard-abbey-road-i.html

Other resources from our politics group: https://www.youtube.com/watch?v=AOU-Yw1qdJs http://www.nyu.edu/projects/politicsdatalab/learning_extra.html

http://spacy.io/

http://ptrckprry.com/course/ssd/nltk-tutorial.pdf http://ptrckprry.com/ssd/

Fuzzy wuzzy

Large datasets

From Itamar: I tried playing around with some of the examples below and others I found. My key insight is that the data sets are extremely large (at least few GB each), and therefore the way to access the data is by running SQL-lite queries as part of the API request and load to memory only aggregated data. As far as I remember we said SQL is not a focus of this course. For that reason I don't think this data set can be handy for us. Let me know if you guys think otherwise.

On Thursday, November 19, 2015 at 4:21:05 PM UTC-5, David Backus wrote: Data https://data.cityofnewyork.us/ https://nycopendata.socrata.com/dashboard

Examples https://plot.ly/ipython-notebooks/big-data-analytics-with-pandas-and-sqlite/ http://iquantny.tumblr.com/ http://fivethirtyeight.com/features/uber-is-serving-new-yorks-outer-boroughs-more-than-taxis-are/ http://fivethirtyeight.com/features/how-data-made-me-a-believer-in-new-york-citys-restaurant-grades/

Pokemon or Big Data? https://pixelastic.github.io/pokemonorbigdata/

SQL + Python: http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/

Plot.ly

Google App Engine

Other coding enviroments

There are lots of coding environments out there. Spyder is the easiest, in our view, but there are lots of choices.

  • Dave uses Spyder because he likes its old school Matlab look and feel.
  • Chase and Spencer use Sublime Text, an editor that can be customized to do almost anything. Paul does the same with Vim. Both are text editors only. You run Python from the command line, which is even more old school.
  • Lots of people recommend Pycharm. Dave thinks this is the tool of choice for someone who wants to go to the next level: slightly harder than Spyder to get going, but way more powerful once you do. Among other things, it looks really cool.

Here are two lists if you'd like to get a sense of what's out there and what others think about it.

R

Install R. If you decide you'd like to try R some time, choose a "mirror" and download the appropriate version. We recommend you run it in RStudio, a popular coding environment. Both are free. Once you've installed them, start up RStudio and it will access R as needed.

Things to check out

  1. Conda. This could have promise: Continuum support of R. Two things:
  • You can install from conda, which you have if you installed Anaconda.

  • You can run in a Jupyter notebook, as Brian mentioned.

The bad news is that it doesn't seem to have RStudio, which I like.

More at https://www.continuum.io/blog/developer/jupyter-and-conda-r

  1. List of online resources.

http://www.r-bloggers.com/learning-r-index-of-online-r-courses-october-2015/

  1. Princeton's intro.

http://data.princeton.edu/R/

Learn R. If you want to learn how to program in R, there's lots of good stuff online -- too much, really. We like these:

We also like the blog aggregator R-bloggers, which is filled with applications, including code.

Other languages

http://www.curiousefficiency.org/posts/2015/10/languages-to-improve-your-python.html

Google's Cloud Datalab

https://cloud.google.com/datalab/

Also Wakari, AWS...

SQLite