Transforming the Web into Data (with Python)

Workshop Materials

This repository contains the materials for the ULS/iSchool Digital Scholarship workshop held on Friday, April 17th 2015.

There are two main documents in this repository (and a couple supporting images), the slides (Web Scraping Tutorial.ipynb) and an example (Web Scraping Example.ipynb).

These documents are stored in this repository as IPython Notebooks, meaning they are JSON documents and not . The links below point to nbviewer so you can read them as a normal human being and not a machine.

If you are interested in building on top of these materials, feel free to fork this repository. You are fee to SHARE and ADAPT these mateirals as long as you ATTRIBUTE them as per the following creative commons license: CC-BY 2.0.

Slides

These slides contain a conceptual introduction to web scraping. They can be viewed as a document or as a set of slides.

Presentation as an HTML document
Presentation as Slides

Example

This notebook contains an example web scrape using Python with some in-line documentation about what is happening at each step.

Scraping the iConference 2015 Program

Using tmpnb, the temporary notebook service

The materials in this repository can be served to participants using the jupyter/tmpnb service. I've included a Dockerfile in this repository that can be used to build an image that contains IPython, the necesary python libraries, and the notebooks in this repository. I built an image from this Dockerfile and called it jupyter/minimal so the tmpnb service would just automatically run it because that is the name of the default image tmpnb launches for temporary notebooks. This is pretty bad documentation, if you have questions just hit me up on twitter at @mcburton. I'll probably write up something more comprehsive about setting up temporary teaching enviroments with tmpnb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Transforming the Web into Data (with Python)

Workshop Materials

Slides

Example

Using tmpnb, the temporary notebook service

Files

README.md

Latest commit

History

README.md

File metadata and controls

Transforming the Web into Data (with Python)

Workshop Materials

Slides

Example

Using tmpnb, the temporary notebook service