Terms of Service Scraper #108

laurenmarietta · 2020-01-22T01:01:37Z

A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.

Steps we need to take to actualize these goals (not necessarily in order):

Verify that the existing software infrastructure for developing/running scrapers is functional. (I've heard @kbalajisrinivas might be useful for this.)
Get a sense for where government sites tend to keep their terms of service/privacy policy pages
Define what our metrics and evaluation system are for a "good" terms of service page
Write a new Python class in scrapers/scrapers/ that builds upon base_scraper.py and contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step)
Write tests for this new class

I invite anyone to add/modify this list!

The text was updated successfully, but these errors were encountered:

laurenmarietta added enhancement New feature or request help wanted Extra attention is needed scraper labels Jan 22, 2020

laurenmarietta self-assigned this Jan 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terms of Service Scraper #108

Terms of Service Scraper #108

laurenmarietta commented Jan 22, 2020

Terms of Service Scraper #108

Terms of Service Scraper #108

Comments

laurenmarietta commented Jan 22, 2020