Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terms of Service Scraper #108

Open
5 tasks
laurenmarietta opened this issue Jan 22, 2020 · 0 comments
Open
5 tasks

Terms of Service Scraper #108

laurenmarietta opened this issue Jan 22, 2020 · 0 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed scraper

Comments

@laurenmarietta
Copy link
Collaborator

A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.

Steps we need to take to actualize these goals (not necessarily in order):

  • Verify that the existing software infrastructure for developing/running scrapers is functional. (I've heard @kbalajisrinivas might be useful for this.)
  • Get a sense for where government sites tend to keep their terms of service/privacy policy pages
  • Define what our metrics and evaluation system are for a "good" terms of service page
  • Write a new Python class in scrapers/scrapers/ that builds upon base_scraper.py and contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step)
  • Write tests for this new class

I invite anyone to add/modify this list!

@laurenmarietta laurenmarietta added enhancement New feature or request help wanted Extra attention is needed scraper labels Jan 22, 2020
@laurenmarietta laurenmarietta self-assigned this Jan 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed scraper
Projects
None yet
Development

No branches or pull requests

1 participant