You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.
Steps we need to take to actualize these goals (not necessarily in order):
Verify that the existing software infrastructure for developing/running scrapers is functional. (I've heard @kbalajisrinivas might be useful for this.)
Get a sense for where government sites tend to keep their terms of service/privacy policy pages
Define what our metrics and evaluation system are for a "good" terms of service page
Write a new Python class in scrapers/scrapers/ that builds upon base_scraper.py and contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step)
Write tests for this new class
I invite anyone to add/modify this list!
The text was updated successfully, but these errors were encountered:
A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.
Steps we need to take to actualize these goals (not necessarily in order):
scrapers/scrapers/
that builds uponbase_scraper.py
and contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step)I invite anyone to add/modify this list!
The text was updated successfully, but these errors were encountered: