Run scraper on local machine to gather regulations.gov comments for the NILC #57

dotj · 2018-11-18T19:03:00Z

Continuation of #48:

We have not yet been able to get a VM up to run the scraper, so we need your help running the scraper locally in order to gather an initial dataset that the NILC can look at.

The documentation for the scraper can be found here: https://github.com/Data4Democracy/immigration-connect/tree/master/public-charge/scraper

Ping me (@dotj) or @alejandrox1 here or post in the #immigration-connect slack page if you need help,

We've seen each page (50 comments) take about 4 minutes to scrape, and there are currently almost 10k comments, so it will take about 13 hours total. Of course, this is dependent on your internet speed and various other factors.

Tasks

Set up the scraper locally
Let the scraper run and collect all the comments (~13 hours)

The text was updated successfully, but these errors were encountered:

coreyar · 2019-03-30T20:02:49Z

I started to take a look at this issue as a first contribution and have a couple of questions.

I was able to get the docker container running but was unable to runpython get_comments.py because the file wasn't being added to the container. I believe all the python files and database should get added to the container also. Would it be preferable to add the python files after installing requirements to avoid running the requirements install build step if the python files change?

I also ran into an error running python get_comments.py. What is the development process? The dockerfile is using ADD rather than volume so file changes are not being synced to the container.

coreyar · 2019-03-30T20:11:42Z

Ahh I see where it is creating the volume in the docker run command -v $(CURDIR):/opt/app.

It seems that the issue is I don't have CURDIR set. I am not too familiar but I think it might be related to make. The other issue I had is also related to the DISPLAY env var which also isn't getting set properly.

dotj added beginner-friendly data-collection labels Nov 18, 2018

dotj mentioned this issue Nov 18, 2018

Automate saving comments and supporting documents to database #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run scraper on local machine to gather regulations.gov comments for the NILC #57

Run scraper on local machine to gather regulations.gov comments for the NILC #57

dotj commented Nov 18, 2018 •

edited

Loading

coreyar commented Mar 30, 2019

coreyar commented Mar 30, 2019

Run scraper on local machine to gather regulations.gov comments for the NILC #57

Run scraper on local machine to gather regulations.gov comments for the NILC #57

Comments

dotj commented Nov 18, 2018 • edited Loading

coreyar commented Mar 30, 2019

coreyar commented Mar 30, 2019

dotj commented Nov 18, 2018 •

edited

Loading