You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have not yet been able to get a VM up to run the scraper, so we need your help running the scraper locally in order to gather an initial dataset that the NILC can look at.
Ping me (@dotj) or @alejandrox1 here or post in the #immigration-connect slack page if you need help,
We've seen each page (50 comments) take about 4 minutes to scrape, and there are currently almost 10k comments, so it will take about 13 hours total. Of course, this is dependent on your internet speed and various other factors.
Tasks
Set up the scraper locally
Let the scraper run and collect all the comments (~13 hours)
The text was updated successfully, but these errors were encountered:
I started to take a look at this issue as a first contribution and have a couple of questions.
I was able to get the docker container running but was unable to runpython get_comments.py because the file wasn't being added to the container. I believe all the python files and database should get added to the container also. Would it be preferable to add the python files after installing requirements to avoid running the requirements install build step if the python files change?
I also ran into an error running python get_comments.py. What is the development process? The dockerfile is using ADD rather than volume so file changes are not being synced to the container.
Ahh I see where it is creating the volume in the docker run command -v $(CURDIR):/opt/app.
It seems that the issue is I don't have CURDIR set. I am not too familiar but I think it might be related to make. The other issue I had is also related to the DISPLAY env var which also isn't getting set properly.
Continuation of #48:
We have not yet been able to get a VM up to run the scraper, so we need your help running the scraper locally in order to gather an initial dataset that the NILC can look at.
The documentation for the scraper can be found here: https://github.com/Data4Democracy/immigration-connect/tree/master/public-charge/scraper
Ping me (@dotj) or @alejandrox1 here or post in the
#immigration-connect
slack page if you need help,We've seen each page (50 comments) take about 4 minutes to scrape, and there are currently almost 10k comments, so it will take about 13 hours total. Of course, this is dependent on your internet speed and various other factors.
Tasks
The text was updated successfully, but these errors were encountered: