Skip to content
This repository has been archived by the owner on Jul 4, 2019. It is now read-only.

implement a task queue #7

Open
prgr4m opened this issue Nov 27, 2014 · 3 comments
Open

implement a task queue #7

prgr4m opened this issue Nov 27, 2014 · 3 comments

Comments

@prgr4m
Copy link
Contributor

prgr4m commented Nov 27, 2014

It just doesn't feel right... Yes, the project works with the current solution but with the district court issue, solving captchas and stateful browser hacking (wish mechanize was up to date, but there's always robobrowser d^_^b )... definitely seems like we should be using gearman or celery for this sort of background work with server-sent events but I'm not necessarily aware of the original requirements (if there are any). Here's a couple of questions:

  • what's the lowest end of browser support (if any requirements, noticed the jquery action on the front-end)
  • what do you currently have up on heroku (resource-wise)? (and who's paying for it?)
  • what do you feel most comfortable with in regards to hashing out a solution?
  • do you think there should be a delay of some sort? (to emulate human usage. If I were playing admin, I'd be playing super slueth because that would be my job if I saw significant traffic coming from one IP)

I'm pretty flexible but I need to know what your thoughts are on this. I noticed you opened up a ticket for this but closed it.

@bschoenfeld
Copy link
Member

Yeah, technically all this work shouldn't be done while the browser is waiting for a response. A background worker would be more appropriate but I did the simplest thing that could work.

  • We want to support whatever our users have. GA says 8% are using IE and 20% of those are using IE 8. I'm using jQuery for Ajax requests, so I went with 1.x for more browser support. If there's a good reason to drop support for an older browser, we can probably do so.
  • We've got 2 dynos on Herkou and I'm paying for them myself for now. We could probably get support from our users since they use the site for business or CfA but we also may be able to drop to one dyno and keep it free.
  • Not sure what you mean by hashing out a solution. We can just plan through Github Issues or maybe a Google Hangout. I'd also like to make sure we listen to the users. I've been talking to a bunch of them through Twitter. So far they are just happy that it exists, but I've already had a couple feature requests.
  • Yes, my current solution had a small delay built in. It's still much faster than manually searching, but I left a 1 second pause between requests on the client. I'm also not super concerned with getting shutdown. I don't consider this a long term solution. I've reached out to a couple of my city gov contacts about how the data transfer works on the backend. The ultimate goal would be to open this data so we could create a search that doesn't have to scrape the data.

I'm busy with so other things, so if you want to work on this thing, go for it. I'd like to see more sites be incorporated into the search. I think that's what the users want. I tried to add Alexandria to the search but their site is very difficult to scrape. Fairfax doesn't even have a public site.

@prgr4m
Copy link
Contributor Author

prgr4m commented Dec 4, 2014

I know what you mean on the busy part. It's taken this long to reply. I can only dedicate times on the weekends to this project after I take care of a couple of things. Paying out of pocket can be pretty pricey (will keep that in mind)... What I meant in regards to hashing out a solution was using libraries/techniques that you would be comfortable with since I don't know your background. Since I found out about this project through a local meetup, I figured I could contribute but would like to be congruent with the main developer on the project. That's all. I'm offline a bit but you should see something in my repo in about a week. Then I'll just add to this issue and get feedback before submitting a pull request. Have a good week.

@bschoenfeld
Copy link
Member

I'm open to using and learning whatever works. I rarely use Python, but I know it well enough to get a quick web server running. I don't have plans to take this thing much further, so if it goes in a different direction, that's cool. I'm prepared to step back if someone else wants to lead the way. I think there may be some interest in Code for DC, so I hope we hear from them soon.

Did you find out about this through the Code for HR meetup?

I should be getting billed by Heroku, but I can't find any charges on my account. I hope I'm not eating up CfA's credits. At any rate, GA says I'm only getting a couple hits a day so I've backed it down to one dyno.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants