-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: made scrapers not able to run twice at the same time #33
Conversation
@tyboro2002 i think this will not behave how you want/intend it to behave. Python variables and state is not thread safe at all. State is not guaranteed to be the same across threads. You are spawning a new thread and in there you check and modify the scrapestate. This state is not guaranteed to be available in any other thread after changing it and it can also easily lead to race conditions. 2 fast clicks on a sync button, 2 requests to the web server, 2 threads get started, both read the scrapestate variable and both get the answer that nothing is scraping yet, they both set scraping to true and start their business. If you want to share state, you will need to pass that object to the thread explicitly. You will probably also need to use a locking mechanism if you don't want a race condition. On a second note, if you are running flask on multiple threads the standard python state is also not guaranteed to be the same across those. I think you can save state in the global flask object, that at least solves thread synced state on the webserver level. |
I will look into it, can you make an issue for this? |
@tyboro2002 je kan ook naar kelderapi kijken, daar hebben we dat ook globale state in Flask https://github.com/ZeusWPI/kelderapi/blob/6f69f1f0afcfc6c9988473a03b3c5fe36cd6e842/app/app.py#L16 Edit: daar kan er wel maar 1 instantie de with lock nemen, dus maybe dat dat niet ideaal is voor wat jij wil? |
@app.route("/scrape/<restaurant_name>", methods=['POST'])
def scrape(restaurant_name):
"""
Start the scraper in a background thread for the given restaurant.
"""
try:
# Start a new thread to run the scraper for the given restaurant
scraper_thread = threading.Thread(target=run_scraper_in_background, args=(restaurant_name,))
scraper_thread.start()
return jsonify({"message": f"Scraping started for {restaurant_name}"}), 200
except Exception as e:
# If there's an error, return the error message in JSON format
return jsonify({"error": str(e)}), 500 I use threading threads isnt this different than just flask threads |
It is, but I think this might be useable regardless? but if you don't see how it could be useful, you don't have to use it; I just commented because the problem here made me think of the problem we had in kelderapi. |
I don't really understand how it works there couldn't we just simply get a lock on the data? |
I think this is fixed in #34 please review that one. |
No description provided.