Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: made scrapers not able to run twice at the same time #33

Merged
merged 1 commit into from
Dec 26, 2024

Conversation

tyboro2002
Copy link
Member

No description provided.

@tyboro2002 tyboro2002 merged commit e33e91b into master Dec 26, 2024
1 check passed
@mcbloch
Copy link
Member

mcbloch commented Dec 26, 2024

@tyboro2002 i think this will not behave how you want/intend it to behave. Python variables and state is not thread safe at all. State is not guaranteed to be the same across threads.

You are spawning a new thread and in there you check and modify the scrapestate. This state is not guaranteed to be available in any other thread after changing it and it can also easily lead to race conditions. 2 fast clicks on a sync button, 2 requests to the web server, 2 threads get started, both read the scrapestate variable and both get the answer that nothing is scraping yet, they both set scraping to true and start their business.

If you want to share state, you will need to pass that object to the thread explicitly. You will probably also need to use a locking mechanism if you don't want a race condition.

On a second note, if you are running flask on multiple threads the standard python state is also not guaranteed to be the same across those. I think you can save state in the global flask object, that at least solves thread synced state on the webserver level.

@tyboro2002
Copy link
Member Author

tyboro2002 commented Dec 26, 2024

I will look into it, can you make an issue for this?

@redfast00
Copy link
Member

redfast00 commented Dec 26, 2024

@tyboro2002 je kan ook naar kelderapi kijken, daar hebben we dat ook globale state in Flask https://github.com/ZeusWPI/kelderapi/blob/6f69f1f0afcfc6c9988473a03b3c5fe36cd6e842/app/app.py#L16

Edit: daar kan er wel maar 1 instantie de with lock nemen, dus maybe dat dat niet ideaal is voor wat jij wil?

@tyboro2002
Copy link
Member Author

@tyboro2002 je kan ook naar kelderapi kijken, daar hebben we dat ook globale state in Flask https://github.com/ZeusWPI/kelderapi/blob/6f69f1f0afcfc6c9988473a03b3c5fe36cd6e842/app/app.py#L16

Edit: daar kan er wel maar 1 instantie de with lock nemen, dus maybe dat dat niet ideaal is voor wat jij wil?

@app.route("/scrape/<restaurant_name>", methods=['POST'])
def scrape(restaurant_name):
    """
    Start the scraper in a background thread for the given restaurant.
    """
    try:
        # Start a new thread to run the scraper for the given restaurant
        scraper_thread = threading.Thread(target=run_scraper_in_background, args=(restaurant_name,))
        scraper_thread.start()

        return jsonify({"message": f"Scraping started for {restaurant_name}"}), 200

    except Exception as e:
        # If there's an error, return the error message in JSON format
        return jsonify({"error": str(e)}), 500

I use threading threads isnt this different than just flask threads

@redfast00
Copy link
Member

It is, but I think this might be useable regardless? but if you don't see how it could be useful, you don't have to use it; I just commented because the problem here made me think of the problem we had in kelderapi.

@tyboro2002
Copy link
Member Author

It is, but I think this might be useable regardless? but if you don't see how it could be useful, you don't have to use it; I just commented because the problem here made me think of the problem we had in kelderapi.

I don't really understand how it works there couldn't we just simply get a lock on the data?

@tyboro2002
Copy link
Member Author

tyboro2002 commented Dec 26, 2024

I think this is fixed in #34 please review that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants