Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry mechanism for transient errors #49

Merged
merged 7 commits into from
Jul 3, 2024

Conversation

Mews
Copy link
Collaborator

@Mews Mews commented Jun 29, 2024

Closes #39

Changes

  • Added a is_transient_error function inside fetcher.py;
    • I hardcoded the list of transient errors as [408, 502, 503, 504] since they're the most common ones, but let me know if I should add/remove any;
  • Modified fetch_url to retry fetching the url when it encounters a transient error until retries reaches 0;
  • Added tests to test the retry feature both directly through the fetch_url function and in the crawl method;
  • Added a max_retry_attempts option to CrawlSettings;

Ps.: The wait times for each consecutive retry attempt are just 1, 2 ,3, ... seconds. Let me know if that's ok.

@Mews Mews requested a review from indrajithi June 29, 2024 19:53
@indrajithi indrajithi merged commit 8ed15c5 into DataCrawl-AI:master Jul 3, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a retry mechanism for transient errors
2 participants