-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec.qc.ca not loading events #82
Comments
Notes: The crawling works fine in a local machine, but fails when it is running in a github runner. Task for @dev-aravind - Add the user-agent header in all steps of the crawling process, which includes fetching entity URLs, fetching entity details ( both headless and headful mode ). @saumier will try and contact the Spec.qc.ca developer team to allow our user-agent to crawl their website. |
@troughc I sent you an email for Isabelle to ask her tech team to allow the Artsdata crawler User Agent "artsdata-crawler/3.3.0" Additional note: Artsdata crawler agent is "artsdata-crawler/3.3.0" however the tech teams have been informed to only match to "artsdata-crawler", because the version number (currently 3.3.0) changes with each update. |
email was sent |
@saumier The user-agent is now added to every step. |
@dev-aravind the tech teams have been informed to only match to "artsdata-crawler", because the version number (currently 3.3.0) changes with each update. |
@fjjulien Please let me know if you hear anything from Isabelle at Spec regarding our crawler being allowed in. Once the Artsdata crawler is allowed in I will run another crawl of their event JSON-LD. |
When running the workflow for spec.qc.ca the system exits with an error:
Max retries reached. Unable to fetch the content for page .
The text was updated successfully, but these errors were encountered: