Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate broken link detection for documentation #1168

Open
pushfoo opened this issue Apr 5, 2022 · 3 comments
Open

Automate broken link detection for documentation #1168

pushfoo opened this issue Apr 5, 2022 · 3 comments
Milestone

Comments

@pushfoo
Copy link
Member

pushfoo commented Apr 5, 2022

Enhancement request:

What should be added/changed?

A way to detect 404 links in doc.

Recap of today's discord discussion after I brought the topic up:

  1. A BeautifulSoup script against generated files should be enough to get started
  2. Selenium is overkill given we don't use much JS in the doc
  3. 404 detection could be run as either a unit test or as part of the sphinx build

What would it help with?

Finding broken links. We have multiple recent examples of broken links (#1165 , #1130 , #998), and there are likely more undetected. Upcoming doc reorganization will create even more, so we need a way to find them all.

@pushfoo
Copy link
Member Author

pushfoo commented Apr 5, 2022

For now, developers can run make linkcheck locally from inside the doc directory. This sphinx functionality will attempt to contact outside sites in addition to checking internal links, and you may get rate limited. It will also be very slow as it decides to sleep sometimes.

For the future, I'll look into ways of specifying behavior such as conf.py overrides and CLI flags for running the linkcheck module on its own. If anyone beats me to it, comment on here and/or make a PR.

@einarf
Copy link
Member

einarf commented Apr 6, 2022

So we should run make linkcheck in github actions?

@pushfoo
Copy link
Member Author

pushfoo commented Apr 8, 2022

Not exactly. Default make linkecheck behavior does some things we don't want:

  1. it detects all broken links everywhere, including the release notes
  2. if an external page is up but missing a specified anchor, linkcheck treats the link as broken

That's why I was asking about splitting release notes into different files on discord. It could help us choose where we link check more easily, ie only the the most recent releases. Old release notes have broken links to old page locations or deleted github accounts. We might not want to prioritize those, so they'd be noise if detected by CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants