Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate html-proofer -> lychee, only check internal links #108

Open
yongrenjie opened this issue Jun 21, 2023 · 4 comments
Open

Migrate html-proofer -> lychee, only check internal links #108

yongrenjie opened this issue Jun 21, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@yongrenjie
Copy link
Contributor

Problem

Checking the validity of external links is fragile, and many of the external links require authentication anyway, meaning that it is not possible to meaningfully check their validity.

lychee also seems like an improvement over html-proofer, which we currently use in CI.

Proposed steps to solve

  1. Replace html-proofer with lychee in CI. (There is an action available here)
  2. Configure it to only check internal links
  3. Set up a pre-commit hook so that it matches CI. This needs some extra work as lychee doesn't provide one out of the box. I think the most sensible approach would actually be to make a PR to lychee to include one, as the developers seem quite open to this sort of enhancement.
    • pre-commit says that it will bootstrap rust if it's not already installed, so people editing the handbook should not need to separately install rust. (a plus!)
    • Note that html-proofer doesn't have a pre-commit hook either, so we are not losing anything by switching.

@JimMadge I think I have some extra time these couple of weeks, so could start looking into it. Edits etc welcome.

@yongrenjie yongrenjie added the enhancement New feature or request label Jun 21, 2023
@JimMadge
Copy link
Member

If this can work it would be good, more alignment between local tests and CI.

Somethings I'm not sure about,

  • What we would really want to do is hugo build --minify && lychee --offline. I'm not sure if pre-commit is that flexible
  • Do we need to do this at all? If we always use the ref and relref short codes I think Hugo should raise an error if the link is broken
  • Lychee is distributed as a binary, having pre commit install it's own copy of cargo/rust and building Lychee seems like a waste.

@yongrenjie
Copy link
Contributor Author

yongrenjie commented Jun 21, 2023

Do we need to do this at all? If we always use the ref and relref short codes I think Hugo should raise an error if the link is broken

🤨 That is true.

It doesn't check for missing anchors (e.g. {{< ref "/content/docs/contributing/contributing_changes.md#doesnt-exist" >}} is still ok), but I don't know if any tool can do that really.

In which case, the task reduces to making sure that all internal links are specified using either ref or relref. (Do you know if one of these is better than the other?)

In any case, that should be easy enough to do as it'd just be a simple bash script grepping the markdown source code for https://alan-turing-institute.github.io/REG-handbook/*.

@JimMadge
Copy link
Member

It doesn't check for missing anchors (e.g. {{< ref "/content/docs/contributing/contributing_changes.md#doesnt-exist" >}} is still ok), but I don't know if any tool can do that really.

Oh, that is interesting. I assumed that would be checked.
Not sure if it can be checked when it is translated to html.

In which case, the task reduces to making sure that all internal links are specified using either ref or relref. (Do you know if one of these is better than the other?)

In any case, that should be easy enough to do as it'd just be a simple bash script grepping the markdown source code for https://alan-turing-institute.github.io/REG-handbook/*.

That would be good. Might need to check for Markdown style links [something](../../index.md) too? A little more effort on contributors to save potential headaches. Especially if it can be integrated into pre-commit with some friendly output.

If you do something like that, it would also be appreciated in The Turing Way.

@yongrenjie
Copy link
Contributor Author

yongrenjie commented Jun 21, 2023

In principle you should be able to check anchors. (for example, I think Sphinx does this)

There are a couple of ways I guess — one would be checking the generated HTML, to ensure that a HTML element on the given page with the given id exists. However, making this robust might be a bit difficult, as just grepping for id="the-anchor-name" might lead to false positives. To do it properly would require parsing the HTML.

I suspect it might be easier to check anchors on the Markdown itself. This would rely on knowledge of which HTML anchors are generated from Markdown (and how), i.e. ## A heading is translated into <h2 id="a-heading">A heading</h2>. The downside is that such a tool would only work with a particular build system, but I think it'd be far easier to write, since Markdown syntax is much more restricted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants