Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: simplify exclude list, latest muffet, update redirects #1733

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

mrjones-plip
Copy link
Contributor

@mrjones-plip mrjones-plip commented Dec 16, 2024

Description

This PR updates the muffet link checker to:

  • homogenize exclude list to be all the same and easier to read and maintain:
    • all links start the same: http[s]*:
    • (almost) all links end the same .*" \
    • google drive/docs/sheet links are all "http[s]*://.*google.com/.*DOC-ID-HERE.*" \
    • "weird" links like 127* and localhost are at the end
  • add echis ke repos to avoid 404 on private repos
  • uses latest 2.x muffet (major revision bump from 1.x)
  • embeds muffet binary in repo for easy of running on linux x86 systems (maybe lead to testing against production URL thus have no install dependencies in CI?!)
  • updates flags used per latest muffet version
  • removes hard coded DNS script as muffet natively supports the --dns-resolver flag now
  • had some trouble with redirects*, so went ahead and changed these 4 link URLs using the 4 bash commands below:
    • github.com/medic/medic/ -> github.com/medic/cht-core/
    • github.com/medic/medic-webapp/ -> github.com/medic/cht-core/
    • github.com/medic/medic-conf/ -> github.com/medic/cht-conf/
    • github.com/medic/medic-android/ -> github.com/medic/cht-android/
    find . -type f -name "*.md" -exec sed -i 's/github\.com\/medic\/medic\/issues/github\.com\/medic\/cht-core\/issues/g' {} +
    find . -type f -name "*.md" -exec sed -i 's/github\.com\/medic\/medic-webapp\/issues/github\.com\/medic\/cht-core\/issues/g' {} +
    find . -type f -name "*.md" -exec sed -i 's/github\.com\/medic\/medic-conf\/issues/github\.com\/medic\/cht-conf\/issues/g' {} +
    find . -type f -name "*.md" -exec sed -i 's/github\.com\/medic\/medic-android\/issues/github\.com\/medic\/cht-android\/issues/g' {} +
    

*Redirect issue

OMG - what in the world!? saw this and just decided to do the repo rename in the links to avoid this case. Two identical calls to curl two different responses 🪦

$ curl -I https://github.com/medic/medic-webapp/issues/4743                                 
HTTP/2 404                             
server: GitHub.com
date: Mon, 16 Dec 2024 20:48:51 GMT                                                        
content-type: text/plain; charset=utf-8
content-length: 9                                                                          
server-timing: nginx;desc="NGINX";dur=1.348052,glb;desc="GLB";dur=32.624074
x-voltron-version: 69a2227                                                                                                                                                            vary: Accept-Encoding, Accept, X-Requested-With                                                                                                                                       x-frame-options: DENY                                                                      
set-cookie: _gh_sess=Ls%2FeJszygf32o1n8ntIuB1TIpJiJd8mST73I1UXMf1Gh%2FeZTjUZQXsWd2DW67hF73NblBq6QgdUoQYZmLUfeyiIwZBsYnmyXQQfQlEJp4jVKt%2BmanRqlyLiO3lAIKBV7M%2BvJHu0XlpypRvXNGQ86eeL2e
XM9%2FAOyvaj7bgLWaCDUnNUFdwElqAukz4O1iZqbRL3UmRXw9CrD%2FttvjBkyLWtTu9vjKz%2F3Zhw9vcLrAZXLB%2Bq7zuWEHEbABaQmXBIhYbRnhbRoiifClbLfi1Z4zw%3D%3D--oYXqIV7pZApgxCMJ--X9IBBS%2F%2B1C%2Bx7di5B
KAshg%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax       
set-cookie: _octo=GH1.1.122198198.1734382145; Path=/; Domain=github.com; Expires=Tue, 16 Dec 2025 20:49:05 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Tue, 16 Dec 2025 20:49:05 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: E6C3:22D725:1F4686F8:1FF39F8A:67609241
                                             
$ curl -I https://github.com/medic/medic-webapp/issues/4743
HTTP/2 301                             
server: GitHub.com
date: Mon, 16 Dec 2024 20:49:05 GMT                                                        
content-length: 0         
location: https://github.com/medic/cht-core/issues/4743
server-timing: nginx;desc="NGINX";dur=0.660681,glb;desc="GLB";dur=28.681684
x-voltron-version: 69a2227                                                                                                                                                            vary: Accept-Encoding, Accept, X-Requested-With                                                                                                                                       x-frame-options: DENY                                                                      
set-cookie: _gh_sess=o27xqANUe1pi9TWw%2BEgJammbHM6%2Ffblmg59hC2ciHIJUNOFBsZevb0Emj1mEZHoqkLFmsRxw%2FqUCv8r9%2FqG1NspH7Bw9yHlX%2F%2FBFAt2ZAY0d41hqyN1ZZm3tZ97u6pOzpqCWezMTdqH9zuykwu6KE
RlDLnq4Z1zN0laXk1Z5pBOK%2B68gbux7n6M5m5tl35CrsDtyC4pvIFde7IBRXZETIL6lWYwGUwJIMGys8f5IdX5Igeyxd5FN3SOG4JhlkF2%2FGUEbB2gADzbOp3We7W9aTg%3D%3D--4ULYk9PP26BItW43--CLNBm1xXIbTtPue2X3pyig%
3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax              
set-cookie: _octo=GH1.1.212428826.1734382151; Path=/; Domain=github.com; Expires=Tue, 16 Dec 2025 20:49:11 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Tue, 16 Dec 2025 20:49:11 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: 45D4:19A744:1DBD6461:1E6368B6:67609247                                          

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@mrjones-plip mrjones-plip changed the title homogenize exclude list, add echis ke repos, muffet link checket: homogenize exclude list, add echis ke repos Dec 16, 2024
@mrjones-plip mrjones-plip changed the title muffet link checket: homogenize exclude list, add echis ke repos feat: link checker - homogenize exclude list, add echis ke repos, use latest muffet Dec 16, 2024
@mrjones-plip mrjones-plip marked this pull request as ready for review December 16, 2024 19:29
@mrjones-plip mrjones-plip changed the title feat: link checker - homogenize exclude list, add echis ke repos, use latest muffet feat: homogenize exclude list, add echis ke repos, use latest muffet Dec 16, 2024
@mrjones-plip
Copy link
Contributor Author

oop! hold on - I'm getting false positives on github redirects on older repos that we've moved. Lemme try and resolve that before your review!

@mrjones-plip mrjones-plip changed the title feat: homogenize exclude list, add echis ke repos, use latest muffet feat: homogenize exclude list, exclude echis repos, latest muffet, update redirects Dec 16, 2024
@mrjones-plip mrjones-plip removed the request for review from andrablaj December 16, 2024 23:22
@mrjones-plip
Copy link
Contributor Author

so using the muffet.sh as of 69dec48 yields the results below. better, but still not even close! More work to be done

http://localhost:1313/hosting/3.x/offline/
        403     https://stackoverflow.com/questions/6370017/mapping-a-hostname-to-an-ip-address-on-android
http://localhost:1313/hosting/analytics/setup-docker-compose/
        403     https://www.docker.com/blog/announcing-compose-v2-general-availability/
        403     https://www.docker.com/products/docker-desktop/
http://localhost:1313/building/reference/app-settings/header_tabs/
        lookup fontawesome.com: i/o timeout     https://fontawesome.com/v4.7.0/
http://localhost:1313/contribute/code/core/dev-environment/
        403     https://www.docker.com/products/docker-desktop
http://localhost:1313/hosting/requirements/
        403     https://www.docker.com/blog/announcing-compose-v2-general-availability/
        403     https://www.docker.com/products/docker-desktop/
http://localhost:1313/building/contact-management/user-management-tool/
        403     https://gallery.ecr.aws/medic/cht-user-management
http://localhost:1313/building/guides/interoperability/fhir/
        403     https://hl7.org/fhir/
http://localhost:1313/hosting/3.x/ec2-setup-guide/
        403     https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html
        403     https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html
        403     https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html
        403     https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snapshot-lifecycle.html
        403     https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snapshot-lifecycle.html#snapshot-lifecycle-console
        403     https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-restoring-volume.html
        403     https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html
http://localhost:1313/core/overview/architecture/
        403     https://www.nginx.com/
http://localhost:1313/contribute/code/cht-conf/
        403     https://www.docker.com/
http://localhost:1313/building/guides/database/rdbms-from-windows/
        403     https://stackoverflow.com/questions/2224066/how-to-convert-ssh-keypairs-generated-using-puttygen-windows-into-key-pairs-us/2224204#2224204
http://localhost:1313/building/examples/pharmacovigilance-reference-app/
        tls: failed to verify certificate: x509: certificate signed by unknown authority        https://www.intellisoftkenya.com/
http://localhost:1313/contribute/medic/product-development-process/deploy-on-eks/
        403     https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
        403     https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
        403     https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html
http://localhost:1313/contribute/medic/onboarding/all-the-things/
        timeout https://academy.communityhealthtoolkit.org/
http://localhost:1313/hosting/monitoring/production/
        403     https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snapshot-lifecycle.html
http://localhost:1313/hosting/4.x/production/docker/backups/
        403     https://gallery.ecr.aws/s5s3h4s7/
http://localhost:1313/building/tutorials/couch2pg-setup/
        403     https://postgresqlco.nf/doc/en/param/password_encryption/14/
http://localhost:1313/contribute/medic/product-development-process/ux-research-repo/
        400     http://airtable.com
http://localhost:1313/core/releases/0.4.15-and-earlier/
        401     https://docs.google.com/a/medicmobile.org/spreadsheet/ccc?key=0Ao9l2yegOFn7dEJRTEw1Z3RmZm0wTEo4Nk92NjVocnc
http://localhost:1313/contribute/technical-resources/
        403     https://www.pluralsight.com/courses/kubernetes-packaging-applications-helm
        403     https://www.pluralsight.com/paths/using-kubernetes-as-a-developer
        431     https://twitter.com/iximiuz/status/1423984739514454033?s=21
        timeout https://academy.communityhealthtoolkit.org/courses/course-v1:cht-academy+201+2022/about
http://localhost:1313/building/examples/direct-to-client/
        403     https://sites.uw.edu/twowaytexting/
http://localhost:1313/building/local-setup/
        403     https://www.docker.com/products/docker-desktop
http://localhost:1313/hosting/4.x/production/docker/
        403     https://askubuntu.com/a/477554
http://localhost:1313/building/tutorials/application-tests/
        lookup pptr.dev: i/o timeout (following redirect https://pptr.dev/guides/getting-started)       https://developers.google.com/web/tools/puppeteer/get-started#default_runtime_settings
http://localhost:1313/core/releases/3.10.0/
        404     https://docs.communityhealthtoolkit.org/apps/guides/security/privacy-policy/
http://localhost:1313/core/releases/2.13.0/
        404     http://localhost:1313/core/releases/2.13.0/building/contact-summary/contact-summary-overview/
http://localhost:1313/building/guides/messaging/gateways/africas-talking/
        403     https://account.africastalking.com
http://localhost:1313/core/releases/3.4.0/
        404     https://github.com/medic/medic-docs/blob/master/features/webapp-branding.pdf
http://localhost:1313/core/releases/3.11.0/
        404     https://docs.communityhealthtoolkit.org/apps/guides/messaging/rapidpro
http://localhost:1313/core/releases/2.16.0/
        401     https://docs.google.com/document/d/1uXSqntenhxlGOeFtP7ScLcFmoid3kagPYn-EDoodP3s/edit#heading=h.4bwl8oo2mtpi
        404     http://localhost:1313/core/releases/2.16.0/2.15.0.md#death-reporting

@mrjones-plip
Copy link
Contributor Author

OK!! muffet doesn't send a user agent apparently? A lot of sites don't like that, hence all the 403s even at 1 request a second.

Commit b978454 gets only these (below) which I think are all valid. Almost there!

        404     https://docs.communityhealthtoolkit.org/apps/guides/messaging/rapidpro
        404     https://github.com/medic/medic-docs/blob/master/features/webapp-branding.pdf
        401     https://docs.google.com/a/medicmobile.org/spreadsheet/ccc?key=0Ao9l2yegOFn7dEJRTEw1Z3RmZm0wTEo4Nk92NjVocnc
        400     http://airtable.com
        431     https://twitter.com/iximiuz/status/1423984739514454033?s=21
        404     https://docs.communityhealthtoolkit.org/apps/guides/security/privacy-policy/
        404     https://github.com/medic/medic-nootils/issues/9

@mrjones-plip
Copy link
Contributor Author

When I run this at home I often see i/o timeout errors (see below), but running this in EC2 I get no errors and manually testing with curl on 3 different locations I get no errors as well.

Given it runs fine on EC2, I think it should be fine in GH CI then! 🤞

http://localhost:1313/core/overview/offline-first/
        lookup alistapart.com: i/o timeout      https://alistapart.com/article/offline-first/
        lookup blog.couchdb.org: i/o timeout    https://blog.couchdb.org/2017/09/19/couchdb-takes-medic-mobile-to-the-front-lines-of-healthcare-work/
http://localhost:1313/core/releases/2.6.2/
        lookup code.jquery.com: i/o timeout     https://code.jquery.com/jquery-3.7.1.min.js
        lookup docs.google.com: i/o timeout     https://docs.google.com/document/d

@mrjones-plip mrjones-plip changed the title feat: homogenize exclude list, exclude echis repos, latest muffet, update redirects feat: simplify exclude list, latest muffet, update redirects Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant