Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQ] Rate limit requests to the same host #83

Open
hugobuddel opened this issue Sep 13, 2024 · 4 comments
Open

[FEATURE REQ] Rate limit requests to the same host #83

hugobuddel opened this issue Sep 13, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@hugobuddel
Copy link
Contributor

hugobuddel commented Sep 13, 2024

Is your feature request related to a problem? Please describe.

I've got a README.md that has two links to a hackernews post.
Since a couple of days, linkspector fails with a 503 on the 2nd URL in the file, independently of the order. Linkspector succeeds when either of the URL's is commented out.

It therefore seems that the server is responding witht 503 because linkspector fires the request to quickly, or something like that.

Describe the solution you'd like

Perhaps linkspector could wait a (configurable) amount of time between requests to the same host.

@hugobuddel hugobuddel added the enhancement New feature or request label Sep 13, 2024
@hugobuddel
Copy link
Contributor Author

Everything works fine if I adjust the batchSize from 100 to 1 in checkHyperlinks().

const { batchSize = 100, retryCount = 3, aliveStatusCodes } = options

@hugobuddel
Copy link
Contributor Author

Keeping the batchSize at 100 and adding a random delay of up to 10 seconds à la https://stackoverflow.com/a/45010143/2097 also works:

// returns a promise that resolves after the specified number of ms
function delay(ms) {
    return new Promise(resolve => {
        setTimeout(resolve, ms);
    });
}

async function checkHyperlinks(nodes, options = {}, filePath) {
  const { batchSize = 100, retryCount = 3, aliveStatusCodes } = options

...

    for (let i = 0; i < tempArray.length; i += batchSize) {
      const batch = tempArray.slice(i, i + batchSize)
      const promises = batch.map(async (link) => {
        await delay(Math.random() * 10000); // wait
        const page = await browser.newPage()

but 1 second is not enough.

@hugobuddel
Copy link
Contributor Author

Or maybe URL's to the same host should always be in separate batches?

@hugobuddel
Copy link
Contributor Author

Example README.md, since I've updated mine to have only a single hackernews link in there:

## Quotes

These quotes highlight the goal and status of this repository.

[kachnuv_ocasek](https://news.ycombinator.com/item?id=36354589)

[arghwhat](https://news.ycombinator.com/item?id=36354464)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant