Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVEMENT] Prevent 429 error with bandcamp api #55

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thegass
Copy link
Contributor

@thegass thegass commented Oct 24, 2024

  • add random wait between requests
  • increases runtime but handles my > 5000 releases collection fine.

* add random wait between requests
* increases runtime but handles my > 5000 releases collection fine.
@Ezwen
Copy link
Owner

Ezwen commented Oct 26, 2024

Thanks for the proposal! Instead of modifying business logic in two different places, would it make sense instead to simply replace the Thread.sleep(1000) the the Util.retry function with the line you propose? This could (1) make the code cleaner, (2) only trigger a sleep when retrying (and thus avoiding slowing down all download scenarios).

WDYT?

@thegass
Copy link
Contributor Author

thegass commented Oct 26, 2024

I think that would work f. the actual downloads.

But I think if you get a retry while collecting the available releases you already have triggered the rate-limiter (seems to be a rather stupid ip-based (maybe + cookie) limiter which blocks your ip for a few minutes, if you get 429 on cli you also get them inside the browser).
My understanding is, that we need to be more "human like" while crawling the collection for avaible releases.
How long the delay has to be is difficult to say without knowing details about the implemented rate-limiter.
Maybe a quater or half a second would be enough. ( --request-delay param?)

@Ezwen
Copy link
Owner

Ezwen commented Oct 27, 2024

Just to make sure I understand your analysis correctly, you think that when a retry happens, it's already too late? Which means we require random sleeps systematically, to prevent these cases from happening entirely?

@thegass
Copy link
Contributor Author

thegass commented Oct 27, 2024

I don't know if it needs to be random or if it would be enough to sleep long enough.
The randomnes would help if the limiter has some kind of bot-detection in addition to the requests/timeframe limiter.
And it seems that bandcamp doesn't like bots according to their robots.txt ;-)
To know for sure we would need details how bandcamp has implemented the ratelimiter.
But I don't thing they will answer such questions.

But in my tests (with my rather big collection) the crucial part was the
retrieveDownloadURLs Method in the BandcampApiConnector.

I also tried smaller sleep there (to speed up).
But for example
TimeUnit.MILLISECONDS.sleep(100 + Random().nextInt(500).toLong())
triggerd the limiter.

TimeUnit.MILLISECONDS.sleep(300 + Random().nextInt(300).toLong())seems to work mostly stable.

I settled on Seconds for my build to be on the safe side.
I have little problems with long runtimes. I ust let it run in the background and have my new releases next day.

@Ezwen
Copy link
Owner

Ezwen commented Oct 29, 2024

As a first attempt to provide this feature, we could consider a new option --delay-requests-randomly which would enable a random sleep before each Jsoup.connect call.

That would match your proposal and need?

@thegass
Copy link
Contributor Author

thegass commented Oct 30, 2024

Sounds like a good idea.

@Ezwen
Copy link
Owner

Ezwen commented Oct 30, 2024

Great. Feel free to make the changes inside this PR directly (since I will squash the commits in the end), or to create a new one if you prefer.

@thegass
Copy link
Contributor Author

thegass commented Oct 30, 2024

Ok. Hope I find some time this weekend to implement it.

@thegass
Copy link
Contributor Author

thegass commented Nov 21, 2024

sorry for delay.
hopefully i can finish this before christmas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants