-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChunkedEncodingError while scraping subreddit submissions #40
Comments
@aryamansharma01 I haven't seen this error before, I'll take a look.
Usually the query should be able to run to completion, but at first glance it looks like this error might be due to a network connection issue with the client / host |
I have that issue too. I'm trying to scrape the subreddit "antiwork" during 2021. Until Oct it works fine and then it sends me this error. I think maybe it happens in large amounts. Does someone have a solution? |
I run into this issue as well, nominally with larger amounts. Is there maybe a chance there is some spillover size-wise with the limit parameter? |
For me the same issue. I have downloaded several months of Wallstreetbets data without any problem but since a few days ago the issue keeps incurring. Did you already find y solution? |
No. I think it's related to the issues with the server of pushshift... |
Does anyone have a solution? Can we have the problematic ones skipped? Currently mine just stuck halfway for a scarping. |
@ReichYang when i was looking into this before it appeared to be a connection issue between the client and server. It would help to see any debug logs from when this error occurs to try to pinpoint what is causing this |
Thanks for the speedy response. I think mine is also connection error, but I tried several times and it just kept on throwing this error. ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) I'm wondering if we can have a parameter that can ensure the scraping won't stop just because of this kind of issue, like skipping the ones that consistently receive connection issues and moving on to the next. |
@mattpodolak Hey, I just encountered another one. And here is the error log.
|
Hi guys, original poster here. I played around with the libraries a bit and found that when i installed different version of urllib3 and requests, the error no longer appeared. Also I think it might help if one tries to scrape small amounts of data at a time.
Hope this helps! |
Hi Matt, I'm trying to scrape subreddit posts within a time period of six months, with a limit set to none. After irregular periods of time however, the connection gets broken apparently. Following is the code snippet and the error. I tried restarting it multiple times, but the same issue comes up. Is there any way to ensure that all data is scraped in one go? Thanks
Code :
Error :
The text was updated successfully, but these errors were encountered: