Skip to content
This repository has been archived by the owner on Jan 31, 2021. It is now read-only.

Investigate "unknown error" on koPmuEyP3a0 #46

Open
spiralofhope opened this issue May 27, 2020 · 5 comments
Open

Investigate "unknown error" on koPmuEyP3a0 #46

spiralofhope opened this issue May 27, 2020 · 5 comments

Comments

@spiralofhope
Copy link

spiralofhope commented May 27, 2020

Note that this issue is now blocked by #47


Prompted by Suggestion: Comment limiting + Don't discard on fail. I've been experimenting with downloading the comments for koPmuEyP3a0 using --stream.

I am using youtube-comment-scraper 1.0.1 and node v10.19.0 in a Debian 64bit (stable) environment within a VirtualBox guest on a Windows 10 host.

All attempts have failed with "unknown error", and all attempts have resulted in a file with a different size.

Either I need advice on how to troubleshoot further, or debugging functionality would need to be implemented to learn more. Perhaps a counter of time and amount of data collected could help; I might be able to implement that on the user side of things (perhaps using some combination of watch and du manual logging? I don't know.)


Suspicions

  • Flaky internet connection somewhere along the chain
  • Throttling or limitations by YouTube
    • Note that I'm running this on a non-proxy IP which also has a browser logged into an account, so IP-based spam protection shouldn't be the issue, but usage still might.
  • Invalid data? - I don't think this is something like invalid data (like a font) within the stream because different file sizes result. There is nothing obvious when I look at the tail data.

CSV tests

youtube-comment-scraper --format csv --stream -- koPmuEyP3a0 | tee output.csv
(text)
✕ unknown error
  • Test 1 - a 13,652 kB file
  • Test 2 - a 1,244 kB file (using just a redirect instead of tee)
  • Test 3 - a 36,632 kB file

JSON tests

youtube-comment-scraper --stream -- koPmuEyP3a0 | tee output.json
(text)
✕ unknown error
  • Test 1 - a 2,440 kB file
  • Test 2 - a 57,436 kB file

(3) With:

node --max-old-space-size=10000 /usr/local/bin/youtube-comment-scraper --stream -- koPmuEyP3a0 | tee output3.json
  • Test 3 - a 68,644 kB file
@philbot9
Copy link
Owner

I believe the root cause is A/B testing of a changed YouTube video page.

See philbot9/youtube-comments-task#26

@philbot9
Copy link
Owner

philbot9 commented May 27, 2020

@spiralofhope This should be fixed in 1.0.2 of youtube-comment-scraper-cli. I ran the scraper for a while and it never failed when previously it did.

Please update and let me know if it's working for you.

$ npm install -g [email protected]

@spiralofhope
Copy link
Author

spiralofhope commented May 28, 2020

Thanks for your efforts!

sudo npm install -g [email protected]
output
npm WARN npm npm does not support Node.js v10.19.0
npm WARN npm You should probably upgrade to a newer version of node as we
npm WARN npm can't make any promises that npm will work with this version.
npm WARN npm Supported releases of Node.js are the latest release of 4, 6, 7, 8, 9.
npm WARN npm You can find the latest version at https://nodejs.org/
npm WARN deprecated [email protected]: request has been deprecated, see https://github.com/request/request/issues/3142
/usr/local/bin/youtube-comment-scraper -> /usr/local/lib/node_modules/youtube-comment-scraper-cli/bin/youtube-comment-scraper
+ [email protected]
removed 1 package and updated 18 packages in 9.124s
youtube-comment-scraper --format csv --stream -- koPmuEyP3a0 | tee output4.csv
✕ API response does not contain a "content_html" field

The following appeared to work for some time, but ended up with the same error:

youtube-comment-scraper --outputfile output4.json -- koPmuEyP3a0
✕ API response does not contain a "content_html" field

I'll continue with some other tests, using --stream, for example:

youtube-comment-scraper --stream -- koPmuEyP3a0 | tee output4.json

@spiralofhope
Copy link
Author

I have spent some time running tests with and without --stream and json vs csv, and all attempts result in a different size of file and an error:

✕ API response does not contain a "content_html" field

@spiralofhope
Copy link
Author

Note that this issue is now blocked by #47

I tested another downloader to determine if it was an issue with YouTube throttling.

https://github.com/egbertbouman/youtube-comment-downloader

./downloader.py --youtubeid=koPmuEyP3a0 --output=koPmuEyP3a0.json

I did eventually get an error, but the download seems to have completed successfully. (it's huge)

This other downloader doesn't seem to have self-throttling, and I don't think YouTube disconnected me during the process.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants