Skip to content
This repository has been archived by the owner on Jan 31, 2021. It is now read-only.

YouTube comment scraper deactivated #28

Open
Yakabuff opened this issue Oct 17, 2019 · 24 comments
Open

YouTube comment scraper deactivated #28

Yakabuff opened this issue Oct 17, 2019 · 24 comments

Comments

@Yakabuff
Copy link

502 bad gateway

@philbot9
Copy link
Owner

@Yakabuff Thanks, there are issues with retrieving the video info and the scraper stopped working. I have taken the site down so I don't get flooded with error reports from users 😉

While I work on a solution you can use https://github.com/philbot9/youtube-comment-scraper-cli locally.

@sevecose
Copy link

@Yakabuff Thanks, there are issues with retrieving the video info and the scraper stopped working. I have taken the site down so I don't get flooded with error reports from users

While I work on a solution you can use https://github.com/philbot9/youtube-comment-scraper-cli locally.

Not everyone is a programmer and not everyone has Linux. If it was an exe, no problem. This is desperate. Many comments from YT disappear. I only found your service last month. It was great help!

@philbot9
Copy link
Owner

philbot9 commented Oct 21, 2019

@sevecose Neither programming skills nor Linux are required to run the CLI. As per the installation instructions in the README you do have to have node.js with npm installed (and I would suggest PowerShell on Windows).

If that’s not an option for you keep an eye on this GitHub issue. When I have the time to fix this problem and the site is back up I will close this Issue.

@philbot9
Copy link
Owner

The site is back up for the short term!

http://ytcomments.klostermann.ca

Depending on usage it may go down again. I'm exploring options to find a more permanent solution.

@philbot9
Copy link
Owner

philbot9 commented Oct 29, 2019

Unfortunately, I had to disable the scraper again.

It looks like we have reached critical mass. Due to the high number of comments being scraped by users, YouTube has blocked the server IP. I'm looking at alternative options such as Proxy servers or something like AWS lambda.

Until I have found a solution the scraper will remain deactivated. As an alternative, users can use this project to run the scraper locally: https://github.com/philbot9/youtube-comment-scraper-cli

@philbot9 philbot9 reopened this Oct 29, 2019
@philbot9 philbot9 changed the title youtube comment scraper site down YouTube comment scraper deactivated Oct 29, 2019
@memento
Copy link

memento commented Nov 11, 2019

Unfortunately, I had to disable the scraper again.

It looks like we have reached critical mass. Due to the high number of comments being scraped by users, YouTube has blocked the server IP. I'm looking at alternative options such as Proxy servers or something like AWS lambda.

Until I have found a solution the scraper will remain deactivated. As an alternative, users can use this project to run the scraper locally: https://github.com/philbot9/youtube-comment-scraper-cli

What is the volume or requests that triggered the BAN ?

@philbot9
Copy link
Owner

@everythinginitsrightplace Please post issues with the CLI on that repo: https://github.com/philbot9/youtube-comment-scraper-cli

This has actually come up before though: philbot9/youtube-comment-scraper-cli#19

@henryross0fof
Copy link

Hello.Does anyone have an ideea on how fast will the site be back? I tried installing nodes.js but I'm pretty sure it's not my cup of tea.If anyone could help me set it up keeping in mind that I'm a total noob at programing I would really appreciate it :)

@spiralofhope
Copy link

spiralofhope commented Nov 19, 2019

I tried installing nodes.js but I'm pretty sure it's not my cup of tea.

I feel your pain; I had it going once for a while, but even I couldn't get it working again under Windows.

If you're stuck on Windows, you might have better luck trying things within Windows Subsystem for Linux. I got things running on pure Linux. If you go the Linux route, then my notes might help:
https://blog.spiralofhope.com/?p=45279

If you have no idea what you're doing with Windows Subsystem for Linux, I have notes here:
https://blog.spiralofhope.com/?p=39613

I can update my stuff if you have any breakthroughs, but I'm not in a position to mentor for this problem since it's solved for me.

Repository owner deleted a comment from sevecose Dec 7, 2019
@tissatussa
Copy link

thanks for this scraper script! It works very well, even when a YT video has many comments. Previous year i used your online version but your script also works fine locally, so i use this terminal method since the online version is disabled due to the described issue (which i discovered just recently).

it's really doubtfull what we can do with all that many comment info of certain YT videos, and your script helps in still finding good info!

but i wonder: should YT ever change their comment page (layout / navigation) code, your script might no longer function properly..? Do you check their changes? Can i keep using the script in future? I'm thinking of writing a program / script which rebuilds an HTML / JS page from the json/csv comment data and maybe even create a PDF of all comments & replies of a certain YT video .. this way searching text parts is easy - i often found many stunning info an links .. an HTML // JS version can even sort all comments on 'newest' and hide replies.

so, are you still working on this script / keeping updates regarding any changed YT code in future?
About the "disabling issue", hope you'll solve this.

@spiralofhope
Copy link

but i wonder: should YT ever change their comment page (layout / navigation) code, your script might no longer function properly..? Do you check their changes? Can i keep using the script in future?

The author is alive and the project is active. Maybe it can be updated, maybe it can't. Maybe YouTube will make it harder in the future. Nobody knows.

I'm thinking of writing a program / script which rebuilds an HTML / JS page from the json/csv comment data and maybe even create a PDF of all comments & replies of a certain YT video .. this way searching text parts is easy

Maybe the author could implement a way to dump only certain fields into the file and make it much easier for you.

About the "disabling issue", hope you'll solve this.

The author said this is a YouTube server IP block, so there is nothing he can do.

@rboye
Copy link

rboye commented Mar 7, 2020

So sad to see the tool get disabled. For now I had to switch to this one as a backup solution: https://seobots.io/bots/youtube-comment-scraper ; it works in a similar way.

@spiralofhope
Copy link

I'd bet it's just a matter of time until that website also gets blacklisted.

Using philbot9's commandline scraper is working great for me, since I won't overuse an IP like a website/service would.

@AsgerH
Copy link

AsgerH commented Apr 3, 2020

NetLab is a Danish national infrastructure for research use of archived web content. In order to support our target group, I have just created a tutorial - it went live last night. People here may also find it useful.

The tutorial covers Windows and Mac (the latter with thanks to my colleagues for details and feedback).

I believe that one key issue for those having trouble with Philip's excellent script may be the need to use admin privileges in order to get everything running correctly.

The tutorial may be found on this page:
http://www.netlab.dk/services/tools-and-tutorials/youtube-comment-scraper/

I hope it will prove helpful to some.

Repository owner deleted a comment from spiralofhope Apr 3, 2020
Repository owner deleted a comment from AsgerH Apr 3, 2020
Repository owner deleted a comment from spiralofhope Apr 3, 2020
@philbot9
Copy link
Owner

philbot9 commented Apr 3, 2020

Thank you @AsgerH for taking the time to create this tutorial. I have added a link to the youtube-comment-scraper-cli.

@AsgerH
Copy link

AsgerH commented Apr 3, 2020

You are very welcome, Philip. I'm happy to see that you added it to your repository.

On a side note thank you for handling the strange attack. You handled it exactly as I asked GitHub support to do, by deleting it all.

@andrscyv
Copy link

Could it be possible that the client browser request the pages from youtube and then send them to the server for the extraction of data ? Then youtube wouldn't see only one ip making a lot of requests. I know a first obstacle would be to bypass cors policies on chrome (but I've seen that's posible with some flags on the executable).

Another approach could be to use this as a chrome extension, if you browse to the web page that contains the video I guess you could extract the html without cors issues.

I'm sure somebody has already thought of this, just want to know your opinion.

Repository owner deleted a comment from M-Y-bit Apr 27, 2020
@philbot9
Copy link
Owner

@M-Y-bit This is an issue tracker, not a support forum.

If you are having problems with the youtube-comment-scraper-cli, please refer to the information in that repository: https://github.com/philbot9/youtube-comment-scraper-cli

There is no further information or support available.

Repository owner deleted a comment from M-Y-bit Apr 28, 2020
@philbot9
Copy link
Owner

@andrscyv Thanks for your suggestion.

I considered a completely client side solution at one point (fetch and parse) but as you point out, CORS makes this impossible. YouTube does not include any CORS headers in their response so we can't fetch the data client-side.

A browser extension might work, but that's not a path I'd like to go down. I would like to support most browsers and maintaining several extensions would be cumbersome as browser platforms change.

@itzmeharsha
Copy link

hi while i am trying local cli
i am facing this
can i know why this occur
API response does not contain a "content_html" field

@itzmeharsha
Copy link

@Yakabuff Thanks, there are issues with retrieving the video info and the scraper stopped working. I have taken the site down so I don't get flooded with error reports from users

While I work on a solution you can use https://github.com/philbot9/youtube-comment-scraper-cli locally.

while using this i am encountered "API response does not contain a "content_html" field"
plzz let me know why it happnes and the solution thankyou for your script

@PiyumithaNirman
Copy link

previously youtube comment scraper-cli worked correctly. unfortunately, it gives an error now.
that error was "API response does not contain a "content_html". @philbot9 can you help me to solve that error.

@spiralofhope
Copy link

@PiyumithaNirman

previously youtube comment scraper-cli worked correctly. unfortunately, it gives an error now.
that error was "API response does not contain a "content_html". @philbot9 can you help me to solve that error.

If you have an issue with that other program, then you should check its issues list for your problem. I think this is the one you should subscribe to:

philbot9/youtube-comment-scraper-cli#47

@pke
Copy link

pke commented Jan 25, 2021

Getting 404s only. Has the YT download API URL changed?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests