Skip to content
This repository has been archived by the owner on Jan 31, 2021. It is now read-only.

Too many comments? #35

Closed
jerryspan opened this issue Jan 21, 2020 · 6 comments
Closed

Too many comments? #35

jerryspan opened this issue Jan 21, 2020 · 6 comments

Comments

@jerryspan
Copy link

I am trying to scrape comments from a URL with many comments (i.e. this one IZXgjR9INsA) and i keep getting errors like below:

<--- Last few GCs --->

[31198:0x103ec9000]  2834281 ms: Mark-sweep 2041.5 (2067.8) -> 2039.2 (2067.5) MB, 983.2 / 0.0 ms  (average mu = 0.277, current mu = 0.306) allocation failure scavenge might not succeed
[31198:0x103ec9000]  2835233 ms: Mark-sweep 2041.3 (2067.5) -> 2039.2 (2067.8) MB, 939.7 / 0.0 ms  (average mu = 0.166, current mu = 0.014) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1006cfa8d]
Security context: 0x384a2c3c0921 <JSObject>
    1: toString [0x384af47b43f1] [buffer.js:~753] [pc=0xd95ac9259f2](this=0x384a50c0dff9 <Uint8Array map = 0x384a6c3e5151>,0x384a6ee804b9 <undefined>,0x384a6ee804b9 <undefined>,0x384a6ee804b9 <undefined>)
    2: arguments adaptor frame: 1->3
    3: /* anonymous */ [0x384a50c07e09] [/usr/local/lib/node_modules/youtube-comment-scraper-cli/node_modules/request/req...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x100b8ff9a node::Abort() (.cold.1) [/usr/local/bin/node]
 2: 0x1000832c4 node::FatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1000833ec node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 4: 0x1001728fd v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0x1001728a7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 6: 0x10028c7e7 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 7: 0x10028db6c v8::internal::Heap::MarkCompactPrologue() [/usr/local/bin/node]
 8: 0x10028b73a v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 9: 0x10028a219 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
10: 0x100291bb8 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
11: 0x100291c0e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
12: 0x10026edad v8::internal::Factory::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [/usr/local/bin/node]
13: 0x1002712ff v8::internal::Factory::NewRawTwoByteString(int, v8::internal::AllocationType) [/usr/local/bin/node]
14: 0x100271282 v8::internal::Factory::NewStringFromUtf8(v8::internal::Vector<char const> const&, v8::internal::AllocationType) [/usr/local/bin/node]
15: 0x10018a7c7 v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int) [/usr/local/bin/node]
16: 0x100102fb3 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local<v8::Value>*) [/usr/local/bin/node]
17: 0x10006a8d6 void node::Buffer::(anonymous namespace)::StringSlice<(node::encoding)1>(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]
18: 0x1006cfa8d Builtins_CallApiCallback [/usr/local/bin/node]
19: 0xd95ac9259f2 
20: 0x1006c8c19 Builtins_ArgumentsAdaptorTrampoline [/usr/local/bin/node]
21: 0x1006cecdb Builtins_InterpreterEntryTrampoline [/usr/local/bin/node]
22: 0xd95ac91af16 
Abort trap: 6

@FloPinguin
Copy link

I have the same problem... I tried to fix it by setting --max_old_space_size=4096 (4 GB RAM) for node.js, but now I get something like "call stack size exceeded"

@philbot9
Copy link
Owner

Can you try using the --stream option? That way the scraper won't try to load all comments into memory but writes them directly to stdout or a file.

$ youtube-comment-scraper --stream <VideoID> --outputFile <outputFile>

@guyman70718
Copy link

I have the same problem, when I use --stream or --stream and -d it still crashes but the error is a red special character and "unknown error"
image

@patrickcdoyle
Copy link

Looks like adding --stream helped get around the memory issue. I got an unknown error message a few times but I just re-ran the same lines again and it worked.

However, the CSV is saved in a wonky format - instead of 21 columns with their own titles, it looks like there are at least 70 columns and the titles are missing. Any thoughts?

@philbot9
Copy link
Owner

philbot9 commented May 28, 2020

The Unknown Video error should be fixed in 1.0.2. See #46 (comment)

@patrickcdoyle The scraper moves replies into a separate set of columns when using CSV format. You can disable that using the --collapseReplies flag.

@patrickcdoyle
Copy link

Still can't quite get the formatting down - it's not the replies that seem to be tripping everything up. It's happening most aggressively with video WYodBfRxKWI. Looks like it has columns out to at least BRL. Any thoughts would be appreciated!

--

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants