slow on gpu / parallel mode #3

ghost · 2020-10-31T05:45:10Z

Hi,

Hope you are all well !

I forked your code and created a flask server for generating question from webpages I scrape. (And, of course, I convert the html into clean text ^^)

It takes a long time (120s in avg) to generate questions (only sentences) even if cuda is available.

Is there a way to optimise the processing time ? I have 3 x gpu on my server, is it possible to enable the parallel or distributed mode for question_generator ?

Cheers,
X

AMontgomerie · 2020-11-01T14:58:35Z

Hey!

That does sound like quite a long time! Currently question generator doesn't support multiple GPUs but I suppose it should be possible using torch.distributed.

To be honest I don't really know much about it, and these tutorials seem to be mostly about distributed training rather than inference, but it might help. I don't currently have access to an environment with multiple GPUs to do any testing though.

AMontgomerie · 2020-11-02T02:02:51Z

Another possibility for speeding up inference would be exporting the model to ONNX.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow on gpu / parallel mode #3

slow on gpu / parallel mode #3

ghost commented Oct 31, 2020

AMontgomerie commented Nov 1, 2020

AMontgomerie commented Nov 2, 2020

slow on gpu / parallel mode #3

slow on gpu / parallel mode #3

Comments

ghost commented Oct 31, 2020

AMontgomerie commented Nov 1, 2020

AMontgomerie commented Nov 2, 2020