Timings for training the indicTrans model. #56

upadrasta84 · 2023-03-19T11:08:12Z

upadrasta84
Mar 19, 2023

Dear all,

We have taken several stabs at getting this model to work as per the README file in this repository. We have tried several AWS EC2 instance configurations. Our latest is a g3.8xlarge with 32vcpu, 244gb ram, and 2 GPUs.

I believe I created the experiment directory properly. But hope someone can validate it:

we created a dir called en-indic-exp. Under it, we have the training dirs such as en-ta, en-te, en-hi etc. inside each of them, we have train.en and train.xx files as per the language. We also have a dir called devtests inside of en-indic-exp. Within devtests, we have an 'all' directory which has once again dirs like en-ta, en-te, en-hi etc. The file names inside them are like dev.te/dev.en/test.te/test.en.

We are kicking off with ./prepare_data_joint_training.sh '../en-indic-exp' 'en' 'indic'.

The 'learning target BPE' takes a lot of time. Started off by showing estimate as 500 hours but now it is about 250 hours.

learning target BPE
0%| | 18/32000 [10:55<249:34:01, 28.09s/it]

After this, the reminder of the steps are also more or less slow. The fairseq-preprocess and fairseq-train are almost impossible and I never reached the finish line, despite trying for the past couple of weeks.

Do these processes take so much time? Is my machine configuration even close to the mark or better or much worse?

Hope someone can point me in the right direction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timings for training the indicTrans model. #56

{{title}}

Replies: 0 comments

Select a reply

Timings for training the indicTrans model. #56

upadrasta84 Mar 19, 2023

Replies: 0 comments

upadrasta84
Mar 19, 2023