You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I train a gpt2 model with pipeline parallerlism, Flops Profiler in ds config is useless, it output nothing
So add some code like this:
prof = None
if args.deepspeed:
prof = FlopsProfiler(model[0])
else:
prof = FlopsProfiler(model)
...
train_step(forward_step_func,
train_data_iterator,
model,
optimizer,
lr_scheduler)
....
if iteration == profile_step and mpu.get_data_parallel_rank() == 0 and mpu.is_pipeline_last_stage() and mpu.get_tensor_model_parallel_rank() == 0:
prof.print_model_profile(profile_step=profile_step)
prof.end_profile()
but output only has fwd info, like:
-------------------------- DeepSpeed Flops Profiler --------------------------
Profile Summary at step 8:
Notations:
data parallel size (dp_size), model parallel size(mp_size),
number of parameters (params), number of multiply-accumulate operations(MACs),
number of floating-point operations (flops), floating-point operations per second (FLOPS),
fwd latency (forward propagation latency), bwd latency (backward propagation latency),
step (weights update latency), iter latency (sum of fwd, bwd and step latency)
params per gpu: 463.21 M
params of model = params per GPU * mp_size: 463.21 M
fwd MACs per GPU: 242304.87 GMACs
fwd flops per GPU: 484678.47 G
fwd flops of model = fwd flops per GPU * mp_size: 484678.47 G
fwd latency: 29.87 s
fwd FLOPS per GPU = fwd flops per GPU / fwd latency: 16.22 TFLOPS
How to interpret the following data,depth 0 and depth 1 spend less time than depth 2 ? Aren't they inclusive relationships?
but output only has fwd info, like:
The text was updated successfully, but these errors were encountered: