You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
d norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.886 | TFLOPs: 78.46 |
iteration 5426/ 250000 | consumed samples: 43408 | consumed tokens: 88899584 | elapsed time per iteration (ms): 4247.9 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.883 | TFLOPs: 78.36 |
iteration 5427/ 250000 | consumed samples: 43416 | consumed tokens: 88915968 | elapsed time per iteration (ms): 4225.8 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.893 | TFLOPs: 78.77 |
iteration 5428/ 250000 | consumed samples: 43424 | consumed tokens: 88932352 | elapsed time per iteration (ms): 4229.2 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.892 | TFLOPs: 78.71 |
iteration 5429/ 250000 | consumed samples: 43432 | consumed tokens: 88948736 | elapsed time per iteration (ms): 4233.6 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.890 | TFLOPs: 78.63 |
iteration 5430/ 250000 | consumed samples: 43440 | consumed tokens: 88965120 | elapsed time per iteration (ms): 4247.0 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.884 | TFLOPs: 78.38 |
The text was updated successfully, but these errors were encountered:
d norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.886 | TFLOPs: 78.46 |
iteration 5426/ 250000 | consumed samples: 43408 | consumed tokens: 88899584 | elapsed time per iteration (ms): 4247.9 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.883 | TFLOPs: 78.36 |
iteration 5427/ 250000 | consumed samples: 43416 | consumed tokens: 88915968 | elapsed time per iteration (ms): 4225.8 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.893 | TFLOPs: 78.77 |
iteration 5428/ 250000 | consumed samples: 43424 | consumed tokens: 88932352 | elapsed time per iteration (ms): 4229.2 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.892 | TFLOPs: 78.71 |
iteration 5429/ 250000 | consumed samples: 43432 | consumed tokens: 88948736 | elapsed time per iteration (ms): 4233.6 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.890 | TFLOPs: 78.63 |
iteration 5430/ 250000 | consumed samples: 43440 | consumed tokens: 88965120 | elapsed time per iteration (ms): 4247.0 | learning rate: 2.999E-04 | global batch size: 8 | loss scale: 1.0 | grad norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.884 | TFLOPs: 78.38 |
The text was updated successfully, but these errors were encountered: