-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1124_0823.log
2241 lines (1924 loc) Β· 156 KB
/
1124_0823.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /root/miniconda3/envs/qwen_env/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/root/miniconda3/envs/qwen_env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /root/miniconda3/envs/qwen_env did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/root/miniconda3/envs/qwen_env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Asia/Shanghai')}
warn(msg)
/root/miniconda3/envs/qwen_env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//192.168.1.88'), PosixPath('http'), PosixPath('33232')}
warn(msg)
/root/miniconda3/envs/qwen_env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_iop8tmln/none_ycgnevcx/attempt_0/0/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/root/miniconda3/envs/qwen_env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /root/miniconda3/envs/qwen_env/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
[2023-11-24 08:23:28,671] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-11-24 08:23:30.910 | INFO | __main__:init_components:98 - Initializing components...
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.θ―·δ½Ώη¨ζζ°ζ¨‘εε代η οΌε°€ε
Άε¦ζδ½ ε¨9ζ25ζ₯εε·²η»εΌε§δ½Ώη¨Qwen-7BοΌεδΈζ³¨ζδΈθ¦δ½Ώη¨ι误代η ε樑εγ
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 0%| | 0/15 [00:00<?, ?it/s]Loading checkpoint shards: 7%|β | 1/15 [00:01<00:24, 1.75s/it]Loading checkpoint shards: 13%|ββ | 2/15 [00:02<00:14, 1.11s/it]Loading checkpoint shards: 20%|ββ | 3/15 [00:02<00:10, 1.16it/s]Loading checkpoint shards: 27%|βββ | 4/15 [00:03<00:08, 1.33it/s]Loading checkpoint shards: 33%|ββββ | 5/15 [00:04<00:07, 1.42it/s]Loading checkpoint shards: 40%|ββββ | 6/15 [00:04<00:05, 1.52it/s]Loading checkpoint shards: 47%|βββββ | 7/15 [00:05<00:05, 1.58it/s]Loading checkpoint shards: 53%|ββββββ | 8/15 [00:05<00:04, 1.62it/s]Loading checkpoint shards: 60%|ββββββ | 9/15 [00:06<00:03, 1.64it/s]Loading checkpoint shards: 67%|βββββββ | 10/15 [00:07<00:03, 1.66it/s]Loading checkpoint shards: 73%|ββββββββ | 11/15 [00:07<00:02, 1.67it/s]Loading checkpoint shards: 80%|ββββββββ | 12/15 [00:08<00:01, 1.66it/s]Loading checkpoint shards: 87%|βββββββββ | 13/15 [00:08<00:01, 1.67it/s]Loading checkpoint shards: 93%|ββββββββββ| 14/15 [00:09<00:00, 1.67it/s]Loading checkpoint shards: 100%|ββββββββββ| 15/15 [00:09<00:00, 1.82it/s]Loading checkpoint shards: 100%|ββββββββββ| 15/15 [00:09<00:00, 1.51it/s]
memory footprint of model: 11.741138577461243 GB
trainable params: 223,150,080 || all params: 8,085,877,760 || trainable%: 2.759750847383575
verify all params of the model
torch.float32 1781314560 0.2202994669066083
torch.uint8 6304563200 0.7797005330933917
torch.float32 ['base_model.model.transformer.h.0.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.0.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.0.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.0.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.0.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.0.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.0.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.0.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.0.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.0.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.1.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.1.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.1.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.1.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.1.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.1.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.1.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.1.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.1.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.1.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.2.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.2.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.2.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.2.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.2.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.2.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.2.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.2.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.2.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.2.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.3.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.3.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.3.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.3.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.3.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.3.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.3.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.3.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.3.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.3.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.4.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.4.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.4.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.4.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.4.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.4.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.4.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.4.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.4.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.4.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.5.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.5.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.5.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.5.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.5.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.5.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.5.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.5.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.5.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.5.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.6.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.6.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.6.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.6.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.6.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.6.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.6.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.6.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.6.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.6.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.7.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.7.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.7.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.7.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.7.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.7.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.7.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.7.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.7.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.7.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.8.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.8.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.8.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.8.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.8.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.8.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.8.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.8.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.8.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.8.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.9.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.9.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.9.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.9.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.9.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.9.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.9.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.9.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.9.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.9.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.10.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.10.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.10.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.10.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.10.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.10.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.10.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.10.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.10.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.10.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.11.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.11.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.11.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.11.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.11.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.11.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.11.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.11.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.11.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.11.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.12.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.12.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.12.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.12.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.12.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.12.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.12.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.12.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.12.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.12.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.13.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.13.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.13.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.13.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.13.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.13.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.13.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.13.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.13.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.13.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.14.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.14.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.14.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.14.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.14.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.14.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.14.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.14.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.14.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.14.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.15.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.15.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.15.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.15.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.15.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.15.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.15.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.15.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.15.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.15.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.16.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.16.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.16.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.16.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.16.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.16.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.16.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.16.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.16.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.16.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.17.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.17.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.17.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.17.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.17.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.17.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.17.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.17.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.17.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.17.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.18.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.18.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.18.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.18.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.18.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.18.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.18.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.18.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.18.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.18.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.19.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.19.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.19.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.19.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.19.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.19.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.19.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.19.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.19.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.19.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.20.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.20.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.20.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.20.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.20.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.20.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.20.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.20.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.20.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.20.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.21.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.21.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.21.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.21.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.21.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.21.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.21.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.21.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.21.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.21.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.22.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.22.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.22.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.22.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.22.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.22.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.22.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.22.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.22.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.22.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.23.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.23.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.23.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.23.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.23.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.23.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.23.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.23.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.23.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.23.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.24.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.24.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.24.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.24.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.24.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.24.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.24.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.24.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.24.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.24.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.25.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.25.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.25.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.25.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.25.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.25.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.25.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.25.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.25.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.25.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.26.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.26.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.26.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.26.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.26.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.26.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.26.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.26.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.26.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.26.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.27.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.27.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.27.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.27.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.27.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.27.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.27.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.27.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.27.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.27.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.28.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.28.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.28.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.28.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.28.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.28.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.28.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.28.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.28.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.28.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.29.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.29.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.29.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.29.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.29.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.29.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.29.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.29.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.29.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.29.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.30.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.30.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.30.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.30.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.30.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.30.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.30.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.30.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.30.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.30.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.31.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.31.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.31.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.31.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.31.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.31.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.31.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.31.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.31.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.31.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.32.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.32.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.32.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.32.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.32.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.32.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.32.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.32.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.32.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.32.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.33.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.33.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.33.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.33.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.33.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.33.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.33.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.33.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.33.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.33.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.34.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.34.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.34.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.34.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.34.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.34.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.34.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.34.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.34.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.34.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.35.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.35.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.35.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.35.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.35.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.35.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.35.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.35.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.35.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.35.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.36.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.36.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.36.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.36.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.36.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.36.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.36.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.36.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.36.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.36.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.37.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.37.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.37.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.37.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.37.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.37.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.37.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.37.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.37.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.37.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.38.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.38.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.38.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.38.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.38.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.38.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.38.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.38.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.38.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.38.mlp.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.39.attn.c_attn.lora_A.default.weight', 'base_model.model.transformer.h.39.attn.c_attn.lora_B.default.weight', 'base_model.model.transformer.h.39.attn.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.39.attn.c_proj.lora_B.default.weight', 'base_model.model.transformer.h.39.mlp.w1.lora_A.default.weight', 'base_model.model.transformer.h.39.mlp.w1.lora_B.default.weight', 'base_model.model.transformer.h.39.mlp.w2.lora_A.default.weight', 'base_model.model.transformer.h.39.mlp.w2.lora_B.default.weight', 'base_model.model.transformer.h.39.mlp.c_proj.lora_A.default.weight', 'base_model.model.transformer.h.39.mlp.c_proj.lora_B.default.weight']
verify trainable params the model
torch.float32 223150080 1.0
torch.float32 223150080
2023-11-24 08:25:23.706 | INFO | component.dataset:__init__:12 - Loading data: ./data/text_matching_data_train.jsonl
2023-11-24 08:25:23.755 | INFO | component.dataset:__init__:15 - there are 49500 data in dataset
2023-11-24 08:25:23.767 | INFO | __main__:main:197 - *** starting training ***
0%| | 0/1547 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
0%| | 1/1547 [00:04<2:07:19, 4.94s/it] 0%| | 2/1547 [00:08<1:49:09, 4.24s/it] 0%| | 3/1547 [00:12<1:42:35, 3.99s/it] 0%| | 4/1547 [00:15<1:38:10, 3.82s/it] 0%| | 5/1547 [00:19<1:37:13, 3.78s/it] 0%| | 6/1547 [00:23<1:36:08, 3.74s/it] 0%| | 7/1547 [00:27<1:37:07, 3.78s/it] 1%| | 8/1547 [00:30<1:35:55, 3.74s/it] 1%| | 9/1547 [00:34<1:34:53, 3.70s/it] 1%| | 10/1547 [00:38<1:33:54, 3.67s/it] {'loss': 6.9888, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.01}
1%| | 10/1547 [00:38<1:33:54, 3.67s/it] 1%| | 11/1547 [00:41<1:32:30, 3.61s/it] 1%| | 12/1547 [00:45<1:33:11, 3.64s/it] 1%| | 13/1547 [00:48<1:31:18, 3.57s/it] 1%| | 14/1547 [00:52<1:32:25, 3.62s/it] 1%| | 15/1547 [00:55<1:31:30, 3.58s/it] 1%| | 16/1547 [00:59<1:33:58, 3.68s/it] 1%| | 17/1547 [01:03<1:35:12, 3.73s/it] 1%| | 18/1547 [01:07<1:33:00, 3.65s/it] 1%| | 19/1547 [01:10<1:34:10, 3.70s/it] 1%|β | 20/1547 [01:14<1:34:42, 3.72s/it] {'loss': 6.77, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.01}
1%|β | 20/1547 [01:14<1:34:42, 3.72s/it] 1%|β | 21/1547 [01:18<1:33:48, 3.69s/it] 1%|β | 22/1547 [01:22<1:37:11, 3.82s/it] 1%|β | 23/1547 [01:26<1:36:27, 3.80s/it] 2%|β | 24/1547 [01:30<1:37:58, 3.86s/it] 2%|β | 25/1547 [01:33<1:37:33, 3.85s/it] 2%|β | 26/1547 [01:37<1:38:14, 3.88s/it] 2%|β | 27/1547 [01:41<1:33:25, 3.69s/it] 2%|β | 28/1547 [01:44<1:32:57, 3.67s/it] 2%|β | 29/1547 [01:48<1:30:49, 3.59s/it] 2%|β | 30/1547 [01:51<1:31:19, 3.61s/it] {'loss': 6.3807, 'learning_rate': 5e-06, 'epoch': 0.02}
2%|β | 30/1547 [01:51<1:31:19, 3.61s/it] 2%|β | 31/1547 [01:55<1:33:05, 3.68s/it] 2%|β | 32/1547 [01:59<1:34:54, 3.76s/it] 2%|β | 33/1547 [02:03<1:32:18, 3.66s/it] 2%|β | 34/1547 [02:06<1:31:19, 3.62s/it] 2%|β | 35/1547 [02:10<1:32:59, 3.69s/it] 2%|β | 36/1547 [02:14<1:35:05, 3.78s/it] 2%|β | 37/1547 [02:18<1:38:37, 3.92s/it] 2%|β | 38/1547 [02:22<1:36:05, 3.82s/it] 3%|β | 39/1547 [02:26<1:36:58, 3.86s/it] 3%|β | 40/1547 [02:29<1:33:29, 3.72s/it] {'loss': 5.5086, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.03}
3%|β | 40/1547 [02:29<1:33:29, 3.72s/it] 3%|β | 41/1547 [02:33<1:33:57, 3.74s/it] 3%|β | 42/1547 [02:37<1:36:36, 3.85s/it] 3%|β | 43/1547 [02:41<1:34:22, 3.76s/it] 3%|β | 44/1547 [02:44<1:33:58, 3.75s/it] 3%|β | 45/1547 [02:48<1:31:28, 3.65s/it] 3%|β | 46/1547 [02:51<1:30:31, 3.62s/it] 3%|β | 47/1547 [02:55<1:32:23, 3.70s/it] 3%|β | 48/1547 [02:59<1:32:45, 3.71s/it] 3%|β | 49/1547 [03:02<1:30:30, 3.62s/it] 3%|β | 50/1547 [03:06<1:31:18, 3.66s/it] {'loss': 4.2526, 'learning_rate': 8.333333333333334e-06, 'epoch': 0.03}
3%|β | 50/1547 [03:06<1:31:18, 3.66s/it] 3%|β | 51/1547 [03:10<1:30:59, 3.65s/it] 3%|β | 52/1547 [03:14<1:32:44, 3.72s/it] 3%|β | 53/1547 [03:17<1:33:21, 3.75s/it] 3%|β | 54/1547 [03:21<1:33:24, 3.75s/it] 4%|β | 55/1547 [03:25<1:33:17, 3.75s/it] 4%|β | 56/1547 [03:29<1:32:53, 3.74s/it] 4%|β | 57/1547 [03:32<1:33:20, 3.76s/it] 4%|β | 58/1547 [03:36<1:31:51, 3.70s/it] 4%|β | 59/1547 [03:40<1:33:38, 3.78s/it] 4%|β | 60/1547 [03:44<1:32:43, 3.74s/it] {'loss': 3.6927, 'learning_rate': 1e-05, 'epoch': 0.04}
4%|β | 60/1547 [03:44<1:32:43, 3.74s/it] 4%|β | 61/1547 [03:47<1:30:46, 3.67s/it] 4%|β | 62/1547 [03:51<1:31:21, 3.69s/it] 4%|β | 63/1547 [03:55<1:31:15, 3.69s/it] 4%|β | 64/1547 [03:58<1:32:16, 3.73s/it] 4%|β | 65/1547 [04:02<1:30:03, 3.65s/it] 4%|β | 66/1547 [04:05<1:29:06, 3.61s/it] 4%|β | 67/1547 [04:10<1:34:24, 3.83s/it] 4%|β | 68/1547 [04:13<1:34:10, 3.82s/it] 4%|β | 69/1547 [04:17<1:34:30, 3.84s/it] 5%|β | 70/1547 [04:21<1:32:54, 3.77s/it] {'loss': 3.3055, 'learning_rate': 1.1666666666666668e-05, 'epoch': 0.05}
5%|β | 70/1547 [04:21<1:32:54, 3.77s/it] 5%|β | 71/1547 [04:25<1:32:26, 3.76s/it] 5%|β | 72/1547 [04:28<1:31:20, 3.72s/it] 5%|β | 73/1547 [04:32<1:30:34, 3.69s/it] 5%|β | 74/1547 [04:35<1:29:24, 3.64s/it] 5%|β | 75/1547 [04:39<1:29:52, 3.66s/it] 5%|β | 76/1547 [04:44<1:35:15, 3.89s/it] 5%|β | 77/1547 [04:47<1:33:20, 3.81s/it] 5%|β | 78/1547 [04:51<1:35:50, 3.91s/it] 5%|β | 79/1547 [04:55<1:33:48, 3.83s/it] 5%|β | 80/1547 [04:59<1:32:42, 3.79s/it] {'loss': 2.9347, 'learning_rate': 1.3333333333333333e-05, 'epoch': 0.05}
5%|β | 80/1547 [04:59<1:32:42, 3.79s/it] 5%|β | 81/1547 [05:03<1:33:29, 3.83s/it] 5%|β | 82/1547 [05:06<1:32:16, 3.78s/it] 5%|β | 83/1547 [05:11<1:36:36, 3.96s/it] 5%|β | 84/1547 [05:14<1:34:34, 3.88s/it] 5%|β | 85/1547 [05:19<1:36:21, 3.95s/it] 6%|β | 86/1547 [05:22<1:32:51, 3.81s/it] 6%|β | 87/1547 [05:25<1:28:41, 3.65s/it] 6%|β | 88/1547 [05:29<1:32:53, 3.82s/it] 6%|β | 89/1547 [05:33<1:34:20, 3.88s/it] 6%|β | 90/1547 [05:38<1:35:10, 3.92s/it] {'loss': 2.4141, 'learning_rate': 1.5e-05, 'epoch': 0.06}
6%|β | 90/1547 [05:38<1:35:10, 3.92s/it] 6%|β | 91/1547 [05:41<1:32:20, 3.81s/it] 6%|β | 92/1547 [05:45<1:31:54, 3.79s/it] 6%|β | 93/1547 [05:49<1:32:21, 3.81s/it] 6%|β | 94/1547 [05:53<1:32:32, 3.82s/it] 6%|β | 95/1547 [05:56<1:33:11, 3.85s/it] 6%|β | 96/1547 [06:00<1:31:37, 3.79s/it] 6%|β | 97/1547 [06:04<1:31:34, 3.79s/it] 6%|β | 98/1547 [06:08<1:31:48, 3.80s/it] 6%|β | 99/1547 [06:11<1:31:26, 3.79s/it] 6%|β | 100/1547 [06:16<1:34:38, 3.92s/it] {'loss': 1.6325, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.06}
6%|β | 100/1547 [06:16<1:34:38, 3.92s/it] 7%|β | 101/1547 [06:19<1:31:30, 3.80s/it] 7%|β | 102/1547 [06:23<1:33:45, 3.89s/it] 7%|β | 103/1547 [06:27<1:33:32, 3.89s/it] 7%|β | 104/1547 [06:31<1:35:21, 3.96s/it] 7%|β | 105/1547 [06:35<1:34:55, 3.95s/it] 7%|β | 106/1547 [06:39<1:34:58, 3.95s/it] 7%|β | 107/1547 [06:43<1:31:45, 3.82s/it] 7%|β | 108/1547 [06:47<1:34:13, 3.93s/it] 7%|β | 109/1547 [06:51<1:32:06, 3.84s/it] 7%|β | 110/1547 [06:54<1:32:16, 3.85s/it] {'loss': 0.9953, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.07}
7%|β | 110/1547 [06:54<1:32:16, 3.85s/it] 7%|β | 111/1547 [06:58<1:31:56, 3.84s/it] 7%|β | 112/1547 [07:02<1:29:53, 3.76s/it] 7%|β | 113/1547 [07:05<1:29:21, 3.74s/it] 7%|β | 114/1547 [07:09<1:30:38, 3.80s/it] 7%|β | 115/1547 [07:13<1:30:38, 3.80s/it] 7%|β | 116/1547 [07:17<1:29:41, 3.76s/it] 8%|β | 117/1547 [07:21<1:31:16, 3.83s/it] 8%|β | 118/1547 [07:24<1:28:31, 3.72s/it] 8%|β | 119/1547 [07:28<1:26:44, 3.64s/it] 8%|β | 120/1547 [07:32<1:31:22, 3.84s/it] {'loss': 0.5852, 'learning_rate': 2e-05, 'epoch': 0.08}
8%|β | 120/1547 [07:32<1:31:22, 3.84s/it] 8%|β | 121/1547 [07:36<1:29:25, 3.76s/it] 8%|β | 122/1547 [07:39<1:27:35, 3.69s/it] 8%|β | 123/1547 [07:43<1:27:00, 3.67s/it] 8%|β | 124/1547 [07:47<1:28:51, 3.75s/it] 8%|β | 125/1547 [07:51<1:30:31, 3.82s/it] 8%|β | 126/1547 [07:55<1:33:24, 3.94s/it] 8%|β | 127/1547 [07:59<1:32:16, 3.90s/it] 8%|β | 128/1547 [08:02<1:30:12, 3.81s/it] 8%|β | 129/1547 [08:06<1:28:09, 3.73s/it] 8%|β | 130/1547 [08:10<1:27:46, 3.72s/it] {'loss': 0.516, 'learning_rate': 2.1666666666666667e-05, 'epoch': 0.08}
8%|β | 130/1547 [08:10<1:27:46, 3.72s/it] 8%|β | 131/1547 [08:14<1:29:08, 3.78s/it] 9%|β | 132/1547 [08:17<1:28:49, 3.77s/it] 9%|β | 133/1547 [08:21<1:29:19, 3.79s/it] 9%|β | 134/1547 [08:25<1:27:53, 3.73s/it] 9%|β | 135/1547 [08:29<1:28:49, 3.77s/it] 9%|β | 136/1547 [08:32<1:27:23, 3.72s/it] 9%|β | 137/1547 [08:36<1:26:00, 3.66s/it] 9%|β | 138/1547 [08:39<1:25:30, 3.64s/it] 9%|β | 139/1547 [08:43<1:25:32, 3.65s/it] 9%|β | 140/1547 [08:47<1:25:45, 3.66s/it] {'loss': 0.4702, 'learning_rate': 2.3333333333333336e-05, 'epoch': 0.09}
9%|β | 140/1547 [08:47<1:25:45, 3.66s/it] 9%|β | 141/1547 [08:50<1:26:44, 3.70s/it] 9%|β | 142/1547 [08:54<1:26:03, 3.68s/it] 9%|β | 143/1547 [08:58<1:26:56, 3.72s/it] 9%|β | 144/1547 [09:02<1:27:00, 3.72s/it] 9%|β | 145/1547 [09:05<1:27:49, 3.76s/it] 9%|β | 146/1547 [09:09<1:28:26, 3.79s/it] 10%|β | 147/1547 [09:13<1:28:33, 3.80s/it] 10%|β | 148/1547 [09:17<1:28:40, 3.80s/it] 10%|β | 149/1547 [09:21<1:29:21, 3.84s/it] 10%|β | 150/1547 [09:25<1:28:18, 3.79s/it] {'loss': 0.3644, 'learning_rate': 2.5e-05, 'epoch': 0.1}
10%|β | 150/1547 [09:25<1:28:18, 3.79s/it] 10%|β | 151/1547 [09:28<1:27:18, 3.75s/it] 10%|β | 152/1547 [09:32<1:27:08, 3.75s/it] 10%|β | 153/1547 [09:35<1:25:21, 3.67s/it] 10%|β | 154/1547 [09:39<1:25:00, 3.66s/it] 10%|β | 155/1547 [09:43<1:27:18, 3.76s/it] 10%|β | 156/1547 [09:47<1:27:25, 3.77s/it] 10%|β | 157/1547 [09:51<1:28:53, 3.84s/it] 10%|β | 158/1547 [09:54<1:27:18, 3.77s/it] 10%|β | 159/1547 [09:58<1:27:03, 3.76s/it] 10%|β | 160/1547 [10:02<1:26:50, 3.76s/it] {'loss': 0.2911, 'learning_rate': 2.6666666666666667e-05, 'epoch': 0.1}
10%|β | 160/1547 [10:02<1:26:50, 3.76s/it] 10%|β | 161/1547 [10:06<1:28:10, 3.82s/it] 10%|β | 162/1547 [10:10<1:28:00, 3.81s/it] 11%|β | 163/1547 [10:13<1:25:36, 3.71s/it] 11%|β | 164/1547 [10:17<1:27:20, 3.79s/it] 11%|β | 165/1547 [10:21<1:27:17, 3.79s/it] 11%|β | 166/1547 [10:25<1:28:37, 3.85s/it] 11%|β | 167/1547 [10:29<1:27:26, 3.80s/it] 11%|β | 168/1547 [10:33<1:28:54, 3.87s/it] 11%|β | 169/1547 [10:36<1:26:36, 3.77s/it] 11%|β | 170/1547 [10:40<1:27:30, 3.81s/it] {'loss': 0.2558, 'learning_rate': 2.8333333333333335e-05, 'epoch': 0.11}
11%|β | 170/1547 [10:40<1:27:30, 3.81s/it] 11%|β | 171/1547 [10:44<1:27:35, 3.82s/it] 11%|β | 172/1547 [10:48<1:27:07, 3.80s/it] 11%|β | 173/1547 [10:51<1:26:09, 3.76s/it] 11%|β | 174/1547 [10:55<1:24:53, 3.71s/it] 11%|ββ | 175/1547 [10:59<1:27:17, 3.82s/it] 11%|ββ | 176/1547 [11:03<1:27:17, 3.82s/it] 11%|ββ | 177/1547 [11:07<1:29:47, 3.93s/it] 12%|ββ | 178/1547 [11:11<1:28:35, 3.88s/it] 12%|ββ | 179/1547 [11:15<1:28:07, 3.86s/it] 12%|ββ | 180/1547 [11:18<1:25:47, 3.77s/it] {'loss': 0.1759, 'learning_rate': 3e-05, 'epoch': 0.12}
12%|ββ | 180/1547 [11:18<1:25:47, 3.77s/it] 12%|ββ | 181/1547 [11:22<1:26:50, 3.81s/it] 12%|ββ | 182/1547 [11:26<1:25:05, 3.74s/it] 12%|ββ | 183/1547 [11:29<1:24:40, 3.72s/it] 12%|ββ | 184/1547 [11:33<1:24:46, 3.73s/it] 12%|ββ | 185/1547 [11:37<1:24:27, 3.72s/it] 12%|ββ | 186/1547 [11:41<1:24:46, 3.74s/it] 12%|ββ | 187/1547 [11:44<1:25:16, 3.76s/it] 12%|ββ | 188/1547 [11:48<1:24:19, 3.72s/it] 12%|ββ | 189/1547 [11:52<1:23:34, 3.69s/it] 12%|ββ | 190/1547 [11:55<1:23:16, 3.68s/it] {'loss': 0.1376, 'learning_rate': 3.1666666666666666e-05, 'epoch': 0.12}
12%|ββ | 190/1547 [11:55<1:23:16, 3.68s/it] 12%|ββ | 191/1547 [11:59<1:23:35, 3.70s/it] 12%|ββ | 192/1547 [12:03<1:25:05, 3.77s/it] 12%|ββ | 193/1547 [12:07<1:23:44, 3.71s/it] 13%|ββ | 194/1547 [12:11<1:27:06, 3.86s/it] 13%|ββ | 195/1547 [12:14<1:26:03, 3.82s/it] 13%|ββ | 196/1547 [12:18<1:24:29, 3.75s/it] 13%|ββ | 197/1547 [12:22<1:27:30, 3.89s/it] 13%|ββ | 198/1547 [12:26<1:24:28, 3.76s/it] 13%|ββ | 199/1547 [12:30<1:26:17, 3.84s/it] 13%|ββ | 200/1547 [12:34<1:26:52, 3.87s/it] {'loss': 0.168, 'learning_rate': 3.3333333333333335e-05, 'epoch': 0.13}
13%|ββ | 200/1547 [12:34<1:26:52, 3.87s/it] 13%|ββ | 201/1547 [12:38<1:26:35, 3.86s/it] 13%|ββ | 202/1547 [12:41<1:25:33, 3.82s/it] 13%|ββ | 203/1547 [12:45<1:28:09, 3.94s/it] 13%|ββ | 204/1547 [12:49<1:26:39, 3.87s/it] 13%|ββ | 205/1547 [12:53<1:25:35, 3.83s/it] 13%|ββ | 206/1547 [12:57<1:25:39, 3.83s/it] 13%|ββ | 207/1547 [13:00<1:24:28, 3.78s/it] 13%|ββ | 208/1547 [13:04<1:25:08, 3.82s/it] 14%|ββ | 209/1547 [13:08<1:22:43, 3.71s/it] 14%|ββ | 210/1547 [13:12<1:25:38, 3.84s/it] {'loss': 0.1489, 'learning_rate': 3.5e-05, 'epoch': 0.14}
14%|ββ | 210/1547 [13:12<1:25:38, 3.84s/it] 14%|ββ | 211/1547 [13:16<1:26:08, 3.87s/it] 14%|ββ | 212/1547 [13:20<1:25:16, 3.83s/it] 14%|ββ | 213/1547 [13:23<1:24:01, 3.78s/it] 14%|ββ | 214/1547 [13:27<1:23:06, 3.74s/it] 14%|ββ | 215/1547 [13:31<1:22:59, 3.74s/it] 14%|ββ | 216/1547 [13:35<1:24:54, 3.83s/it] 14%|ββ | 217/1547 [13:38<1:23:53, 3.78s/it] 14%|ββ | 218/1547 [13:42<1:24:28, 3.81s/it] 14%|ββ | 219/1547 [13:46<1:23:41, 3.78s/it] 14%|ββ | 220/1547 [13:50<1:25:15, 3.85s/it] {'loss': 0.1226, 'learning_rate': 3.6666666666666666e-05, 'epoch': 0.14}
14%|ββ | 220/1547 [13:50<1:25:15, 3.85s/it] 14%|ββ | 221/1547 [13:55<1:30:38, 4.10s/it] 14%|ββ | 222/1547 [13:58<1:27:26, 3.96s/it] 14%|ββ | 223/1547 [14:02<1:26:25, 3.92s/it] 14%|ββ | 224/1547 [14:06<1:27:31, 3.97s/it] 15%|ββ | 225/1547 [14:10<1:26:49, 3.94s/it] 15%|ββ | 226/1547 [14:14<1:29:23, 4.06s/it] 15%|ββ | 227/1547 [14:18<1:27:52, 3.99s/it] 15%|ββ | 228/1547 [14:22<1:25:29, 3.89s/it] 15%|ββ | 229/1547 [14:26<1:28:15, 4.02s/it] 15%|ββ | 230/1547 [14:30<1:27:46, 4.00s/it] {'loss': 0.1139, 'learning_rate': 3.8333333333333334e-05, 'epoch': 0.15}
15%|ββ | 230/1547 [14:30<1:27:46, 4.00s/it] 15%|ββ | 231/1547 [14:34<1:27:51, 4.01s/it] 15%|ββ | 232/1547 [14:38<1:27:50, 4.01s/it] 15%|ββ | 233/1547 [14:42<1:27:36, 4.00s/it] 15%|ββ | 234/1547 [14:46<1:25:12, 3.89s/it] 15%|ββ | 235/1547 [14:50<1:26:38, 3.96s/it] 15%|ββ | 236/1547 [14:54<1:26:10, 3.94s/it] 15%|ββ | 237/1547 [14:58<1:25:58, 3.94s/it] 15%|ββ | 238/1547 [15:02<1:25:06, 3.90s/it] 15%|ββ | 239/1547 [15:06<1:26:39, 3.97s/it] 16%|ββ | 240/1547 [15:10<1:25:10, 3.91s/it] {'loss': 0.101, 'learning_rate': 4e-05, 'epoch': 0.16}
16%|ββ | 240/1547 [15:10<1:25:10, 3.91s/it] 16%|ββ | 241/1547 [15:14<1:26:08, 3.96s/it] 16%|ββ | 242/1547 [15:17<1:25:29, 3.93s/it] 16%|ββ | 243/1547 [15:21<1:25:42, 3.94s/it] 16%|ββ | 244/1547 [15:25<1:25:21, 3.93s/it] 16%|ββ | 245/1547 [15:29<1:26:23, 3.98s/it] 16%|ββ | 246/1547 [15:33<1:23:45, 3.86s/it] 16%|ββ | 247/1547 [15:37<1:22:50, 3.82s/it] 16%|ββ | 248/1547 [15:41<1:23:20, 3.85s/it] 16%|ββ | 249/1547 [15:44<1:22:04, 3.79s/it] 16%|ββ | 250/1547 [15:49<1:24:51, 3.93s/it] {'loss': 0.1144, 'learning_rate': 4.166666666666667e-05, 'epoch': 0.16}
16%|ββ | 250/1547 [15:49<1:24:51, 3.93s/it] 16%|ββ | 251/1547 [15:52<1:22:21, 3.81s/it] 16%|ββ | 252/1547 [15:56<1:24:04, 3.90s/it] 16%|ββ | 253/1547 [16:00<1:24:32, 3.92s/it] 16%|ββ | 254/1547 [16:04<1:23:30, 3.87s/it] 16%|ββ | 255/1547 [16:08<1:22:40, 3.84s/it] 17%|ββ | 256/1547 [16:11<1:21:08, 3.77s/it] 17%|ββ | 257/1547 [16:15<1:21:07, 3.77s/it] 17%|ββ | 258/1547 [16:19<1:21:54, 3.81s/it] 17%|ββ | 259/1547 [16:23<1:20:28, 3.75s/it] 17%|ββ | 260/1547 [16:27<1:22:48, 3.86s/it] {'loss': 0.1183, 'learning_rate': 4.3333333333333334e-05, 'epoch': 0.17}
17%|ββ | 260/1547 [16:27<1:22:48, 3.86s/it] 17%|ββ | 261/1547 [16:30<1:21:15, 3.79s/it] 17%|ββ | 262/1547 [16:35<1:23:40, 3.91s/it] 17%|ββ | 263/1547 [16:39<1:25:02, 3.97s/it] 17%|ββ | 264/1547 [16:42<1:21:57, 3.83s/it] 17%|ββ | 265/1547 [16:46<1:20:26, 3.76s/it] 17%|ββ | 266/1547 [16:50<1:22:11, 3.85s/it] 17%|ββ | 267/1547 [16:53<1:20:28, 3.77s/it] 17%|ββ | 268/1547 [16:57<1:19:54, 3.75s/it] 17%|ββ | 269/1547 [17:01<1:18:39, 3.69s/it] 17%|ββ | 270/1547 [17:04<1:18:34, 3.69s/it] {'loss': 0.1201, 'learning_rate': 4.5e-05, 'epoch': 0.17}
17%|ββ | 270/1547 [17:04<1:18:34, 3.69s/it] 18%|ββ | 271/1547 [17:08<1:17:29, 3.64s/it] 18%|ββ | 272/1547 [17:12<1:19:40, 3.75s/it] 18%|ββ | 273/1547 [17:15<1:17:12, 3.64s/it] 18%|ββ | 274/1547 [17:19<1:17:31, 3.65s/it] 18%|ββ | 275/1547 [17:23<1:17:21, 3.65s/it] 18%|ββ | 276/1547 [17:26<1:17:48, 3.67s/it] 18%|ββ | 277/1547 [17:30<1:21:06, 3.83s/it] 18%|ββ | 278/1547 [17:35<1:22:54, 3.92s/it] 18%|ββ | 279/1547 [17:38<1:21:45, 3.87s/it] 18%|ββ | 280/1547 [17:42<1:22:22, 3.90s/it] {'loss': 0.1271, 'learning_rate': 4.666666666666667e-05, 'epoch': 0.18}
18%|ββ | 280/1547 [17:42<1:22:22, 3.90s/it] 18%|ββ | 281/1547 [17:46<1:20:35, 3.82s/it] 18%|ββ | 282/1547 [17:50<1:18:58, 3.75s/it] 18%|ββ | 283/1547 [17:53<1:18:47, 3.74s/it] 18%|ββ | 284/1547 [17:57<1:19:04, 3.76s/it] 18%|ββ | 285/1547 [18:01<1:18:11, 3.72s/it] 18%|ββ | 286/1547 [18:04<1:17:29, 3.69s/it] 19%|ββ | 287/1547 [18:08<1:17:44, 3.70s/it] 19%|ββ | 288/1547 [18:12<1:16:52, 3.66s/it] 19%|ββ | 289/1547 [18:15<1:16:18, 3.64s/it] 19%|ββ | 290/1547 [18:19<1:16:42, 3.66s/it] {'loss': 0.0923, 'learning_rate': 4.8333333333333334e-05, 'epoch': 0.19}
19%|ββ | 290/1547 [18:19<1:16:42, 3.66s/it] 19%|ββ | 291/1547 [18:23<1:16:52, 3.67s/it] 19%|ββ | 292/1547 [18:26<1:16:01, 3.63s/it] 19%|ββ | 293/1547 [18:30<1:16:24, 3.66s/it] 19%|ββ | 294/1547 [18:34<1:18:00, 3.74s/it] 19%|ββ | 295/1547 [18:37<1:17:32, 3.72s/it] 19%|ββ | 296/1547 [18:41<1:17:46, 3.73s/it] 19%|ββ | 297/1547 [18:45<1:17:18, 3.71s/it] 19%|ββ | 298/1547 [18:49<1:16:52, 3.69s/it] 19%|ββ | 299/1547 [18:52<1:18:10, 3.76s/it] 19%|ββ | 300/1547 [18:56<1:18:43, 3.79s/it] {'loss': 0.1278, 'learning_rate': 5e-05, 'epoch': 0.19}
19%|ββ | 300/1547 [18:56<1:18:43, 3.79s/it] 19%|ββ | 301/1547 [19:01<1:23:15, 4.01s/it] 20%|ββ | 302/1547 [19:04<1:20:34, 3.88s/it] 20%|ββ | 303/1547 [19:08<1:19:59, 3.86s/it] 20%|ββ | 304/1547 [19:12<1:18:47, 3.80s/it] 20%|ββ | 305/1547 [19:16<1:18:51, 3.81s/it] 20%|ββ | 306/1547 [19:20<1:20:56, 3.91s/it] 20%|ββ | 307/1547 [19:24<1:19:42, 3.86s/it] 20%|ββ | 308/1547 [19:27<1:19:17, 3.84s/it] 20%|ββ | 309/1547 [19:31<1:17:49, 3.77s/it] 20%|ββ | 310/1547 [19:35<1:18:04, 3.79s/it] {'loss': 0.1064, 'learning_rate': 5e-05, 'epoch': 0.2}
20%|ββ | 310/1547 [19:35<1:18:04, 3.79s/it] 20%|ββ | 311/1547 [19:38<1:16:08, 3.70s/it] 20%|ββ | 312/1547 [19:42<1:16:46, 3.73s/it] 20%|ββ | 313/1547 [19:46<1:18:26, 3.81s/it] 20%|ββ | 314/1547 [19:50<1:18:09, 3.80s/it] 20%|ββ | 315/1547 [19:54<1:17:44, 3.79s/it] 20%|ββ | 316/1547 [19:58<1:18:23, 3.82s/it] 20%|ββ | 317/1547 [20:02<1:20:15, 3.91s/it] 21%|ββ | 318/1547 [20:05<1:18:17, 3.82s/it] 21%|ββ | 319/1547 [20:09<1:18:50, 3.85s/it] 21%|ββ | 320/1547 [20:13<1:18:00, 3.81s/it] {'loss': 0.1265, 'learning_rate': 5e-05, 'epoch': 0.21}
21%|ββ | 320/1547 [20:13<1:18:00, 3.81s/it] 21%|ββ | 321/1547 [20:17<1:17:20, 3.78s/it] 21%|ββ | 322/1547 [20:20<1:17:05, 3.78s/it] 21%|ββ | 323/1547 [20:24<1:16:16, 3.74s/it] 21%|ββ | 324/1547 [20:28<1:16:43, 3.76s/it] 21%|ββ | 325/1547 [20:31<1:15:37, 3.71s/it] 21%|ββ | 326/1547 [20:36<1:18:14, 3.84s/it] 21%|ββ | 327/1547 [20:39<1:17:37, 3.82s/it] 21%|ββ | 328/1547 [20:43<1:18:34, 3.87s/it] 21%|βββ | 329/1547 [20:48<1:20:38, 3.97s/it] 21%|βββ | 330/1547 [20:52<1:21:23, 4.01s/it] {'loss': 0.0925, 'learning_rate': 5e-05, 'epoch': 0.21}
21%|βββ | 330/1547 [20:52<1:21:23, 4.01s/it] 21%|βββ | 331/1547 [20:56<1:21:01, 4.00s/it] 21%|βββ | 332/1547 [20:59<1:18:13, 3.86s/it] 22%|βββ | 333/1547 [21:03<1:19:42, 3.94s/it] 22%|βββ | 334/1547 [21:07<1:18:25, 3.88s/it] 22%|βββ | 335/1547 [21:11<1:18:42, 3.90s/it] 22%|βββ | 336/1547 [21:15<1:17:59, 3.86s/it] 22%|βββ | 337/1547 [21:19<1:16:58, 3.82s/it] 22%|βββ | 338/1547 [21:22<1:15:53, 3.77s/it] 22%|βββ | 339/1547 [21:26<1:13:29, 3.65s/it] 22%|βββ | 340/1547 [21:29<1:13:45, 3.67s/it] {'loss': 0.1021, 'learning_rate': 5e-05, 'epoch': 0.22}
22%|βββ | 340/1547 [21:29<1:13:45, 3.67s/it] 22%|βββ | 341/1547 [21:33<1:14:15, 3.69s/it] 22%|βββ | 342/1547 [21:37<1:14:26, 3.71s/it] 22%|βββ | 343/1547 [21:40<1:13:55, 3.68s/it] 22%|βββ | 344/1547 [21:44<1:13:18, 3.66s/it] 22%|βββ | 345/1547 [21:48<1:15:33, 3.77s/it] 22%|βββ | 346/1547 [21:52<1:14:59, 3.75s/it] 22%|βββ | 347/1547 [21:55<1:14:33, 3.73s/it] 22%|βββ | 348/1547 [21:59<1:16:45, 3.84s/it] 23%|βββ | 349/1547 [22:03<1:14:25, 3.73s/it] 23%|βββ | 350/1547 [22:07<1:14:59, 3.76s/it] {'loss': 0.1239, 'learning_rate': 5e-05, 'epoch': 0.23}
23%|βββ | 350/1547 [22:07<1:14:59, 3.76s/it] 23%|βββ | 351/1547 [22:10<1:14:40, 3.75s/it] 23%|βββ | 352/1547 [22:14<1:14:19, 3.73s/it] 23%|βββ | 353/1547 [22:18<1:14:51, 3.76s/it] 23%|βββ | 354/1547 [22:22<1:16:49, 3.86s/it] 23%|βββ | 355/1547 [22:26<1:15:23, 3.80s/it] 23%|βββ | 356/1547 [22:30<1:16:47, 3.87s/it] 23%|βββ | 357/1547 [22:34<1:17:32, 3.91s/it] 23%|βββ | 358/1547 [22:38<1:17:00, 3.89s/it] 23%|βββ | 359/1547 [22:41<1:14:23, 3.76s/it] 23%|βββ | 360/1547 [22:45<1:14:46, 3.78s/it] {'loss': 0.1255, 'learning_rate': 5e-05, 'epoch': 0.23}
23%|βββ | 360/1547 [22:45<1:14:46, 3.78s/it] 23%|βββ | 361/1547 [22:49<1:14:22, 3.76s/it] 23%|βββ | 362/1547 [22:52<1:13:54, 3.74s/it] 23%|βββ | 363/1547 [22:56<1:14:45, 3.79s/it] 24%|βββ | 364/1547 [23:00<1:17:19, 3.92s/it] 24%|βββ | 365/1547 [23:04<1:16:32, 3.89s/it] 24%|βββ | 366/1547 [23:08<1:16:22, 3.88s/it] 24%|βββ | 367/1547 [23:12<1:15:38, 3.85s/it] 24%|βββ | 368/1547 [23:15<1:13:19, 3.73s/it] 24%|βββ | 369/1547 [23:19<1:12:45, 3.71s/it] 24%|βββ | 370/1547 [23:23<1:13:33, 3.75s/it] {'loss': 0.1031, 'learning_rate': 5e-05, 'epoch': 0.24}
24%|βββ | 370/1547 [23:23<1:13:33, 3.75s/it] 24%|βββ | 371/1547 [23:27<1:14:08, 3.78s/it] 24%|βββ | 372/1547 [23:30<1:12:56, 3.72s/it] 24%|βββ | 373/1547 [23:34<1:12:59, 3.73s/it] 24%|βββ | 374/1547 [23:38<1:12:53, 3.73s/it] 24%|βββ | 375/1547 [23:42<1:13:27, 3.76s/it] 24%|βββ | 376/1547 [23:45<1:11:50, 3.68s/it] 24%|βββ | 377/1547 [23:49<1:12:48, 3.73s/it] 24%|βββ | 378/1547 [23:53<1:12:17, 3.71s/it] 24%|βββ | 379/1547 [23:56<1:11:34, 3.68s/it] 25%|βββ | 380/1547 [24:00<1:10:12, 3.61s/it] {'loss': 0.0955, 'learning_rate': 5e-05, 'epoch': 0.25}
25%|βββ | 380/1547 [24:00<1:10:12, 3.61s/it] 25%|βββ | 381/1547 [24:03<1:11:08, 3.66s/it] 25%|βββ | 382/1547 [24:07<1:11:50, 3.70s/it] 25%|βββ | 383/1547 [24:11<1:10:58, 3.66s/it] 25%|βββ | 384/1547 [24:14<1:10:36, 3.64s/it] 25%|βββ | 385/1547 [24:19<1:13:57, 3.82s/it] 25%|βββ | 386/1547 [24:22<1:13:48, 3.81s/it] 25%|βββ | 387/1547 [24:27<1:16:02, 3.93s/it] 25%|βββ | 388/1547 [24:30<1:15:05, 3.89s/it] 25%|βββ | 389/1547 [24:34<1:15:17, 3.90s/it] 25%|βββ | 390/1547 [24:38<1:15:52, 3.93s/it] {'loss': 0.0955, 'learning_rate': 5e-05, 'epoch': 0.25}
25%|βββ | 390/1547 [24:38<1:15:52, 3.93s/it] 25%|βββ | 391/1547 [24:42<1:16:45, 3.98s/it] 25%|βββ | 392/1547 [24:46<1:16:03, 3.95s/it] 25%|βββ | 393/1547 [24:50<1:15:36, 3.93s/it] 25%|βββ | 394/1547 [24:54<1:13:54, 3.85s/it] 26%|βββ | 395/1547 [24:58<1:13:38, 3.84s/it] 26%|βββ | 396/1547 [25:01<1:11:45, 3.74s/it] 26%|βββ | 397/1547 [25:05<1:11:13, 3.72s/it] 26%|βββ | 398/1547 [25:09<1:13:04, 3.82s/it] 26%|βββ | 399/1547 [25:13<1:11:50, 3.75s/it] 26%|βββ | 400/1547 [25:16<1:11:46, 3.75s/it] {'loss': 0.1461, 'learning_rate': 5e-05, 'epoch': 0.26}
26%|βββ | 400/1547 [25:16<1:11:46, 3.75s/it] 26%|βββ | 401/1547 [25:20<1:11:34, 3.75s/it] 26%|βββ | 402/1547 [25:25<1:16:10, 3.99s/it] 26%|βββ | 403/1547 [25:28<1:14:02, 3.88s/it] 26%|βββ | 404/1547 [25:32<1:13:32, 3.86s/it] 26%|βββ | 405/1547 [25:35<1:11:11, 3.74s/it] 26%|βββ | 406/1547 [25:39<1:08:22, 3.60s/it] 26%|βββ | 407/1547 [25:43<1:10:31, 3.71s/it] 26%|βββ | 408/1547 [25:47<1:10:47, 3.73s/it] 26%|βββ | 409/1547 [25:51<1:12:17, 3.81s/it] 27%|βββ | 410/1547 [25:54<1:10:58, 3.75s/it] {'loss': 0.1166, 'learning_rate': 5e-05, 'epoch': 0.27}
27%|βββ | 410/1547 [25:54<1:10:58, 3.75s/it] 27%|βββ | 411/1547 [25:58<1:11:26, 3.77s/it] 27%|βββ | 412/1547 [26:02<1:10:53, 3.75s/it] 27%|βββ | 413/1547 [26:05<1:11:22, 3.78s/it] 27%|βββ | 414/1547 [26:09<1:10:11, 3.72s/it] 27%|βββ | 415/1547 [26:13<1:12:01, 3.82s/it] 27%|βββ | 416/1547 [26:17<1:09:50, 3.71s/it] 27%|βββ | 417/1547 [26:20<1:10:21, 3.74s/it] 27%|βββ | 418/1547 [26:24<1:10:34, 3.75s/it] 27%|βββ | 419/1547 [26:28<1:09:56, 3.72s/it] 27%|βββ | 420/1547 [26:31<1:09:29, 3.70s/it] {'loss': 0.099, 'learning_rate': 5e-05, 'epoch': 0.27}
27%|βββ | 420/1547 [26:31<1:09:29, 3.70s/it] 27%|βββ | 421/1547 [26:36<1:13:12, 3.90s/it] 27%|βββ | 422/1547 [26:39<1:11:38, 3.82s/it] 27%|βββ | 423/1547 [26:44<1:12:57, 3.89s/it] 27%|βββ | 424/1547 [26:47<1:11:48, 3.84s/it] 27%|βββ | 425/1547 [26:51<1:10:52, 3.79s/it] 28%|βββ | 426/1547 [26:55<1:10:44, 3.79s/it] 28%|βββ | 427/1547 [26:59<1:11:39, 3.84s/it] 28%|βββ | 428/1547 [27:03<1:12:32, 3.89s/it] 28%|βββ | 429/1547 [27:06<1:11:15, 3.82s/it] 28%|βββ | 430/1547 [27:10<1:10:05, 3.76s/it] {'loss': 0.097, 'learning_rate': 5e-05, 'epoch': 0.28}
28%|βββ | 430/1547 [27:10<1:10:05, 3.76s/it] 28%|βββ | 431/1547 [27:14<1:10:42, 3.80s/it] 28%|βββ | 432/1547 [27:17<1:09:25, 3.74s/it] 28%|βββ | 433/1547 [27:21<1:09:30, 3.74s/it] 28%|βββ | 434/1547 [27:25<1:09:49, 3.76s/it] 28%|βββ | 435/1547 [27:29<1:10:13, 3.79s/it] 28%|βββ | 436/1547 [27:32<1:09:27, 3.75s/it] 28%|βββ | 437/1547 [27:36<1:09:41, 3.77s/it] 28%|βββ | 438/1547 [27:40<1:09:56, 3.78s/it] 28%|βββ | 439/1547 [27:44<1:09:16, 3.75s/it] 28%|βββ | 440/1547 [27:48<1:10:17, 3.81s/it] {'loss': 0.113, 'learning_rate': 5e-05, 'epoch': 0.28}
28%|βββ | 440/1547 [27:48<1:10:17, 3.81s/it] 29%|βββ | 441/1547 [27:52<1:10:04, 3.80s/it] 29%|βββ | 442/1547 [27:55<1:07:35, 3.67s/it] 29%|βββ | 443/1547 [27:59<1:09:27, 3.78s/it] 29%|βββ | 444/1547 [28:03<1:09:41, 3.79s/it] 29%|βββ | 445/1547 [28:06<1:07:54, 3.70s/it] 29%|βββ | 446/1547 [28:10<1:06:37, 3.63s/it] 29%|βββ | 447/1547 [28:13<1:04:55, 3.54s/it] 29%|βββ | 448/1547 [28:17<1:07:16, 3.67s/it] 29%|βββ | 449/1547 [28:20<1:06:00, 3.61s/it] 29%|βββ | 450/1547 [28:24<1:06:16, 3.62s/it] {'loss': 0.1409, 'learning_rate': 5e-05, 'epoch': 0.29}
29%|βββ | 450/1547 [28:24<1:06:16, 3.62s/it] 29%|βββ | 451/1547 [28:28<1:07:08, 3.68s/it] 29%|βββ | 452/1547 [28:32<1:07:56, 3.72s/it] 29%|βββ | 453/1547 [28:36<1:08:09, 3.74s/it] 29%|βββ | 454/1547 [28:40<1:09:31, 3.82s/it] 29%|βββ | 455/1547 [28:43<1:08:38, 3.77s/it] 29%|βββ | 456/1547 [28:47<1:08:57, 3.79s/it] 30%|βββ | 457/1547 [28:51<1:08:25, 3.77s/it] 30%|βββ | 458/1547 [28:54<1:07:53, 3.74s/it] 30%|βββ | 459/1547 [28:58<1:07:35, 3.73s/it] 30%|βββ | 460/1547 [29:02<1:08:01, 3.75s/it] {'loss': 0.0949, 'learning_rate': 5e-05, 'epoch': 0.3}
30%|βββ | 460/1547 [29:02<1:08:01, 3.75s/it] 30%|βββ | 461/1547 [29:06<1:09:34, 3.84s/it] 30%|βββ | 462/1547 [29:10<1:10:05, 3.88s/it] 30%|βββ | 463/1547 [29:13<1:08:14, 3.78s/it] 30%|βββ | 464/1547 [29:17<1:08:24, 3.79s/it] 30%|βββ | 465/1547 [29:21<1:10:29, 3.91s/it] 30%|βββ | 466/1547 [29:26<1:11:09, 3.95s/it] 30%|βββ | 467/1547 [29:29<1:10:11, 3.90s/it] 30%|βββ | 468/1547 [29:33<1:11:32, 3.98s/it] 30%|βββ | 469/1547 [29:37<1:11:04, 3.96s/it] 30%|βββ | 470/1547 [29:41<1:09:17, 3.86s/it] {'loss': 0.0983, 'learning_rate': 5e-05, 'epoch': 0.3}
30%|βββ | 470/1547 [29:41<1:09:17, 3.86s/it] 30%|βββ | 471/1547 [29:45<1:08:51, 3.84s/it] 31%|βββ | 472/1547 [29:48<1:07:00, 3.74s/it] 31%|βββ | 473/1547 [29:52<1:07:16, 3.76s/it] 31%|βββ | 474/1547 [29:56<1:07:49, 3.79s/it] 31%|βββ | 475/1547 [30:00<1:08:02, 3.81s/it] 31%|βββ | 476/1547 [30:03<1:07:11, 3.76s/it] 31%|βββ | 477/1547 [30:07<1:07:12, 3.77s/it] 31%|βββ | 478/1547 [30:11<1:08:21, 3.84s/it] 31%|βββ | 479/1547 [30:15<1:08:42, 3.86s/it] 31%|βββ | 480/1547 [30:19<1:08:31, 3.85s/it] {'loss': 0.1114, 'learning_rate': 5e-05, 'epoch': 0.31}
31%|βββ | 480/1547 [30:19<1:08:31, 3.85s/it] 31%|βββ | 481/1547 [30:22<1:05:55, 3.71s/it] 31%|βββ | 482/1547 [30:26<1:07:03, 3.78s/it] 31%|βββ | 483/1547 [30:30<1:07:56, 3.83s/it] 31%|ββββ | 484/1547 [30:34<1:07:40, 3.82s/it] 31%|ββββ | 485/1547 [30:38<1:09:53, 3.95s/it] 31%|ββββ | 486/1547 [30:42<1:10:04, 3.96s/it] 31%|ββββ | 487/1547 [30:46<1:07:51, 3.84s/it] 32%|ββββ | 488/1547 [30:49<1:06:32, 3.77s/it] 32%|ββββ | 489/1547 [30:53<1:07:05, 3.81s/it] 32%|ββββ | 490/1547 [30:57<1:07:56, 3.86s/it] {'loss': 0.1298, 'learning_rate': 5e-05, 'epoch': 0.32}
32%|ββββ | 490/1547 [30:57<1:07:56, 3.86s/it] 32%|ββββ | 491/1547 [31:01<1:07:02, 3.81s/it] 32%|ββββ | 492/1547 [31:05<1:06:26, 3.78s/it] 32%|ββββ | 493/1547 [31:09<1:07:25, 3.84s/it] 32%|ββββ | 494/1547 [31:13<1:09:01, 3.93s/it] 32%|ββββ | 495/1547 [31:17<1:09:46, 3.98s/it] 32%|ββββ | 496/1547 [31:21<1:08:12, 3.89s/it] 32%|ββββ | 497/1547 [31:24<1:07:02, 3.83s/it] 32%|ββββ | 498/1547 [31:28<1:05:20, 3.74s/it] 32%|ββββ | 499/1547 [31:31<1:04:28, 3.69s/it] 32%|ββββ | 500/1547 [31:35<1:04:16, 3.68s/it] {'loss': 0.107, 'learning_rate': 5e-05, 'epoch': 0.32}
32%|ββββ | 500/1547 [31:35<1:04:16, 3.68s/it] 32%|ββββ | 501/1547 [31:42<1:21:37, 4.68s/it] 32%|ββββ | 502/1547 [31:46<1:16:43, 4.40s/it] 33%|ββββ | 503/1547 [31:49<1:12:23, 4.16s/it] 33%|ββββ | 504/1547 [31:53<1:09:27, 4.00s/it] 33%|ββββ | 505/1547 [31:57<1:06:41, 3.84s/it] 33%|ββββ | 506/1547 [32:01<1:08:39, 3.96s/it] 33%|ββββ | 507/1547 [32:05<1:08:57, 3.98s/it] 33%|ββββ | 508/1547 [32:08<1:06:32, 3.84s/it] 33%|ββββ | 509/1547 [32:12<1:04:56, 3.75s/it] 33%|ββββ | 510/1547 [32:16<1:05:15, 3.78s/it] {'loss': 0.0865, 'learning_rate': 5e-05, 'epoch': 0.33}
33%|ββββ | 510/1547 [32:16<1:05:15, 3.78s/it] 33%|ββββ | 511/1547 [32:19<1:04:32, 3.74s/it] 33%|ββββ | 512/1547 [32:23<1:04:46, 3.75s/it] 33%|ββββ | 513/1547 [32:27<1:04:00, 3.71s/it] 33%|ββββ | 514/1547 [32:30<1:03:17, 3.68s/it] 33%|ββββ | 515/1547 [32:34<1:03:59, 3.72s/it] 33%|ββββ | 516/1547 [32:38<1:03:48, 3.71s/it] 33%|ββββ | 517/1547 [32:42<1:04:11, 3.74s/it] 33%|ββββ | 518/1547 [32:45<1:04:09, 3.74s/it] 34%|ββββ | 519/1547 [32:49<1:04:18, 3.75s/it] 34%|ββββ | 520/1547 [32:53<1:05:18, 3.82s/it] {'loss': 0.0956, 'learning_rate': 5e-05, 'epoch': 0.34}
34%|ββββ | 520/1547 [32:53<1:05:18, 3.82s/it] 34%|ββββ | 521/1547 [32:57<1:04:27, 3.77s/it] 34%|ββββ | 522/1547 [33:01<1:04:04, 3.75s/it] 34%|ββββ | 523/1547 [33:04<1:02:49, 3.68s/it] 34%|ββββ | 524/1547 [33:08<1:04:50, 3.80s/it] 34%|ββββ | 525/1547 [33:12<1:04:53, 3.81s/it] 34%|ββββ | 526/1547 [33:16<1:05:19, 3.84s/it] 34%|ββββ | 527/1547 [33:20<1:04:49, 3.81s/it] 34%|ββββ | 528/1547 [33:23<1:03:39, 3.75s/it] 34%|ββββ | 529/1547 [33:27<1:03:05, 3.72s/it] 34%|ββββ | 530/1547 [33:31<1:03:30, 3.75s/it] {'loss': 0.0917, 'learning_rate': 5e-05, 'epoch': 0.34}
34%|ββββ | 530/1547 [33:31<1:03:30, 3.75s/it] 34%|ββββ | 531/1547 [33:35<1:04:11, 3.79s/it] 34%|ββββ | 532/1547 [33:39<1:04:44, 3.83s/it] 34%|ββββ | 533/1547 [33:42<1:04:08, 3.79s/it] 35%|ββββ | 534/1547 [33:46<1:03:32, 3.76s/it] 35%|ββββ | 535/1547 [33:49<1:01:18, 3.63s/it] 35%|ββββ | 536/1547 [33:53<1:02:27, 3.71s/it] 35%|ββββ | 537/1547 [33:57<1:02:58, 3.74s/it] 35%|ββββ | 538/1547 [34:01<1:05:02, 3.87s/it] 35%|ββββ | 539/1547 [34:05<1:03:43, 3.79s/it] 35%|ββββ | 540/1547 [34:08<1:03:26, 3.78s/it] {'loss': 0.1227, 'learning_rate': 5e-05, 'epoch': 0.35}
35%|ββββ | 540/1547 [34:08<1:03:26, 3.78s/it] 35%|ββββ | 541/1547 [34:12<1:02:34, 3.73s/it] 35%|ββββ | 542/1547 [34:16<1:03:51, 3.81s/it] 35%|ββββ | 543/1547 [34:20<1:02:42, 3.75s/it] 35%|ββββ | 544/1547 [34:23<1:01:58, 3.71s/it] 35%|ββββ | 545/1547 [34:27<1:02:18, 3.73s/it] 35%|ββββ | 546/1547 [34:31<1:03:45, 3.82s/it] 35%|ββββ | 547/1547 [34:35<1:03:39, 3.82s/it] 35%|ββββ | 548/1547 [34:38<1:02:00, 3.72s/it] 35%|ββββ | 549/1547 [34:42<1:02:07, 3.74s/it] 36%|ββββ | 550/1547 [34:46<1:02:58, 3.79s/it] {'loss': 0.0914, 'learning_rate': 5e-05, 'epoch': 0.36}
36%|ββββ | 550/1547 [34:46<1:02:58, 3.79s/it] 36%|ββββ | 551/1547 [34:50<1:04:48, 3.90s/it] 36%|ββββ | 552/1547 [34:54<1:04:17, 3.88s/it] 36%|ββββ | 553/1547 [34:58<1:03:14, 3.82s/it] 36%|ββββ | 554/1547 [35:02<1:03:57, 3.86s/it] 36%|ββββ | 555/1547 [35:06<1:03:45, 3.86s/it] 36%|ββββ | 556/1547 [35:10<1:05:40, 3.98s/it] 36%|ββββ | 557/1547 [35:14<1:04:02, 3.88s/it] 36%|ββββ | 558/1547 [35:17<1:03:57, 3.88s/it] 36%|ββββ | 559/1547 [35:21<1:02:05, 3.77s/it] 36%|ββββ | 560/1547 [35:25<1:02:57, 3.83s/it] {'loss': 0.0857, 'learning_rate': 5e-05, 'epoch': 0.36}
36%|ββββ | 560/1547 [35:25<1:02:57, 3.83s/it] 36%|ββββ | 561/1547 [35:29<1:02:08, 3.78s/it] 36%|ββββ | 562/1547 [35:33<1:04:30, 3.93s/it] 36%|ββββ | 563/1547 [35:36<1:02:30, 3.81s/it] 36%|ββββ | 564/1547 [35:40<1:02:56, 3.84s/it] 37%|ββββ | 565/1547 [35:44<1:02:53, 3.84s/it] 37%|ββββ | 566/1547 [35:48<1:04:46, 3.96s/it] 37%|ββββ | 567/1547 [35:52<1:03:08, 3.87s/it] 37%|ββββ | 568/1547 [35:56<1:01:21, 3.76s/it] 37%|ββββ | 569/1547 [35:59<1:00:44, 3.73s/it] 37%|ββββ | 570/1547 [36:03<1:01:52, 3.80s/it] {'loss': 0.1086, 'learning_rate': 5e-05, 'epoch': 0.37}
37%|ββββ | 570/1547 [36:03<1:01:52, 3.80s/it] 37%|ββββ | 571/1547 [36:07<1:01:29, 3.78s/it] 37%|ββββ | 572/1547 [36:11<1:03:09, 3.89s/it] 37%|ββββ | 573/1547 [36:15<1:02:48, 3.87s/it] 37%|ββββ | 574/1547 [36:19<1:02:54, 3.88s/it] 37%|ββββ | 575/1547 [36:23<1:03:15, 3.90s/it] 37%|ββββ | 576/1547 [36:27<1:03:35, 3.93s/it] 37%|ββββ | 577/1547 [36:31<1:04:11, 3.97s/it] 37%|ββββ | 578/1547 [36:35<1:03:25, 3.93s/it] 37%|ββββ | 579/1547 [36:38<1:03:21, 3.93s/it] 37%|ββββ | 580/1547 [36:42<1:01:51, 3.84s/it] {'loss': 0.1278, 'learning_rate': 5e-05, 'epoch': 0.37}
37%|ββββ | 580/1547 [36:42<1:01:51, 3.84s/it] 38%|ββββ | 581/1547 [36:46<1:01:40, 3.83s/it] 38%|ββββ | 582/1547 [36:50<1:00:50, 3.78s/it] 38%|ββββ | 583/1547 [36:54<1:02:34, 3.89s/it] 38%|ββββ | 584/1547 [36:58<1:01:54, 3.86s/it] 38%|ββββ | 585/1547 [37:01<1:00:58, 3.80s/it] 38%|ββββ | 586/1547 [37:05<59:07, 3.69s/it] 38%|ββββ | 587/1547 [37:09<1:00:32, 3.78s/it] 38%|ββββ | 588/1547 [37:13<1:02:44, 3.93s/it] 38%|ββββ | 589/1547 [37:17<1:01:15, 3.84s/it] 38%|ββββ | 590/1547 [37:20<1:00:02, 3.76s/it] {'loss': 0.0823, 'learning_rate': 5e-05, 'epoch': 0.38}
38%|ββββ | 590/1547 [37:20<1:00:02, 3.76s/it] 38%|ββββ | 591/1547 [37:24<1:00:40, 3.81s/it] 38%|ββββ | 592/1547 [37:28<59:32, 3.74s/it] 38%|ββββ | 593/1547 [37:31<58:12, 3.66s/it] 38%|ββββ | 594/1547 [37:35<57:10, 3.60s/it] 38%|ββββ | 595/1547 [37:39<59:25, 3.75s/it] 39%|ββββ | 596/1547 [37:42<59:24, 3.75s/it] 39%|ββββ | 597/1547 [37:46<58:54, 3.72s/it] 39%|ββββ | 598/1547 [37:50<58:18, 3.69s/it] 39%|ββββ | 599/1547 [37:54<59:25, 3.76s/it] 39%|ββββ | 600/1547 [37:57<58:49, 3.73s/it] {'loss': 0.078, 'learning_rate': 5e-05, 'epoch': 0.39}
39%|ββββ | 600/1547 [37:57<58:49, 3.73s/it] 39%|ββββ | 601/1547 [38:01<58:04, 3.68s/it] 39%|ββββ | 602/1547 [38:04<57:03, 3.62s/it] 39%|ββββ | 603/1547 [38:08<57:26, 3.65s/it] 39%|ββββ | 604/1547 [38:12<58:37, 3.73s/it] 39%|ββββ | 605/1547 [38:16<59:15, 3.77s/it] 39%|ββββ | 606/1547 [38:19<57:46, 3.68s/it] 39%|ββββ | 607/1547 [38:23<57:57, 3.70s/it] 39%|ββββ | 608/1547 [38:27<57:36, 3.68s/it] 39%|ββββ | 609/1547 [38:30<57:03, 3.65s/it] 39%|ββββ | 610/1547 [38:34<56:02, 3.59s/it] {'loss': 0.105, 'learning_rate': 5e-05, 'epoch': 0.39}
39%|ββββ | 610/1547 [38:34<56:02, 3.59s/it] 39%|ββββ | 611/1547 [38:38<57:10, 3.66s/it] 40%|ββββ | 612/1547 [38:41<57:20, 3.68s/it] 40%|ββββ | 613/1547 [38:45<56:53, 3.66s/it] 40%|ββββ | 614/1547 [38:49<57:51, 3.72s/it] 40%|ββββ | 615/1547 [38:52<57:45, 3.72s/it] 40%|ββββ | 616/1547 [38:56<57:35, 3.71s/it] 40%|ββββ | 617/1547 [39:00<57:11, 3.69s/it] 40%|ββββ | 618/1547 [39:03<56:47, 3.67s/it] 40%|ββββ | 619/1547 [39:07<56:18, 3.64s/it] 40%|ββββ | 620/1547 [39:11<57:43, 3.74s/it] {'loss': 0.116, 'learning_rate': 5e-05, 'epoch': 0.4}
40%|ββββ | 620/1547 [39:11<57:43, 3.74s/it] 40%|ββββ | 621/1547 [39:15<57:18, 3.71s/it] 40%|ββββ | 622/1547 [39:19<58:54, 3.82s/it] 40%|ββββ | 623/1547 [39:22<58:41, 3.81s/it] 40%|ββββ | 624/1547 [39:27<1:00:31, 3.93s/it] 40%|ββββ | 625/1547 [39:31<1:00:40, 3.95s/it] 40%|ββββ | 626/1547 [39:34<1:00:08, 3.92s/it] 41%|ββββ | 627/1547 [39:38<59:01, 3.85s/it] 41%|ββββ | 628/1547 [39:42<58:09, 3.80s/it] 41%|ββββ | 629/1547 [39:46<58:02, 3.79s/it] 41%|ββββ | 630/1547 [39:49<58:11, 3.81s/it] {'loss': 0.1161, 'learning_rate': 5e-05, 'epoch': 0.41}
41%|ββββ | 630/1547 [39:49<58:11, 3.81s/it] 41%|ββββ | 631/1547 [39:53<56:38, 3.71s/it] 41%|ββββ | 632/1547 [39:57<57:09, 3.75s/it] 41%|ββββ | 633/1547 [40:01<59:25, 3.90s/it] 41%|ββββ | 634/1547 [40:05<59:01, 3.88s/it] 41%|ββββ | 635/1547 [40:09<59:37, 3.92s/it] 41%|ββββ | 636/1547 [40:13<58:08, 3.83s/it] 41%|ββββ | 637/1547 [40:16<57:33, 3.79s/it] 41%|ββββ | 638/1547 [40:20<57:00, 3.76s/it] 41%|βββββ | 639/1547 [40:24<56:53, 3.76s/it] 41%|βββββ | 640/1547 [40:28<57:19, 3.79s/it] {'loss': 0.0967, 'learning_rate': 5e-05, 'epoch': 0.41}
41%|βββββ | 640/1547 [40:28<57:19, 3.79s/it] 41%|βββββ | 641/1547 [40:31<56:34, 3.75s/it] 41%|βββββ | 642/1547 [40:35<55:55, 3.71s/it] 42%|βββββ | 643/1547 [40:39<56:37, 3.76s/it] 42%|βββββ | 644/1547 [40:42<55:55, 3.72s/it] 42%|βββββ | 645/1547 [40:46<55:32, 3.69s/it] 42%|βββββ | 646/1547 [40:50<56:33, 3.77s/it] 42%|βββββ | 647/1547 [40:54<56:21, 3.76s/it] 42%|βββββ | 648/1547 [40:57<55:57, 3.73s/it] 42%|βββββ | 649/1547 [41:01<56:28, 3.77s/it] 42%|βββββ | 650/1547 [41:05<56:51, 3.80s/it] {'loss': 0.1, 'learning_rate': 5e-05, 'epoch': 0.42}
42%|βββββ | 650/1547 [41:05<56:51, 3.80s/it] 42%|βββββ | 651/1547 [41:09<55:23, 3.71s/it] 42%|βββββ | 652/1547 [41:12<56:05, 3.76s/it] 42%|βββββ | 653/1547 [41:16<56:25, 3.79s/it] 42%|βββββ | 654/1547 [41:20<56:37, 3.81s/it] 42%|βββββ | 655/1547 [41:24<57:41, 3.88s/it] 42%|βββββ | 656/1547 [41:28<56:32, 3.81s/it] 42%|βββββ | 657/1547 [41:31<55:51, 3.77s/it] 43%|βββββ | 658/1547 [41:35<56:00, 3.78s/it] 43%|βββββ | 659/1547 [41:39<57:27, 3.88s/it] 43%|βββββ | 660/1547 [41:43<56:52, 3.85s/it] {'loss': 0.1056, 'learning_rate': 5e-05, 'epoch': 0.43}
43%|βββββ | 660/1547 [41:43<56:52, 3.85s/it] 43%|βββββ | 661/1547 [41:47<55:53, 3.79s/it] 43%|βββββ | 662/1547 [41:51<55:41, 3.78s/it] 43%|βββββ | 663/1547 [41:54<55:24, 3.76s/it] 43%|βββββ | 664/1547 [41:58<55:58, 3.80s/it] 43%|βββββ | 665/1547 [42:02<55:29, 3.77s/it] 43%|βββββ | 666/1547 [42:06<55:26, 3.78s/it] 43%|βββββ | 667/1547 [42:10<56:17, 3.84s/it] 43%|βββββ | 668/1547 [42:13<55:08, 3.76s/it] 43%|βββββ | 669/1547 [42:17<55:25, 3.79s/it] 43%|βββββ | 670/1547 [42:21<54:34, 3.73s/it] {'loss': 0.1039, 'learning_rate': 5e-05, 'epoch': 0.43}
43%|βββββ | 670/1547 [42:21<54:34, 3.73s/it] 43%|βββββ | 671/1547 [42:24<54:02, 3.70s/it] 43%|βββββ | 672/1547 [42:28<53:15, 3.65s/it] 44%|βββββ | 673/1547 [42:32<54:07, 3.72s/it] 44%|βββββ | 674/1547 [42:36<55:52, 3.84s/it] 44%|βββββ | 675/1547 [42:40<55:04, 3.79s/it] 44%|βββββ | 676/1547 [42:44<58:15, 4.01s/it] 44%|βββββ | 677/1547 [42:48<56:38, 3.91s/it] 44%|βββββ | 678/1547 [42:51<55:49, 3.85s/it] 44%|βββββ | 679/1547 [42:55<55:18, 3.82s/it] 44%|βββββ | 680/1547 [42:59<53:35, 3.71s/it] {'loss': 0.1086, 'learning_rate': 5e-05, 'epoch': 0.44}
44%|βββββ | 680/1547 [42:59<53:35, 3.71s/it] 44%|βββββ | 681/1547 [43:02<53:28, 3.71s/it] 44%|βββββ | 682/1547 [43:06<53:03, 3.68s/it] 44%|βββββ | 683/1547 [43:09<51:31, 3.58s/it] 44%|βββββ | 684/1547 [43:13<52:43, 3.67s/it] 44%|βββββ | 685/1547 [43:17<53:36, 3.73s/it] 44%|βββββ | 686/1547 [43:21<54:07, 3.77s/it] 44%|βββββ | 687/1547 [43:25<55:10, 3.85s/it] 44%|βββββ | 688/1547 [43:29<53:56, 3.77s/it] 45%|βββββ | 689/1547 [43:33<55:20, 3.87s/it] 45%|βββββ | 690/1547 [43:37<55:49, 3.91s/it] {'loss': 0.1018, 'learning_rate': 5e-05, 'epoch': 0.45}
45%|βββββ | 690/1547 [43:37<55:49, 3.91s/it] 45%|βββββ | 691/1547 [43:40<54:16, 3.80s/it] 45%|βββββ | 692/1547 [43:44<54:39, 3.84s/it] 45%|βββββ | 693/1547 [43:48<54:31, 3.83s/it] 45%|βββββ | 694/1547 [43:52<54:13, 3.81s/it] 45%|βββββ | 695/1547 [43:55<53:35, 3.77s/it] 45%|βββββ | 696/1547 [43:59<54:18, 3.83s/it] 45%|βββββ | 697/1547 [44:03<54:30, 3.85s/it] 45%|βββββ | 698/1547 [44:07<54:18, 3.84s/it] 45%|βββββ | 699/1547 [44:11<55:05, 3.90s/it] 45%|βββββ | 700/1547 [44:15<55:15, 3.91s/it] {'loss': 0.096, 'learning_rate': 5e-05, 'epoch': 0.45}
45%|βββββ | 700/1547 [44:15<55:15, 3.91s/it] 45%|βββββ | 701/1547 [44:19<54:23, 3.86s/it] 45%|βββββ | 702/1547 [44:23<54:49, 3.89s/it] 45%|βββββ | 703/1547 [44:27<56:44, 4.03s/it] 46%|βββββ | 704/1547 [44:31<54:29, 3.88s/it] 46%|βββββ | 705/1547 [44:34<54:05, 3.85s/it] 46%|βββββ | 706/1547 [44:38<53:40, 3.83s/it] 46%|βββββ | 707/1547 [44:42<54:04, 3.86s/it] 46%|βββββ | 708/1547 [44:46<54:02, 3.86s/it] 46%|βββββ | 709/1547 [44:50<55:26, 3.97s/it] 46%|βββββ | 710/1547 [44:54<53:43, 3.85s/it] {'loss': 0.0989, 'learning_rate': 5e-05, 'epoch': 0.46}
46%|βββββ | 710/1547 [44:54<53:43, 3.85s/it] 46%|βββββ | 711/1547 [44:58<55:46, 4.00s/it] 46%|βββββ | 712/1547 [45:02<54:37, 3.92s/it] 46%|βββββ | 713/1547 [45:06<53:51, 3.87s/it] 46%|βββββ | 714/1547 [45:09<53:23, 3.85s/it] 46%|βββββ | 715/1547 [45:13<52:43, 3.80s/it] 46%|βββββ | 716/1547 [45:18<55:38, 4.02s/it] 46%|βββββ | 717/1547 [45:22<55:03, 3.98s/it] 46%|βββββ | 718/1547 [45:25<53:25, 3.87s/it] 46%|βββββ | 719/1547 [45:29<53:59, 3.91s/it] 47%|βββββ | 720/1547 [45:33<52:02, 3.78s/it] {'loss': 0.1099, 'learning_rate': 5e-05, 'epoch': 0.47}
47%|βββββ | 720/1547 [45:33<52:02, 3.78s/it] 47%|βββββ | 721/1547 [45:36<50:55, 3.70s/it] 47%|βββββ | 722/1547 [45:40<51:55, 3.78s/it] 47%|βββββ | 723/1547 [45:44<53:28, 3.89s/it] 47%|βββββ | 724/1547 [45:48<53:29, 3.90s/it] 47%|βββββ | 725/1547 [45:52<52:16, 3.82s/it] 47%|βββββ | 726/1547 [45:56<52:15, 3.82s/it] 47%|βββββ | 727/1547 [45:59<50:48, 3.72s/it] 47%|βββββ | 728/1547 [46:03<51:14, 3.75s/it] 47%|βββββ | 729/1547 [46:07<50:28, 3.70s/it] 47%|βββββ | 730/1547 [46:11<51:39, 3.79s/it] {'loss': 0.0894, 'learning_rate': 5e-05, 'epoch': 0.47}
47%|βββββ | 730/1547 [46:11<51:39, 3.79s/it] 47%|βββββ | 731/1547 [46:14<51:06, 3.76s/it] 47%|βββββ | 732/1547 [46:18<51:15, 3.77s/it] 47%|βββββ | 733/1547 [46:22<50:50, 3.75s/it] 47%|βββββ | 734/1547 [46:26<51:21, 3.79s/it] 48%|βββββ | 735/1547 [46:29<50:46, 3.75s/it] 48%|βββββ | 736/1547 [46:33<49:42, 3.68s/it] 48%|βββββ | 737/1547 [46:36<49:05, 3.64s/it] 48%|βββββ | 738/1547 [46:40<50:20, 3.73s/it] 48%|βββββ | 739/1547 [46:44<49:40, 3.69s/it] 48%|βββββ | 740/1547 [46:47<49:32, 3.68s/it] {'loss': 0.0815, 'learning_rate': 5e-05, 'epoch': 0.48}
48%|βββββ | 740/1547 [46:47<49:32, 3.68s/it] 48%|βββββ | 741/1547 [46:51<49:27, 3.68s/it] 48%|βββββ | 742/1547 [46:55<49:31, 3.69s/it] 48%|βββββ | 743/1547 [46:59<49:58, 3.73s/it] 48%|βββββ | 744/1547 [47:03<51:16, 3.83s/it] 48%|βββββ | 745/1547 [47:06<50:00, 3.74s/it] 48%|βββββ | 746/1547 [47:10<50:49, 3.81s/it] 48%|βββββ | 747/1547 [47:14<52:01, 3.90s/it] 48%|βββββ | 748/1547 [47:18<51:00, 3.83s/it] 48%|βββββ | 749/1547 [47:22<49:32, 3.72s/it] 48%|βββββ | 750/1547 [47:25<48:43, 3.67s/it] {'loss': 0.1, 'learning_rate': 5e-05, 'epoch': 0.48}
48%|βββββ | 750/1547 [47:25<48:43, 3.67s/it] 49%|βββββ | 751/1547 [47:29<50:02, 3.77s/it] 49%|βββββ | 752/1547 [47:32<48:27, 3.66s/it] 49%|βββββ | 753/1547 [47:36<49:10, 3.72s/it] 49%|βββββ | 754/1547 [47:40<49:39, 3.76s/it] 49%|βββββ | 755/1547 [47:44<50:11, 3.80s/it] 49%|βββββ | 756/1547 [47:48<48:44, 3.70s/it] 49%|βββββ | 757/1547 [47:51<47:15, 3.59s/it] 49%|βββββ | 758/1547 [47:55<49:03, 3.73s/it] 49%|βββββ | 759/1547 [47:58<47:55, 3.65s/it] 49%|βββββ | 760/1547 [48:03<49:51, 3.80s/it] {'loss': 0.1063, 'learning_rate': 5e-05, 'epoch': 0.49}
49%|βββββ | 760/1547 [48:03<49:51, 3.80s/it] 49%|βββββ | 761/1547 [48:07<50:39, 3.87s/it] 49%|βββββ | 762/1547 [48:10<50:25, 3.85s/it] 49%|βββββ | 763/1547 [48:14<50:42, 3.88s/it] 49%|βββββ | 764/1547 [48:18<50:28, 3.87s/it] 49%|βββββ | 765/1547 [48:22<50:56, 3.91s/it] 50%|βββββ | 766/1547 [48:26<50:21, 3.87s/it] 50%|βββββ | 767/1547 [48:30<50:34, 3.89s/it] 50%|βββββ | 768/1547 [48:34<49:52, 3.84s/it] 50%|βββββ | 769/1547 [48:38<50:22, 3.88s/it] 50%|βββββ | 770/1547 [48:42<50:38, 3.91s/it] {'loss': 0.0882, 'learning_rate': 5e-05, 'epoch': 0.5}
50%|βββββ | 770/1547 [48:42<50:38, 3.91s/it] 50%|βββββ | 771/1547 [48:45<50:06, 3.87s/it] 50%|βββββ | 772/1547 [48:49<50:02, 3.87s/it] 50%|βββββ | 773/1547 [48:53<49:15, 3.82s/it] 50%|βββββ | 774/1547 [48:57<50:23, 3.91s/it] 50%|βββββ | 775/1547 [49:01<51:29, 4.00s/it] 50%|βββββ | 776/1547 [49:05<51:27, 4.00s/it] 50%|βββββ | 777/1547 [49:09<50:16, 3.92s/it] 50%|βββββ | 778/1547 [49:13<49:37, 3.87s/it] 50%|βββββ | 779/1547 [49:17<49:59, 3.91s/it] 50%|βββββ | 780/1547 [49:20<48:46, 3.82s/it] {'loss': 0.0743, 'learning_rate': 5e-05, 'epoch': 0.5}
50%|βββββ | 780/1547 [49:20<48:46, 3.82s/it] 50%|βββββ | 781/1547 [49:24<48:09, 3.77s/it] 51%|βββββ | 782/1547 [49:28<47:14, 3.70s/it] 51%|βββββ | 783/1547 [49:32<48:06, 3.78s/it] 51%|βββββ | 784/1547 [49:35<48:08, 3.79s/it] 51%|βββββ | 785/1547 [49:39<48:03, 3.78s/it] 51%|βββββ | 786/1547 [49:43<47:44, 3.76s/it] 51%|βββββ | 787/1547 [49:46<46:38, 3.68s/it] 51%|βββββ | 788/1547 [49:50<47:19, 3.74s/it] 51%|βββββ | 789/1547 [49:54<46:55, 3.71s/it] 51%|βββββ | 790/1547 [49:58<47:58, 3.80s/it] {'loss': 0.0983, 'learning_rate': 5e-05, 'epoch': 0.51}
51%|βββββ | 790/1547 [49:58<47:58, 3.80s/it] 51%|βββββ | 791/1547 [50:02<47:38, 3.78s/it] 51%|βββββ | 792/1547 [50:05<47:21, 3.76s/it] 51%|ββββββ | 793/1547 [50:09<48:50, 3.89s/it] 51%|ββββββ | 794/1547 [50:13<48:27, 3.86s/it] 51%|ββββββ | 795/1547 [50:17<48:47, 3.89s/it] 51%|ββββββ | 796/1547 [50:21<47:43, 3.81s/it] 52%|ββββββ | 797/1547 [50:25<48:02, 3.84s/it] 52%|ββββββ | 798/1547 [50:28<46:59, 3.76s/it] 52%|ββββββ | 799/1547 [50:32<46:51, 3.76s/it] 52%|ββββββ | 800/1547 [50:36<45:59, 3.69s/it] {'loss': 0.1079, 'learning_rate': 5e-05, 'epoch': 0.52}
52%|ββββββ | 800/1547 [50:36<45:59, 3.69s/it] 52%|ββββββ | 801/1547 [50:40<46:44, 3.76s/it] 52%|ββββββ | 802/1547 [50:43<46:54, 3.78s/it] 52%|ββββββ | 803/1547 [50:47<46:35, 3.76s/it] 52%|ββββββ | 804/1547 [50:51<46:21, 3.74s/it] 52%|ββββββ | 805/1547 [50:54<45:08, 3.65s/it] 52%|ββββββ | 806/1547 [50:58<45:24, 3.68s/it] 52%|ββββββ | 807/1547 [51:01<44:28, 3.61s/it] 52%|ββββββ | 808/1547 [51:05<44:44, 3.63s/it] 52%|ββββββ | 809/1547 [51:09<45:34, 3.71s/it] 52%|ββββββ | 810/1547 [51:13<45:29, 3.70s/it] {'loss': 0.0833, 'learning_rate': 5e-05, 'epoch': 0.52}
52%|ββββββ | 810/1547 [51:13<45:29, 3.70s/it] 52%|ββββββ | 811/1547 [51:16<45:12, 3.69s/it] 52%|ββββββ | 812/1547 [51:20<46:07, 3.77s/it] 53%|ββββββ | 813/1547 [51:24<46:08, 3.77s/it] 53%|ββββββ | 814/1547 [51:28<45:18, 3.71s/it] 53%|ββββββ | 815/1547 [51:32<45:58, 3.77s/it] 53%|ββββββ | 816/1547 [51:35<45:23, 3.73s/it] 53%|ββββββ | 817/1547 [51:39<45:04, 3.70s/it] 53%|ββββββ | 818/1547 [51:43<45:24, 3.74s/it] 53%|ββββββ | 819/1547 [51:46<45:47, 3.77s/it] 53%|ββββββ | 820/1547 [51:50<46:33, 3.84s/it] {'loss': 0.0942, 'learning_rate': 5e-05, 'epoch': 0.53}
53%|ββββββ | 820/1547 [51:50<46:33, 3.84s/it] 53%|ββββββ | 821/1547 [51:54<45:56, 3.80s/it] 53%|ββββββ | 822/1547 [51:58<47:32, 3.93s/it] 53%|ββββββ | 823/1547 [52:02<46:30, 3.85s/it] 53%|ββββββ | 824/1547 [52:06<46:31, 3.86s/it] 53%|ββββββ | 825/1547 [52:10<45:29, 3.78s/it] 53%|ββββββ | 826/1547 [52:13<45:48, 3.81s/it] 53%|ββββββ | 827/1547 [52:18<46:48, 3.90s/it] 54%|ββββββ | 828/1547 [52:22<46:57, 3.92s/it] 54%|ββββββ | 829/1547 [52:25<46:39, 3.90s/it] 54%|ββββββ | 830/1547 [52:30<47:29, 3.97s/it] {'loss': 0.0978, 'learning_rate': 5e-05, 'epoch': 0.54}
54%|ββββββ | 830/1547 [52:30<47:29, 3.97s/it] 54%|ββββββ | 831/1547 [52:34<47:53, 4.01s/it] 54%|ββββββ | 832/1547 [52:38<48:11, 4.04s/it] 54%|ββββββ | 833/1547 [52:41<46:52, 3.94s/it] 54%|ββββββ | 834/1547 [52:45<45:41, 3.85s/it] 54%|ββββββ | 835/1547 [52:49<45:05, 3.80s/it] 54%|ββββββ | 836/1547 [52:53<44:59, 3.80s/it] 54%|ββββββ | 837/1547 [52:56<45:16, 3.83s/it] 54%|ββββββ | 838/1547 [53:00<45:54, 3.89s/it] 54%|ββββββ | 839/1547 [53:04<45:31, 3.86s/it] 54%|ββββββ | 840/1547 [53:08<45:13, 3.84s/it] {'loss': 0.1151, 'learning_rate': 5e-05, 'epoch': 0.54}
54%|ββββββ | 840/1547 [53:08<45:13, 3.84s/it] 54%|ββββββ | 841/1547 [53:12<44:50, 3.81s/it] 54%|ββββββ | 842/1547 [53:16<45:05, 3.84s/it] 54%|ββββββ | 843/1547 [53:20<44:57, 3.83s/it] 55%|ββββββ | 844/1547 [53:23<44:52, 3.83s/it] 55%|ββββββ | 845/1547 [53:27<44:42, 3.82s/it] 55%|ββββββ | 846/1547 [53:31<44:17, 3.79s/it] 55%|ββββββ | 847/1547 [53:34<43:22, 3.72s/it] 55%|ββββββ | 848/1547 [53:38<43:10, 3.71s/it] 55%|ββββββ | 849/1547 [53:42<43:21, 3.73s/it] 55%|ββββββ | 850/1547 [53:46<43:35, 3.75s/it] {'loss': 0.095, 'learning_rate': 5e-05, 'epoch': 0.55}
55%|ββββββ | 850/1547 [53:46<43:35, 3.75s/it] 55%|ββββββ | 851/1547 [53:50<44:09, 3.81s/it] 55%|ββββββ | 852/1547 [53:53<43:43, 3.78s/it] 55%|ββββββ | 853/1547 [53:57<44:23, 3.84s/it] 55%|ββββββ | 854/1547 [54:01<44:39, 3.87s/it] 55%|ββββββ | 855/1547 [54:05<44:28, 3.86s/it] 55%|ββββββ | 856/1547 [54:09<44:10, 3.84s/it] 55%|ββββββ | 857/1547 [54:13<44:01, 3.83s/it] 55%|ββββββ | 858/1547 [54:17<44:04, 3.84s/it] 56%|ββββββ | 859/1547 [54:21<44:31, 3.88s/it] 56%|ββββββ | 860/1547 [54:24<44:08, 3.86s/it] {'loss': 0.0869, 'learning_rate': 5e-05, 'epoch': 0.56}
56%|ββββββ | 860/1547 [54:24<44:08, 3.86s/it] 56%|ββββββ | 861/1547 [54:28<44:29, 3.89s/it] 56%|ββββββ | 862/1547 [54:32<43:52, 3.84s/it] 56%|ββββββ | 863/1547 [54:36<44:08, 3.87s/it] 56%|ββββββ | 864/1547 [54:40<43:24, 3.81s/it] 56%|ββββββ | 865/1547 [54:43<43:04, 3.79s/it] 56%|ββββββ | 866/1547 [54:47<42:39, 3.76s/it] 56%|ββββββ | 867/1547 [54:51<43:39, 3.85s/it] 56%|ββββββ | 868/1547 [54:55<43:21, 3.83s/it] 56%|ββββββ | 869/1547 [54:59<44:13, 3.91s/it] 56%|ββββββ | 870/1547 [55:03<43:30, 3.86s/it] {'loss': 0.0963, 'learning_rate': 5e-05, 'epoch': 0.56}
56%|ββββββ | 870/1547 [55:03<43:30, 3.86s/it] 56%|ββββββ | 871/1547 [55:07<43:30, 3.86s/it] 56%|ββββββ | 872/1547 [55:10<43:05, 3.83s/it] 56%|ββββββ | 873/1547 [55:14<42:26, 3.78s/it] 56%|ββββββ | 874/1547 [55:18<42:33, 3.79s/it] 57%|ββββββ | 875/1547 [55:22<43:31, 3.89s/it] 57%|ββββββ | 876/1547 [55:26<42:22, 3.79s/it] 57%|ββββββ | 877/1547 [55:29<42:04, 3.77s/it] 57%|ββββββ | 878/1547 [55:33<42:38, 3.83s/it] 57%|ββββββ | 879/1547 [55:37<43:00, 3.86s/it] 57%|ββββββ | 880/1547 [55:41<42:41, 3.84s/it] {'loss': 0.0992, 'learning_rate': 5e-05, 'epoch': 0.57}
57%|ββββββ | 880/1547 [55:41<42:41, 3.84s/it] 57%|ββββββ | 881/1547 [55:45<41:54, 3.78s/it] 57%|ββββββ | 882/1547 [55:48<42:00, 3.79s/it] 57%|ββββββ | 883/1547 [55:52<42:13, 3.82s/it] 57%|ββββββ | 884/1547 [55:56<42:35, 3.85s/it] 57%|ββββββ | 885/1547 [56:00<42:03, 3.81s/it] 57%|ββββββ | 886/1547 [56:04<41:54, 3.80s/it] 57%|ββββββ | 887/1547 [56:07<41:40, 3.79s/it] 57%|ββββββ | 888/1547 [56:11<41:56, 3.82s/it] 57%|ββββββ | 889/1547 [56:15<41:09, 3.75s/it] 58%|ββββββ | 890/1547 [56:19<41:37, 3.80s/it] {'loss': 0.0947, 'learning_rate': 5e-05, 'epoch': 0.58}
58%|ββββββ | 890/1547 [56:19<41:37, 3.80s/it] 58%|ββββββ | 891/1547 [56:23<41:34, 3.80s/it] 58%|ββββββ | 892/1547 [56:26<41:04, 3.76s/it] 58%|ββββββ | 893/1547 [56:30<40:58, 3.76s/it] 58%|ββββββ | 894/1547 [56:34<40:37, 3.73s/it] 58%|ββββββ | 895/1547 [56:38<40:57, 3.77s/it] 58%|ββββββ | 896/1547 [56:41<40:02, 3.69s/it] 58%|ββββββ | 897/1547 [56:45<42:11, 3.89s/it] 58%|ββββββ | 898/1547 [56:49<41:08, 3.80s/it] 58%|ββββββ | 899/1547 [56:53<41:56, 3.88s/it] 58%|ββββββ | 900/1547 [56:57<42:05, 3.90s/it] {'loss': 0.084, 'learning_rate': 5e-05, 'epoch': 0.58}
58%|ββββββ | 900/1547 [56:57<42:05, 3.90s/it] 58%|ββββββ | 901/1547 [57:01<41:52, 3.89s/it] 58%|ββββββ | 902/1547 [57:05<42:29, 3.95s/it] 58%|ββββββ | 903/1547 [57:09<42:22, 3.95s/it] 58%|ββββββ | 904/1547 [57:13<41:45, 3.90s/it] 59%|ββββββ | 905/1547 [57:17<42:22, 3.96s/it] 59%|ββββββ | 906/1547 [57:21<43:41, 4.09s/it] 59%|ββββββ | 907/1547 [57:25<41:40, 3.91s/it] 59%|ββββββ | 908/1547 [57:28<40:34, 3.81s/it] 59%|ββββββ | 909/1547 [57:32<40:44, 3.83s/it] 59%|ββββββ | 910/1547 [57:36<40:43, 3.84s/it] {'loss': 0.0855, 'learning_rate': 5e-05, 'epoch': 0.59}
59%|ββββββ | 910/1547 [57:36<40:43, 3.84s/it] 59%|ββββββ | 911/1547 [57:40<41:39, 3.93s/it] 59%|ββββββ | 912/1547 [57:44<40:32, 3.83s/it] 59%|ββββββ | 913/1547 [57:48<40:53, 3.87s/it] 59%|ββββββ | 914/1547 [57:52<41:41, 3.95s/it] 59%|ββββββ | 915/1547 [57:56<40:49, 3.88s/it] 59%|ββββββ | 916/1547 [57:59<40:01, 3.81s/it] 59%|ββββββ | 917/1547 [58:03<39:07, 3.73s/it] 59%|ββββββ | 918/1547 [58:07<39:44, 3.79s/it] 59%|ββββββ | 919/1547 [58:11<39:35, 3.78s/it] 59%|ββββββ | 920/1547 [58:14<39:57, 3.82s/it] {'loss': 0.0817, 'learning_rate': 5e-05, 'epoch': 0.59}
59%|ββββββ | 920/1547 [58:14<39:57, 3.82s/it] 60%|ββββββ | 921/1547 [58:18<40:17, 3.86s/it] 60%|ββββββ | 922/1547 [58:22<41:02, 3.94s/it] 60%|ββββββ | 923/1547 [58:27<41:29, 3.99s/it] 60%|ββββββ | 924/1547 [58:30<40:20, 3.89s/it] 60%|ββββββ | 925/1547 [58:34<40:19, 3.89s/it] 60%|ββββββ | 926/1547 [58:38<40:12, 3.88s/it] 60%|ββββββ | 927/1547 [58:42<39:07, 3.79s/it] 60%|ββββββ | 928/1547 [58:45<38:17, 3.71s/it] 60%|ββββββ | 929/1547 [58:49<38:48, 3.77s/it] 60%|ββββββ | 930/1547 [58:53<39:39, 3.86s/it] {'loss': 0.095, 'learning_rate': 5e-05, 'epoch': 0.6}
60%|ββββββ | 930/1547 [58:53<39:39, 3.86s/it] 60%|ββββββ | 931/1547 [58:57<39:34, 3.85s/it] 60%|ββββββ | 932/1547 [59:01<39:48, 3.88s/it] 60%|ββββββ | 933/1547 [59:05<39:01, 3.81s/it] 60%|ββββββ | 934/1547 [59:08<37:38, 3.68s/it] 60%|ββββββ | 935/1547 [59:12<37:26, 3.67s/it] 61%|ββββββ | 936/1547 [59:15<38:13, 3.75s/it] 61%|ββββββ | 937/1547 [59:19<38:49, 3.82s/it] 61%|ββββββ | 938/1547 [59:23<37:51, 3.73s/it] 61%|ββββββ | 939/1547 [59:27<38:54, 3.84s/it] 61%|ββββββ | 940/1547 [59:31<39:11, 3.87s/it] {'loss': 0.1123, 'learning_rate': 5e-05, 'epoch': 0.61}
61%|ββββββ | 940/1547 [59:31<39:11, 3.87s/it] 61%|ββββββ | 941/1547 [59:35<39:26, 3.91s/it] 61%|ββββββ | 942/1547 [59:39<39:04, 3.87s/it] 61%|ββββββ | 943/1547 [59:42<38:11, 3.79s/it] 61%|ββββββ | 944/1547 [59:46<38:22, 3.82s/it] 61%|ββββββ | 945/1547 [59:50<38:37, 3.85s/it] 61%|ββββββ | 946/1547 [59:54<38:25, 3.84s/it] 61%|ββββββ | 947/1547 [59:58<38:56, 3.89s/it] 61%|βββββββ | 948/1547 [1:00:02<38:26, 3.85s/it] 61%|βββββββ | 949/1547 [1:00:06<37:56, 3.81s/it] 61%|βββββββ | 950/1547 [1:00:09<36:45, 3.70s/it] {'loss': 0.0835, 'learning_rate': 5e-05, 'epoch': 0.61}
61%|βββββββ | 950/1547 [1:00:09<36:45, 3.70s/it] 61%|βββββββ | 951/1547 [1:00:13<37:49, 3.81s/it] 62%|βββββββ | 952/1547 [1:00:17<37:59, 3.83s/it] 62%|βββββββ | 953/1547 [1:00:20<37:07, 3.75s/it] 62%|βββββββ | 954/1547 [1:00:24<37:22, 3.78s/it] 62%|βββββββ | 955/1547 [1:00:28<38:26, 3.90s/it] 62%|βββββββ | 956/1547 [1:00:32<38:06, 3.87s/it] 62%|βββββββ | 957/1547 [1:00:36<37:31, 3.82s/it] 62%|βββββββ | 958/1547 [1:00:40<37:50, 3.86s/it] 62%|βββββββ | 959/1547 [1:00:44<38:15, 3.90s/it] 62%|βββββββ | 960/1547 [1:00:48<38:15, 3.91s/it] {'loss': 0.0747, 'learning_rate': 5e-05, 'epoch': 0.62}
62%|βββββββ | 960/1547 [1:00:48<38:15, 3.91s/it] 62%|βββββββ | 961/1547 [1:00:52<37:23, 3.83s/it] 62%|βββββββ | 962/1547 [1:00:55<37:20, 3.83s/it] 62%|βββββββ | 963/1547 [1:00:59<36:58, 3.80s/it] 62%|βββββββ | 964/1547 [1:01:02<35:45, 3.68s/it] 62%|βββββββ | 965/1547 [1:01:06<35:30, 3.66s/it] 62%|βββββββ | 966/1547 [1:01:10<36:37, 3.78s/it] 63%|βββββββ | 967/1547 [1:01:14<37:00, 3.83s/it] 63%|βββββββ | 968/1547 [1:01:18<37:30, 3.89s/it] 63%|βββββββ | 969/1547 [1:01:22<36:57, 3.84s/it] 63%|βββββββ | 970/1547 [1:01:26<37:26, 3.89s/it] {'loss': 0.103, 'learning_rate': 5e-05, 'epoch': 0.63}
63%|βββββββ | 970/1547 [1:01:26<37:26, 3.89s/it] 63%|βββββββ | 971/1547 [1:01:30<37:03, 3.86s/it] 63%|βββββββ | 972/1547 [1:01:33<36:22, 3.79s/it] 63%|βββββββ | 973/1547 [1:01:37<36:27, 3.81s/it] 63%|βββββββ | 974/1547 [1:01:41<35:33, 3.72s/it] 63%|βββββββ | 975/1547 [1:01:44<35:33, 3.73s/it] 63%|βββββββ | 976/1547 [1:01:48<36:00, 3.78s/it] 63%|βββββββ | 977/1547 [1:01:52<36:03, 3.80s/it] 63%|βββββββ | 978/1547 [1:01:56<36:05, 3.81s/it] 63%|βββββββ | 979/1547 [1:02:00<35:42, 3.77s/it] 63%|βββββββ | 980/1547 [1:02:03<35:36, 3.77s/it] {'loss': 0.0995, 'learning_rate': 5e-05, 'epoch': 0.63}
63%|βββββββ | 980/1547 [1:02:03<35:36, 3.77s/it] 63%|βββββββ | 981/1547 [1:02:07<34:53, 3.70s/it] 63%|βββββββ | 982/1547 [1:02:11<34:51, 3.70s/it] 64%|βββββββ | 983/1547 [1:02:14<33:47, 3.60s/it] 64%|βββββββ | 984/1547 [1:02:18<34:16, 3.65s/it] 64%|βββββββ | 985/1547 [1:02:22<34:36, 3.69s/it] 64%|βββββββ | 986/1547 [1:02:25<34:37, 3.70s/it] 64%|βββββββ | 987/1547 [1:02:30<36:47, 3.94s/it] 64%|βββββββ | 988/1547 [1:02:33<36:01, 3.87s/it] 64%|βββββββ | 989/1547 [1:02:37<35:42, 3.84s/it] 64%|βββββββ | 990/1547 [1:02:41<35:55, 3.87s/it] {'loss': 0.0945, 'learning_rate': 5e-05, 'epoch': 0.64}
64%|βββββββ | 990/1547 [1:02:41<35:55, 3.87s/it] 64%|βββββββ | 991/1547 [1:02:45<35:07, 3.79s/it] 64%|βββββββ | 992/1547 [1:02:48<34:45, 3.76s/it] 64%|βββββββ | 993/1547 [1:02:52<34:43, 3.76s/it] 64%|βββββββ | 994/1547 [1:02:56<34:35, 3.75s/it] 64%|βββββββ | 995/1547 [1:03:00<34:04, 3.70s/it] 64%|βββββββ | 996/1547 [1:03:03<34:30, 3.76s/it] 64%|βββββββ | 997/1547 [1:03:07<34:03, 3.72s/it] 65%|βββββββ | 998/1547 [1:03:11<33:41, 3.68s/it] 65%|βββββββ | 999/1547 [1:03:15<34:05, 3.73s/it] 65%|βββββββ | 1000/1547 [1:03:18<33:35, 3.69s/it] {'loss': 0.0847, 'learning_rate': 5e-05, 'epoch': 0.65}
65%|βββββββ | 1000/1547 [1:03:18<33:35, 3.69s/it] 65%|βββββββ | 1001/1547 [1:03:24<40:33, 4.46s/it] 65%|βββββββ | 1002/1547 [1:03:28<37:59, 4.18s/it] 65%|βββββββ | 1003/1547 [1:03:32<36:21, 4.01s/it] 65%|βββββββ | 1004/1547 [1:03:35<35:42, 3.94s/it] 65%|βββββββ | 1005/1547 [1:03:39<34:50, 3.86s/it] 65%|βββββββ | 1006/1547 [1:03:43<34:28, 3.82s/it] 65%|βββββββ | 1007/1547 [1:03:47<34:47, 3.87s/it] 65%|βββββββ | 1008/1547 [1:03:50<34:20, 3.82s/it] 65%|βββββββ | 1009/1547 [1:03:54<34:13, 3.82s/it] 65%|βββββββ | 1010/1547 [1:03:58<33:24, 3.73s/it] {'loss': 0.0921, 'learning_rate': 5e-05, 'epoch': 0.65}
65%|βββββββ | 1010/1547 [1:03:58<33:24, 3.73s/it] 65%|βββββββ | 1011/1547 [1:04:02<34:17, 3.84s/it] 65%|βββββββ | 1012/1547 [1:04:06<34:30, 3.87s/it] 65%|βββββββ | 1013/1547 [1:04:10<35:38, 4.00s/it] 66%|βββββββ | 1014/1547 [1:04:14<35:33, 4.00s/it] 66%|βββββββ | 1015/1547 [1:04:18<34:51, 3.93s/it] 66%|βββββββ | 1016/1547 [1:04:22<34:22, 3.88s/it] 66%|βββββββ | 1017/1547 [1:04:25<33:36, 3.80s/it] 66%|βββββββ | 1018/1547 [1:04:30<34:55, 3.96s/it] 66%|βββββββ | 1019/1547 [1:04:33<34:18, 3.90s/it] 66%|βββββββ | 1020/1547 [1:04:37<33:18, 3.79s/it] {'loss': 0.0895, 'learning_rate': 5e-05, 'epoch': 0.66}
66%|βββββββ | 1020/1547 [1:04:37<33:18, 3.79s/it] 66%|βββββββ | 1021/1547 [1:04:41<33:55, 3.87s/it] 66%|βββββββ | 1022/1547 [1:04:45<33:21, 3.81s/it] 66%|βββββββ | 1023/1547 [1:04:48<33:10, 3.80s/it] 66%|βββββββ | 1024/1547 [1:04:52<33:40, 3.86s/it] 66%|βββββββ | 1025/1547 [1:04:56<32:48, 3.77s/it] 66%|βββββββ | 1026/1547 [1:05:00<33:59, 3.91s/it] 66%|βββββββ | 1027/1547 [1:05:04<33:15, 3.84s/it] 66%|βββββββ | 1028/1547 [1:05:07<32:37, 3.77s/it] 67%|βββββββ | 1029/1547 [1:05:11<32:59, 3.82s/it] 67%|βββββββ | 1030/1547 [1:05:15<33:11, 3.85s/it] {'loss': 0.0973, 'learning_rate': 5e-05, 'epoch': 0.67}
67%|βββββββ | 1030/1547 [1:05:15<33:11, 3.85s/it] 67%|βββββββ | 1031/1547 [1:05:19<32:28, 3.78s/it] 67%|βββββββ | 1032/1547 [1:05:23<32:53, 3.83s/it] 67%|βββββββ | 1033/1547 [1:05:27<33:39, 3.93s/it] 67%|βββββββ | 1034/1547 [1:05:31<33:57, 3.97s/it] 67%|βββββββ | 1035/1547 [1:05:35<33:34, 3.94s/it] 67%|βββββββ | 1036/1547 [1:05:39<33:51, 3.98s/it] 67%|βββββββ | 1037/1547 [1:05:43<33:12, 3.91s/it] 67%|βββββββ | 1038/1547 [1:05:46<32:38, 3.85s/it] 67%|βββββββ | 1039/1547 [1:05:51<33:13, 3.92s/it] 67%|βββββββ | 1040/1547 [1:05:54<32:00, 3.79s/it] {'loss': 0.0749, 'learning_rate': 5e-05, 'epoch': 0.67}
67%|βββββββ | 1040/1547 [1:05:54<32:00, 3.79s/it] 67%|βββββββ | 1041/1547 [1:05:58<31:41, 3.76s/it] 67%|βββββββ | 1042/1547 [1:06:01<31:21, 3.73s/it] 67%|βββββββ | 1043/1547 [1:06:06<32:18, 3.85s/it] 67%|βββββββ | 1044/1547 [1:06:09<32:03, 3.82s/it] 68%|βββββββ | 1045/1547 [1:06:13<31:42, 3.79s/it] 68%|βββββββ | 1046/1547 [1:06:17<32:16, 3.86s/it] 68%|βββββββ | 1047/1547 [1:06:21<32:21, 3.88s/it] 68%|βββββββ | 1048/1547 [1:06:25<32:22, 3.89s/it] 68%|βββββββ | 1049/1547 [1:06:29<31:39, 3.81s/it] 68%|βββββββ | 1050/1547 [1:06:32<31:57, 3.86s/it] {'loss': 0.0952, 'learning_rate': 5e-05, 'epoch': 0.68}
68%|βββββββ | 1050/1547 [1:06:32<31:57, 3.86s/it] 68%|βββββββ | 1051/1547 [1:06:36<31:44, 3.84s/it] 68%|βββββββ | 1052/1547 [1:06:40<31:43, 3.85s/it] 68%|βββββββ | 1053/1547 [1:06:44<31:38, 3.84s/it] 68%|βββββββ | 1054/1547 [1:06:48<31:00, 3.77s/it] 68%|βββββββ | 1055/1547 [1:06:52<31:33, 3.85s/it] 68%|βββββββ | 1056/1547 [1:06:56<32:23, 3.96s/it] 68%|βββββββ | 1057/1547 [1:07:00<32:03, 3.93s/it] 68%|βββββββ | 1058/1547 [1:07:03<31:37, 3.88s/it] 68%|βββββββ | 1059/1547 [1:07:07<31:18, 3.85s/it] 69%|βββββββ | 1060/1547 [1:07:11<30:59, 3.82s/it] {'loss': 0.0805, 'learning_rate': 5e-05, 'epoch': 0.69}
69%|βββββββ | 1060/1547 [1:07:11<30:59, 3.82s/it] 69%|βββββββ | 1061/1547 [1:07:15<30:39, 3.79s/it] 69%|βββββββ | 1062/1547 [1:07:18<30:14, 3.74s/it] 69%|βββββββ | 1063/1547 [1:07:22<29:54, 3.71s/it] 69%|βββββββ | 1064/1547 [1:07:26<29:59, 3.73s/it] 69%|βββββββ | 1065/1547 [1:07:30<31:12, 3.88s/it] 69%|βββββββ | 1066/1547 [1:07:34<31:29, 3.93s/it] 69%|βββββββ | 1067/1547 [1:07:38<31:08, 3.89s/it] 69%|βββββββ | 1068/1547 [1:07:42<30:49, 3.86s/it] 69%|βββββββ | 1069/1547 [1:07:46<31:35, 3.97s/it] 69%|βββββββ | 1070/1547 [1:07:49<30:35, 3.85s/it] {'loss': 0.0799, 'learning_rate': 5e-05, 'epoch': 0.69}
69%|βββββββ | 1070/1547 [1:07:49<30:35, 3.85s/it] 69%|βββββββ | 1071/1547 [1:07:53<30:23, 3.83s/it] 69%|βββββββ | 1072/1547 [1:07:57<30:48, 3.89s/it] 69%|βββββββ | 1073/1547 [1:08:01<30:08, 3.81s/it] 69%|βββββββ | 1074/1547 [1:08:04<29:38, 3.76s/it] 69%|βββββββ | 1075/1547 [1:08:08<29:56, 3.81s/it] 70%|βββββββ | 1076/1547 [1:08:12<29:11, 3.72s/it] 70%|βββββββ | 1077/1547 [1:08:16<29:38, 3.78s/it] 70%|βββββββ | 1078/1547 [1:08:20<30:14, 3.87s/it] 70%|βββββββ | 1079/1547 [1:08:24<30:00, 3.85s/it] 70%|βββββββ | 1080/1547 [1:08:27<29:47, 3.83s/it] {'loss': 0.1063, 'learning_rate': 5e-05, 'epoch': 0.7}
70%|βββββββ | 1080/1547 [1:08:27<29:47, 3.83s/it] 70%|βββββββ | 1081/1547 [1:08:31<29:28, 3.79s/it] 70%|βββββββ | 1082/1547 [1:08:35<28:44, 3.71s/it] 70%|βββββββ | 1083/1547 [1:08:39<29:16, 3.79s/it] 70%|βββββββ | 1084/1547 [1:08:43<30:51, 4.00s/it] 70%|βββββββ | 1085/1547 [1:08:47<30:13, 3.93s/it] 70%|βββββββ | 1086/1547 [1:08:51<29:32, 3.84s/it] 70%|βββββββ | 1087/1547 [1:08:55<29:45, 3.88s/it] 70%|βββββββ | 1088/1547 [1:08:59<30:17, 3.96s/it] 70%|βββββββ | 1089/1547 [1:09:03<30:21, 3.98s/it] 70%|βββββββ | 1090/1547 [1:09:06<29:10, 3.83s/it] {'loss': 0.0962, 'learning_rate': 5e-05, 'epoch': 0.7}
70%|βββββββ | 1090/1547 [1:09:06<29:10, 3.83s/it] 71%|βββββββ | 1091/1547 [1:09:10<28:47, 3.79s/it] 71%|βββββββ | 1092/1547 [1:09:14<28:24, 3.75s/it] 71%|βββββββ | 1093/1547 [1:09:17<27:58, 3.70s/it] 71%|βββββββ | 1094/1547 [1:09:21<28:10, 3.73s/it] 71%|βββββββ | 1095/1547 [1:09:24<27:28, 3.65s/it] 71%|βββββββ | 1096/1547 [1:09:28<27:32, 3.66s/it] 71%|βββββββ | 1097/1547 [1:09:32<27:43, 3.70s/it] 71%|βββββββ | 1098/1547 [1:09:36<27:46, 3.71s/it] 71%|βββββββ | 1099/1547 [1:09:40<28:11, 3.77s/it] 71%|βββββββ | 1100/1547 [1:09:43<27:29, 3.69s/it] {'loss': 0.0854, 'learning_rate': 5e-05, 'epoch': 0.71}
71%|βββββββ | 1100/1547 [1:09:43<27:29, 3.69s/it] 71%|βββββββ | 1101/1547 [1:09:47<27:49, 3.74s/it] 71%|βββββββ | 1102/1547 [1:09:50<27:29, 3.71s/it] 71%|ββββββββ | 1103/1547 [1:09:54<27:43, 3.75s/it] 71%|ββββββββ | 1104/1547 [1:09:58<27:35, 3.74s/it] 71%|ββββββββ | 1105/1547 [1:10:02<28:39, 3.89s/it] 71%|ββββββββ | 1106/1547 [1:10:06<28:14, 3.84s/it] 72%|ββββββββ | 1107/1547 [1:10:10<27:52, 3.80s/it] 72%|ββββββββ | 1108/1547 [1:10:13<27:37, 3.78s/it] 72%|ββββββββ | 1109/1547 [1:10:17<27:21, 3.75s/it] 72%|ββββββββ | 1110/1547 [1:10:21<27:05, 3.72s/it] {'loss': 0.1033, 'learning_rate': 5e-05, 'epoch': 0.72}
72%|ββββββββ | 1110/1547 [1:10:21<27:05, 3.72s/it] 72%|ββββββββ | 1111/1547 [1:10:25<27:53, 3.84s/it] 72%|ββββββββ | 1112/1547 [1:10:29<28:09, 3.88s/it] 72%|ββββββββ | 1113/1547 [1:10:33<28:37, 3.96s/it] 72%|ββββββββ | 1114/1547 [1:10:37<28:08, 3.90s/it] 72%|ββββββββ | 1115/1547 [1:10:41<28:02, 3.89s/it] 72%|ββββββββ | 1116/1547 [1:10:45<28:31, 3.97s/it] 72%|ββββββββ | 1117/1547 [1:10:48<27:39, 3.86s/it] 72%|ββββββββ | 1118/1547 [1:10:52<27:21, 3.83s/it] 72%|ββββββββ | 1119/1547 [1:10:56<27:09, 3.81s/it] 72%|ββββββββ | 1120/1547 [1:11:00<27:00, 3.79s/it] {'loss': 0.0829, 'learning_rate': 5e-05, 'epoch': 0.72}
72%|ββββββββ | 1120/1547 [1:11:00<27:00, 3.79s/it] 72%|ββββββββ | 1121/1547 [1:11:03<26:44, 3.77s/it] 73%|ββββββββ | 1122/1547 [1:11:07<26:15, 3.71s/it] 73%|ββββββββ | 1123/1547 [1:11:11<26:26, 3.74s/it] 73%|ββββββββ | 1124/1547 [1:11:15<27:03, 3.84s/it] 73%|ββββββββ | 1125/1547 [1:11:19<26:48, 3.81s/it] 73%|ββββββββ | 1126/1547 [1:11:22<26:51, 3.83s/it] 73%|ββββββββ | 1127/1547 [1:11:26<26:35, 3.80s/it] 73%|ββββββββ | 1128/1547 [1:11:30<26:32, 3.80s/it] 73%|ββββββββ | 1129/1547 [1:11:34<27:11, 3.90s/it] 73%|ββββββββ | 1130/1547 [1:11:38<26:59, 3.88s/it] {'loss': 0.0896, 'learning_rate': 5e-05, 'epoch': 0.73}
73%|ββββββββ | 1130/1547 [1:11:38<26:59, 3.88s/it] 73%|ββββββββ | 1131/1547 [1:11:42<27:46, 4.01s/it] 73%|ββββββββ | 1132/1547 [1:11:46<27:24, 3.96s/it] 73%|ββββββββ | 1133/1547 [1:11:50<26:08, 3.79s/it] 73%|ββββββββ | 1134/1547 [1:11:53<26:15, 3.81s/it] 73%|ββββββββ | 1135/1547 [1:11:57<25:36, 3.73s/it] 73%|ββββββββ | 1136/1547 [1:12:01<26:04, 3.81s/it] 73%|ββββββββ | 1137/1547 [1:12:05<25:58, 3.80s/it] 74%|ββββββββ | 1138/1547 [1:12:09<25:54, 3.80s/it] 74%|ββββββββ | 1139/1547 [1:12:12<25:20, 3.73s/it] 74%|ββββββββ | 1140/1547 [1:12:16<24:56, 3.68s/it] {'loss': 0.0928, 'learning_rate': 5e-05, 'epoch': 0.74}
74%|ββββββββ | 1140/1547 [1:12:16<24:56, 3.68s/it] 74%|ββββββββ | 1141/1547 [1:12:20<25:52, 3.82s/it] 74%|ββββββββ | 1142/1547 [1:12:24<26:07, 3.87s/it] 74%|ββββββββ | 1143/1547 [1:12:27<25:30, 3.79s/it] 74%|ββββββββ | 1144/1547 [1:12:31<25:18, 3.77s/it] 74%|ββββββββ | 1145/1547 [1:12:35<25:31, 3.81s/it] 74%|ββββββββ | 1146/1547 [1:12:39<25:41, 3.85s/it] 74%|ββββββββ | 1147/1547 [1:12:43<25:15, 3.79s/it] 74%|ββββββββ | 1148/1547 [1:12:46<25:04, 3.77s/it] 74%|ββββββββ | 1149/1547 [1:12:50<24:46, 3.74s/it] 74%|ββββββββ | 1150/1547 [1:12:54<24:38, 3.72s/it] {'loss': 0.1004, 'learning_rate': 5e-05, 'epoch': 0.74}
74%|ββββββββ | 1150/1547 [1:12:54<24:38, 3.72s/it] 74%|ββββββββ | 1151/1547 [1:12:57<24:29, 3.71s/it] 74%|ββββββββ | 1152/1547 [1:13:01<23:57, 3.64s/it] 75%|ββββββββ | 1153/1547 [1:13:05<24:53, 3.79s/it] 75%|ββββββββ | 1154/1547 [1:13:09<24:25, 3.73s/it] 75%|ββββββββ | 1155/1547 [1:13:13<25:15, 3.87s/it] 75%|ββββββββ | 1156/1547 [1:13:17<26:24, 4.05s/it] 75%|ββββββββ | 1157/1547 [1:13:21<25:48, 3.97s/it] 75%|ββββββββ | 1158/1547 [1:13:25<26:16, 4.05s/it] 75%|ββββββββ | 1159/1547 [1:13:29<25:54, 4.01s/it] 75%|ββββββββ | 1160/1547 [1:13:33<24:51, 3.85s/it] {'loss': 0.0976, 'learning_rate': 5e-05, 'epoch': 0.75}
75%|ββββββββ | 1160/1547 [1:13:33<24:51, 3.85s/it] 75%|ββββββββ | 1161/1547 [1:13:36<24:30, 3.81s/it] 75%|ββββββββ | 1162/1547 [1:13:40<24:37, 3.84s/it] 75%|ββββββββ | 1163/1547 [1:13:45<25:33, 3.99s/it] 75%|ββββββββ | 1164/1547 [1:13:48<25:05, 3.93s/it] 75%|ββββββββ | 1165/1547 [1:13:52<24:30, 3.85s/it] 75%|ββββββββ | 1166/1547 [1:13:56<24:49, 3.91s/it] 75%|ββββββββ | 1167/1547 [1:14:00<24:52, 3.93s/it] 76%|ββββββββ | 1168/1547 [1:14:04<24:22, 3.86s/it] 76%|ββββββββ | 1169/1547 [1:14:07<24:02, 3.82s/it] 76%|ββββββββ | 1170/1547 [1:14:11<23:05, 3.68s/it] {'loss': 0.0964, 'learning_rate': 5e-05, 'epoch': 0.76}
76%|ββββββββ | 1170/1547 [1:14:11<23:05, 3.68s/it] 76%|ββββββββ | 1171/1547 [1:14:14<22:37, 3.61s/it] 76%|ββββββββ | 1172/1547 [1:14:18<23:07, 3.70s/it] 76%|ββββββββ | 1173/1547 [1:14:22<23:36, 3.79s/it] 76%|ββββββββ | 1174/1547 [1:14:26<24:09, 3.89s/it] 76%|ββββββββ | 1175/1547 [1:14:30<23:42, 3.82s/it] 76%|ββββββββ | 1176/1547 [1:14:34<24:13, 3.92s/it] 76%|ββββββββ | 1177/1547 [1:14:38<24:37, 3.99s/it] 76%|ββββββββ | 1178/1547 [1:14:42<23:48, 3.87s/it] 76%|ββββββββ | 1179/1547 [1:14:46<23:59, 3.91s/it] 76%|ββββββββ | 1180/1547 [1:14:49<23:23, 3.83s/it] {'loss': 0.1032, 'learning_rate': 5e-05, 'epoch': 0.76}
76%|ββββββββ | 1180/1547 [1:14:49<23:23, 3.83s/it] 76%|ββββββββ | 1181/1547 [1:14:53<23:15, 3.81s/it] 76%|ββββββββ | 1182/1547 [1:14:57<23:04, 3.79s/it] 76%|ββββββββ | 1183/1547 [1:15:01<22:34, 3.72s/it] 77%|ββββββββ | 1184/1547 [1:15:04<22:22, 3.70s/it] 77%|ββββββββ | 1185/1547 [1:15:08<22:02, 3.65s/it] 77%|ββββββββ | 1186/1547 [1:15:12<22:18, 3.71s/it] 77%|ββββββββ | 1187/1547 [1:15:15<22:18, 3.72s/it] 77%|ββββββββ | 1188/1547 [1:15:19<22:02, 3.68s/it] 77%|ββββββββ | 1189/1547 [1:15:23<22:30, 3.77s/it] 77%|ββββββββ | 1190/1547 [1:15:27<22:09, 3.72s/it] {'loss': 0.0918, 'learning_rate': 5e-05, 'epoch': 0.77}
77%|ββββββββ | 1190/1547 [1:15:27<22:09, 3.72s/it] 77%|ββββββββ | 1191/1547 [1:15:30<22:14, 3.75s/it] 77%|ββββββββ | 1192/1547 [1:15:34<22:45, 3.85s/it] 77%|ββββββββ | 1193/1547 [1:15:38<22:20, 3.79s/it] 77%|ββββββββ | 1194/1547 [1:15:42<21:57, 3.73s/it] 77%|ββββββββ | 1195/1547 [1:15:46<22:15, 3.79s/it] 77%|ββββββββ | 1196/1547 [1:15:49<22:06, 3.78s/it] 77%|ββββββββ | 1197/1547 [1:15:53<21:36, 3.70s/it] 77%|ββββββββ | 1198/1547 [1:15:57<21:34, 3.71s/it] 78%|ββββββββ | 1199/1547 [1:16:00<21:31, 3.71s/it] 78%|ββββββββ | 1200/1547 [1:16:04<21:34, 3.73s/it] {'loss': 0.0934, 'learning_rate': 5e-05, 'epoch': 0.78}
78%|ββββββββ | 1200/1547 [1:16:04<21:34, 3.73s/it] 78%|ββββββββ | 1201/1547 [1:16:08<21:40, 3.76s/it] 78%|ββββββββ | 1202/1547 [1:16:11<21:11, 3.69s/it] 78%|ββββββββ | 1203/1547 [1:16:15<21:24, 3.73s/it] 78%|ββββββββ | 1204/1547 [1:16:19<21:13, 3.71s/it] 78%|ββββββββ | 1205/1547 [1:16:23<21:37, 3.79s/it] 78%|ββββββββ | 1206/1547 [1:16:27<21:48, 3.84s/it] 78%|ββββββββ | 1207/1547 [1:16:31<21:26, 3.78s/it] 78%|ββββββββ | 1208/1547 [1:16:34<21:09, 3.74s/it] 78%|ββββββββ | 1209/1547 [1:16:38<20:49, 3.70s/it] 78%|ββββββββ | 1210/1547 [1:16:41<20:19, 3.62s/it] {'loss': 0.0773, 'learning_rate': 5e-05, 'epoch': 0.78}
78%|ββββββββ | 1210/1547 [1:16:41<20:19, 3.62s/it] 78%|ββββββββ | 1211/1547 [1:16:45<20:12, 3.61s/it] 78%|ββββββββ | 1212/1547 [1:16:48<19:54, 3.57s/it] 78%|ββββββββ | 1213/1547 [1:16:52<19:55, 3.58s/it] 78%|ββββββββ | 1214/1547 [1:16:56<20:17, 3.66s/it] 79%|ββββββββ | 1215/1547 [1:17:00<20:45, 3.75s/it] 79%|ββββββββ | 1216/1547 [1:17:03<20:44, 3.76s/it] 79%|ββββββββ | 1217/1547 [1:17:07<20:13, 3.68s/it] 79%|ββββββββ | 1218/1547 [1:17:11<20:01, 3.65s/it] 79%|ββββββββ | 1219/1547 [1:17:14<20:06, 3.68s/it] 79%|ββββββββ | 1220/1547 [1:17:18<19:41, 3.61s/it] {'loss': 0.0984, 'learning_rate': 5e-05, 'epoch': 0.79}
79%|ββββββββ | 1220/1547 [1:17:18<19:41, 3.61s/it] 79%|ββββββββ | 1221/1547 [1:17:22<20:00, 3.68s/it] 79%|ββββββββ | 1222/1547 [1:17:25<20:18, 3.75s/it] 79%|ββββββββ | 1223/1547 [1:17:29<20:13, 3.75s/it] 79%|ββββββββ | 1224/1547 [1:17:33<19:57, 3.71s/it] 79%|ββββββββ | 1225/1547 [1:17:36<19:43, 3.68s/it] 79%|ββββββββ | 1226/1547 [1:17:40<19:43, 3.69s/it] 79%|ββββββββ | 1227/1547 [1:17:44<20:07, 3.77s/it] 79%|ββββββββ | 1228/1547 [1:17:48<20:06, 3.78s/it] 79%|ββββββββ | 1229/1547 [1:17:52<20:38, 3.89s/it] 80%|ββββββββ | 1230/1547 [1:17:56<20:09, 3.81s/it] {'loss': 0.1089, 'learning_rate': 5e-05, 'epoch': 0.8}
80%|ββββββββ | 1230/1547 [1:17:56<20:09, 3.81s/it] 80%|ββββββββ | 1231/1547 [1:17:59<19:39, 3.73s/it] 80%|ββββββββ | 1232/1547 [1:18:03<19:27, 3.71s/it] 80%|ββββββββ | 1233/1547 [1:18:07<19:27, 3.72s/it] 80%|ββββββββ | 1234/1547 [1:18:11<19:57, 3.83s/it] 80%|ββββββββ | 1235/1547 [1:18:15<19:53, 3.83s/it] 80%|ββββββββ | 1236/1547 [1:18:18<19:46, 3.81s/it] 80%|ββββββββ | 1237/1547 [1:18:22<19:45, 3.82s/it] 80%|ββββββββ | 1238/1547 [1:18:26<19:22, 3.76s/it] 80%|ββββββββ | 1239/1547 [1:18:29<19:09, 3.73s/it] 80%|ββββββββ | 1240/1547 [1:18:33<19:33, 3.82s/it] {'loss': 0.1017, 'learning_rate': 5e-05, 'epoch': 0.8}
80%|ββββββββ | 1240/1547 [1:18:33<19:33, 3.82s/it] 80%|ββββββββ | 1241/1547 [1:18:37<18:56, 3.72s/it] 80%|ββββββββ | 1242/1547 [1:18:41<19:02, 3.74s/it] 80%|ββββββββ | 1243/1547 [1:18:45<19:29, 3.85s/it] 80%|ββββββββ | 1244/1547 [1:18:48<19:04, 3.78s/it] 80%|ββββββββ | 1245/1547 [1:18:52<19:01, 3.78s/it] 81%|ββββββββ | 1246/1547 [1:18:56<18:54, 3.77s/it] 81%|ββββββββ | 1247/1547 [1:19:00<19:34, 3.91s/it] 81%|ββββββββ | 1248/1547 [1:19:04<19:19, 3.88s/it] 81%|ββββββββ | 1249/1547 [1:19:08<19:19, 3.89s/it] 81%|ββββββββ | 1250/1547 [1:19:12<19:04, 3.85s/it] {'loss': 0.0861, 'learning_rate': 5e-05, 'epoch': 0.81}
81%|ββββββββ | 1250/1547 [1:19:12<19:04, 3.85s/it] 81%|ββββββββ | 1251/1547 [1:19:16<19:08, 3.88s/it] 81%|ββββββββ | 1252/1547 [1:19:19<18:55, 3.85s/it] 81%|ββββββββ | 1253/1547 [1:19:23<18:49, 3.84s/it] 81%|ββββββββ | 1254/1547 [1:19:27<18:58, 3.88s/it] 81%|ββββββββ | 1255/1547 [1:19:31<18:37, 3.83s/it] 81%|ββββββββ | 1256/1547 [1:19:35<18:46, 3.87s/it] 81%|βββββββββ | 1257/1547 [1:19:39<18:53, 3.91s/it] 81%|βββββββββ | 1258/1547 [1:19:43<18:34, 3.86s/it] 81%|βββββββββ | 1259/1547 [1:19:46<18:15, 3.80s/it] 81%|βββββββββ | 1260/1547 [1:19:50<18:05, 3.78s/it] {'loss': 0.0769, 'learning_rate': 5e-05, 'epoch': 0.81}
81%|βββββββββ | 1260/1547 [1:19:50<18:05, 3.78s/it] 82%|βββββββββ | 1261/1547 [1:19:53<17:17, 3.63s/it] 82%|βββββββββ | 1262/1547 [1:19:57<17:24, 3.66s/it] 82%|βββββββββ | 1263/1547 [1:20:01<17:32, 3.71s/it] 82%|βββββββββ | 1264/1547 [1:20:05<17:47, 3.77s/it] 82%|βββββββββ | 1265/1547 [1:20:08<17:22, 3.70s/it] 82%|βββββββββ | 1266/1547 [1:20:12<17:13, 3.68s/it] 82%|βββββββββ | 1267/1547 [1:20:16<17:14, 3.70s/it] 82%|βββββββββ | 1268/1547 [1:20:19<17:01, 3.66s/it] 82%|βββββββββ | 1269/1547 [1:20:23<17:08, 3.70s/it] 82%|βββββββββ | 1270/1547 [1:20:27<17:20, 3.76s/it] {'loss': 0.0819, 'learning_rate': 5e-05, 'epoch': 0.82}
82%|βββββββββ | 1270/1547 [1:20:27<17:20, 3.76s/it] 82%|βββββββββ | 1271/1547 [1:20:31<17:11, 3.74s/it] 82%|βββββββββ | 1272/1547 [1:20:34<17:06, 3.73s/it] 82%|βββββββββ | 1273/1547 [1:20:38<16:33, 3.63s/it] 82%|βββββββββ | 1274/1547 [1:20:42<16:44, 3.68s/it] 82%|βββββββββ | 1275/1547 [1:20:45<16:20, 3.60s/it] 82%|βββββββββ | 1276/1547 [1:20:49<16:34, 3.67s/it] 83%|βββββββββ | 1277/1547 [1:20:53<16:58, 3.77s/it] 83%|βββββββββ | 1278/1547 [1:20:57<17:01, 3.80s/it] 83%|βββββββββ | 1279/1547 [1:21:00<16:32, 3.70s/it] 83%|βββββββββ | 1280/1547 [1:21:04<16:32, 3.72s/it] {'loss': 0.0948, 'learning_rate': 5e-05, 'epoch': 0.83}
83%|βββββββββ | 1280/1547 [1:21:04<16:32, 3.72s/it] 83%|βββββββββ | 1281/1547 [1:21:08<16:23, 3.70s/it] 83%|βββββββββ | 1282/1547 [1:21:11<16:25, 3.72s/it] 83%|βββββββββ | 1283/1547 [1:21:15<16:05, 3.66s/it] 83%|βββββββββ | 1284/1547 [1:21:18<15:47, 3.60s/it] 83%|βββββββββ | 1285/1547 [1:21:23<16:40, 3.82s/it] 83%|βββββββββ | 1286/1547 [1:21:27<17:14, 3.96s/it] 83%|βββββββββ | 1287/1547 [1:21:30<16:31, 3.81s/it] 83%|βββββββββ | 1288/1547 [1:21:34<16:39, 3.86s/it] 83%|βββββββββ | 1289/1547 [1:21:38<16:48, 3.91s/it] 83%|βββββββββ | 1290/1547 [1:21:42<16:58, 3.96s/it] {'loss': 0.0967, 'learning_rate': 5e-05, 'epoch': 0.83}
83%|βββββββββ | 1290/1547 [1:21:42<16:58, 3.96s/it] 83%|βββββββββ | 1291/1547 [1:21:46<16:34, 3.88s/it] 84%|βββββββββ | 1292/1547 [1:21:50<16:32, 3.89s/it] 84%|βββββββββ | 1293/1547 [1:21:54<15:57, 3.77s/it] 84%|βββββββββ | 1294/1547 [1:21:58<16:07, 3.82s/it] 84%|βββββββββ | 1295/1547 [1:22:02<16:23, 3.90s/it] 84%|βββββββββ | 1296/1547 [1:22:05<16:10, 3.87s/it] 84%|βββββββββ | 1297/1547 [1:22:09<16:07, 3.87s/it] 84%|βββββββββ | 1298/1547 [1:22:13<15:34, 3.75s/it] 84%|βββββββββ | 1299/1547 [1:22:16<15:18, 3.70s/it] 84%|βββββββββ | 1300/1547 [1:22:20<14:52, 3.61s/it] {'loss': 0.1048, 'learning_rate': 5e-05, 'epoch': 0.84}
84%|βββββββββ | 1300/1547 [1:22:20<14:52, 3.61s/it] 84%|βββββββββ | 1301/1547 [1:22:23<14:50, 3.62s/it] 84%|βββββββββ | 1302/1547 [1:22:27<14:46, 3.62s/it] 84%|βββββββββ | 1303/1547 [1:22:31<15:10, 3.73s/it] 84%|βββββββββ | 1304/1547 [1:22:35<14:56, 3.69s/it] 84%|βββββββββ | 1305/1547 [1:22:39<15:11, 3.77s/it] 84%|βββββββββ | 1306/1547 [1:22:42<14:55, 3.72s/it] 84%|βββββββββ | 1307/1547 [1:22:46<15:09, 3.79s/it] 85%|βββββββββ | 1308/1547 [1:22:50<15:30, 3.90s/it] 85%|βββββββββ | 1309/1547 [1:22:54<15:01, 3.79s/it] 85%|βββββββββ | 1310/1547 [1:22:58<15:04, 3.82s/it] {'loss': 0.0976, 'learning_rate': 5e-05, 'epoch': 0.85}
85%|βββββββββ | 1310/1547 [1:22:58<15:04, 3.82s/it] 85%|βββββββββ | 1311/1547 [1:23:01<15:01, 3.82s/it] 85%|βββββββββ | 1312/1547 [1:23:05<14:34, 3.72s/it] 85%|βββββββββ | 1313/1547 [1:23:09<14:33, 3.73s/it] 85%|βββββββββ | 1314/1547 [1:23:12<14:23, 3.71s/it] 85%|βββββββββ | 1315/1547 [1:23:16<14:11, 3.67s/it] 85%|βββββββββ | 1316/1547 [1:23:19<13:52, 3.60s/it] 85%|βββββββββ | 1317/1547 [1:23:23<14:22, 3.75s/it] 85%|βββββββββ | 1318/1547 [1:23:27<13:53, 3.64s/it] 85%|βββββββββ | 1319/1547 [1:23:31<14:00, 3.69s/it] 85%|βββββββββ | 1320/1547 [1:23:34<13:53, 3.67s/it] {'loss': 0.1009, 'learning_rate': 5e-05, 'epoch': 0.85}
85%|βββββββββ | 1320/1547 [1:23:34<13:53, 3.67s/it] 85%|βββββββββ | 1321/1547 [1:23:38<13:28, 3.58s/it] 85%|βββββββββ | 1322/1547 [1:23:41<13:37, 3.63s/it] 86%|βββββββββ | 1323/1547 [1:23:46<14:05, 3.78s/it] 86%|βββββββββ | 1324/1547 [1:23:49<13:54, 3.74s/it] 86%|βββββββββ | 1325/1547 [1:23:53<13:39, 3.69s/it] 86%|βββββββββ | 1326/1547 [1:23:57<13:46, 3.74s/it] 86%|βββββββββ | 1327/1547 [1:24:00<13:41, 3.73s/it] 86%|βββββββββ | 1328/1547 [1:24:05<14:31, 3.98s/it] 86%|βββββββββ | 1329/1547 [1:24:09<14:08, 3.89s/it] 86%|βββββββββ | 1330/1547 [1:24:12<13:46, 3.81s/it] {'loss': 0.0877, 'learning_rate': 5e-05, 'epoch': 0.86}
86%|βββββββββ | 1330/1547 [1:24:12<13:46, 3.81s/it] 86%|βββββββββ | 1331/1547 [1:24:16<13:33, 3.77s/it] 86%|βββββββββ | 1332/1547 [1:24:19<13:19, 3.72s/it] 86%|βββββββββ | 1333/1547 [1:24:23<13:11, 3.70s/it] 86%|βββββββββ | 1334/1547 [1:24:27<13:11, 3.72s/it] 86%|βββββββββ | 1335/1547 [1:24:31<13:01, 3.69s/it] 86%|βββββββββ | 1336/1547 [1:24:34<12:45, 3.63s/it] 86%|βββββββββ | 1337/1547 [1:24:38<13:00, 3.71s/it] 86%|βββββββββ | 1338/1547 [1:24:42<13:21, 3.84s/it] 87%|βββββββββ | 1339/1547 [1:24:46<13:01, 3.76s/it] 87%|βββββββββ | 1340/1547 [1:24:50<13:20, 3.86s/it] {'loss': 0.1017, 'learning_rate': 5e-05, 'epoch': 0.87}
87%|βββββββββ | 1340/1547 [1:24:50<13:20, 3.86s/it] 87%|βββββββββ | 1341/1547 [1:24:53<13:03, 3.80s/it] 87%|βββββββββ | 1342/1547 [1:24:57<12:59, 3.80s/it] 87%|βββββββββ | 1343/1547 [1:25:01<12:50, 3.78s/it] 87%|βββββββββ | 1344/1547 [1:25:04<12:29, 3.69s/it] 87%|βββββββββ | 1345/1547 [1:25:08<12:19, 3.66s/it] 87%|βββββββββ | 1346/1547 [1:25:11<12:05, 3.61s/it] 87%|βββββββββ | 1347/1547 [1:25:15<11:57, 3.59s/it] 87%|βββββββββ | 1348/1547 [1:25:19<11:53, 3.59s/it] 87%|βββββββββ | 1349/1547 [1:25:23<12:16, 3.72s/it] 87%|βββββββββ | 1350/1547 [1:25:27<12:27, 3.79s/it] {'loss': 0.092, 'learning_rate': 5e-05, 'epoch': 0.87}
87%|βββββββββ | 1350/1547 [1:25:27<12:27, 3.79s/it] 87%|βββββββββ | 1351/1547 [1:25:30<12:27, 3.81s/it] 87%|βββββββββ | 1352/1547 [1:25:34<12:02, 3.70s/it] 87%|βββββββββ | 1353/1547 [1:25:37<11:38, 3.60s/it] 88%|βββββββββ | 1354/1547 [1:25:41<11:41, 3.64s/it] 88%|βββββββββ | 1355/1547 [1:25:45<11:41, 3.66s/it] 88%|βββββββββ | 1356/1547 [1:25:49<11:53, 3.74s/it] 88%|βββββββββ | 1357/1547 [1:25:52<11:50, 3.74s/it] 88%|βββββββββ | 1358/1547 [1:25:56<11:56, 3.79s/it] 88%|βββββββββ | 1359/1547 [1:26:00<11:49, 3.77s/it] 88%|βββββββββ | 1360/1547 [1:26:03<11:29, 3.69s/it] {'loss': 0.0852, 'learning_rate': 5e-05, 'epoch': 0.88}
88%|βββββββββ | 1360/1547 [1:26:03<11:29, 3.69s/it] 88%|βββββββββ | 1361/1547 [1:26:07<11:05, 3.58s/it] 88%|βββββββββ | 1362/1547 [1:26:11<11:40, 3.78s/it] 88%|βββββββββ | 1363/1547 [1:26:14<11:11, 3.65s/it] 88%|βββββββββ | 1364/1547 [1:26:18<11:22, 3.73s/it] 88%|βββββββββ | 1365/1547 [1:26:22<11:24, 3.76s/it] 88%|βββββββββ | 1366/1547 [1:26:26<10:59, 3.64s/it] 88%|βββββββββ | 1367/1547 [1:26:30<11:33, 3.85s/it] 88%|βββββββββ | 1368/1547 [1:26:34<11:30, 3.86s/it] 88%|βββββββββ | 1369/1547 [1:26:38<11:23, 3.84s/it] 89%|βββββββββ | 1370/1547 [1:26:42<11:34, 3.92s/it] {'loss': 0.0669, 'learning_rate': 5e-05, 'epoch': 0.89}
89%|βββββββββ | 1370/1547 [1:26:42<11:34, 3.92s/it] 89%|βββββββββ | 1371/1547 [1:26:46<11:47, 4.02s/it] 89%|βββββββββ | 1372/1547 [1:26:49<11:19, 3.88s/it] 89%|βββββββββ | 1373/1547 [1:26:53<11:09, 3.85s/it] 89%|βββββββββ | 1374/1547 [1:26:57<10:55, 3.79s/it] 89%|βββββββββ | 1375/1547 [1:27:01<10:53, 3.80s/it] 89%|βββββββββ | 1376/1547 [1:27:04<10:48, 3.79s/it] 89%|βββββββββ | 1377/1547 [1:27:08<10:50, 3.83s/it] 89%|βββββββββ | 1378/1547 [1:27:12<10:36, 3.77s/it] 89%|βββββββββ | 1379/1547 [1:27:16<10:27, 3.73s/it] 89%|βββββββββ | 1380/1547 [1:27:19<10:23, 3.73s/it] {'loss': 0.084, 'learning_rate': 5e-05, 'epoch': 0.89}
89%|βββββββββ | 1380/1547 [1:27:19<10:23, 3.73s/it] 89%|βββββββββ | 1381/1547 [1:27:23<10:10, 3.68s/it] 89%|βββββββββ | 1382/1547 [1:27:27<10:25, 3.79s/it] 89%|βββββββββ | 1383/1547 [1:27:31<10:08, 3.71s/it] 89%|βββββββββ | 1384/1547 [1:27:34<09:58, 3.67s/it] 90%|βββββββββ | 1385/1547 [1:27:38<09:52, 3.66s/it] 90%|βββββββββ | 1386/1547 [1:27:41<09:53, 3.69s/it] 90%|βββββββββ | 1387/1547 [1:27:45<10:01, 3.76s/it] 90%|βββββββββ | 1388/1547 [1:27:49<09:59, 3.77s/it] 90%|βββββββββ | 1389/1547 [1:27:53<09:56, 3.77s/it] 90%|βββββββββ | 1390/1547 [1:27:57<09:53, 3.78s/it] {'loss': 0.0757, 'learning_rate': 5e-05, 'epoch': 0.9}
90%|βββββββββ | 1390/1547 [1:27:57<09:53, 3.78s/it] 90%|βββββββββ | 1391/1547 [1:28:01<10:03, 3.87s/it] 90%|βββββββββ | 1392/1547 [1:28:05<10:10, 3.94s/it] 90%|βββββββββ | 1393/1547 [1:28:09<09:48, 3.82s/it] 90%|βββββββββ | 1394/1547 [1:28:12<09:42, 3.81s/it] 90%|βββββββββ | 1395/1547 [1:28:16<09:30, 3.75s/it] 90%|βββββββββ | 1396/1547 [1:28:20<09:37, 3.82s/it] 90%|βββββββββ | 1397/1547 [1:28:24<09:32, 3.81s/it] 90%|βββββββββ | 1398/1547 [1:28:28<09:32, 3.84s/it] 90%|βββββββββ | 1399/1547 [1:28:31<09:19, 3.78s/it] 90%|βββββββββ | 1400/1547 [1:28:35<09:16, 3.78s/it] {'loss': 0.1045, 'learning_rate': 5e-05, 'epoch': 0.9}
90%|βββββββββ | 1400/1547 [1:28:35<09:16, 3.78s/it] 91%|βββββββββ | 1401/1547 [1:28:39<09:25, 3.87s/it] 91%|βββββββββ | 1402/1547 [1:28:43<09:21, 3.87s/it] 91%|βββββββββ | 1403/1547 [1:28:47<09:04, 3.78s/it] 91%|βββββββββ | 1404/1547 [1:28:50<09:00, 3.78s/it] 91%|βββββββββ | 1405/1547 [1:28:54<09:09, 3.87s/it] 91%|βββββββββ | 1406/1547 [1:28:58<09:01, 3.84s/it] 91%|βββββββββ | 1407/1547 [1:29:02<08:52, 3.80s/it] 91%|βββββββββ | 1408/1547 [1:29:05<08:36, 3.72s/it] 91%|βββββββββ | 1409/1547 [1:29:10<08:50, 3.85s/it] 91%|βββββββββ | 1410/1547 [1:29:13<08:35, 3.76s/it] {'loss': 0.0862, 'learning_rate': 5e-05, 'epoch': 0.91}
91%|βββββββββ | 1410/1547 [1:29:13<08:35, 3.76s/it] 91%|βββββββββ | 1411/1547 [1:29:17<08:32, 3.77s/it] 91%|ββββββββββ| 1412/1547 [1:29:21<08:35, 3.82s/it] 91%|ββββββββββ| 1413/1547 [1:29:25<08:27, 3.79s/it] 91%|ββββββββββ| 1414/1547 [1:29:28<08:17, 3.74s/it] 91%|ββββββββββ| 1415/1547 [1:29:32<08:28, 3.85s/it] 92%|ββββββββββ| 1416/1547 [1:29:36<08:17, 3.80s/it] 92%|ββββββββββ| 1417/1547 [1:29:40<08:10, 3.77s/it] 92%|ββββββββββ| 1418/1547 [1:29:43<07:57, 3.70s/it] 92%|ββββββββββ| 1419/1547 [1:29:47<08:02, 3.77s/it] 92%|ββββββββββ| 1420/1547 [1:29:51<08:08, 3.85s/it] {'loss': 0.0856, 'learning_rate': 5e-05, 'epoch': 0.92}
92%|ββββββββββ| 1420/1547 [1:29:51<08:08, 3.85s/it] 92%|ββββββββββ| 1421/1547 [1:29:55<08:01, 3.82s/it] 92%|ββββββββββ| 1422/1547 [1:29:58<07:46, 3.73s/it] 92%|ββββββββββ| 1423/1547 [1:30:02<07:51, 3.81s/it] 92%|ββββββββββ| 1424/1547 [1:30:06<07:44, 3.78s/it] 92%|ββββββββββ| 1425/1547 [1:30:10<07:51, 3.87s/it] 92%|ββββββββββ| 1426/1547 [1:30:14<07:57, 3.94s/it] 92%|ββββββββββ| 1427/1547 [1:30:18<07:33, 3.78s/it] 92%|ββββββββββ| 1428/1547 [1:30:21<07:27, 3.76s/it] 92%|ββββββββββ| 1429/1547 [1:30:25<07:20, 3.73s/it] 92%|ββββββββββ| 1430/1547 [1:30:29<07:20, 3.77s/it] {'loss': 0.0938, 'learning_rate': 5e-05, 'epoch': 0.92}
92%|ββββββββββ| 1430/1547 [1:30:29<07:20, 3.77s/it] 93%|ββββββββββ| 1431/1547 [1:30:33<07:31, 3.89s/it] 93%|ββββββββββ| 1432/1547 [1:30:37<07:18, 3.82s/it] 93%|ββββββββββ| 1433/1547 [1:30:41<07:12, 3.79s/it] 93%|ββββββββββ| 1434/1547 [1:30:44<07:01, 3.73s/it] 93%|ββββββββββ| 1435/1547 [1:30:48<06:57, 3.72s/it] 93%|ββββββββββ| 1436/1547 [1:30:52<06:51, 3.71s/it] 93%|ββββββββββ| 1437/1547 [1:30:55<06:48, 3.71s/it] 93%|ββββββββββ| 1438/1547 [1:30:59<06:46, 3.73s/it] 93%|ββββββββββ| 1439/1547 [1:31:03<06:37, 3.68s/it] 93%|ββββββββββ| 1440/1547 [1:31:06<06:37, 3.71s/it] {'loss': 0.1154, 'learning_rate': 5e-05, 'epoch': 0.93}
93%|ββββββββββ| 1440/1547 [1:31:06<06:37, 3.71s/it] 93%|ββββββββββ| 1441/1547 [1:31:10<06:45, 3.82s/it] 93%|ββββββββββ| 1442/1547 [1:31:14<06:37, 3.79s/it] 93%|ββββββββββ| 1443/1547 [1:31:18<06:45, 3.90s/it] 93%|ββββββββββ| 1444/1547 [1:31:22<06:50, 3.99s/it] 93%|ββββββββββ| 1445/1547 [1:31:26<06:29, 3.82s/it] 93%|ββββββββββ| 1446/1547 [1:31:29<06:16, 3.73s/it] 94%|ββββββββββ| 1447/1547 [1:31:34<06:29, 3.90s/it] 94%|ββββββββββ| 1448/1547 [1:31:38<06:22, 3.86s/it] 94%|ββββββββββ| 1449/1547 [1:31:41<06:16, 3.84s/it] 94%|ββββββββββ| 1450/1547 [1:31:45<06:01, 3.73s/it] {'loss': 0.0802, 'learning_rate': 5e-05, 'epoch': 0.94}
94%|ββββββββββ| 1450/1547 [1:31:45<06:01, 3.73s/it] 94%|ββββββββββ| 1451/1547 [1:31:48<05:56, 3.71s/it] 94%|ββββββββββ| 1452/1547 [1:31:52<05:49, 3.68s/it] 94%|ββββββββββ| 1453/1547 [1:31:56<05:46, 3.68s/it] 94%|ββββββββββ| 1454/1547 [1:32:00<05:45, 3.71s/it] 94%|ββββββββββ| 1455/1547 [1:32:03<05:39, 3.69s/it] 94%|ββββββββββ| 1456/1547 [1:32:07<05:34, 3.67s/it] 94%|ββββββββββ| 1457/1547 [1:32:10<05:31, 3.68s/it] 94%|ββββββββββ| 1458/1547 [1:32:14<05:22, 3.62s/it] 94%|ββββββββββ| 1459/1547 [1:32:18<05:23, 3.68s/it] 94%|ββββββββββ| 1460/1547 [1:32:21<05:15, 3.63s/it] {'loss': 0.0809, 'learning_rate': 5e-05, 'epoch': 0.94}
94%|ββββββββββ| 1460/1547 [1:32:21<05:15, 3.63s/it] 94%|ββββββββββ| 1461/1547 [1:32:25<05:18, 3.70s/it] 95%|ββββββββββ| 1462/1547 [1:32:29<05:15, 3.71s/it] 95%|ββββββββββ| 1463/1547 [1:32:32<05:04, 3.63s/it] 95%|ββββββββββ| 1464/1547 [1:32:36<05:11, 3.75s/it] 95%|ββββββββββ| 1465/1547 [1:32:40<05:11, 3.80s/it] 95%|ββββββββββ| 1466/1547 [1:32:44<05:11, 3.84s/it] 95%|ββββββββββ| 1467/1547 [1:32:48<05:16, 3.95s/it] 95%|ββββββββββ| 1468/1547 [1:32:52<05:08, 3.90s/it] 95%|ββββββββββ| 1469/1547 [1:32:56<05:03, 3.89s/it] 95%|ββββββββββ| 1470/1547 [1:33:00<04:51, 3.79s/it] {'loss': 0.085, 'learning_rate': 5e-05, 'epoch': 0.95}
95%|ββββββββββ| 1470/1547 [1:33:00<04:51, 3.79s/it] 95%|ββββββββββ| 1471/1547 [1:33:03<04:47, 3.78s/it] 95%|ββββββββββ| 1472/1547 [1:33:07<04:47, 3.83s/it] 95%|ββββββββββ| 1473/1547 [1:33:11<04:43, 3.83s/it] 95%|ββββββββββ| 1474/1547 [1:33:15<04:38, 3.82s/it] 95%|ββββββββββ| 1475/1547 [1:33:19<04:38, 3.87s/it] 95%|ββββββββββ| 1476/1547 [1:33:23<04:44, 4.00s/it] 95%|ββββββββββ| 1477/1547 [1:33:27<04:26, 3.80s/it] 96%|ββββββββββ| 1478/1547 [1:33:30<04:15, 3.70s/it] 96%|ββββββββββ| 1479/1547 [1:33:34<04:11, 3.70s/it] 96%|ββββββββββ| 1480/1547 [1:33:38<04:11, 3.75s/it] {'loss': 0.0777, 'learning_rate': 5e-05, 'epoch': 0.96}
96%|ββββββββββ| 1480/1547 [1:33:38<04:11, 3.75s/it] 96%|ββββββββββ| 1481/1547 [1:33:41<04:08, 3.76s/it] 96%|ββββββββββ| 1482/1547 [1:33:46<04:13, 3.90s/it] 96%|ββββββββββ| 1483/1547 [1:33:49<04:04, 3.82s/it] 96%|ββββββββββ| 1484/1547 [1:33:53<04:01, 3.83s/it] 96%|ββββββββββ| 1485/1547 [1:33:57<03:56, 3.81s/it] 96%|ββββββββββ| 1486/1547 [1:34:01<03:50, 3.78s/it] 96%|ββββββββββ| 1487/1547 [1:34:04<03:44, 3.75s/it] 96%|ββββββββββ| 1488/1547 [1:34:08<03:41, 3.76s/it] 96%|ββββββββββ| 1489/1547 [1:34:12<03:36, 3.74s/it] 96%|ββββββββββ| 1490/1547 [1:34:15<03:31, 3.70s/it] {'loss': 0.1182, 'learning_rate': 5e-05, 'epoch': 0.96}
96%|ββββββββββ| 1490/1547 [1:34:15<03:31, 3.70s/it] 96%|ββββββββββ| 1491/1547 [1:34:19<03:27, 3.70s/it] 96%|ββββββββββ| 1492/1547 [1:34:23<03:24, 3.72s/it] 97%|ββββββββββ| 1493/1547 [1:34:27<03:25, 3.80s/it] 97%|ββββββββββ| 1494/1547 [1:34:31<03:23, 3.84s/it] 97%|ββββββββββ| 1495/1547 [1:34:35<03:21, 3.88s/it] 97%|ββββββββββ| 1496/1547 [1:34:38<03:09, 3.72s/it] 97%|ββββββββββ| 1497/1547 [1:34:42<03:10, 3.82s/it] 97%|ββββββββββ| 1498/1547 [1:34:46<03:01, 3.70s/it] 97%|ββββββββββ| 1499/1547 [1:34:49<02:58, 3.72s/it] 97%|ββββββββββ| 1500/1547 [1:34:53<02:53, 3.68s/it] {'loss': 0.087, 'learning_rate': 5e-05, 'epoch': 0.97}
97%|ββββββββββ| 1500/1547 [1:34:53<02:53, 3.68s/it] 97%|ββββββββββ| 1501/1547 [1:34:59<03:28, 4.53s/it] 97%|ββββββββββ| 1502/1547 [1:35:03<03:12, 4.28s/it] 97%|ββββββββββ| 1503/1547 [1:35:07<03:08, 4.29s/it] 97%|ββββββββββ| 1504/1547 [1:35:11<02:56, 4.10s/it] 97%|ββββββββββ| 1505/1547 [1:35:15<02:52, 4.12s/it] 97%|ββββββββββ| 1506/1547 [1:35:19<02:42, 3.97s/it] 97%|ββββββββββ| 1507/1547 [1:35:23<02:37, 3.93s/it] 97%|ββββββββββ| 1508/1547 [1:35:27<02:36, 4.02s/it] 98%|ββββββββββ| 1509/1547 [1:35:31<02:30, 3.97s/it] 98%|ββββββββββ| 1510/1547 [1:35:35<02:26, 3.97s/it] {'loss': 0.0933, 'learning_rate': 5e-05, 'epoch': 0.98}
98%|ββββββββββ| 1510/1547 [1:35:35<02:26, 3.97s/it] 98%|ββββββββββ| 1511/1547 [1:35:39<02:26, 4.06s/it] 98%|ββββββββββ| 1512/1547 [1:35:43<02:16, 3.90s/it] 98%|ββββββββββ| 1513/1547 [1:35:46<02:10, 3.85s/it] 98%|ββββββββββ| 1514/1547 [1:35:50<02:03, 3.75s/it] 98%|ββββββββββ| 1515/1547 [1:35:53<01:59, 3.73s/it] 98%|ββββββββββ| 1516/1547 [1:35:57<01:55, 3.73s/it] 98%|ββββββββββ| 1517/1547 [1:36:01<01:53, 3.79s/it] 98%|ββββββββββ| 1518/1547 [1:36:05<01:52, 3.87s/it] 98%|ββββββββββ| 1519/1547 [1:36:09<01:49, 3.90s/it] 98%|ββββββββββ| 1520/1547 [1:36:13<01:45, 3.91s/it] {'loss': 0.1024, 'learning_rate': 5e-05, 'epoch': 0.98}
98%|ββββββββββ| 1520/1547 [1:36:13<01:45, 3.91s/it] 98%|ββββββββββ| 1521/1547 [1:36:17<01:38, 3.78s/it] 98%|ββββββββββ| 1522/1547 [1:36:20<01:34, 3.78s/it] 98%|ββββββββββ| 1523/1547 [1:36:24<01:30, 3.76s/it] 99%|ββββββββββ| 1524/1547 [1:36:28<01:25, 3.73s/it] 99%|ββββββββββ| 1525/1547 [1:36:31<01:21, 3.69s/it] 99%|ββββββββββ| 1526/1547 [1:36:35<01:18, 3.74s/it] 99%|ββββββββββ| 1527/1547 [1:36:39<01:15, 3.77s/it] 99%|ββββββββββ| 1528/1547 [1:36:43<01:11, 3.75s/it] 99%|ββββββββββ| 1529/1547 [1:36:46<01:07, 3.75s/it] 99%|ββββββββββ| 1530/1547 [1:36:50<01:04, 3.79s/it] {'loss': 0.0904, 'learning_rate': 5e-05, 'epoch': 0.99}
99%|ββββββββββ| 1530/1547 [1:36:50<01:04, 3.79s/it] 99%|ββββββββββ| 1531/1547 [1:36:55<01:02, 3.91s/it] 99%|ββββββββββ| 1532/1547 [1:36:58<00:58, 3.89s/it] 99%|ββββββββββ| 1533/1547 [1:37:02<00:53, 3.85s/it] 99%|ββββββββββ| 1534/1547 [1:37:06<00:50, 3.85s/it] 99%|ββββββββββ| 1535/1547 [1:37:10<00:46, 3.85s/it] 99%|ββββββββββ| 1536/1547 [1:37:14<00:42, 3.84s/it] 99%|ββββββββββ| 1537/1547 [1:37:17<00:38, 3.83s/it] 99%|ββββββββββ| 1538/1547 [1:37:21<00:34, 3.78s/it] 99%|ββββββββββ| 1539/1547 [1:37:25<00:30, 3.79s/it]100%|ββββββββββ| 1540/1547 [1:37:29<00:26, 3.77s/it] {'loss': 0.0866, 'learning_rate': 5e-05, 'epoch': 1.0}
100%|ββββββββββ| 1540/1547 [1:37:29<00:26, 3.77s/it]100%|ββββββββββ| 1541/1547 [1:37:32<00:22, 3.76s/it]100%|ββββββββββ| 1542/1547 [1:37:36<00:19, 3.80s/it]100%|ββββββββββ| 1543/1547 [1:37:40<00:14, 3.74s/it]100%|ββββββββββ| 1544/1547 [1:37:44<00:11, 3.78s/it]100%|ββββββββββ| 1545/1547 [1:37:48<00:07, 3.80s/it]100%|ββββββββββ| 1546/1547 [1:37:51<00:03, 3.78s/it]100%|ββββββββββ| 1547/1547 [1:37:55<00:00, 3.82s/it] {'train_runtime': 5875.8343, 'train_samples_per_second': 8.424, 'train_steps_per_second': 0.263, 'train_loss': 0.39508816813682845, 'epoch': 1.0}
100%|ββββββββββ| 1547/1547 [1:37:55<00:00, 3.82s/it]100%|ββββββββββ| 1547/1547 [1:37:55<00:00, 3.80s/it]
***** train metrics *****
epoch = 1.0
train_loss = 0.3951
train_runtime = 1:37:55.83
train_samples_per_second = 8.424
train_steps_per_second = 0.263