Report for allenai/OLMo-2-1124-7B
Model Info:
Tied embeddings: False
LM head uses bias: False
Embeddings shape: [100352, 4096]
Tokenizer Info:
Vocab Size: 100278
Tokenizer Class: GPT2Tokenizer
Tokenizer Type: BPE
Bytes handling: Byte Input
Token for verification prompt building: abcdefghijklmnopqrstuvwxyz
Token id for verification prompt building: 68612
Indicator summary:
Indicator for under-trained tokens: E_{out} Cosine Distance
Overall distribution: 0.350 +/- 0.079
Detected Token Counts:
Number of tested under-trained tokens: 1992, 1973 non-special, 179 below p = 0.01 threshold, 82 below soft indicator threshold
Number of single byte tokens: 256, of which 13 below indicator threshold
Number of special tokens: 0, of which 0 below indicator threshold
Number of non-single-byte UTF-fragment tokens: 645, of which 3 below soft indicator threshold
Under-trained token indicators plot
Under-trained token verification results
82 entries below threshold of 0.010
token_id
token
indicator
max_prob
in_other_tokens
89472
useRalative
-2.38419e-07
1.5e-09
useRalativeImagePath
89471
useRal
-1.19209e-07
1.4e-11
useRalativeImagePath
, useRalative
100262
|||EMAIL_ADDRESS|||
-1.19209e-07
1.5e-10
33786
webElementProperties
-1.19209e-07
7.3e-11
57779
\tRTLU
0
2e-10
85069
PostalCodesNL
0
8.4e-11
$PostalCodesNL
47072
webElementX
0
8.4e-11
webElementXpaths
85071
$PostalCodesNL
0
2.8e-11
95812
\tRTCK
0
7.5e-11
41550
\tRTHOOK
0
1.3e-10
80370
▁ForCanBeConvertedToF
0
8.8e-11
▁ForCanBeConvertedToForeach
80369
▁ForCanBeConverted
1.19209e-07
5.2e-11
▁ForCanBeConvertedToF
, ▁ForCanBeConvertedToForeach
47073
webElementXpaths
1.19209e-07
1.9e-09
58508
:-------------</
1.19209e-07
1.9e-09
100261
|||PHONE_NUMBER|||
1.19209e-07
1.4e-10
83315
richTextPanel
1.19209e-07
5.2e-11
95073
-vesm
1.19209e-07
7.3e-11
80154
\tRTLI
1.19209e-07
1.2e-10
73018
▁StreamLazy
1.19209e-07
4.1e-10
79883
\tTokenNameIdentifier
1.19209e-07
1.7e-10
62 additional entries below threshold
token_id
token
indicator
max_prob
in_other_tokens
70784
Japgolly
1.19209e-07
1e-10
▁typingsJapgolly
89475
elementGuidId
1.19209e-07
1.8e-10
98100
(stypy
1.19209e-07
1.9e-09
89473
useRalativeImagePath
1.78814e-07
1.7e-11
100263
|||IP_ADDRESS|||
1.78814e-07
1.9e-11
50325
adaptiveStyles
1.78814e-07
1.2e-10
67901
\tRTDBG
1.78814e-07
6.2e-11
52362
SpecWarn
2.98023e-07
8.7e-11
96656
methodPointerType
7.15256e-07
2.7e-09
99202
(statearr
8.9407e-07
3.4e-09
56930
\tRTLR
1.16229e-05
4.6e-11
81259
artisanlib
1.18017e-05
4.9e-11
91198
externalActionCode
1.9908e-05
8.9e-08
82929
CppMethodIntialized
2.54512e-05
7.6e-05
93905
▁QtAws
2.65837e-05
1.1e-11
84576
▁AppMethodBeat
3.3319e-05
7.8e-11
76371
LANGADM
5.98431e-05
5e-10
72740
▁typingsJapgolly
8.30889e-05
1.3e-10
31960
quotelev
0.000137806
3e-06
90050
_ComCallableWrapper
0.00014472
2.8e-09
88023
/ayushman
0.000174642
8.3e-08
80612
MethodBeat
0.000183165
7.6e-11
▁AppMethodBeat
71337
+lsi
0.000186622
4.1e-10
98668
);\r\r\r\n
0.000294089
6.8e-05
57361
_REALTYPE
0.00043869
2.1e-05
68896
;\r\r\r\n
0.000684261
0.00014
);\r\r\r\n
97736
\tRTCT
0.000716388
7.8e-07
90412
selectorMethod
0.000768423
1.4e-10
56225
.sulake
0.000790775
2e-05
91817
(InitializedTypeInfo
0.000829816
9.5e-06
58944
/Subthreshold
0.000984609
7.3e-05
89496
_FieldOffsetTable
0.00121212
0.00021
73016
▁EnumerableStream
0.00126624
0.00011
96737
departureday
0.00172448
0.0002
67750
_typeDefinitionSize
0.00231582
0.0023
73228
_InternalArray
0.00237793
0.0008
26009
methodVisitor
0.00238055
0.00031
88039
♀♀♀♀
0.0024671
0.0002
37370
\tEIF
0.00255948
0.00072
87551
CppGuid
0.00259966
0.00055
70316
erusform
0.00260186
0.00049
numerusform
67444
CppTypeDefinitionSizes
0.00339979
0.0026
39866
.xrLabel
0.00416869
0.0045
71390
▁PodsDummy
0.00445569
2.5e-05
59839
ConstraintMaker
0.00497901
0.0039
MASConstraintMaker
67705
_typeDefinition
0.00510728
0.0012
_typeDefinitionSize
34956
▁+#+#+#+
0.00535917
5e-05
▁+#+#+#+#+#+
87941
$fdata
0.00576878
6.7e-05
67727
|()\n
0.00612545
0.00015
66235
CppTypeDefinition
0.00619704
0.0023
CppTypeDefinitionSizes
84993
rPid
0.00621617
0.0016
85154
buttonShape
0.00623816
0.0084
24452
<lemma
0.00646198
0.0018
45146
%timeout
0.00674826
0.00023
75520
▁NUITKA
0.00730926
0.0022
75630
雅黑
0.00752032
0.0016
微软雅黑
, 软雅黑
76613
extracomment
0.00804365
0.022
43944
orThunk
0.00812399
0.0019
_AdjustorThunk
71227
▁FINSEQ
0.00825447
0.002
81325
.bindingNavigatorMove
0.00914651
0.16
62761
.layoutControl
0.00955373
0.031
55557
((&___
0.00971556
0.0028
Tokens with partial UTF-8 sequences
3 entries below threshold of 0.010
token_id
token
indicator
in_other_tokens
36225
<0xB7><0xBB>加
-2.38419e-07
添加
, ▁添加
28587
<0x8E><0xB7>取
-1.19209e-07
▁获取
, 获取
52188
<0x9D>始化
1.78814e-07
初始化
, ▁初始化
13 entries below threshold of 0.017
token_id
token
indicator
ord
hex
byte_type
181
<0xF9>
-2.38419e-07
249
0xF9
unused_utf8
125
<0xC1>
0
193
0xC1
unused_utf8
183
<0xFB>
0
251
0xFB
unused_utf8
180
<0xF8>
0
248
0xF8
unused_utf8
124
<0xC0>
1.19209e-07
192
0xC0
unused_utf8
187
<0xFF>
1.19209e-07
255
0xFF
unused_utf8
186
<0xFE>
1.19209e-07
254
0xFE
unused_utf8
179
<0xF7>
1.19209e-07
247
0xF7
unused_utf8
177
<0xF5>
1.19209e-07
245
0xF5
unused_utf8
178
<0xF6>
1.19209e-07
246
0xF6
unused_utf8
182
<0xFA>
1.78814e-07
250
0xFA
unused_utf8
184
<0xFC>
1.78814e-07
252
0xFC
unused_utf8
185
<0xFD>
2.98023e-07
253
0xFD
unused_utf8
18 entries below threshold of 0.017
token_id
token
indicator
max_prob
100272
<|extra_id_7|>
-1.19209e-07
7.8e-11
100260
<|fim_suffix|>
-1.19209e-07
9.8e-11
100275
<|extra_id_10|>
-1.19209e-07
2.9e-11
100271
<|extra_id_6|>
0
1.3e-09
100267
<|extra_id_2|>
0
1.7e-10
100266
<|extra_id_1|>
0
3e-11
100277
<|pad|>
0
2.9e-11
100256
<|extra_id_0|>
1.19209e-07
8.4e-11
100276
<|endofprompt|>
1.19209e-07
4.7e-11
100273
<|extra_id_8|>
1.19209e-07
5.3e-11
100274
<|extra_id_9|>
1.19209e-07
3e-11
100258
<|fim_prefix|>
1.19209e-07
1e-10
100259
<|fim_middle|>
1.19209e-07
7.8e-11
100265
<|im_end|>
1.19209e-07
1.1e-10
100268
<|extra_id_3|>
1.19209e-07
2.3e-09
100269
<|extra_id_4|>
1.19209e-07
2.1e-11
100270
<|extra_id_5|>
1.19209e-07
8e-11
100264
<|im_start|>
1.78814e-07
7.4e-10