-
Notifications
You must be signed in to change notification settings - Fork 3
/
ValNamesGeo_ValNombresGeo.txt
231 lines (210 loc) · 8.85 KB
/
ValNamesGeo_ValNombresGeo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
--------------------------------------------------------------------------------
---Do not copy /No Copiar
--EN
SiB Colombia scripts for Biodiversity data validation and cleaning in OPEN REFINE
https://github.com/SIB-Colombia/data-quality-open-refine
Script Name: Gegraphic name's validation according to DIVIPOLA
Language: GREL 'General Refine Expression Language'
Created:2018-03-21
Last Update:2021-07-16
Contributors: Camila Plata, Ricardo Ortiz
This script:
1-Creates concatenated columns of geographic names to check geographic hierarchy
2-Match single and concatenated columns with DIVIPOLA names
3-Returns validation columns using a boolean descriptor (1,0)
Conventions boolean descriptor
0-The geographic name DOES NOT match with any name at DIVIPOLA
1-The geographic name matches with DIVIPOLA
Conditions
Dataset with DwC elements 'stateProvince'(minimum),'county','municipality'
DIVIPOLA archive uploaded in openRefine with named 'DIVIPOLA_20200311', the latest version can be found at the GitHuB repository
New data will be stored in columns at the beginning of the dataset
Check the data when the validation columns are mark with '0', this data needs to be checked and fixed
Conventions
spc = stateProvince+County
spcm = stateProvince+County+Municipality
spValidation = boolean descriptor, 1 if the stateProvince matches Divipola, 0 if not
spcValidation = boolean descriptor, 1 if the combination of stateProvince+County matches Divipola, 0 if not
spcmMValidation = boolean descriptor, 1 if the combination of stateProvince+County+Municipality matches Divipola, 0 if not
Warnings
The first time before running the script, upload to open refine the dataset and the divipola archive and restart openRefine, then run the script
--ES
Rutinas del SiB Colombia para la validación y limpieza de datos primarios
de Biodiversidad en OPEN REFINE
https://github.com/SIB-Colombia/data-quality-open-refine
Nombre rutina: Validación de entidades geográficas de acuerdo a la División político administrativa oficial de Colombia (DIVIPOLA)
Languaje: GREL 'General Refine Expression Language'
Creado:2018-03-21
Última Actualización:2021-07-16
Autores:Camila Plata, Ricardo Ortiz
Esta rutina:
1-Crea columnas con las entidades geográficas concatenadas para validar la jerarquía geográfica
2-Compara las columnas creadas con los nombres en DIVIPOLA
3-Genera columnas de validación con descriptores booleanos (1,0)
Convenciones descriptores booleanos
0-El nombre geográfico NO coincide con ningún nombre en DIVIPOLA
1-El nombre geográfico coincide con DIVIPOLA
Requerimientos:
Conjunto de datos con los elementos DwC 'stateProvince'(mínimo),'county','municipality'
Archivo DIVIPOLA cargado en openRefine con nombre 'DIVIPOLA_20200311', la última versión esta disponible en el repositorio de GitHuB
Los nuevos datos seran guardados en columnas el inicio del conjunto de datos
Revise los datos cuando las columnas de validación este marcadas con '0', estos datos necesitan ser revisados y ajustados
Convenciones
spc = stateProvince+County
spcm = stateProvince+County+Municipality
spValidation = descriptor booleano , 1 si stateProvince (Departamento) coincide con Divipola, 0 si no coincide
spcValidation = descriptor booleano , 1 si la combinación stateProvince+County (departamento+municipio) coincide con Divipola, 0 si no coincide
spcmMValidation = descriptor booleano , 1 si la combinación stateProvince+County+Municipality (departamento+municipio+centroPoblado) coincide con Divipola, 0 si no coincide
Advertencias
Si está usando el script por primera vez se recomienda cargar el conjunto de datos y el archivo de divipola, y reiniciar openRefine antes de correr el script
---Do not copy /No Copiar
--------------------------------------------------------------------------------
{
"op": "core/column-addition",
"description": "Create column spcm at index 1 based on column stateProvince using expression grel:(cells['stateProvince'].value+' '+cells['county'].value+' '+cells['municipality'].value).trim().replace(/\\\\s+/,' ')",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spcm",
"columnInsertIndex": 1,
"baseColumnName": "stateProvince",
"expression": "grel:(cells['stateProvince'].value+' '+cells['county'].value+' '+cells['municipality'].value).trim().replace(/\\\\s+/,' ')",
"onError": "set-to-blank"
},
{
"op": "core/column-addition",
"description": "Create column spcmMatch at index 1 based on column spcm using expression grel:cell.cross('DIVIPOLA_20210416','SPCM')[0].cells['SPCM'].value",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spcmSuggested",
"columnInsertIndex": 1,
"baseColumnName": "spcm",
"expression": "grel:cell.cross('DIVIPOLA_20210416','SPCM')[0].cells['SPCM'].value",
"onError": "set-to-blank"
},
{
"op": "core/column-addition",
"description": "Create column spcmValidation at index 1 based on column kingdom using expression grel:if(value==cells[\"spcmSuggested\"].value,1,0)\"",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spcmValidation",
"columnInsertIndex": 1,
"baseColumnName": "spcm",
"expression": "grel:if(value==cells[\"spcmSuggested\"].value,1,0).toString()\"",
"onError": "set-to-blank"
},
{
"op": "core/column-move",
"description": "Move column genus to position 1",
"columnName": "spcm",
"index": 1
},
{
"op": "core/column-addition",
"description": "Create column spc at index 1 based on column stateProvince using expression grel:(cells['stateProvince'].value+' '+cells['county'].value).trim().replace(/\\\\s+/,' ')",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spc",
"columnInsertIndex": 1,
"baseColumnName": "stateProvince",
"expression": "grel:(cells['stateProvince'].value+' '+cells['county'].value).trim().replace(/\\\\s+/,' ')",
"onError": "set-to-blank"
},
{
"op": "core/column-addition",
"description": "Create column spcSuggested at index 1 based on column spc using expression grel:cell.cross('DIVIPOLA_20210416','SPC')[0].cells['SPC'].value",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spcSuggested",
"columnInsertIndex": 1,
"baseColumnName": "spc",
"expression": "grel:cell.cross('DIVIPOLA_20210416','SPC')[0].cells['SPC'].value",
"onError": "set-to-blank"
},
{
"op": "core/column-addition",
"description": "Create column spcValidation at index 1 based on column kingdom using expression grel:if(value==cells[\"spcSuggested\"].value,1,0)\"",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spcValidation",
"columnInsertIndex": 1,
"baseColumnName": "spc",
"expression": "grel:if(value==cells[\"spcSuggested\"].value,1,0).toString()\"",
"onError": "set-to-blank"
},
{
"op": "core/column-move",
"description": "Move column to position 1",
"columnName": "spc",
"index": 1
},
{
"op": "core/column-addition",
"description": "Create column spMatch at index 1 based on column stateProvince using expression grel:cell.cross('DIVIPOLA_20210416','stateProvince')[0].cells['stateProvince'].value",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spSuggested",
"columnInsertIndex": 1,
"baseColumnName": "stateProvince",
"expression": "grel:cell.cross('DIVIPOLA_20210416','stateProvince')[0].cells['stateProvince'].value",
"onError": "set-to-blank"
},
{
"op": "core/column-addition",
"description": "Create column spValidation at index 1 based on column kingdom using expression grel:if(value==cells[\"spSuggested\"].value,1,0)\"",
"engineConfig": {
"mode": "row-based",
"facets": []
},
"newColumnName": "spValidation",
"columnInsertIndex": 1,
"baseColumnName": "stateProvince",
"expression": "grel:if(value==cells[\"spSuggested\"].value,1,0).toString()\"",
"onError": "set-to-blank"
},
{
"op": "core/column-move",
"description": "Move column municipality to position 1",
"columnName": "municipality",
"index": 1
},
{
"op": "core/column-move",
"description": "Move column county to position 1",
"columnName": "county",
"index": 1
},
{
"op": "core/column-move",
"description": "Move column stateProvince to position 1",
"columnName": "stateProvince",
"index": 1
},
{
"op": "core/column-removal",
"description": "Remove column spSuggested",
"columnName": "spSuggested"
},
{
"op": "core/column-removal",
"description": "Remove column spcSuggested",
"columnName": "spcSuggested"
},
{
"op": "core/column-removal",
"description": "Remove column spmSuggested",
"columnName": "spcmSuggested"
}