Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
rkaravia committed Jun 16, 2020
0 parents commit 2373624
Show file tree
Hide file tree
Showing 39 changed files with 6,520 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/node_modules
120 changes: 120 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Import regions from OpenStreetMap

The total process takes about 1 hour.

## Prerequisites

### Land polygons

Download "WGS 84, Large polygons not split" from osmdata, unpack and store the shapefile in

- `00-static-data/land-polygons-complete-4326`

Direct link: https://osmdata.openstreetmap.de/download/land-polygons-complete-4326.zip

Landing page: https://osmdata.openstreetmap.de/data/land-polygons.html

You may want to open the file in mapshaper and check that the polygons do not self-intersect, because clipping with self-intersecting polygons will corrupt data. For example, clipping with these polygons removes most of Japan from the output:

<img src="screenshot_mapshaper.png" alt="Screenshot" width="640">

You can also use
[this snapshot of land polygons](https://nzz-q-assets-stage.s3.amazonaws.com/q-locator-map/land-polygons-complete-4326_2019-11-18.zip)
with correct polygons.

### Natural earth

Natural earth data (1:10m Cultural Vectors) is used for zoom levels 0 to 4 for compatibility with OpenMapTiles.

Download "countries" and "states and provinces" and unpack and store the shapefiles in

- `00-static-data/ne_10m_admin_0_countries`
- `00-static-data/ne_10m_admin_1_states_provinces`

Direct links:

- https://naciscdn.org/naturalearth/10m/cultural/ne_10m_admin_0_countries.zip
- https://naciscdn.org/naturalearth/10m/cultural/ne_10m_admin_1_states_provinces.zip

Landing page: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/

## Steps

Run this script to execute all steps listed below:

```bash
import-osm/import-osm.sh
```

#### 1. Query list of countries (Overpass)

Input: Nothing.
Output: List of countries with ISO3166-1 codes.

#### 2. Query regions by country (Overpass)

Input: List of countries with ISO3166-1 codes.
Output: For every country, one GeoJSON file with country and subdivision polygons.

Also store raw data only download if raw data is not available.

#### 3. Clip with land polygons

Input: For every country, one GeoJSON file with country and subdivision polygons.
Output: For every country, one GeoJSON file with country and subdivision polygons.

#### 4. Reduce regions (remove small disconnected parts, e.g. remove French Guiana from France)

Input: For every country, one GeoJSON file with country and subdivision polygons.
Output: For every country, one GeoJSON file with country and subdivision polygons.

#### 5. Split by region

Input: For every country, one GeoJSON file with country and subdivision polygons.
Output: For every region, one GeoJSON file.

#### 6. Simplify regions

Input: For every region, one GeoJSON file.
Output: For every region, one GeoJSON file.

#### 7. Merge regions

Input: For every country, one GeoJSON file with country and subdivision polygons.
Output: One GeoJSON file with all countries, one GeoJSON file with all subdivisions.

#### 8. Generate vector tiles

Input: One GeoJSON file with all countries, one GeoJSON file with all subdivisions.
Output: mbtiles file with 2 layers (countries, subdivisions).

#### 9. Convert natural earth data to GeoJSON

Input: Shapefiles with countries and states/provinces.
Output: GeoJSON files with countries and states/provinces.

#### 10. Generate vector tiles (natural earth)

Input: GeoJSON files with countries and states/provinces.
Output: mbtiles file with 2 layers (countries, subdivisions).

#### 11. Join tiles

Input: mbtiles files from steps 8/10.
Output: mbtiles file with 2 layers (countries, subdivisions), using natural earth data for zoom levels 0-4 and Openstreetmap data for zoom levels 5-10.

### Clean up

Run this to remove all `output` folders:

```bash
import-osm/remove-outputs.sh
```

# Preview vector tiles

Run this script to preview the vector tiles generated in step 11.

```bash
import-osm/preview-tiles.sh
```
2 changes: 2 additions & 0 deletions import-osm/00-static-data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!.gitignore
1 change: 1 addition & 0 deletions import-osm/01-list-countries/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/output
16 changes: 16 additions & 0 deletions import-osm/01-list-countries/list-countries.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash
set -o errexit
set -o nounset

step_root=$(dirname "$0")
output_dir="$step_root/output"

mkdir -p "$output_dir"

curl https://overpass-api.de/api/interpreter \
--compressed \
--data 'data=[out:csv(::"id", "ISO3166-1", wikidata, name, "name:de", "name:en")]; relation[boundary=administrative][admin_level=2]["ISO3166-1"]; out;' \
| npx tsv2json \
| npx prettier \
--parser json \
> "$output_dir/countries.json"
1 change: 1 addition & 0 deletions import-osm/02-query-regions/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/output
171 changes: 171 additions & 0 deletions import-osm/02-query-regions/query-regions-by-country.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
const fs = require("fs");
const { geoBounds } = require("d3-geo");
const queryOverpassWithCallback = require("query-overpass");
const turf = require("@turf/turf");

async function queryRegionsByCountry(countryCode, overpassResult) {
const query = `
[out:json];
(
relation
[boundary=administrative]
["ISO3166-1"="${countryCode}"];
relation
[boundary=administrative]
["ISO3166-2"~"^${countryCode}"];
);
out; >; out skel;`;
const keepTags = [
"ISO3166-1",
"ISO3166-2",
"admin_level",
"wikidata",
"name",
"name:de",
"name:en"
];

if (overpassResult) {
console.log("Reuse existing data");
} else {
overpassResult = await queryOverpass(query);
}

const geojson = await parseOverpassResult(
overpassResult,
keepTags,
countryCode
);

return {
geojson,
rawData: overpassResult
};
}

function queryOverpass(query) {
return new Promise(resolve => {
const runQuery = () => {
queryOverpassWithCallback(query, (error, data) => {
if (error) {
if (error.statusCode === 429) {
console.log("Too many requests, will retry in 30 seconds...");
sleep(30).then(runQuery);
} else if (error.statusCode === 504) {
console.log("Gateway timeout, will retry in 30 seconds...");
sleep(30).then(runQuery);
} else {
throw error;
}
} else {
resolve(data);
}
});
};
runQuery();
});
}

function sleep(seconds) {
return new Promise(resolve => setTimeout(resolve, seconds * 1000));
}

async function parseOverpassResult(overpassResult, keepTags, countryCode) {
const geojson = turf.clone(overpassResult);

// Add bounding box
geojson.bbox = getBbox(geojson);

// Keep only Polygon and MultiPolygon features
geojson.features = geojson.features.filter(feature => {
const { type } = feature.geometry;
return type === "Polygon" || type === "MultiPolygon";
});

geojson.features.forEach(feature => {
// Keep only a subset of tags
const { tags } = feature.properties;
const properties = {};
keepTags.forEach(keepTag => {
if (tags[keepTag] === undefined) {
properties[keepTag] = null;
} else {
properties[keepTag] = tags[keepTag];
}
});

// Add OSM relation id as property and remove feature id
properties.osmRelationId = parseInt(feature.id.split("/")[1]);
delete feature.id;

// Set type to country / subdivision
// ---
// Some regions (usually "dependent territories") are both countries and subdivisions and also
// have a separate ISO3166-1 country code, in addition to the ISO3166-2 subdivision code.
// For example American Samoa has these codes:
// ISO3166-1: AS, ISO3166-2: US-AS
// These regions will be labeled as "subdivision" here.
// See https://en.wikipedia.org/wiki/ISO_3166-2#Subdivisions_included_in_ISO_3166-1
if (properties["ISO3166-1"] === countryCode) {
properties.type = "country";
} else if (properties["ISO3166-2"]) {
properties.type = "subdivision";
properties["ISO3166-1"] = countryCode;
}

feature.properties = properties;
});

// Remove duplicate wikidata entries
const featuresByWikidata = {};
geojson.features.forEach(feature => {
const { wikidata } = feature.properties;
if (wikidata) {
if (!featuresByWikidata[wikidata]) {
featuresByWikidata[wikidata] = [];
}
featuresByWikidata[wikidata].push(feature);
} else {
console.warn(
"Discarded feature without wikidata tag",
JSON.stringify(feature.properties)
);
}
});
geojson.features = [];
Object.values(featuresByWikidata).forEach(features => {
features.sort(
(a, b) => a.properties.admin_level - b.properties.admin_level
);
geojson.features.push(features[0]);
if (features.length > 1) {
console.log(
`Discarded ${features.length - 1} features with duplicate wikidata tags`
);
}
});

return geojson;
}

function getBbox(geojson) {
// D3 required the opposite of the standard (RFC 7946) GeoJSON winding order:
// The exterior ring for polygons must be clockwise.
const geojsonClockwise = turf.rewind(geojson, {
reverse: true
});
// D3 instead of turf is used to get a correct bounding box for countries that
// cross the antimeridian (180° east/west), for example Russia, United States
return geoBounds(geojsonClockwise).flat();
}

if (require.main === module) {
queryRegionsByCountry("CH").then(({ geojson }) => {
fs.writeFileSync("CH-regions.json", JSON.stringify(geojson));
});
}

module.exports = queryRegionsByCountry;
37 changes: 37 additions & 0 deletions import-osm/02-query-regions/query-regions.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
const queryRegionsByCountry = require("./query-regions-by-country");
const fs = require("fs");
const path = require("path");

async function queryRegions(countriesFile, outputDir) {
const countries = JSON.parse(fs.readFileSync(countriesFile));
const allRegions = new Set();
for (const country of countries) {
const countryCode = country["ISO3166-1"];
console.log(`Querying regions for country ${countryCode}...`);

const rawDataPath = path.join(outputDir, "raw", `${countryCode}.json`);
let oldRawData;
if (fs.existsSync(rawDataPath)) {
oldRawData = JSON.parse(fs.readFileSync(rawDataPath));
}

const { geojson, rawData } = await queryRegionsByCountry(
countryCode,
oldRawData
);
if (!oldRawData) {
fs.writeFileSync(rawDataPath, JSON.stringify(rawData));
}
const outputFile = path.join(outputDir, `${countryCode}.json`);
fs.writeFileSync(outputFile, JSON.stringify(geojson));

geojson.features.forEach(({ properties: { wikidata } }) => {
allRegions.add(wikidata);
});
}
const listFile = path.join(outputDir, `list/list.json`);
fs.writeFileSync(listFile, JSON.stringify(Array.from(allRegions).sort()));
}

const [countriesFile, outputDir] = process.argv.slice(2);
queryRegions(countriesFile, outputDir);
13 changes: 13 additions & 0 deletions import-osm/02-query-regions/query-regions.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash
set -o errexit
set -o nounset

step_root=$(dirname "$0")
countries_file="$step_root/../01-list-countries/output/countries.json"
output_dir="$step_root/output"

mkdir -p "$output_dir"
mkdir -p "$output_dir/raw"
mkdir -p "$output_dir/list"

node "$step_root/query-regions.js" "$countries_file" "$output_dir"
1 change: 1 addition & 0 deletions import-osm/03-clip-regions/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/output
20 changes: 20 additions & 0 deletions import-osm/03-clip-regions/clip-regions.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
set -o errexit
set -o nounset

step_root=$(dirname "$0")
input_dir="$step_root/../02-query-regions/output"
output_dir="$step_root/output"

land_polygons="$step_root/../00-static-data/land-polygons-complete-4326/land_polygons.shp"

mkdir -p "$output_dir"

# Split by admin_level before clipping to work around a bug (?) with overlapping geometries.
npx mapshaper \
-i "$input_dir"/*.json combine-files no-topology -merge-layers \
-split admin_level \
-clip "$land_polygons" \
-merge-layers \
-split ISO3166-1 \
-o format=geojson "$output_dir"
1 change: 1 addition & 0 deletions import-osm/04-reduce-regions/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/output
Loading

0 comments on commit 2373624

Please sign in to comment.