A city autocomplete menu using the GeoNames datasets and the Algolia search engine
This repository contains a process to generate a city index based on the GeoNames dataset. When uploaded to the Algolia platform, this index can be used to create a city autocomplete menu. The demo of the autocomplete menu using the generated index can be found here.
The exported file ./data/export/city_index.csv
contains the following columns:
Column name | Description | GeoNames Table |
---|---|---|
name | The city name (utf8) | geoname |
alternatenames | The alternate names (comma separated) | geoname |
country | The country name | countryInfo |
postalcodes | The postalcode(s) (comma separated) | postalcodes |
population | The city population | geoname |
lat | latitude in decimal degrees (wgs84) | geoname |
lng | longitude in decimal degrees (wgs84) | geoname |
The data processing pipline is built with Docker. Specifically, it is built with Docker Compose thus allowing to define a multi-container architecture defined in a single file. To run through the city index generation process both need to be installed:
- Install Docker
- Install Docker Compose
In order to generate the city index the following steps need to be followed:
- Clone the repository and change directory to it:
git clone https://github.com/manuelroth/city-autocomplete.git
cd city-autocomplete
- Run the
download-geonames
command:
docker-compose run --rm download-geonames
This downloads the necessary GeoNames datasets, extracts and places it into the ./data/import
folder -> Source Code
- Now start up the database container (as a daemon process):
docker-compose up -d postgres
- Import the GeoNames datasets into the PostgreSQL database:
docker-compose run --rm import-geonames
This takes the datasets previously placed into the ./data/import
folder and imports them into the database -> Source Code
- Generate and export the city index:
docker-compose run --rm export-city-index
This generates the city index and saves it as CSV-file in the ./data/export
folder -> Source Code
(optional) 6. Upload the city-index to the Algolia platform:
docker-compose run --rm upload-city-index
Before uploading the city-index the environment variables APP_ID
, API_KEY
and INDEX_NAME
in the docker-compose.yml file need to be updated -> Source Code
environment:
APP_ID: <replace with your Algolia app_id>
API_KEY: <replace with your Algolia app_key>
INDEX_NAME: <replace with your Algolia index_name>
The city index contains data of different GeoNames datasets (Place data of all countries, Postalcodes of all countries and the Country Info). In order to combine the different datasets, a relation between them needs to be established. Datasets usually use id's for that purpose, but many times there are no id's which help to link the datasets. Therefore other methodes need to be applied.
For both the countryInfo and the postalcodes dataset I decided to link the dataset using one or multiple columns which have the same values. This works very well if the datasets are complete and there is enough information to be able to create a correct match between the tables.
This approach worked very well to create a relation between the geonames and the countryInfo tables. The country code column of both tables is complete. So the strings can be matched and a relation is established.
However the same approach did not work as seamless with the geonames and postalcodes tables. Both tables have columns for the city name, country code, 1. administrative division (state), 2. adm. division (county/province), 3. adm. division (community), latitude and longitude fields. As with every dataset, the geonames dataset is not perfect. There are many rows which do not have data for all the columns. Therefore it was not possible to use all the columns to create a relation between the two tables.
With manual testing I found out that in most cases it is suitable to create a relation using the country code, 1. administrative division (state) and the city name between the tables. These are the minimal required fields. Otherwise postalcodes get related to the wrong citys. The current solution is not perfect yet and should be further improved in the future.