This app takes a Neo4j database snapshot and copies all content to an Azure Cosmos DB Graph database using the BulkExecutor library.
- The app is NOT intended to synchronize a live production database.
- Node or Relationship property names which are system reserved in Cosmos DB will be prepended with
prop_
, i.e.id
will becomeprop_id
. - Because Cosmos DB stores vertices and edges in the same Container, Neo4j Relationship Ids will be appended with
edge_
in order to avoid conflicts with Node Ids. - This project is NOT officially supported by Microsoft. It is an independent effort, although we really appreciate if you submit PRs to improve it.
The first thing you'll need is a Neo4j database. Docker is the quickest way to get started. If you're looking for Neo4j docker images with pre-populated Graph datasets, we've got you covered! e.g. The following will spin up a container of Game of Thrones dataset:
docker run --name neo4j-got -p 7474:7474 -p 7687:7687 -d syedhassaanahmed/neo4j-game-of-thrones
Browse the data by pointing to http://localhost:7474. Initial Neo4j login/password will be "neo4j/neo4j".
Before you run the app, you'll need to supply environment variables which contain settings to your Neo4j and Cosmos DB databases.
COSMOSDB_ENDPOINT=https://<COSMOSDB_ACCOUNT>.documents.azure.com:443/
COSMOSDB_AUTHKEY=<COSMOSDB_AUTHKEY>
COSMOSDB_DATABASE=graphdb
COSMOSDB_CONTAINER=graphcont
COSMOSDB_PARTITIONKEY=someProperty
COSMOSDB_OFFERTHROUGHPUT=1000 #default is 400
NEO4J_ENDPOINT=neo4j://<NEO4J_ENDPOINT>:7687
NEO4J_USERNAME=neo4j #default is 'neo4j'
NEO4J_PASSWORD=<NEO4J_PASSWORD>
CACHE_PATH=<PATH_TO_CACHE_DIRECTORY> #default is 'cache'
dotnet NeoToCosmos.dll
and watch your data being copied. If for some reason you couldn't transfer the data completely, simply rerun the command. For fresh clean start, add -r
switch.
Here is how to run the containerized version of the tool.
docker run -d -e <ENVIRONMENT_VARIABLES> syedhassaanahmed/neo-to-cosmos
- Add
--network "host"
in order to access local Neo4j in dev environment.
Copying large volume of data from Neo4j to CosmosDB using a single instance of the app may not be entirely feasible, even with maxed out RUs and a cache layer. Hence we've provided an ARM template to orchestrate deployment of Cosmos DB and N number of Azure Container Instances, each performs a portion of data migration.
In order to achieve resilience during the migration, we also persist a RocksDB cache on an emptyDir volume. An emptyDir
can survive container crashes.
To deploy the template using latest Azure CLI 2.0;
az group deployment create -g <RESOURCE_GROUP> \
--template-file azuredeploy.json \
--parameters \
cosmosDbPartitionKey=someProperty \
neo4jEndpoint=neo4j://<NEO4J_ENDPOINT>:7687 \
neo4jPassword=<NEO4J_PASSWORD>
This work builds upon the great effort Brian Sherwin has done in this repo.