Skip to content

syedhassaanahmed/neo-to-cosmos

Repository files navigation

neo-to-cosmos

Build Status Docker Build Status Docker Pulls

Deploy to Azure

This app takes a Neo4j database snapshot and copies all content to an Azure Cosmos DB Graph database using the BulkExecutor library.

Disclaimer

  • The app is NOT intended to synchronize a live production database.
  • Node or Relationship property names which are system reserved in Cosmos DB will be prepended with prop_, i.e. id will become prop_id.
  • Because Cosmos DB stores vertices and edges in the same Container, Neo4j Relationship Ids will be appended with edge_ in order to avoid conflicts with Node Ids.
  • This project is NOT officially supported by Microsoft. It is an independent effort, although we really appreciate if you submit PRs to improve it.

Get Started

The first thing you'll need is a Neo4j database. Docker is the quickest way to get started. If you're looking for Neo4j docker images with pre-populated Graph datasets, we've got you covered! e.g. The following will spin up a container of Game of Thrones dataset:

docker run --name neo4j-got -p 7474:7474 -p 7687:7687 -d syedhassaanahmed/neo4j-game-of-thrones

Browse the data by pointing to http://localhost:7474. Initial Neo4j login/password will be "neo4j/neo4j".

Configuration

Before you run the app, you'll need to supply environment variables which contain settings to your Neo4j and Cosmos DB databases.

COSMOSDB_ENDPOINT=https://<COSMOSDB_ACCOUNT>.documents.azure.com:443/
COSMOSDB_AUTHKEY=<COSMOSDB_AUTHKEY>
COSMOSDB_DATABASE=graphdb
COSMOSDB_CONTAINER=graphcont
COSMOSDB_PARTITIONKEY=someProperty
COSMOSDB_OFFERTHROUGHPUT=1000 #default is 400

NEO4J_ENDPOINT=neo4j://<NEO4J_ENDPOINT>:7687
NEO4J_USERNAME=neo4j #default is 'neo4j'
NEO4J_PASSWORD=<NEO4J_PASSWORD>

CACHE_PATH=<PATH_TO_CACHE_DIRECTORY> #default is 'cache'

Run the tool

dotnet NeoToCosmos.dll and watch your data being copied. If for some reason you couldn't transfer the data completely, simply rerun the command. For fresh clean start, add -r switch.

Here is how to run the containerized version of the tool.

docker run -d -e <ENVIRONMENT_VARIABLES> syedhassaanahmed/neo-to-cosmos
  • Add --network "host" in order to access local Neo4j in dev environment.

Scale out

Copying large volume of data from Neo4j to CosmosDB using a single instance of the app may not be entirely feasible, even with maxed out RUs and a cache layer. Hence we've provided an ARM template to orchestrate deployment of Cosmos DB and N number of Azure Container Instances, each performs a portion of data migration.

In order to achieve resilience during the migration, we also persist a RocksDB cache on an emptyDir volume. An emptyDir can survive container crashes.

To deploy the template using latest Azure CLI 2.0;

az group deployment create -g <RESOURCE_GROUP> \
    --template-file azuredeploy.json \
    --parameters \
        cosmosDbPartitionKey=someProperty \
        neo4jEndpoint=neo4j://<NEO4J_ENDPOINT>:7687 \
        neo4jPassword=<NEO4J_PASSWORD>

Credits

This work builds upon the great effort Brian Sherwin has done in this repo.