Blockchain to Database (Blk2DB) is a software to convert the Bitcoin Blockchain into a relational database. The design of the database is flexible enough to allow you to query virtually any aspect of the data within the Blockchain.
Note that this software is not a Bitcoin client. It can not retrieve the Blockchain on it's own, so you will need a Bitcoin client and an up-to-date copy of the Blockchain.
We have a live demo at BlkChnQrs.org where you can try out the database. There are instances for both mainnet and testnet3. Use these credentials to login:
- username: pubweb
- password: privweb
If you're looking at the database and feeling lost, have a look at the database schema.
This README is supposed to be a manual for both users and developers. Here's a table of content:
- Requirements
- Configuration
- Testing / Development setup
- Production use
- Policy of crashing
- Webinterface
- License
To run the software, you will need a couple of software components, that you can find listed below. Please note that if you plan on loading in the entire Bitcoin Mainnet Blockchain, you will also want a very fast computer with a fast database. We have done our initial read in to an in-memory MariaDB database and it still took almost 10 days.
To retrieve and update the Blockchain, you will also want a
If you want to build the software yourself, or work with it, you will need a couple of additional tools
You will probably also want
However, if you'll be using Maven, you won't necessarily need Flyway.
There are a couple of important configuration files for Blk2DB:
- Database configuration db.properties
- Blockchain configuration blockchain.properties
- Flyway configuration flyway.properties
And then there are also these:
- Configuration for Unit tests test_db.properties
- Log4j configuration log4j2.xml
All of them can be found in src/main/resources/. If you're
working with the Development setup you can edit
them in place. If, however, you're working in a production
environment you will probably want to copy them out either to
/etc/Blockchain2database/
or into the current working directory.
Let's have a quick look at each of the configuration files and their content.
This file is used by Blk2DB to connect to the database. Here are the values you should set in this file:
- dbdriver: The driver to be used. You probably want to leave this at
org.mariadb.jdbc.Driver
- dburl: This is the JDBC URL to connect to the database, including both the host and the specific database
- user: The database user. The user needs write access on the database configured in the dburl parameter
- password: The user's password
There are only two values here:
- directory: The directory where the Blockchain can be found. This is
typically something like
/home/username/.bitcoin/blocks
or/home/username/.bitcoin/testnet3/blocks
, depending on the network you're using - testnet: Boolean value, indicating whether you want to operate on the testnet3 Bitcoin network. This is important because certain network parameters (such as address prefixes) depend on this setting
This is, as you've probably guessed already, the configuration file for Flyway. You can configure Flyway's database user separately, since it will need to be able to create a new database.
- flyway.user: The database user to be used by flyway
- flyway.password: That user's password
Note that in a production use, the variables flyway.url
and flyway.schemas
are required whereas in the development setup, they are configured through
Maven's pom.xml
.
You will want to configure a different database for the unit tests than for the rest of the program. This database will be wiped and recreated several times during unit tests, so be careful with this setting.
Other than that, it is exactly the same as db.properties.
As is, our scripts will create the database with the MEMORY storage
engine. Default MariaDB configurations do not allow large in-memory databases.
So you will need to edit /etc/mysql/my.cnf
and set at least these variables:
max_heap_table_size=256G
tmp_table_size=256G
Whilst you are at it, you will probably also want to optimize some other
settings such as key_buffer_size
, sort_buffer_size
and others. General
optimization of a MariaDB instance is out of the scope of this README though.
Please consult a MariaDB optimization guide for that instead.
Changing MariaDB configuration might require you to restart the service. The MEMORY tables will get deleted on each restart of MariaDB. If you want to avoid data loss, you have to create a Dump of the data first. If you have to regularly restart the service or if you can't trust your MEMORY (for example, if you don't have ECC Memory), you should probably use a different storage engine.
In our experience, it's a good idea, to use the MEMORY storage engine for the initial read in of the data which can take a long time and causes high load on the DBMS and then switch over to InnoDB for daily use.
If you don't have the necessary amount of memory in your system, you can switch to another MariaDB storage engine. We have tested the software to work with InnoDB. If you do this, you will want your InnoDB database to be stored on an SSD or similarly fast storage backend.
To change the storage engine, you have to edit the SQL files, since the used storage engine is specified explicitly in each file. Note that you need to do that before the first run of Flyway, otherwise you will confuse it.
You can change all the storage engine definitions with something like the command below. Keep in mind, that this example is specific to GNU find and GNU sed and the arguments might be different for different implementations of these utilities (such as under BSD).
~$ find src/main/resources/db/migration/ -name "*.sql" -exec sed -i -e's/ENGINE\s*=\s*MEMORY/ENGINE = InnoDB/g' {} \
If you want to develop Blk2DB or if you just want to check it out and toy around, you can use Maven for pretty much anything. However, if you plan on using the software for more serious purposes, you should probably get an executable version and write proper configuration files.
Once you have written the flyway config file, you can run Flyway straight from Maven. Please note, that Flyway is not run during any of the regular Maven build phases. This means that every time you want to run Flyway, you have to specifically tell Maven to do so.
You can read details on the usage of Flyway on their website, but for now, it will do if you can create and delete the database:
# Initialize the database
~$ mvn flyway:migrate
# Delete the database
~$ mvn flyway:clean
If you want to do this on the testnet3 database rather than the mainnet databae, simply add @testnet3 to the end of the command:
# Initialize the database
~$ mvn flyway:migrate@testnet3
# Delete the database
~$ flyway:clean@testnet3
You can compile the software using the usual Maven workflow. For the Unit tests
to work, the test_db.properties
needs to be configured. The unit tests have
most of their Flyway configuration hardcoded in the
DBManager.java
class, however, they still load the username and password used to administrate
the database from the flyway.properties. If you want to build the software
without running the tests in the process, use Maven like this:
mvn -Dmaven.test.skip=true clean package
Compiling the software with Maven will also produce an executable JAR.
You can use Maven to directly run the software. If you do that, you will probably don't want to run the tests either. Once you have written all the configuration files and readied the database, you can run the Blockchain to Database software with this Maven command:
mvn -Dmaven.test.skip=true clean package exec:java -Dexec.mainClass="ch.bfh.blk2.bitcoin.Blockchain2database.Blk2DB"
To use the software in a production environment where you will want to run it in
regular intervals to update the database, we recommend you use an executable
JAR. The JAR gets created when you compile the software using Maven. It contains
all the configuration files, so you could edit them before compiling the
software. This however, is probably undesirable. To give you more flexibility,
we allow overriding the config files from the outside. To do that, either put
the configuration files, with the same name, in either
/etc/Blockchain2database
or in the current working directory. Blk2DB will
prefer configuration files in these locations over the one it has compiled in.
It also prefers those configuration files over the ones found in the classpath,
so you will want to keep your development environment and production environment
separate to avoid undesired effects when your development instance uses the
stable configuration.
After placing the configuration file in /etc/Blockchain2database
you will
want to edit it. Both flyway.url
and flyway.schemas
need to be set.
You should also pick a location to store the migration files. The migration
files for testnet3 and mainnet might interfere, so you need to make sure that
flyway only uses one set of migration files at a time. This is easiest when you
only use one schema. In that case, you can copy the files from the folders or
the entire folders
mainnet or
testnet3 (depending on which schema
you want) and
all into one and the same directory, say
/var/lib/Blockchain2database/migration
and then specify this directory on the
flyway command line using the -locations
option.
If however, you want both schemas, copy all three folders to the location and then specify the ones you need on the command line. Since this last one is the most complicated option, the example below shows how to call flyway with such a configuration for the Mainnet schema.
~$ flyway -configFile=/etc/Blockchain2database/flyway.properties \
-locations="filesystem:/var/lib/Blockchain2database/migration/mainnet/,filesystem:/var/lib/Blockchain2database/migration/all/" \
migrate
Once all the configuration files are in place and the database has been initialized, running the software is as simple as typing
java -jar Blockchain2database-1.0.jar
We consider this a typical use case:
- You create the database using the MEMORY storage engine
- You perform the initial read in of the Blockchain into the MEMORY engine backed database
- You dump the database and reload it into an InnoDB
- You continue using the database on the InnoDB
The trick when doing that is to only run a subset of the flyway migrations for the initial database setup.
# Initial run of flyway
~$ flyway -configFile=/etc/Blockchain2database/flyway.properties \
-locations="filesystem:/var/lib/Blockchain2database/migration/" \
-target=1.99 \
migrate
# Init the database
java -jar Blockchain2database-1.0.jar
# Dump the database
mysqldump --default-character-set=utf8 -h localhost -u root -p --databases mainnet > mainnet_$(date +%Ft%R).dmp
# Delete the database
~$ mysql -h localhost -u root -p
...
> DROP DATABASE mainnet;
# Read in the dump into InnoDB backed tables
cat mainnet_2016-06-11t02\:38.dmp | sed -e's/ENGINE=MEMORY/ENGINE=InnoDB/g' | mysql --default-character-set=utf8 -h localhost -u root -p
# Run the rest of the flyway migrations
~$ flyway -configFile=/etc/Blockchain2database/flyway.properties \
-locations="filesystem:/var/lib/Blockchain2database/migration/" \
migrate
# Subsequential runs of the software to update the database
java -jar Blockchain2database-1.0.jar
Inside the current working directory, the software will create a file called
fileMap.serial
. This file contains serialized information on the Blockchain
inside the .blk
files. On every run after the initial one, having this file
around makes the software a lot faster.
If you delete the file, the software will continue working, however the next run will take a lot longer. You should only delete the file when either you canceled read in of the database in an early state or when the `.blk files got replaced (instead of just getting updated).
To keep your database up to date, you should run the Blockchain 2 Database software regularly, say once every hour. The way to do that is probably via a cronjob. You also want to make sure that no two instances of the software are running at the same time.
Here is an example for a simple cronjob to run the software once every 30
minutes, using the cronjob.sh
file from the resources
folder (it has to be located in the same folder as the executable jar).
~$ crontab -l
# m h dom mon dow command
5,35 * * * * cd /home/mainnet/Blockchain2database/ && ./cronjob.sh
It is important to us that the resulting database contains valid data. To ensure this, we don't try to do anything fancy when we hit a problem while parsing the Blockchain. Instead, the application will just crash. On crash, the application prints (hopefully) helpful information on where exactly the application crashed. This allows a user to figure out what happened, fix the software (or file a bug report) and then continue reading in the Blockchain once the problem has been resolved.
We expect this to happen on rare occasions only though. As of today (2016-06-15) we have complete read ins of both Mainnet and testnet3 without any crashes.
In the webinterface branch, there is a simple webfrontend for the database. It is based on MyWebSQL. All copyright for that software stays with the original authors. The copyright stays the same (GPLv2 / GPLv3) and our changes are released under the same license.
Our version of the software contains some minor changes. In particular, it can
load .sql
files from a directory on the server. We use this to present to the
user some example queries. The setup procedure stays the same as for the
original software.
This software is published under the GNU PGL v3. All copyright remains with the original authors.