Source URL: http://github.com/dwayn/snowflaked
This is a C implementation of Twitter's snowflake algorithm as a standalone daemon and client
Snowflake is an uncoordinated, distributed unique id generator that generates roughly ordered 64bit integer IDs.
The algorithm generates a time based ID that gains its entropy from the region id, worker id, and sequence bits. Region id and worker id must be a unique combination for each snowflaked process that is generating IDs for a single ID space. The sequence number is reset to 0 at the beginning of each millisecond and the first ID generated by a snowflaked daemon will use 0 as its sequence and then increment the sequence number for the next request within that millisecond. If clocks on all servers running snowflaked process are in sync, the IDs are ordered down to the millisecond, but IDs generated within a single millisecond are not ordered. In practice, due to clock skew, the IDs generated could be unordered at a higher granularity. NTP clock adjustments are handled by the daemon to ensure that IDs are not regenerated when NTP adjusts the system clock backwards. Note: NTP should be configured and running on all hosts running snowflaked.
Parts of a snowflake ID:
-
sign bit: will always be 0
-
this is because some languages don't support unsigned integers properly (Java and PHP are notable cases)
-
next 41 bits: milliseconds since custom epoch
-
next 4 bits: region id
-
next 10 bits: worker id
-
lowest 8 bits: sequence number
sign ms since custom epoch region worker sequence | 0 | 16917051000 | 2 | 26 | 37 | | 0 | ----------------------------------------- | ---- | ---------- | -------- | | 0 | 00000001111110000010101011011011001111000 | 0010 | 0000011010 | 00100101 | ID: 70955254678034981
This has been implemented to always provide positive signed 64bit integers rather than using unsigned 64bit integers so that Java (prior to version 8) can use these IDs without painful workarounds.
The number of bits assigned to each of the parts as well as the custom epoch start time can be
adjusted in src/snowflake.h
by modifying the SNOWFLAKE_*
constants and the project recompiled.
Be sure that the sum of the 4 *_BITS
constants does not exceed 63 bits as that will cause overflow
on the ID.
The theoretical limit to the algorithm is approximately 2^4 * 2^10 * 2^8 * 1000 or ~4.2B IDs per second by 2^4 * 2^10 worker processes. Due to the single threaded nature of the daemon, CPU clock speed is the primary limiter to the rate at which IDs can be generated. On a 2.8ghz Intel Core i7, I have benchmarked to about 130k IDs per second which is why I chose 8 bits for the sequence (would take approx 5ghz CPU to overrun the sequence). Unfortunately, I have been unable to squeeze anymore performance out of the daemon due to the fact that the majority of the CPU is spent in libevent library handling the connections.
This is the main server daemon
Required Arguments:
-r REGIONID, --region REGIONID
Region ID for the process
-w WORKERID, --worker WORKERID
Worker ID for the process
Optional Arguments:
-p PORT, --port PORT
Port number for the server to run on.
Default: 8008
Command line client for snowflaked
Optional Arguments
-h HOST, --host HOST
Hostname of the server to query
Default: localhost
-p PORT, --port PORT
Port number on the server to query
Default: 8008
Command line client for generating load on snowflaked daemon
-h HOST, --host HOST
Hostname of the server to query
Default: localhost
-p PORT, --port PORT
Port number on the server to query
Default: 8008
-n ITERATIONS, --iterations ITERATIONS
Number of ID GET calls to make to the server
Default: 1000
-m MODVALUE, --output-every MODVALUE
Modulus value to set how often to write to stdout
Default: 1 (write line to stdout for every GET call)
There is an extension for PHP in clients/php-extension
that can be compiled. See clients/php-extension/README.md
Coming soon
Node.js, Ruby, Java, Scala, etc: I plan to chip away at writing clients for many other languages, but if anyone would like to help with writing more client libraries, let me know and I would be happy to help out
- libevent 2.x
- check 0.9.8 (need to test if check lib detection is working properly)
- libtoolize/glibtoolize
- autotools (aclocal, autoheader, autoconf, automake)
- gperftools
If you have homebrew installed, you can run the following to install all needed build dependencies:
./homebrew_install_dependencies.sh
sudo apt-get install libevent-dev pkg-config check libtool automake autoconf gperftools
you can leave out gperftools if you are not planning on compiling the debug version
sudo yum install libevent-devel pkgconfig check libtool automake autoconf gperftools
you can leave out gperftools if you are not planning on compiling the debug version
- ./bootstrap.sh
- ./configure [--enable-debug]
- make
- sudo make install
The daemon implements a simple text protocol:
GET\r\n
64bit integer preceeded by "+"
+70955254678034981\r\n
If there is an error, the response will be preceeded by "-"
-ERROR error message\r\n
INFO\r\n
The following is actually a contiguous string, but it is broken up into multiple lines for the sake of readability.
+uptime:16761\r
version:00.02.00\r
region:0\r
worker:39\r
seq_cap:255\r
seq_max:84\r
ids:334241\r
waits:0\r\n
- uptime - Seconds since server started
- version - Compiled version of the server
- region - Region ID for the server
- worker - Worker ID for the server
- seq_cap - Max sequence ID that can be generated
- seq_max - The highest sequence number that the server has generated since it was started
- ids - The number of IDs that the server has generated since it was started
- waits - The number of times the server has had to enter a wait loop to generate an ID
- This can be caused by hitting the sequence number cap or by waiting due to an NTP clock adjustment that rolled time backwards on the host
- Tests for snowflaked components