Skip to content

Latest commit

 

History

History
220 lines (145 loc) · 7.31 KB

README.md

File metadata and controls

220 lines (145 loc) · 7.31 KB

Snowflaked

Source URL: http://github.com/dwayn/snowflaked

Introduction

This is a C implementation of Twitter's snowflake algorithm as a standalone daemon and client

Snowflake is an uncoordinated, distributed unique id generator that generates roughly ordered 64bit integer IDs.

The algorithm generates a time based ID that gains its entropy from the region id, worker id, and sequence bits. Region id and worker id must be a unique combination for each snowflaked process that is generating IDs for a single ID space. The sequence number is reset to 0 at the beginning of each millisecond and the first ID generated by a snowflaked daemon will use 0 as its sequence and then increment the sequence number for the next request within that millisecond. If clocks on all servers running snowflaked process are in sync, the IDs are ordered down to the millisecond, but IDs generated within a single millisecond are not ordered. In practice, due to clock skew, the IDs generated could be unordered at a higher granularity. NTP clock adjustments are handled by the daemon to ensure that IDs are not regenerated when NTP adjusts the system clock backwards. Note: NTP should be configured and running on all hosts running snowflaked.

Parts of a snowflake ID:

  • sign bit: will always be 0

  • this is because some languages don't support unsigned integers properly (Java and PHP are notable cases)

  • next 41 bits: milliseconds since custom epoch

  • next 4 bits: region id

  • next 10 bits: worker id

  • lowest 8 bits: sequence number

       sign           ms since custom epoch            region    worker     sequence  
      | 0 |                               16917051000 |    2 |         26 |       37 |
      | 0 | ----------------------------------------- | ---- | ---------- | -------- |
      | 0 | 00000001111110000010101011011011001111000 | 0010 | 0000011010 | 00100101 |
      
      ID: 70955254678034981
    

This has been implemented to always provide positive signed 64bit integers rather than using unsigned 64bit integers so that Java (prior to version 8) can use these IDs without painful workarounds.

The number of bits assigned to each of the parts as well as the custom epoch start time can be adjusted in src/snowflake.h by modifying the SNOWFLAKE_* constants and the project recompiled. Be sure that the sum of the 4 *_BITS constants does not exceed 63 bits as that will cause overflow on the ID.

Limits

The theoretical limit to the algorithm is approximately 2^4 * 2^10 * 2^8 * 1000 or ~4.2B IDs per second by 2^4 * 2^10 worker processes. Due to the single threaded nature of the daemon, CPU clock speed is the primary limiter to the rate at which IDs can be generated. On a 2.8ghz Intel Core i7, I have benchmarked to about 130k IDs per second which is why I chose 8 bits for the sequence (would take approx 5ghz CPU to overrun the sequence). Unfortunately, I have been unable to squeeze anymore performance out of the daemon due to the fact that the majority of the CPU is spent in libevent library handling the connections.

Components

Core

snowflaked

This is the main server daemon

Required Arguments:

    -r REGIONID, --region REGIONID
                        Region ID for the process

    -w WORKERID, --worker WORKERID
                        Worker ID for the process

Optional Arguments:

    -p PORT, --port PORT
                        Port number for the server to run on. 
                        Default: 8008

snowflake

Command line client for snowflaked

Optional Arguments

    -h HOST, --host HOST
                        Hostname of the server to query
                        Default: localhost

    -p PORT, --port PORT
                        Port number on the server to query
                        Default: 8008

sfbench

Command line client for generating load on snowflaked daemon

    -h HOST, --host HOST
                        Hostname of the server to query
                        Default: localhost

    -p PORT, --port PORT
                        Port number on the server to query
                        Default: 8008

    -n ITERATIONS, --iterations ITERATIONS
                        Number of ID GET calls to make to the server
                        Default: 1000

    -m MODVALUE, --output-every MODVALUE
                        Modulus value to set how often to write to stdout
                        Default: 1 (write line to stdout for every GET call)

Clients

PHP

There is an extension for PHP in clients/php-extension that can be compiled. See clients/php-extension/README.md

Python

Coming soon

Others client libraries

Node.js, Ruby, Java, Scala, etc: I plan to chip away at writing clients for many other languages, but if anyone would like to help with writing more client libraries, let me know and I would be happy to help out

Building

Dependencies

Library dependencies:

  • libevent 2.x
  • check 0.9.8 (need to test if check lib detection is working properly)

Build dependencies:

  • libtoolize/glibtoolize
  • autotools (aclocal, autoheader, autoconf, automake)

Debug build dependencies:

  • gperftools

Installing dependencies

Mac OSX

If you have homebrew installed, you can run the following to install all needed build dependencies:

    ./homebrew_install_dependencies.sh

Debian/Ubuntu

    sudo apt-get install libevent-dev pkg-config check libtool automake autoconf gperftools

you can leave out gperftools if you are not planning on compiling the debug version

RHEL/CentOS

    sudo yum install libevent-devel pkgconfig check libtool automake autoconf gperftools

you can leave out gperftools if you are not planning on compiling the debug version

Compiling/Installing:

  • ./bootstrap.sh
  • ./configure [--enable-debug]
  • make
  • sudo make install

Protocol

The daemon implements a simple text protocol:

Get ID request:

    GET\r\n
Server ID response:

64bit integer preceeded by "+"

    +70955254678034981\r\n
Server error response:

If there is an error, the response will be preceeded by "-"

    -ERROR error message\r\n

Get server stats:

    INFO\r\n
Server info response

The following is actually a contiguous string, but it is broken up into multiple lines for the sake of readability.

    +uptime:16761\r
    version:00.02.00\r
    region:0\r
    worker:39\r
    seq_cap:255\r
    seq_max:84\r
    ids:334241\r
    waits:0\r\n
  • uptime - Seconds since server started
  • version - Compiled version of the server
  • region - Region ID for the server
  • worker - Worker ID for the server
  • seq_cap - Max sequence ID that can be generated
  • seq_max - The highest sequence number that the server has generated since it was started
  • ids - The number of IDs that the server has generated since it was started
  • waits - The number of times the server has had to enter a wait loop to generate an ID
  • This can be caused by hitting the sequence number cap or by waiting due to an NTP clock adjustment that rolled time backwards on the host

TODO

  • Tests for snowflaked components