Skip to content

Latest commit

 

History

History
93 lines (66 loc) · 3.47 KB

README.md

File metadata and controls

93 lines (66 loc) · 3.47 KB

jumboDB

Website: http://comsysto.github.com/jumbodb/

Wiki: https://github.com/comsysto/jumbodb/wiki

Quick Installation: https://github.com/comsysto/jumbodb/wiki/Quick-installation-guide

Download: http://repository-comsysto.forge.cloudbees.com/release/org/jumbodb/database/

Twitter: @devproof http://twitter.com/devproof

Latest version: 0.1.0 (12th Sep 2014)

What is it good for?

  • As data store for low-latency 'Big Data' apps
  • Fast analysis over 'Big Data' with low budget
  • Store, index and query huge amounts of data
  • Make your Hadoop outputs accessible to every application (e.g. aggregated statistics)
  • Provide billions of datasets in a very short time
  • Store terabytes of data on a single instance without any performance impact!
  • Only immutable data is supported, you cannot insert and update single datasets
  • Works well on AWS infrastructure even on provisionized EBS volumes
  • Data delivery management and versionizing

Features

  • Index your JSON data
  • Query over indexed and non-indexed data
  • Geospatial indexes
  • Range queries (between, greather than, less than and so on)
  • Data replication (to another database)
  • Sharding and replication (planned, not yet implemented)
  • Very fast imports (the limitation is the ethernet interface or disk)
  • Multithreaded search
  • High compression
  • No downtimes on import (data is available until next import is finished)
  • Fast rollbacks
  • Java Driver and R Connector
  • Data delivery management and versionizing

Core ideas of jumboDB

  • Process and index the data in a parallelized environment like Hadoop (you can also run it locally)
  • All data is immutable, because data usally gets replaced or extended with further data deliveries from Hadoop
  • Immutable data allows an easy parallelization in data search
  • Preorganized and sorted data is better searchable and results in faster responses
  • Sorted data allows grouped read actions
  • Sort your data by the major use case to speed up queries
  • Compression helps to increase disk speed
  • Don't keep all indexes in memory, because the data is too big!

Big Data for the masses!

Balancing performance and cost efficiency

  1. Affordable Big Data Low IO requirements, efficient usage of disk space, low memory footprint

  2. Fast disk access through compression Snappy achieves compression rates up to 5 times increasing disk IO efficiency and saving storage cost

  3. Batch processing - delivery driven approach "Write once - read many" one batch of data is an atomic write with the rollback possibility

  4. Supports JSON documents Schema flexibility for rapid application development

  5. Power and scalability of Apache Hadoop For batch processing, aggregation and indexing of your data.(e.g. writes up to 500.000 JSON documents per second into the data store)

  6. Low read latency for end-user apps Optimized querying even for large result sets through multithreading and efficient data streaming (e.g. 100.000 JSON documents returned in less than a second)

  7. Hadoop Connector, Java Driver and R connector are available

Setup JumboDB

Please see the JumboDB Wiki https://github.com/comsysto/jumbodb/wiki

Licenses

The connectors are licensed under Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0.html

The database is licensed under Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0.html