Skip to content
This repository has been archived by the owner on Apr 24, 2020. It is now read-only.

Small Object Allocator a C Plus Plus library for effective work with 'pointer on reference'

incoder edited this page Oct 24, 2017 · 2 revisions

Small Object Allocator

Original algorithm of thread synchronization described by Andrei is no longer valid for the now days hardware. Since thread synchronization approach used by Andrei brings to the memory allocation bottleneck in now days applications optimized for multi-core CPUs, so that default libc memory allocator from C++ run-time works faster in most cases.

Small Object Allocator uses same to Locki Small Object low level algorithm (Free list allocator memory pool system) on the lowest chunk level, but another ways to thread synchronizations, searching for chunking etc.

Library is based on Boost C++ libraries including Boost Atomic, Boost Config, Boost Move (portable move semantic), Boost System and Boost Thread.

Library is also designed in order to use small objects controlled by boost::intrusive_ptr instead of unique or shared smart pointers, for the better reference counting performance and reducing the total number of cache misses.

Unlike general propose memory allocators like PDMalloc or JEMalloc Small Object Allocator used only for 'small' objects, i.e. object with size less or equal then 128 bit for 64 bit or less then 64 bit for 32 bit addressing applications. This means Small Object Allocator do not override global new and delete operators.

Small Object allocator can be used with C++ 03 compatible compiler, this is run-time library can be static or dynamic linked.

Design

Top level allocator

Thread cached SLAB is on top level. All memory is split on pulls of fixed size (according to the compiler alignment). I.e. if object size is 14 bytes, allocator will use 16 bytes pull and allocate 16 bytes for this object (simple segregation with small internal fragmentation). Each pool contains thread specific (Hazard pointer) on the middle level allocator of fixed size. Poll storing all middle level allocators in - lock free and double linked forward list. When any thread make an attempt to use allocator, SLAB: - Locate a pool of fixed size, from thread specific value - If no middle level allocator yet used, reserve (set atomic flag) a middle level allocator from a list of existing, or creates new one if all of them already reserved and store it in thread specific pointer - Use middle level allocator to allocate a memory, operation may be locked by read-write lock (read lock)

Memory can be freed from the same thread, or from another thread. In first scenario no blocking happening at all, second scenario doing write lock.

Middle level allocator

Arena - AVL interval tree (tree allocator) which holding low level allocators in their nodes. Previously used tree leaf, saved in a variable for caching proposes. AVL balancing choose over the Red-Black tree, since tree is mostly used for search and inserting new nodes when deleting nodes is rare operation. AVL trees are better for searching, and same in tree leaf insertions to Red-Back tree. In the same time AVL trees slower in tree leaf deletion.

Low level allocator

Free list of fixed size objects - chunks which can allocate 256 (UCHAR_MAX+1) objects of the same size.

System level allocator

Windows implementation using private heap, additional to the process default heap. UNIX (POSIX) implementation refer to standard library allocator, which usually variants on Doug Lea allocator i.e. GNU STD LIBC implementing PDMAlloc.

Small object allocator to JE-Malloc

JE-Malloc is general propose allocator, it is optimized for server size application and or operating sytems kernel. For example FreeBSD using it inside the kernel. JE malloc is generally better from compiler optimization prospective. For example if you have something like:

 MySmallObject **ptrArray = ::je_malloc( sizeof(MySmallObject*) * 100);
 for(std::size_t i=0; i < 100; i++) {
   void *p = ::je_malloc(sizeof(MySmallObject));
   ptrArray[i] = new (p) MySmallObject();
 }
 ....

C++ compiler more likely optimize multiple calls to je_malloc (exactly like a call to std::malloc or std::new) to a single call to the big chunk of memory. As you may assume compiler can not do the same thing for small object allocator.

JE malloc is general propose - i.e. complicated and huge library, for example Windows version compiled with MinGW 64 will add about 23 additional megabytes to your application binary size. I.e. if you use static linking + 23 mb into app.exe file, dynamic linking + jemalloc.dll (23 mb) into distribution directory.

JE malloc allocating huge amount of virtual memory from operating system, and not releasing it for a while. Shrinking happening with additional thread (garbage collector thread).

Small object is about 120-150 kilo-bytes, it not creating any additional threads etc. But much worst from compiler optimization retrospective, and it is not general propose. I.e. if you allocating middle sized (from 129 bytes to virtual page granularity, usually 4096 bytes [4k] ) big (from 1 to 4 pages) or huge blocks (from mega byte to few Giga bytes) it will simply call system allocator. Extend concept from JE malloc works much better for this.

Use cases

Allocator is not designed to speed up work of std::allocator for STL containers like std::vector or std::map. You can use Boost Poll library or allocators from C++ 17 standard.

Usage

Typical use case of using library

#include <object.hpp>

class Widget:public virtual smallobject::object {
public:
  explicit Widget():
    smallobject::object()
  {}
  virtual ~Widget() BOOST_NOEXCEPT_OR_NOTHROW {
    std::cout<<"Widget destroyed"<<std::endl;
  }
};

DECLARE_SPTR_TYPE(Widget);

int main(int argc, const char **argv) {
   s_Widget wdg( new Widget() );
}

Tested compilers and Platforms

  • Microsoft Windows (Vista is minimal supported version) — GCC-TDM (MinGW 32/64) v 5.1 — LLVM CLang

  • Linux Kernel v3 — GCC 5.5 — LLVM CLang