This document describes how the heap profiler works and how to add heap profiling support to your allocator. If you just want to know how to use it, see Heap Profiling with MemoryInfra
[TOC]
The heap profiler consists of tree main components:
- The Context Tracker: Responsible for providing context (pseudo stack backtrace) when an allocation occurs.
- The Allocation Register: A specialized hash table that stores allocation details by address.
- The Heap Dump Writer: Extracts the most important information from a set of recorded allocations and converts it into a format that can be dumped into the trace log.
These components are designed to work well together, but to be usable independently as well.
When there is a way to get notified of all allocations and frees, this is the normal flow:
- When an allocation occurs, call
AllocationContextTracker::GetInstanceForCurrentThread()->GetContextSnapshot()
to get anAllocationContext
. - Insert that context together with the address and size into an
AllocationRegister
by callingInsert()
. - When memory is freed, remove it from the register with
Remove()
. - On memory dump, collect the allocations from the register, call
ExportHeapDump()
, and add the generated heap dump to the memory dump.
*** aside An allocator can skip step 2 and 3 if it is able to store the context itself, and if it is able to enumerate all allocations for step 4.
When heap profiling is enabled (the --enable-heap-profiling
flag is passed),
the memory dump manager calls OnHeapProfilingEnabled()
on every
MemoryDumpProvider
as early as possible, so allocators can start recording
allocations. This should be done even when tracing has not been started,
because these allocations might still be around when a heap dump happens during
tracing.
The AllocationContextTracker
is a thread-local object. Its
main purpose is to keep track of a pseudo stack of trace events. Chrome has
been instrumented with lots of TRACE_EVENT
macros. These trace events push
their name to a thread-local stack when they go into scope, and pop when they
go out of scope, if all of the following conditions have been met:
- A trace is being recorded.
- The category of the event is enabled in the trace config.
- Heap profiling is enabled (with the
--enable-heap-profiling
flag).
This means that allocations that occur before tracing is started will not have backtrace information in their context.
A thread-local instance of the context tracker is initialized lazily when it is
first accessed. This might be because a trace event pushed or popped, or because
GetContextSnapshot()
was called when an allocation occurred.
AllocationContext
is what is used to group and break down
allocations. Currently AllocationContext
has the following fields:
- Backtrace: filled by the context tracker, obtained from the thread-local pseudo stack.
- Type name: to be filled in at a point where the type of a pointer is known, set to [unknown] by default.
It is possible to modify this context after insertion into the register, for instance to set the type name if it was not known at the time of allocation.
The AllocationRegister
is a hash table specialized for
storing (size, AllocationContext)
pairs by address. It has been optimized for
Chrome's typical number of unfreed allocations, and it is backed by mmap
memory directly so there are no reentrancy issues when using it to record
malloc
allocations.
The allocation register is threading-agnostic. Access must be synchronised properly.
Dumping every single allocation in the allocation register straight into the
trace log is not an option due to the sheer volume (~300k unfreed allocations).
The role of the ExportHeapDump()
function is to group
allocations, striking a balance between trace log size and detail.
See the Heap Dump Format document for more details about the structure of the heap dump in the trace log.
Below is an example of adding heap profiling support to an allocator that has an existing memory dump provider.
class FooDumpProvider : public MemoryDumpProvider {
// Kept as pointer because |AllocationRegister| allocates a lot of virtual
// address space when constructed, so only construct it when heap profiling is
// enabled.
scoped_ptr<AllocationRegister> allocation_register_;
Lock allocation_register_lock_;
static FooDumpProvider* GetInstance();
void InsertAllocation(void* address, size_t size) {
AllocationContext context = AllocationContextTracker::GetInstanceForCurrentThread()->GetContextSnapshot();
AutoLock lock(allocation_register_lock_);
allocation_register_->Insert(address, size, context);
}
void RemoveAllocation(void* address) {
AutoLock lock(allocation_register_lock_);
allocation_register_->Remove(address);
}
// Will be called as early as possible by the memory dump manager.
void OnHeapProfilingEnabled(bool enabled) override {
AutoLock lock(allocation_register_lock_);
allocation_register_.reset(new AllocationRegister());
// At this point, make sure that from now on, for every allocation and
// free, |FooDumpProvider::GetInstance()->InsertAllocation()| and
// |RemoveAllocation| are called.
}
bool OnMemoryDump(const MemoryDumpArgs& args,
ProcessMemoryDump& pmd) override {
// Do regular dumping here.
// Dump the heap only for detailed dumps.
if (args.level_of_detail == MemoryDumpLevelOfDetail::DETAILED) {
TraceEventMemoryOverhead overhead;
hash_map<AllocationContext, size_t> bytes_by_context;
{
AutoLock lock(allocation_register_lock_);
if (allocation_register_) {
// Group allocations in the register into |bytes_by_context|, but do
// no additional processing inside the lock.
for (const auto& alloc_size : *allocation_register_)
bytes_by_context[alloc_size.context] += alloc_size.size;
allocation_register_->EstimateTraceMemoryOverhead(&overhead);
}
}
if (!bytes_by_context.empty()) {
scoped_refptr<TracedValue> heap_dump = ExportHeapDump(
bytes_by_context,
pmd->session_state()->stack_frame_deduplicator(),
pmb->session_state()->type_name_deduplicator());
pmd->AddHeapDump("foo_allocator", heap_dump);
overhead.DumpInto("tracing/heap_profiler", pmd);
}
}
return true;
}
};
*** aside
The implementation for malloc
is more complicated because it needs to deal
with reentrancy.