I was planning to write some notes about ptmalloc, tcmalloc and jemalloc. Well, it is impossible for sure. So I decide to read jemalloc first, because this is the first malloc library that I learned while reading redis source code.
Abbreviations:
- TSD, tsd: thread specific data
- TLS, tls: thread local storage
Jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. For further information please check the links down below. I will focus on je_malloc, the main function overriding libc malloc.
The version I am reading is in HEAD commit fb56766ca9b398d07e2def5ead75a021fc08da03 due to a new implementation for performance improvement in je_malloc.
To enable static linking with glibc, there must be a jemalloc specific malloc function implementation. The entry point is in jemalloc.c.
...
# ifdef JEMALLOC_OVERRIDE___LIBC_MALLOC
void *__libc_malloc(size_t size) PREALIAS(je_malloc);
# endif
Tracking into je_malloc. Current je_malloc in this dev branch is not the same as released. They add some code to improve performance which is based on these concepts:
- caching by tcache (thread cache)
- tail-calling the old je_malloc
Misc
tsd_get_allocates
This function return bool value based on platform information. So do tsd_boot0 tsd_boot1, tsd_boot, tsd_booted_get, tsd_get_allocates, tsd_get, and tsd_set.
#ifdef JEMALLOC_MALLOC_THREAD_CLEANUP
#include "jemalloc/internal/tsd_malloc_thread_cleanup.h"
#elif (defined(JEMALLOC_TLS))
#include "jemalloc/internal/tsd_tls.h"
#elif (defined(_WIN32))
#include "jemalloc/internal/tsd_win.h"
#else
#include "jemalloc/internal/tsd_generic.h"
#endif
unlikely, likely
These functions are used for static branch prediction. Compiler would try to place instructions followed by a branch or not according to whether the branch is likely or unlikely to be taken.
JEMALLOC_EXPORT JEMALLOC_ALLOCATOR JEMALLOC_RESTRICT_RETURN
void JEMALLOC_NOTHROW *
JEMALLOC_ATTR(malloc) JEMALLOC_ALLOC_SIZE(1)
je_malloc(size_t size) {
LOG("core.malloc.entry", "size: %zu", size);
if (tsd_get_allocates() && unlikely(!malloc_initialized())) {
return malloc_default(size);
}
tsd_t *tsd = tsd_get(false);
if (unlikely(!tsd || !tsd_fast(tsd) || (size > SC_LOOKUP_MAXCLASS))) {
return malloc_default(size);
}
tcache_t *tcache = tsd_tcachep_get(tsd);
if (unlikely(ticker_trytick(&tcache->gc_ticker))) {
return malloc_default(size);
}
szind_t ind = sz_size2index_lookup(size);
size_t usize;
if (config_stats || config_prof) {
usize = sz_index2size(ind);
}
/* Fast path relies on size being a bin. I.e. SC_LOOKUP_MAXCLASS < SC_SMALL_MAXCLASS */
assert(ind < SC_NBINS);
assert(size <= SC_SMALL_MAXCLASS);
if (config_prof) {
int64_t bytes_until_sample = tsd_bytes_until_sample_get(tsd);
bytes_until_sample -= usize;
tsd_bytes_until_sample_set(tsd, bytes_until_sample);
if (unlikely(bytes_until_sample < 0)) {
/*
* Avoid a prof_active check on the fastpath.
* If prof_active is false, set bytes_until_sample to
* a large value. If prof_active is set to true,
* bytes_until_sample will be reset.
*/
if (!prof_active) {
tsd_bytes_until_sample_set(tsd, SSIZE_MAX);
}
return malloc_default(size);
}
}
cache_bin_t *bin = tcache_small_bin_get(tcache, ind);
bool tcache_success;
void* ret = cache_bin_alloc_easy(bin, &tcache_success);
if (tcache_success) {
if (config_stats) {
*tsd_thread_allocatedp_get(tsd) += usize;
bin->tstats.nrequests++;
}
if (config_prof) {
tcache->prof_accumbytes += usize;
}
LOG("core.malloc.exit", "result: %p", ret);
/* Fastpath success */
return ret;
}
return malloc_default(size);
}