je_malloc

I was planning to write some notes about ptmalloc, tcmalloc and jemalloc. Well, it is impossible for sure. So I decide to read jemalloc first, because this is the first malloc library that I learned while reading redis source code.

Abbreviations：

TSD, tsd: thread specific data
TLS, tls: thread local storage

Jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. For further information please check the links down below. I will focus on je_malloc, the main function overriding libc malloc.

The version I am reading is in HEAD commit fb56766ca9b398d07e2def5ead75a021fc08da03 due to a new implementation for performance improvement in je_malloc.

To enable static linking with glibc, there must be a jemalloc specific malloc function implementation. The entry point is in jemalloc.c.

...
#    ifdef JEMALLOC_OVERRIDE___LIBC_MALLOC
void *__libc_malloc(size_t size) PREALIAS(je_malloc);
#    endif

Tracking into je_malloc. Current je_malloc in this dev branch is not the same as released. They add some code to improve performance which is based on these concepts:

caching by tcache (thread cache)
tail-calling the old je_malloc

Misc

tsd_get_allocates

This function return bool value based on platform information. So do tsd_boot0 tsd_boot1, tsd_boot, tsd_booted_get, tsd_get_allocates, tsd_get, and tsd_set.

#ifdef JEMALLOC_MALLOC_THREAD_CLEANUP
#include "jemalloc/internal/tsd_malloc_thread_cleanup.h"
#elif (defined(JEMALLOC_TLS))
#include "jemalloc/internal/tsd_tls.h"
#elif (defined(_WIN32))
#include "jemalloc/internal/tsd_win.h"
#else
#include "jemalloc/internal/tsd_generic.h"
#endif

unlikely, likely

These functions are used for static branch prediction. Compiler would try to place instructions followed by a branch or not according to whether the branch is likely or unlikely to be taken.

JEMALLOC_EXPORT JEMALLOC_ALLOCATOR JEMALLOC_RESTRICT_RETURN
void JEMALLOC_NOTHROW *
JEMALLOC_ATTR(malloc) JEMALLOC_ALLOC_SIZE(1)
je_malloc(size_t size) {
	LOG("core.malloc.entry", "size: %zu", size);

	if (tsd_get_allocates() && unlikely(!malloc_initialized())) {
		return malloc_default(size);
	}

	tsd_t *tsd = tsd_get(false);
	if (unlikely(!tsd || !tsd_fast(tsd) || (size > SC_LOOKUP_MAXCLASS))) {
		return malloc_default(size);
	}

	tcache_t *tcache = tsd_tcachep_get(tsd);

	if (unlikely(ticker_trytick(&tcache->gc_ticker))) {
		return malloc_default(size);
	}

	szind_t ind = sz_size2index_lookup(size);
	size_t usize;
	if (config_stats || config_prof) {
		usize = sz_index2size(ind);
	}
	/* Fast path relies on size being a bin. I.e. SC_LOOKUP_MAXCLASS < SC_SMALL_MAXCLASS */
	assert(ind < SC_NBINS);
	assert(size <= SC_SMALL_MAXCLASS);

	if (config_prof) {
		int64_t bytes_until_sample = tsd_bytes_until_sample_get(tsd);
		bytes_until_sample -= usize;
		tsd_bytes_until_sample_set(tsd, bytes_until_sample);

		if (unlikely(bytes_until_sample < 0)) {
			/*
			 * Avoid a prof_active check on the fastpath.
			 * If prof_active is false, set bytes_until_sample to
			 * a large value.  If prof_active is set to true,
			 * bytes_until_sample will be reset.
			 */
			if (!prof_active) {
				tsd_bytes_until_sample_set(tsd, SSIZE_MAX);
			}
			return malloc_default(size);
		}
	}

	cache_bin_t *bin = tcache_small_bin_get(tcache, ind);
	bool tcache_success;
	void* ret = cache_bin_alloc_easy(bin, &tcache_success);

	if (tcache_success) {
		if (config_stats) {
			*tsd_thread_allocatedp_get(tsd) += usize;
			bin->tstats.nrequests++;
		}
		if (config_prof) {
			tcache->prof_accumbytes += usize;
		}

		LOG("core.malloc.exit", "result: %p", ret);

		/* Fastpath success */
		return ret;
	}

	return malloc_default(size);
}