Skip to content

When mi_os_prim_alloc_aligned falls back on overallocation, it drops allow_large #1290

@moskupols

Description

@moskupols

Hi, I noticed that in my app some OS allocations sometimes fail to be mapped to 2MiB large pages and get mapped to 4KiB pages instead.
I've only been able to reproduce this with OS allocation sizes strictly larger than 1GiB (which by default start after first 8 1GiB allocations), and only in multi-threaded applications.

Here's a repro, I've checked it on 1.9.7 and on 1.9.10:

#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
enum {
  BLOCK = 1ull << 20,
  BLOCKS = 20 * 1024,
  THREADS = 4,
  BLOCKS_PER_THREAD = BLOCKS / THREADS,
};
 
static void* ptrs[THREADS][BLOCKS_PER_THREAD];
 
static void* worker(void* arg) {
  long t = (long)arg;
  for (int i = 0; i < BLOCKS_PER_THREAD; ++i) {
    ptrs[t][i] = malloc(BLOCK);
    if (!ptrs[t][i]) {
      perror("malloc");
      return (void*)1;
    }
    memset(ptrs[t][i], 0xab, BLOCK);
  }
  return NULL;
}
 
int main(void) {
  pthread_t threads[THREADS];
  for (long t = 0; t < THREADS; ++t) {
    if (pthread_create(&threads[t], NULL, worker, (void*)t) != 0) {
      perror("pthread_create");
      return 1;
    }
  }
  for (int t = 0; t < THREADS; ++t) {
    void* res = NULL;
    pthread_join(threads[t], &res);
    if (res) return 1;
  }
  return 0;
}

If run like this on Linux 5.15 (this requires 20GiB in 2MiB hugepages to maximize chance of reproduction, but this requirement could probably be reduced):

$ gcc -O2 repro.c path/to/libmimalloc.a -pthread -o repro
$ echo 10240 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ MIMALLOC_ALLOW_LARGE_OS_PAGES=1 MIMALLOC_ARENA_RESERVE=2G MIMALLOC_VERBOSE=1 ./repro

In environment with enough hugepages I would expect all the allocations to be logged with (in large pages) suffix.
However, some allocations don't have this suffix and indeed kernel reports a few gigabytes of memory not in hugepages.
Before log about each allocation that is not in large pages there's a log of this form:

mimalloc: warning: thread 0x7FFC377E7700: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0x80000000 bytes, address: 0x7FFA47600000, alignment: 0x400000, commit: 1)

I believe what is happening in my case is that:

  1. For sizes bigger than 1GiB we don't pass aligned address hint to the OS.
  2. At the same time, we require 4MiB alignment for our segments, so arenas use 4MiB alignment for growth.
  3. When we mmap 2MiB hugepages, Linux can return an address that is aligned to 2MiB but not to 4MiB, especially without address hint.
  4. In this case on Linux we go the overallocate... route and allocate alloc_size + alignment bytes, which would usually still be divisible by 2MiB which is the requirement for 2MiB hugepages. However, in this fallback branch we always pass allow_large as false.
  5. I believe in 1.9.10 we could just pass allow_large && _mi_os_canuse_large_page(size, alignment) instead of false in this branch, as mi_os_prim_alloc seems to bump alignment from 1 to OS page size if needed.
  6. I'm not sure why I can't reproduce this with a single-threaded program.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions