feat: enable THP for guest memory#6003
Conversation
This commit adds THP for the guest memory, with a new value for the huge_pages option. Signed-off-by: Marco Marangoni <mamarang@amazon.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6003 +/- ##
=======================================
Coverage 83.03% 83.03%
=======================================
Files 277 277
Lines 30204 30231 +27
=======================================
+ Hits 25079 25102 +23
- Misses 5125 5129 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
|
||
| Setting `huge_pages` to `Transparent` enables transparent huge pages for guest | ||
| memory via `madvise(MADV_HUGEPAGE)`. This allows the kernel to opportunistically | ||
| back guest memory with 2MB pages without requiring a pre-allocated hugetlbfs |
There was a problem hiding this comment.
With mTHP, can't it also be sizes other than 2 MiB?
| let divisor = match self { | ||
| // Any integer memory size expressed in MiB will be a multiple of 4096KiB. | ||
| HugePageConfig::None => 1, | ||
| // Note: THP technically supports memory not 2MB aligned, however that would mean |
There was a problem hiding this comment.
The wording is slightly confusing because here it talks about alignment (if it's misaligned there can also be 4KiB pages at the head too), but then it talks about enforcing size, not alignment.
| // Any integer memory size expressed in MiB will be a multiple of 4096KiB. | ||
| HugePageConfig::None => 1, | ||
| // Note: THP technically supports memory not 2MB aligned, however that would mean | ||
| // some pages at the tail would be forced to be 4k size. To avoid performance/fragmentation surprises, |
There was a problem hiding this comment.
Is everybody "aligned" (no pun intended) with this? I'm not necessarily against this, but equally I don't think it's a big problem having a few 4K pages at the end of the memory region (especially since THP is not a guarantee anyway). And we already know that internal customers often use non 2MiB multiples.
| return int(stdout.strip()) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Shall we check that the host has THP enabled?
| anon_huge_kb = get_anon_huge_pages_kb(vm.firecracker_pid) | ||
|
|
||
| if huge_pages == HugePagesConfig.TRANSPARENT: | ||
| # With THP enabled, the kernel should have promoted some pages (let's say 100 MB out of 128MB) |
There was a problem hiding this comment.
Can this be fragile? Also, the check below is for 64MiB, not 100MiB (nit: let's use MiB rather than MB).
|
|
||
| # Allocate and touch anonymous memory inside the guest to trigger host-side | ||
| # page faults on the guest memory region (which is what THP backs). | ||
| vm.ssh.check_output("python3 -c 'x = bytearray(128 * 1024 * 1024)'") |
There was a problem hiding this comment.
nit: Hmm, I didn't know that this will actually touch the memory. It looks like it's doing an memset? Shall we explicitly call it out?
|
Have we concluded the performance analysis on this? |
Changes
This commit adds THP for the guest memory, with a new value for the huge_pages option.
Benchmarks show an improvement of TBD% (waiting on https://buildkite.com/firecracker/mamarang-nested-a-b/builds/110/list).
Note: this is only the first step to make full use of THP. Additional benefit can be gained by aligning memory to 2MB, and by working on the UFFD protocol.
Design choices:
madvisewill just emit a warningmadviseis called even for memfd, even though it is ignored (/sys/kernel/mm/transparent_hugepage/shmem_enabledisneverin most distros, see https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html#shmem-internal-tmpfs)Reason
Reduce page fault frequency, and reduce TLB usage.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.