Skip to content

feat: enable THP for guest memory#6003

Draft
marco-marangoni wants to merge 1 commit into
firecracker-microvm:mainfrom
marco-marangoni:thp
Draft

feat: enable THP for guest memory#6003
marco-marangoni wants to merge 1 commit into
firecracker-microvm:mainfrom
marco-marangoni:thp

Conversation

@marco-marangoni

@marco-marangoni marco-marangoni commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Changes

This commit adds THP for the guest memory, with a new value for the huge_pages option.

Benchmarks show an improvement of TBD% (waiting on https://buildkite.com/firecracker/mamarang-nested-a-b/builds/110/list).

Note: this is only the first step to make full use of THP. Additional benefit can be gained by aligning memory to 2MB, and by working on the UFFD protocol.

Design choices:

  • require guest memory to be multiple of 2MB: this is technically not required, but I think it will reduce surprises by end-users when some VM configuration have some pages that cannot be merged
  • not enabled by default: THP has some impact on the host, and feedback from internal customers was that they want to enable THP under their terms. We can re-evaluate changing the default in future Firecracker versions
  • requested optimistically: there's no guarantee that huge-pages will actually be used by the host:

Reason

Reduce page fault frequency, and reduce TLB usage.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@marco-marangoni marco-marangoni requested a review from ilstam June 26, 2026 16:26
This commit adds THP for the guest memory, with a new value for the
huge_pages option.

Signed-off-by: Marco Marangoni <mamarang@amazon.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.84848% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.03%. Comparing base (ce26972) to head (743de82).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/vstate/memory.rs 80.00% 3 Missing ⚠️
src/vmm/src/vmm_config/machine_config.rs 77.77% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6003   +/-   ##
=======================================
  Coverage   83.03%   83.03%           
=======================================
  Files         277      277           
  Lines       30204    30231   +27     
=======================================
+ Hits        25079    25102   +23     
- Misses       5125     5129    +4     
Flag Coverage Δ
5.10-m5n.metal 83.34% <84.84%> (-0.01%) ⬇️
5.10-m6a.metal 82.69% <84.84%> (+<0.01%) ⬆️
5.10-m6g.metal 80.00% <84.84%> (+<0.01%) ⬆️
5.10-m6i.metal 83.33% <84.84%> (-0.01%) ⬇️
5.10-m7a.metal-48xl 82.68% <84.84%> (+<0.01%) ⬆️
5.10-m7g.metal 80.00% <84.84%> (+<0.01%) ⬆️
5.10-m7i.metal-24xl 83.31% <84.84%> (-0.01%) ⬇️
5.10-m7i.metal-48xl 83.31% <84.84%> (-0.01%) ⬇️
5.10-m8g.metal-24xl 80.00% <84.84%> (+<0.01%) ⬆️
5.10-m8g.metal-48xl 80.00% <84.84%> (+<0.01%) ⬆️
5.10-m8i.metal-48xl 83.31% <84.84%> (-0.01%) ⬇️
5.10-m8i.metal-96xl 83.32% <84.84%> (+<0.01%) ⬆️
6.1-m5n.metal 83.36% <84.84%> (+<0.01%) ⬆️
6.1-m6a.metal 82.71% <84.84%> (-0.01%) ⬇️
6.1-m6g.metal 80.00% <84.84%> (+<0.01%) ⬆️
6.1-m6i.metal 83.36% <84.84%> (+<0.01%) ⬆️
6.1-m7a.metal-48xl 82.70% <84.84%> (+<0.01%) ⬆️
6.1-m7g.metal 80.00% <84.84%> (+<0.01%) ⬆️
6.1-m7i.metal-24xl 83.37% <84.84%> (-0.01%) ⬇️
6.1-m7i.metal-48xl 83.37% <84.84%> (+<0.01%) ⬆️
6.1-m8g.metal-24xl 80.00% <84.84%> (-0.01%) ⬇️
6.1-m8g.metal-48xl 80.00% <84.84%> (+<0.01%) ⬆️
6.1-m8i.metal-48xl 83.38% <84.84%> (+<0.01%) ⬆️
6.1-m8i.metal-96xl 83.38% <84.84%> (+<0.01%) ⬆️
6.18-m5n.metal 83.37% <84.84%> (+<0.01%) ⬆️
6.18-m6a.metal 82.71% <84.84%> (+<0.01%) ⬆️
6.18-m6g.metal 80.00% <84.84%> (+<0.01%) ⬆️
6.18-m6i.metal 83.36% <84.84%> (+<0.01%) ⬆️
6.18-m7a.metal-48xl 82.70% <84.84%> (+<0.01%) ⬆️
6.18-m7g.metal 80.00% <84.84%> (+<0.01%) ⬆️
6.18-m7i.metal-24xl 83.37% <84.84%> (+<0.01%) ⬆️
6.18-m7i.metal-48xl 83.38% <84.84%> (+<0.01%) ⬆️
6.18-m8g.metal-24xl 80.00% <84.84%> (+<0.01%) ⬆️
6.18-m8g.metal-48xl 80.00% <84.84%> (+<0.01%) ⬆️
6.18-m8i.metal-48xl 83.38% <84.84%> (+<0.01%) ⬆️
6.18-m8i.metal-96xl 83.38% <84.84%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docs/hugepages.md

Setting `huge_pages` to `Transparent` enables transparent huge pages for guest
memory via `madvise(MADV_HUGEPAGE)`. This allows the kernel to opportunistically
back guest memory with 2MB pages without requiring a pre-allocated hugetlbfs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With mTHP, can't it also be sizes other than 2 MiB?

let divisor = match self {
// Any integer memory size expressed in MiB will be a multiple of 4096KiB.
HugePageConfig::None => 1,
// Note: THP technically supports memory not 2MB aligned, however that would mean

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording is slightly confusing because here it talks about alignment (if it's misaligned there can also be 4KiB pages at the head too), but then it talks about enforcing size, not alignment.

// Any integer memory size expressed in MiB will be a multiple of 4096KiB.
HugePageConfig::None => 1,
// Note: THP technically supports memory not 2MB aligned, however that would mean
// some pages at the tail would be forced to be 4k size. To avoid performance/fragmentation surprises,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is everybody "aligned" (no pun intended) with this? I'm not necessarily against this, but equally I don't think it's a big problem having a few 4K pages at the end of the memory region (especially since THP is not a guarantee anyway). And we already know that internal customers often use non 2MiB multiples.

return int(stdout.strip())


@pytest.mark.parametrize(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we check that the host has THP enabled?

anon_huge_kb = get_anon_huge_pages_kb(vm.firecracker_pid)

if huge_pages == HugePagesConfig.TRANSPARENT:
# With THP enabled, the kernel should have promoted some pages (let's say 100 MB out of 128MB)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be fragile? Also, the check below is for 64MiB, not 100MiB (nit: let's use MiB rather than MB).


# Allocate and touch anonymous memory inside the guest to trigger host-side
# page faults on the guest memory region (which is what THP backs).
vm.ssh.check_output("python3 -c 'x = bytearray(128 * 1024 * 1024)'")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Hmm, I didn't know that this will actually touch the memory. It looks like it's doing an memset? Shall we explicitly call it out?

@ilstam

ilstam commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Have we concluded the performance analysis on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants