Skip to content

Fix trailing guard elimination for aligned large allocations #350

@t-h-i-n-k-er

Description

@t-h-i-n-k-er

allocate_pages_aligned() laid out the mapping as [guard][lead_pad][usable][trail_pad][guard] and the trail unmap removed trail_size bytes starting at base+usable_size+guard_size. That address falls within the trailing guard rather than after it: when alignment exceeds guard_size, trail_size > guard_size and the munmap carves trail_size bytes out of the trailing guard region, fully removing the original guard page and replacing it with alignment padding that happens to be PROT_NONE. The guard_size stored in the region metadata no longer describes a real guard region. deallocate_pages() and the quarantine both assume [guard][usable][guard] with guard_size bytes on each side, but the trailing guard was eliminated and what sits after the usable area is just leftover alignment padding from the initial PROT_NONE mapping.

Change:
Restructure allocate_pages_aligned() so the trailing guard sits directly after the usable area with alignment padding beyond it: [guard][lead_pad][usable][guard][trail_pad]. When trail_size is nonzero, the old trail region (trail_size + guard_size bytes starting at base+usable_size) is unmapped and just the guard (guard_size bytes) is remapped at base+usable_size. The alignment padding beyond the guard is discarded. The total mapping size is unchanged, and the final layout is exactly [guard][usable][guard] with guard_size bytes of PROT_NONE on each side, which is what deallocate_pages() and the quarantine expect.

The mprotect of the usable area is moved after the unmap/remap so that the RW VMA is established after the VMA layout is finalized, avoiding a window where the usable pages sit between PROT_NONE fragments from the remap.

Why not the old approach:
The old code assumed that removing trail_size bytes from base+usable_size+guard_size trims the "end" of the mapping, but the trailing guard does not start at base+usable_size, it starts at base+usable_size+trail_size. So the munmap target is inside the guard when guard_size > trail_size (the common case for the default config) and inside the trail padding when guard_size <= trail_size (large alignment, small guard). In both cases the total PROT_NONE after the usable area happens to be guard_size bytes, which is why the bug is not immediately visible the guard region is functionally replaced by padding that is also PROT_NONE. However the guard_size in the region metadata is wrong: it describes a guard page that no longer exists at the recorded position, and any code that reasons about the guard's location (e.g. the in-place realloc shrink path, which installs a new guard at old+size and quarantines old+size+old_guard_size) operates on an incorrect layout.

RSS and VmSize are unchanged. VMA count goes from ~2.9 to exactly 3.0 per allocation (leading guard, usable, trailing guard), because the guard is no longer split by the trail unmap. Under CONFIG_LABEL_MEMORY the improvement is larger: the old code's trail unmap could split the trailing guard VMA, forcing prctl(PR_SET_VMA_ANON_NAME) to operate on a fragment; the new code produces a single clean guard VMA.

Verification
Builds clean under gcc -Werror, default and light configs, with and without CONFIG_LABEL_MEMORY. Test suite passes all 51 tests in every configuration.

Guard pages fault on overflow and underflow for aligned large allocations with alignments from 8 KiB to 2 MiB, verified by forking and accessing the page before and after the usable area. deallocate_pages() correctly unmaps the full [guard][usable][guard] range. The quarantine path correctly remaps the usable area as PROT_NONE and later unmaps the full guard-bounded region.

The memory_map_fixed for the remapped guard is MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE with PROT_NONE, which is the same combination used by the existing memory_map_fixed() for quarantine, so kernel behavior is well-tested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions