-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add sub-RFC for interleaving memory allocation #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Alexandr-Konovalov
wants to merge
6
commits into
master
Choose a base branch
from
dev/Alexandr-Konovalov/rfc-interleaving
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+72
−0
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
55b3f1c
Add sub-RFC for interleaving memory allocation.
Alexandr-Konovalov 21eed50
Update rfcs/proposed/numa_support/interleaved-allocation.md
Alexandr-Konovalov 6ef4042
Update rfcs/proposed/numa_support/interleaved-allocation.md
Alexandr-Konovalov 7a5c7d6
Update rfcs/proposed/numa_support/interleaved-allocation.md
Alexandr-Konovalov 4dc0ce3
Update rfcs/proposed/numa_support/interleaved-allocation.md
Alexandr-Konovalov b370fec
Addressing comments.
Alexandr-Konovalov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| # API to allocate memory interleaved between NUMA nodes | ||
|
|
||
| *Note:* This document is a sub-RFC of the [umbrella RFC about improving NUMA | ||
| support](README.md). | ||
|
|
||
| ## Motivation | ||
|
|
||
| There are two kinds of NUMA-related performance bottlenecks: latency increasing due to | ||
| access to a remote node and bandwidth-limited simultaneous access from different CPUs to | ||
| a single NUMA memory node. A well-known method to mitigate both is a distribution of | ||
| memory objects that are accessed from different CPUs to different NUMA nodes in such a way | ||
| that matches an access pattern. If the access pattern is complex enough, a simple | ||
| round-robin distribution can be good enough. The distribution can be achieved either by employing a first-touch policy of NUMA memory allocation or via special platform-dependent | ||
| API. Generally, the latter requires less overhead. | ||
|
|
||
| ## Requirements to public API | ||
|
|
||
| Free, stateless functions, similar to malloc, are sufficient for the allocation of large | ||
| blocks of memory. To guide the spreading of blocks across NUMA nodes, two additional | ||
| parameters are proposed: `interleaving step` and `list of NUMA nodes to perform | ||
| allocations on`. This single function then serves as a provider of memory blocks with at | ||
| least page granularity and will not employ internal caching. If high-performance, smaller | ||
| and repetitive allocations are needed, then `std::pmr` or other solutions should be used. | ||
|
|
||
| `interleaving step` is the size of the contiguous memory block from a particular NUMA | ||
| node, it has page granularity. Currently there are no clear use cases for granularity more | ||
| than page size. | ||
|
|
||
| `list of nodes for allocation` is conceptually a set of `tbb::numa_node_id`. However, | ||
| because `tbb::numa_nodes()` returns `std::vector` and creating a `std::set` from it | ||
| requires allocation, `vector` can be used. Because semantics of `tbb::numa_node_id` is | ||
| not defined, we can't use it to construct e.g., a bit mask. Allocation that is unbalanced | ||
| between NUMA nodes doesn't seem to have useful applications, so repeated elements in `list | ||
| of nodes` is an error. | ||
|
|
||
| One use case for `list of nodes` argument is the desire to run parallel activity on subset of | ||
| nodes and so get memory only from those nodes. | ||
|
|
||
| Most common usage of the allocation function is expected only with `size` parameter. | ||
| In this case, `interleaving_step` defaults to the page size and memory is allocated on all | ||
| NUMA nodes. | ||
|
|
||
| ```c++ | ||
| void *tbb::numa::alloc_interleaved(size_t size, size_t interleaving_step = 0, | ||
| const std::vector<tbb::numa_node_id> *nodes = nullptr); | ||
| void tbb::numa::free_interleaved(void *ptr, size_t size); | ||
| ``` | ||
|
|
||
| ## Implementation details | ||
|
|
||
| Under Linux, only allocations with default interleaving can be supported via HWLOC. Other | ||
| interleaving steps require direct libnuma usage, that creates yet another run-time | ||
| dependency. It's possible to implement allocation with constant number of system call wrt | ||
| allocation size. | ||
|
|
||
| Under Windows, starting Windows 10 and WS 2016, `VirtualAlloc2(MEM_REPLACE_PLACEHOLDER)` | ||
| can be used to provide desired interleaving, but number of system calls is proportional to | ||
| allocation size. For older Windows, either fallback to `VirtualAlloc` or manual touching | ||
| from threads pre-pinned to NUMA nodes can be used. | ||
|
|
||
| There is no NUMA memory support under macOS, so the implementation can only fall back to | ||
| `malloc`. | ||
|
|
||
| ## Open Questions | ||
|
|
||
| When non-default `interleaving step` can be used? | ||
|
|
||
| `size` argument for `free_interleaved()` appeared because what we have is wrappers over | ||
| `mmap`/`munmap` and there is no place to put the size after memory is allocated. We can | ||
| put it in, say, an internal cumap. Is it look useful? | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.