shim and vxdna: update to support unlimited dev no allocation#1234
shim and vxdna: update to support unlimited dev no allocation#1234wendyliang25 wants to merge 2 commits intoamd:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the shim and vxdna virt-accel implementation to support “unlimited” device BO allocations by enabling a dynamically expandable device heap made up of multiple 64MB chunks (up to a configured maximum).
Changes:
- Add DEV_HEAP VA-range reservation and fixed-address mappings in vxdna to place heap chunks contiguously in userspace VA.
- Add an expandable heap
bufferin the shim with a max size (512MB) and retry/expand behavior on allocation retry errors. - Adjust ccmd exception handling/logging behavior to avoid QEMU-side cascading failures.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/vxdna/src/vaccel_amdxdna.h |
Adds heap VA tracking in vxdna_context, BO type accessor, and changes ccmd error-wrap behavior. |
src/vxdna/src/vaccel_amdxdna.cpp |
Implements heap VA reservation + MAP_FIXED mapping for DEV_HEAP and updates error response behavior. |
src/shim/kmq/pcidev.cpp |
Creates an expandable DEV_HEAP BO on first open and retries DEV BO allocation by expanding heap on retry error. |
src/shim/buffer.h |
Introduces shared heap_page_size constant and new expandable buffer constructor signature. |
src/shim/buffer.cpp |
Implements expandable buffer constructor and refines expand() behavior for heap growth. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe78f13 to
a550fea
Compare
maxzhen
left a comment
There was a problem hiding this comment.
Please add comment and clarify locking schema when we need to expand heap.
a550fea to
818c473
Compare
Hi @maxzhen , i have added the comments about the locking: Please check again. thanks |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/shim/kmq/pcidev.cpp:19
Debug.num_heap_pagescan be configured to 0, which makesheap_szbecome 0 and results in constructing an expandable DEV_HEAP buffer withinitial_size == 0(eventually attempting to create a 0-byte BO). Consider clampingget_heap_num_pages()to at least 1 (or explicitly rejecting 0 with a clear error) so the first-open heap allocation is always valid.
unsigned int
get_heap_num_pages()
{
static unsigned int num = 0;
if (!num)
num = xrt_core::config::detail::get_uint_value("Debug.num_heap_pages", 1);
return num;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add userspace shim support for the kernel's expandable device heap. The shim drives heap expansion by catching -EAGAIN from BO allocation ioctls and requesting heap growth before retrying. - buffer.cpp: Add bo_buf::expand() to create a new DEV_HEAP BO via CREATE_BO ioctl, then mmap it at the correct offset to maintain a contiguous virtual address range. Track expansion chunks and unmap them on destruction. - buffer.h: Add m_expand_bo_handles vector and expand() declaration. - kmq/pcidev.cpp: Catch -EAGAIN in pdev::alloc_bo(), call expand() on the heap BO to grow it, and retry the allocation. Log when expansion is triggered. Signed-off-by: Wendy Liang <wendy.liang@amd.com>
Add expandable device heap support in vxdna (QEMU virtio-accel backend) with contiguous virtual address management and robust error handling. Heap VA management: - Reserve a contiguous HEAP_MAX_SIZE (512MB) VA range with PROT_NONE on first DEV_HEAP BO creation. Use MAP_FIXED to place each expansion chunk at m_heap_base + m_heap_cur_offset, ensuring the guest kernel sees a contiguous UVA space matching its expectations. - Track heap state with m_heap_destroyed flag: once any DEV_HEAP BO is destroyed, the contiguous VA invariant is broken so all further DEV/DEV_HEAP allocations are rejected with -EINVAL. - Add bounds check: reject expansion if m_heap_cur_offset + size exceeds HEAP_MAX_SIZE with -ENOSPC. - Move m_heap_cur_offset update after add_bo() and response write to avoid permanently wasting VA budget on failed operations. Error handling: - vxdna_ccmd_error_wrap: Remove exception re-throwing after catch. Re-throwing caused QEMU to send 0x1200 error responses and skip fence signaling, hanging the guest indefinitely on sync_wait. Now errors are logged and written to the response buffer only. - write_err_rsp: Log and return instead of throwing when resp resource is missing, preventing exceptions from escaping the error wrapper. - vxdna_bo constructor: Add mmap_target parameter to support MAP_FIXED placement. Add get_type() accessor for BO type queries. Cleanup: - Clean up reserved VA range in vxdna_context destructor. - Update Doxygen for vxdna_ccmd_error_wrap to reflect non-re-throwing behavior. Signed-off-by: Wendy Liang <wendy.liang@amd.com>
818c473 to
ccdbad5
Compare
See details description in this PR: https://github.com/amd/xdna-driver/pull/1233