accel/amdxdna: implement unlimited dev bo support by wendyliang25 · Pull Request #1233 · amd/xdna-driver

wendyliang25 · 2026-04-02T20:37:47Z

implement unlimited dev bo support.
AIE2 executes instructions on a buffer which is internal to XRT. Due to TLB requirement from NPU firmware, each chunk of the memory has 64MB limitation. If we want to have more than 64MB npu heap buffer, we need to have multiple chunks.

Today the staging driver in drivers/accel/amdxdna only supports single chunk. this patch set enable multiple chunks support.
In the beginning, when the shim opens drm client, it will allocate first 64MB chunk of heap memory. At runtime, if heap buffer allocation fails, userspace shim layer will request to expand the heap, every expansion will add one 64MB chunk until it meets the requirement or hit the max heap size, and then try to allocate the buffer from heap again.
also see PR: #1234

Test with baremetal Linux with/without force_iova= module parameter, and with QEMU KVM guest.

example output from QEMU KVM:

installed_testcases/xrt_unlimited_bo/xrt_unlimited_bo
-------------------------------------------------------
    [#37] : runlist
-------------------------------------------------------
        Going to start the runlist #5
                Sync bos to device
                Sync bos to device
                Sync ofm from device
                Validating ofm  passed
                Sync ofm from device
                Validating ofm  passed
-------------------------------------------------------
    [#38] : runlist
-------------------------------------------------------
        Going to start the runlist #6
                Sync bos to device
                Sync bos to device
                Sync bos to device
                Sync ofm from device
                Validating ofm  passed
                Sync ofm from device
                Validating ofm  passed
                Sync ofm from device
                Validating ofm  passed
INFO: TEST PASSED
                                       .--.
                _______             .-"  .'
        .---u"""       """"---._  ."    %
      .'                        "--.    %
 __.--'  o                          "".. "
(____.                                  ":
 `----.__                                 ".
         `----------__                     ".
               ".   . ""--.                 ".
                 ". ". bIt ""-.              ".
                   "-.)        ""-.           ".
                                   "".         ".
                                      "".       ".
                                         "".      ".
                                            "".    ".
                      ^~^~^~^~^~^~^~^~^~^~^~^~^"".  "^~^~^~^~^
                                            ^~^~^~^  ~^~
                                                 ^~^~^~


=== PASSED: configs/runlist_bo_on_another_bank.ini ===

===============================
Results: 15 passed, 0 failed out of 15 total
===================s

Copilot

Pull request overview

This PR adds “unlimited” device BO support for the amdxdna stack by enabling the device heap to grow in 64MB chunks (up to a configured maximum) and updating both the userspace shim and kernel driver to coordinate heap expansion and firmware mappings.

Changes:

Kernel: track a per-client list of DEV_HEAP chunks, grow committed heap on demand, and notify firmware/hwctxs when new heap chunks are added.
Userspace shim: retry DEV BO allocations on -EAGAIN by expanding the heap one 64MB chunk at a time (up to 512MB).
vxdna virt layer: reserve a large VA range for a contiguous heap layout and map new heap chunks into it with fixed-address mmaps.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/vxdna/src/vaccel_amdxdna.h	Adds heap VA reservation state and adjusts teardown/error wrapping behavior.
src/vxdna/src/vaccel_amdxdna.cpp	Maps DEV_HEAP chunks into a reserved VA window; tracks committed heap growth.
src/shim/kmq/pcidev.cpp	Retries DEV BO allocation by expanding heap on `EAGAIN`.
src/shim/buffer.h	Moves heap page size constant to a shared header; adds expandable buffer ctor.
src/shim/buffer.cpp	Implements expandable heap buffer allocation via multiple DRM BO chunks.
drivers/accel/amdxdna/npu6_regs.c	Plumbs `dev_heap_max_size` for NPU6.
drivers/accel/amdxdna/npu5_regs.c	Plumbs `dev_heap_max_size` for NPU5.
drivers/accel/amdxdna/npu4_regs.c	Plumbs `dev_heap_max_size` for NPU4.
drivers/accel/amdxdna/npu1_regs.c	Plumbs `dev_heap_max_size` for NPU1.
drivers/accel/amdxdna/amdxdna_pci_drv.h	Adds heap expansion notification hook and client heap chunk tracking fields.
drivers/accel/amdxdna/amdxdna_pci_drv.c	Initializes and tears down per-client heap chunk list.
drivers/accel/amdxdna/amdxdna_gem.h	Adds per-heap-chunk list linkage in GEM objects.
drivers/accel/amdxdna/amdxdna_gem.c	Implements committed-heap allocation windowing, chunk expansion, and DEV BO vmap across chunks.
drivers/accel/amdxdna/aie2_pci.h	Adds AIE2 heap size constant and new FW API prototypes.
drivers/accel/amdxdna/aie2_pci.c	Wires heap expansion notification op into AIE2 ops.
drivers/accel/amdxdna/aie2_msg_priv.h	Adds new mailbox opcode for “add host buffer”.
drivers/accel/amdxdna/aie2_message.c	Splits host buffer mapping into MAP + ADD messages for multi-chunk heaps.
drivers/accel/amdxdna/aie2_ctx.c	Pins/unpins all heap chunks and maps/extends heap buffers per hwctx; implements notify hook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/vxdna/src/vaccel_amdxdna.h

src/shim/buffer.cpp

drivers/accel/amdxdna/amdxdna_pci_drv.c

src/vxdna/src/vaccel_amdxdna.cpp

drivers/accel/amdxdna/aie2_message.c

drivers/accel/amdxdna/amdxdna_gem.c

maxzhen · 2026-04-02T20:54:02Z

Hi @wendyliang25 , can we split the change into multiple pull requests. The first PR is for driver change only. This will ensure the driver change does not break backward compatibility. Then, we can file another PR for user space change. I'd like to set up the rule that UMD and KMD changes are not allowed in one PR to avoid breaking changes for old UMD.

wendyliang25 · 2026-04-02T22:38:49Z

Hi @wendyliang25 , can we split the change into multiple pull requests. The first PR is for driver change only. This will ensure the driver change does not break backward compatibility. Then, we can file another PR for user space change. I'd like to set up the rule that UMD and KMD changes are not allowed in one PR to avoid breaking changes for old UMD.

Hi @maxzhen , i have split the original PR, this one only has the kernel driver change, the UMD (shim and vxdna) changes are in this PR: #1234. could you check again? thanks

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

drivers/accel/amdxdna/amdxdna_gem.c:1110

amdxdna_gem_pin_nolock()/amdxdna_gem_unpin() now no-op for AMDXDNA_BO_DEV, but call sites like amdxdna_drm_sync_bo_ioctl() rely on pinning to ensure a stable page backing before calling amdxdna_gem_vmap() (which will invoke the DEV BO .vmap path and expects heap-chunk pages/sgt to exist). With the current change, syncing/vmapping a DEV BO can fail (or map invalid pages) unless some other path already pinned all heap chunks.

Consider making DEV BO pin/unpin propagate to the underlying heap-chunk(s) (e.g., pin/unpin the chunk list, or at least the chunk range that covers abo->mm_node), or special-case the sync/vmap paths to pin the heap chunks when operating on a DEV BO.

	if (abo->type == AMDXDNA_BO_DEV)
		return 0; /* Heap chunks are pinned at expansion time */

	if (is_import_bo(abo))
		return 0;

drivers/accel/amdxdna/amdxdna_pci_drv.c:111

PR description says the driver allocates the first 64MB heap chunk when the DRM client is opened, but amdxdna_drm_open() still only initializes client state (no DEV_HEAP BO is created/queued there). The only creation path for heap chunks appears to be AMDXDNA_BO_DEV_HEAP in amdxdna_drm_create_bo_ioctl().

Either update the PR description to match the actual behavior (heap created by userspace via ioctl), or add the missing kernel-side allocation at open if that’s required for the design.

	init_srcu_struct(&client->hwctx_srcu);
	xa_init_flags(&client->hwctx_xa, XA_FLAGS_ALLOC);
	mutex_init(&client->mm_lock);
	INIT_LIST_HEAD(&client->dev_heap_chunks);

	mutex_lock(&xdna->dev_lock);
	list_add_tail(&client->node, &xdna->client_list);
	mutex_unlock(&xdna->dev_lock);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

drivers/accel/amdxdna/aie2_ctx.c

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

drivers/accel/amdxdna/npu1_regs.c:145

dev_heap_max_size for NPU1 is set to AIE2_DEVM_SIZE (64MB) while NPU4/5/6 use AIE2_DEVM_MAX_SIZE (512MB). If NPU1 hardware/firmware supports the same multi-chunk heap expansion, this will unintentionally disable “unlimited dev bo” on NPU1. If the smaller max is intentional for NPU1, please add a brief comment explaining why to avoid confusion.

	.dev_mem_base      = AIE2_DEVM_BASE,
	.dev_mem_size      = AIE2_DEVM_SIZE,
	.default_vbnv      = "RyzenAI-npu1",
	.dev_heap_max_size = AIE2_DEVM_SIZE,
	.device_type       = AMDXDNA_DEV_TYPE_KMQ,
	.dev_priv          = &npu1_dev_priv,
	.fw_feature_tbl    = npu1_fw_feature_table,
	.ops               = &aie2_ops,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

drivers/accel/amdxdna/aie2_ctx.c

drivers/accel/amdxdna/aie2_msg_priv.h

drivers/accel/amdxdna/aie2_message.c

drivers/accel/amdxdna/amdxdna_gem.c

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

drivers/accel/amdxdna/aie2_ctx.c

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

drivers/accel/amdxdna/aie2_ctx.c

drivers/accel/amdxdna/amdxdna_gem.c

Implement userspace-driven device heap expansion for the AMDXDNA accelerator driver. Instead of allocating a single large heap upfront, the heap starts with a single 64MB chunk and grows on demand when device BO allocations fail with -EAGAIN, signaling userspace shim to expand the heap and retry. The heap is managed as a linked list of chunks per client, tracked by client->dev_heap_chunks. A drm_mm allocator spans the full max heap address range, with allocations constrained to committed chunks via drm_mm_insert_node_in_range. Each chunk is a shmem-backed GEM object with its kva lazily vmapped for CPU access. DEV BO kernel address resolution walks the chunk list to locate the containing chunk. Per-hwctx heap pinning uses all-or-nothing semantics: map_heap pins every committed chunk and notifies firmware (map_host_buf for the first, add_host_buf for subsequent chunks). On failure, all chunks pinned for that hwctx are rolled back. The heap_pinned flag tracks whether a hwctx has successfully pinned the heap. New chunks are broadcast to existing hwctxs via hwctx_heap_expand. For PASID mode, a lazy UVA computation derives each chunk's expected user virtual address from the first chunk's UVA plus preceding chunk sizes, since userspace reserves a contiguous VA range for the entire heap. Key changes: - amdxdna_gem.c: Add chunk list management, drm_mm-based allocation with -EAGAIN expansion, early handle creation in amdxdna_drm_create_dev_heap_bo, DEV BO kva resolution across chunks, lazy UVA computation for PASID mode, and chunk-aware sync BO cache flush. - aie2_ctx.c: Add per-hwctx heap pin/unpin with all-or-nothing rollback (aie2_hwctx_map_heap / aie2_hwctx_release_heap), firmware heap mapping, and heap expansion notification (aie2_hwctx_heap_expand). - aie2_message.c: Refactor aie2_map_host_buf to send per-chunk messages, add aie2_add_host_buf for heap expansion. - npu*_regs.c: Add max heap size device info fields. NPU1 retains single-chunk (64MB) limit; NPU4+ support up to 512MB. Signed-off-by: Wendy Liang <wendy.liang@amd.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

drivers/accel/amdxdna/amdxdna_gem.c

drivers/accel/amdxdna/aie2_ctx.c

Add bounds checking for user-supplied offset and length parameters in amdxdna_gem_dev_bo_clflush() before computing flush range. Without this, out-of-range values could cause cache flush operations on unrelated kernel memory. Signed-off-by: Wendy Liang <wendy.liang@amd.com>

The DRM core calls .close (via drm_gem_release) before .postclose, which means abo->client is set to NULL while hwctxs are still alive. If a TDR fires between .close and .postclose, the restart path calls amdxdna_obj_dma_addr() on the heap BO, dereferencing the now-NULL abo->client. DEV_HEAP BOs cannot outlive the client since they are always cleaned up in amdxdna_client_cleanup(). Skip the NULL assignment for them to keep abo->client valid throughout the driver's usage of the BO. Signed-off-by: Wendy Liang <wendy.liang@amd.com>

Set gobj->import_attach in amdxdna_gem_prime_import() so that drm_gem_shmem_vmap_locked() takes the dma-buf import path instead of calling drm_gem_get_pages() on an object with a NULL file pointer. Without this, imported BOs (e.g. from virtio-gpu in QEMU) trigger a WARNING in drm_gem_get_pages and vmap fails with -EINVAL. Signed-off-by: Wendy Liang <wendy.liang@amd.com>

Copilot AI review requested due to automatic review settings April 2, 2026 20:37

wendyliang25 requested review from houlz0507, maxzhen and xdavidz as code owners April 2, 2026 20:37

Copilot started reviewing on behalf of wendyliang25 April 2, 2026 20:39 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

wendyliang25 force-pushed the accel-dev-bo branch from 1effc8a to 3f0a9db Compare April 2, 2026 20:58

Copilot AI review requested due to automatic review settings April 2, 2026 22:32

wendyliang25 force-pushed the accel-dev-bo branch from 3f0a9db to baa0db4 Compare April 2, 2026 22:32

Copilot started reviewing on behalf of wendyliang25 April 2, 2026 22:34 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/aie2_ctx.c Outdated Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch from baa0db4 to 7464942 Compare April 3, 2026 00:48

Copilot AI review requested due to automatic review settings April 3, 2026 01:33

wendyliang25 force-pushed the accel-dev-bo branch from 7464942 to 1713621 Compare April 3, 2026 01:33

Copilot started reviewing on behalf of wendyliang25 April 3, 2026 01:34 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch from 1713621 to 42f33f8 Compare April 3, 2026 02:12

Copilot AI review requested due to automatic review settings April 3, 2026 06:20

wendyliang25 force-pushed the accel-dev-bo branch from 42f33f8 to de8ed00 Compare April 3, 2026 06:20

Copilot started reviewing on behalf of wendyliang25 April 3, 2026 06:21 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch from de8ed00 to e0d833e Compare April 3, 2026 17:16

houlz0507 reviewed Apr 3, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 6, 2026 06:01

wendyliang25 force-pushed the accel-dev-bo branch from e0d833e to ebeaf65 Compare April 6, 2026 06:01

Copilot started reviewing on behalf of wendyliang25 April 6, 2026 06:03 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch from ebeaf65 to 3013cd5 Compare April 7, 2026 07:25

wendyliang25 force-pushed the accel-dev-bo branch 2 times, most recently from 7372cbb to ca2ab31 Compare April 9, 2026 23:16

Copilot started reviewing on behalf of wendyliang25 April 9, 2026 23:16 View session

wendyliang25 requested a review from houlz0507 April 9, 2026 23:16

Copilot AI reviewed Apr 9, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch from ca2ab31 to 67972ac Compare April 9, 2026 23:41

Copilot AI review requested due to automatic review settings April 10, 2026 17:43

wendyliang25 force-pushed the accel-dev-bo branch from 67972ac to 67e1e92 Compare April 10, 2026 17:43

Copilot started reviewing on behalf of wendyliang25 April 10, 2026 17:45 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch from 67e1e92 to 484f50d Compare April 10, 2026 18:07

Copilot AI review requested due to automatic review settings April 10, 2026 18:18

wendyliang25 force-pushed the accel-dev-bo branch from 484f50d to 8808026 Compare April 10, 2026 18:18

Copilot started reviewing on behalf of wendyliang25 April 10, 2026 18:20 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/aie2_ctx.c Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch 2 times, most recently from 69084ba to f9dc266 Compare April 12, 2026 07:49

Copilot AI review requested due to automatic review settings April 12, 2026 07:49

Copilot started reviewing on behalf of wendyliang25 April 12, 2026 07:50 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Show resolved Hide resolved

drivers/accel/amdxdna/aie2_ctx.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

wendyliang25 force-pushed the accel-dev-bo branch from f9dc266 to 206dafc Compare April 12, 2026 08:27

Copilot AI review requested due to automatic review settings April 13, 2026 06:45

wendyliang25 force-pushed the accel-dev-bo branch from 206dafc to 6f90d68 Compare April 13, 2026 06:45

Copilot started reviewing on behalf of wendyliang25 April 13, 2026 06:47 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

drivers/accel/amdxdna/amdxdna_gem.c Show resolved Hide resolved

drivers/accel/amdxdna/amdxdna_gem.c Outdated Show resolved Hide resolved

drivers/accel/amdxdna/aie2_ctx.c Show resolved Hide resolved

wendyliang25 added 3 commits April 13, 2026 00:46

wendyliang25 force-pushed the accel-dev-bo branch from 6f90d68 to 6eae18c Compare April 13, 2026 07:46

Conversation

wendyliang25 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxzhen commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wendyliang25 commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

wendyliang25 commented Apr 2, 2026 •

edited

Loading

maxzhen commented Apr 2, 2026 •

edited

Loading