Skip to content

loongarch64: add virtio IRQ support with ACPI boot path and dynamic memory allocation#90

Open
BoneInscri wants to merge 4 commits into
syswonder:mainfrom
BoneInscri:main
Open

loongarch64: add virtio IRQ support with ACPI boot path and dynamic memory allocation#90
BoneInscri wants to merge 4 commits into
syswonder:mainfrom
BoneInscri:main

Conversation

@BoneInscri

Copy link
Copy Markdown

Overview

This PR adds LoongArch64-specific support to hvisor-tool for two scenarios:

  1. Virtio IRQ delivery when the guest Linux boots via ACPI (no DTB available)
  2. Dynamic physical memory allocation for zone RAM at runtime, avoiding the need
    for statically reserved memory regions

Changes

driver/hvisor.c

Virtio IRQ via ACPI boot (no DTB):

  • When /hvisor_virtio_device DTB node is absent (ACPI boot), look up the CPUINTC
    IRQ domain via irq_find_matching_fwnode("CPUINTC") and map SWI0 (hwirq 0) as a
    per-CPU IRQ. The hypervisor triggers SWI0 by writing ESTAT.SIP0 in the guest CSR.
    init_IRQ() already enables ECFGF_SIP0 in CSR.ECFG, so the line is live
    immediately after handler registration.
  • Add kthread + hrtimer polling fallback: if IRQ registration fails, a 1ms hrtimer
    periodically wakes a poll thread (hvisor_poll_fn) that checks
    virtio_bridge->need_wakeup and signals the userspace daemon via eventfd.
    Uses wait_event_interruptible to avoid busy-looping.

Dynamic memory allocation (HVISOR_ZONE_M_ALLOC / HVISOR_ZONE_M_FREE):

  • hvisor_m_alloc(): allocates physically contiguous pages from the root Linux buddy
    allocator using __get_free_pages. Uses ilog2 (floor-to-power-of-2) instead of
    get_order (ceil) to avoid over-allocation — e.g. for 752MB, get_order would
    waste 1GB while ilog2 correctly allocates 512MB. Retries with decreasing order
    until allocation succeeds. Returns the physical address and actual allocated size
    back to userspace via copy_to_user.
  • hvisor_m_free(): frees pages by physical address, looking up the original order
    from the per-fd tracking list.
  • All allocations are tracked in a global hvisor_alloc_list (spinlock-protected)
    keyed by the owning struct file *. hvisor_release() automatically frees all
    outstanding pages when the daemon's fd is closed, preventing memory leaks on crash.
  • Add compat shim: MAX_ORDER was renamed to MAX_PAGE_ORDER in kernel 6.3 and its
    semantics changed in 6.1; the shim handles both old and new kernels transparently.

Other fixes:

  • Fix CROSS_COMPILE for LoongArch: loongarch64-unknown-linux-gnu-
    loongarch64-linux-gnu- (matches GCC 13.2.0 package from LoongsonLab)
  • Switch hvcl hypercall from raw .word 0x002b8000 encoding to proper inline asm
    (asm volatile("hvcl 0" ...)) supported by loongarch64-linux-gnu-gcc 13.2.0

include/hvisor.h

  • Add HVISOR_ZONE_M_ALLOC / HVISOR_ZONE_M_FREE ioctl definitions (LoongArch64 only)
  • Add kmalloc_info_t struct: { __u64 pa; __u64 size; } used by both ioctls
  • Add HVISOR_CLEAR_INJECT_IRQ ioctl and HVISOR_HC_CLEAR_INJECT_IRQ hypercall code
  • Move hvisor_call() inline implementation into the header under #ifdef LOONGARCH64
    using the corrected inline asm form

tools/hvisor.c

  • Add load_chunked_image_to_memory(): loads a kernel/DTB image into dynamically
    allocated physical memory chunks, splitting across multiple RAM regions from a
    runtime-generated JSON descriptor; updates config->memory_regions in-place
  • Add zone_start_from_json_dynamic(): reads a zone config JSON + a separate RAM JSON
    (produced by virtio_start_from_json_dynamic), loads kernel and DTB images into
    dynamically allocated chunks, then calls HVISOR_ZONE_START
  • Handle boot_method == "acpi" in zone start: passes cmd_line_ptr and
    efi_system_table via kernel args for EFI boot; skips arch interrupt config parsing
    (parse_arch_config) on LoongArch64

tools/virtio/virtio.c

  • Add virtio_start_from_json_dynamic(): reads a unified virtio config JSON
    (virtio_cfg.json) describing multiple zones with memory_region arrays; for each
    region, calls HVISOR_ZONE_M_ALLOC in a loop to split the region into
    buddy-aligned chunks, maps each chunk with mmap, and records
    zone_mem[zone_id][chunk] = {virt_addr, zone0_ipa, zonex_ipa, size}; builds
    runtime RAM and SHM JSON descriptors passed to zone_start_from_json_dynamic
  • virtio_close(): iterates zone_mem and calls HVISOR_ZONE_M_FREE + munmap
    for every allocated chunk

examples/3a6000-loongarch64/

  • zone1_linux.json: zone 1 config for Linux guest on CPUs 4–5, boot_method: acpi,
    3 RAM regions (0x90000000/1.6GB, 0xfa000000/80MB, 0x800000000/2GB), 4 IO
    passthrough regions, PCIe config (ecam_base: 0xfe00000000)
  • zone3_linux.json, zone4_rtthread.json, zone5_npucore.json, zone2_sel4.json:
    additional zone configs for multi-zone scenarios
  • virtio_cfg.json: unified dynamic-alloc virtio config for 5 zones (linux×2, seL4,
    rt-thread, npucore) with per-zone memory regions and virtio-console device entries
  • test/virtio_cfg1.json, test/virtio_cfg2.json: test virtio configs
  • daemon.sh, start_zone/start_zone1.sh: helper scripts for starting the virtio
    daemon and zone 1

build-la.sh

  • Convenience build script: sets ARCH=loongarch, CROSS_COMPILE=loongarch64-linux-gnu-,
    LOG=LOG_INFO, and accepts KDIR as first argument

Tested on

  • LoongArch 3A6000 (ls3a6000), booting Linux zone via ACPI with virtio-blk/net,
    dynamic RAM allocation from root Linux heap

- Add SWI0 IRQ registration via CPUINTC domain for ACPI boot (no DTB)
- Add kthread + hrtimer polling fallback infrastructure (currently
  disabled) for when IRQ registration fails
- Support per-CPU IRQ registration path (hvisor_irq_is_percpu)
- Fix CROSS_COMPILE toolchain prefix for loongarch target
- Switch hvcl hypercall from raw .word encoding to inline asm with
  loongarch64-linux-gnu-gcc 13.2.0 support
- Add HVISOR_HC_GET_VIRTIO_IRQ hypercall definition
- Fix unused-variable warnings in tools/hvisor.c
- driver: add HVISOR_ZONE_M_ALLOC/M_FREE ioctls for buddy-system page
  allocation from root Linux heap; track allocations per-fd for automatic
  cleanup on release; compat shim for MAX_PAGE_ORDER rename (kernel 6.3+)
- tools/virtio: add virtio_start_from_json_dynamic() that splits large
  memory regions into buddy-aligned chunks via HVISOR_ZONE_M_ALLOC,
  builds zone RAM/SHM JSON descriptors at runtime, and maps each chunk
  with mmap; free all allocated pages in virtio_close()
- tools/hvisor: wire up new dynamic-alloc path for LoongArch64 boot
- include/hvisor.h: expose open_json_file helper and MIN macro
- examples/3a6000-loongarch64: add zone config and virtio cfg examples

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends hvisor-tool with LoongArch64 support for (a) VirtIO IRQ delivery when the guest boots via ACPI (no DTB) and (b) dynamic runtime allocation/free of guest RAM from the root Linux buddy allocator, plus associated tooling and example configs/scripts.

Changes:

  • Add LoongArch64-specific VirtIO notification paths (ACPI/SWI0 mapping, plus intended polling fallback) and new ioctls/hypercalls for dynamic zone RAM allocation.
  • Add dynamic JSON-driven RAM chunk allocation/mapping in the virtio daemon and a dynamic zone start path that consumes a runtime RAM descriptor.
  • Add LoongArch64 build convenience script and example multi-zone JSON configs/scripts.

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
tools/virtio/virtio.c Adds strict-aliasing-safe event idx access, dynamic RAM allocation/mmap logic, and LoongArch64-specific virtio start/close changes
tools/virtio/include/virtio.h Adds helpers for avail-event access and declares JSON helper/macros used by new dynamic paths
tools/log.c Changes logging to also emit to stderr in addition to syslog
tools/hvisor.c Adds dynamic zone start path and chunked image loading into dynamically allocated RAM regions
include/hvisor.h Adds LoongArch64 ioctls/struct for dynamic allocation and updates LoongArch64 hypercall inline asm
driver/hvisor.c Adds LoongArch64 ACPI/SWI0 IRQ mapping logic, dynamic alloc/free tracking, and release-time cleanup
driver/Makefile Updates LoongArch64 CROSS_COMPILE prefix
build-la.sh Adds a LoongArch64 build helper script
examples/3a6000-loongarch64/zone1_linux.json Adds an ACPI-boot Linux zone example with multiple RAM regions
examples/3a6000-loongarch64/zone3_linux.json Adds a normal-boot Linux zone example
examples/3a6000-loongarch64/zone4_rtthread.json Adds an RT-Thread zone example
examples/3a6000-loongarch64/virtio_cfg.json Adds a unified dynamic virtio config example
examples/3a6000-loongarch64/test/virtio_cfg1.json Adds a small dynamic virtio config test fixture
examples/3a6000-loongarch64/test/virtio_cfg2.json Adds a small dynamic virtio config test fixture
examples/3a6000-loongarch64/daemon.sh Adds helper to start the virtio daemon
examples/3a6000-loongarch64/start_zone/start_zone1.sh Adds helper to start zone1 using the dynamic RAM JSON descriptor
Comments suppressed due to low confidence (3)

tools/virtio/virtio.c:1260

  • In the LOONGARCH64 build, the epoll_wait/signalfd handling is compiled out with #ifndef LOONGARCH64, leaving an unconditional while (true) that repeatedly calls consume_pending_requests() with no blocking wait. This becomes a tight busy-loop and can peg a CPU core even when there are no requests. Keep a blocking wait path for LoongArch (e.g., still epoll on eventfd, relying on the kernel IRQ/poll fallback to signal it) or add an explicit sleep/backoff when idle.
    while (true) {
#ifndef LOONGARCH64
        log_info("signal_count is %d, proc_count is %d", signal_count,
                 proc_count);

        // Wait indefinitely for a signal or a kernel kick
        int nfds = epoll_wait(epoll_fd, events, 16, -1);
        ++signal_count;
        if (nfds == -1) {
            if (errno == EINTR)
                continue;
            log_error("epoll_wait failed");
            virtio_close();
            break;
        }

        for (int i = 0; i < nfds; ++i) {
            if (events[i].data.fd == sfd) {
                struct signalfd_siginfo fdsi;
                if (read(sfd, &fdsi, sizeof(fdsi)) == sizeof(fdsi)) {
                    log_info("Received termination signal %d. Exiting...",
                             fdsi.ssi_signo);
                    virtio_close();
                    return;
                }
            } else if (events[i].data.fd == efd) {
                uint64_t u;
                // Clear the eventfd counter to acknowledge the notification
                if (read(efd, &u, sizeof(uint64_t)) != sizeof(uint64_t)) {
                    continue;
                }
#endif

                // Process all pending requests until the ring is empty
                proc_count += consume_pending_requests();
#ifndef LOONGARCH64

tools/virtio/virtio.c:1336

  • virtio_init() uses goto unmap when mmap() returns (void*)-1, but the unmap: label unconditionally calls munmap((void *)virtio_bridge, MMAP_SIZE). Calling munmap(MAP_FAILED, ...) is invalid and will set errno. Only munmap if the mapping succeeded, or restructure the error handling to avoid munmapping MAP_FAILED.
    virtio_bridge = (struct virtio_bridge *)mmap(
        NULL, MMAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, ko_fd, 0);
    if (virtio_bridge == (void *)-1) {
        log_error("mmap failed");
        goto unmap;
    }
    printf("[trace], virtio bridge ok!\n");

    // Initialize event_monitor used by console and net devices
    initialize_event_monitor();
    log_info("hvisor init okay!");
    printf("[trace], hvisor init okay!\n");

    return 0;
unmap:
    munmap((void *)virtio_bridge, MMAP_SIZE);
    return -1;

tools/virtio/virtio.c:404

  • virtqueue_enable_notify() clears VRING_USED_F_NO_NOTIFY using vq->used_ring->flags &= !(uint16_t)VRING_USED_F_NO_NOTIFY; which applies logical-not (!) instead of bitwise-not. This will typically mask the flags with 0/1 and corrupt the flags field. Use a bitwise clear (e.g., flags &= ~VRING_USED_F_NO_NOTIFY) to clear just that bit.
void virtqueue_enable_notify(VirtQueue *vq) {
    if (vq->event_idx_enabled) {
        vq_avail_event_set(vq, vq->avail_ring->idx);
    } else {
        vq->used_ring->flags &= !(uint16_t)VRING_USED_F_NO_NOTIFY;
    }
    write_barrier();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/virtio/virtio.c Outdated
Comment on lines +1295 to +1332
@@ -1292,10 +1323,13 @@ int virtio_init() {
log_error("mmap failed");
goto unmap;
}
printf("[trace], virtio bridge ok!\n");

// Initialize event_monitor used by console and net devices
initialize_event_monitor();
log_info("hvisor init okay!");
printf("[trace], hvisor init okay!\n");

Comment thread tools/virtio/virtio.c
Comment on lines 1792 to +1805
int virtio_start(int argc, char *argv[]) {
int opt, err = 0;
int err = 0;
(void)argc;

printf("virtio start, test!!!");
err = virtio_init(); // Initialize virtio dependencies
if (err)
return -1;

err = virtio_start_from_json(
argv[3]); // Start virtio devices based on virtio_cfg_*.json
#ifdef LOONGARCH64
err = virtio_start_from_json_dynamic(argv[3]);
#else
err = virtio_start_from_json(argv[3]); // Start virtio devices based on virtio_cfg_*.json
#endif
Comment thread tools/virtio/virtio.c
Comment on lines +1613 to +1711
int chunk = 0;
// Memory regions

int exist_shm_flag = 0;

printf("[trace] ready to mmap regions");
for (int j = 0; j < num_mems; j++) {
cJSON *mem_region = cJSON_GetArrayItem(memory_region_json, j);

char *region_type = cJSON_GetObjectItem(mem_region, "type")->valuestring;

zonex_ipa = strtoull(
cJSON_GetObjectItem(mem_region, "zonex_ipa")->valuestring, NULL,
16);

mem_size = strtoull(
cJSON_GetObjectItem(mem_region, "size")->valuestring, NULL, 16);

if (mem_size == 0) {
log_error("Invalid memory size");
continue;
}

// split the large memory into small chunks
__u64 offset = 0;

kmalloc_info_t kmalloc_info;
kmalloc_info.pa = 0;
kmalloc_info.size = mem_size;

while (kmalloc_info.size > 0) {
__u64 old_memsize = kmalloc_info.size;

long ret =
ioctl(ko_fd, HVISOR_ZONE_M_ALLOC, &kmalloc_info);
__u64 zone0_ipa_chunk = kmalloc_info.pa;
if (ret) {
log_error("HVISOR_ZONE_M_ALLOC ioctl failed");
close(ko_fd);
exit(1);
}
__u64 chunk_size = old_memsize - kmalloc_info.size;
__u64 zonex_ipa_chunk = zonex_ipa + offset;

__u64 virt_addr_chunk = (__u64)(uintptr_t)mmap(NULL, chunk_size, PROT_READ | PROT_WRITE, MAP_SHARED,
ko_fd, (off_t)zone0_ipa_chunk);

if (virt_addr_chunk == (__u64)(uintptr_t)(void *)-1) {
log_error("virt_addr_chunk mmap failed");
err = -1;
goto err_out;
}
zone_mem[zone_id][chunk][VIRT_ADDR] = virt_addr_chunk;// the virtual address of the chunk for zone0_ipa
zone_mem[zone_id][chunk][ZONE0_IPA] = zone0_ipa_chunk;// one-to-one mapping (zone0_ipa == zone0_hpa)
zone_mem[zone_id][chunk][ZONEX_IPA] = zonex_ipa_chunk;
zone_mem[zone_id][chunk][MEM_SIZE] = chunk_size;

// save the memory region information to zone_ram_root(all types of memory regions)
cJSON *region = cJSON_CreateObject();
char buf[128];

char type_str[128] = "ram";
cJSON_AddStringToObject(region, "type", type_str);

snprintf(buf, sizeof(buf), "0x%llx", zonex_ipa_chunk);
cJSON_AddStringToObject(region, "ipa", buf);

snprintf(buf, sizeof(buf), "0x%llx", zone0_ipa_chunk);
cJSON_AddStringToObject(region, "hpa", buf);

snprintf(buf, sizeof(buf), "0x%llx", chunk_size);
cJSON_AddStringToObject(region, "size", buf);

cJSON_AddItemToArray(memory_regions, region);

// save the shm region information to zone_shm_root(only for shm)
if (strncmp(region_type, "shm", 3) == 0) {
cJSON *region = cJSON_CreateObject();
char buf[128];

cJSON_AddStringToObject(region, "flag", cJSON_GetObjectItem(mem_region, "flag")->valuestring);

snprintf(buf, sizeof(buf), "0x%llx", zone0_ipa_chunk); // for linux, zone0_ram_ipa == zone0_ram_hpa
cJSON_AddStringToObject(region, "zone0_ram_ipa", buf); // used for linux -> zonex (in linux address space)

snprintf(buf, sizeof(buf), "0x%llx", zonex_ipa_chunk); // for linux, zone0_ram_ipa == zone0_ram_hpa
cJSON_AddStringToObject(region, "zonex_ram_ipa", buf); // used for zonex -> zonex (in zonex address space)

snprintf(buf, sizeof(buf), "0x%llx", chunk_size);
cJSON_AddStringToObject(region, "size", buf);

cJSON_AddItemToArray(shm_regions, region);

exist_shm_flag = 1;
}

offset += chunk_size; // decrease
chunk++;
}
Comment on lines +135 to +143
// avail event idx: read/write via memcpy to avoid strict-aliasing violation.
static inline uint16_t vq_avail_event_get(VirtQueue *vq) {
uint16_t val;
memcpy(&val, (const char *)&(vq)->used_ring->ring[(vq)->num], sizeof(val));
return val;
}
static inline void vq_avail_event_set(VirtQueue *vq, uint16_t val) {
memcpy((char *)&(vq)->used_ring->ring[(vq)->num], &val, sizeof(val));
}
Comment thread driver/hvisor.c Outdated
Comment on lines +108 to +117
int err;

#ifdef LOONGARCH64
// do nothing
#elif
if (virtio_irq == -1) {
pr_err("virtio device is not available\n");
return ENOTTY;
}
#endif
Comment thread build-la.sh
Comment on lines +9 to +11
KDIR=${1:-/media/boneinscri/Data/linux-v6.19/linux}
shift 2>/dev/null || true

Comment thread tools/virtio/virtio.c
Comment on lines +1555 to +1563
(void)zone0_ipa;

unsigned long long mem_size;
buffer = read_file(json_path, &file_size);
buffer[file_size] = '\0';

// Read zones
cJSON *root = cJSON_Parse(buffer);
cJSON *zones_json = cJSON_GetObjectItem(root, "zones");
Comment thread tools/log.c
Comment on lines +77 to +81
/* Output to syslog */
syslog(syslog_levels[level], "%s:%d: %s", file, line, buf);

/* Also output to stderr so errors are visible in terminal */
fprintf(stderr, "[%s] %s:%d: %s\n", level_strings[level], file, line, buf);
Comment thread tools/hvisor.c
Comment on lines +1474 to +1488
static int zone_list(int argc, char *argv[] __attribute__((unused))) {


if (argc != 0) {
help(1);
}
__u64 cnt = CONFIG_MAX_ZONES;
zone_info_t *zones = malloc(sizeof(zone_info_t) * cnt);
zone_list_args_t args = {cnt, zones};
// printf("zone_list: cnt %llu, zones %p\n", cnt, zones);
printf("zone_list: cnt %llu, zones %p\n", cnt, zones);
int fd = open_dev();
printf("zone_list, step1\n");
int ret = ioctl(fd, HVISOR_ZONE_LIST, &args);
printf("[trace] zone_list: ret = %d\n", ret);

Comment thread driver/Makefile
CROSS_COMPILE := riscv64-unknown-linux-gnu-
else ifeq ($(ARCH), loongarch)
CROSS_COMPILE := loongarch64-unknown-linux-gnu-
CROSS_COMPILE := loongarch64-linux-gnu-
@enkerewpo

Copy link
Copy Markdown
Member

please format the code (see ci / code style check)

@enkerewpo

Copy link
Copy Markdown
Member

why are we using dynamic memory allocation for nonroots? If we can shrink the boot memory of root linux (e.g. from 32G to 4G) by editing UEFI/ACPI tables, along with some allocated devices (currently all devices are visible to root linux), then we won't need this dynamic allocation design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants