Skip to content

[VFIO] add basic implementation#5870

Open
ShadowCurse wants to merge 23 commits into
firecracker-microvm:mainfrom
ShadowCurse:vfio_with_dependencies
Open

[VFIO] add basic implementation#5870
ShadowCurse wants to merge 23 commits into
firecracker-microvm:mainfrom
ShadowCurse:vfio_with_dependencies

Conversation

@ShadowCurse

@ShadowCurse ShadowCurse commented May 8, 2026

Copy link
Copy Markdown
Contributor

Changes

Add basic implementation of the VFIO device pass-through.
Current version only allows devices to be added before VM boot.
Other limitations:

  • Only devices with MSIx interrupts are supported.
  • No INTx interrupt support
  • No ROM BAR/IO BAR support
  • No BAR relocation/resizing

Reason

Provide a way to pass physical PCI devices into VM

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse self-assigned this May 8, 2026
@codecov

codecov Bot commented May 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 30.75630% with 824 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.01%. Comparing base (7503f25) to head (d8f211a).

⚠️ Current head d8f211a differs from pull request most recent head b3e961b

Please upload reports for the commit b3e961b to get more accurate results.

Files with missing lines Patch % Lines
src/vmm/src/vfio.rs 21.25% 600 Missing ⚠️
src/vmm/src/device_manager/pci_mngr.rs 0.00% 69 Missing ⚠️
src/vmm/src/rpc_interface.rs 8.33% 44 Missing ⚠️
src/vmm/src/device_manager/mod.rs 2.56% 38 Missing ⚠️
src/vmm/src/lib.rs 4.16% 23 Missing ⚠️
src/vmm/src/resources.rs 35.00% 13 Missing ⚠️
.../firecracker/src/api_server/request/hotplug/mod.rs 0.00% 8 Missing ⚠️
src/vmm/src/pci/mod.rs 85.45% 8 Missing ⚠️
src/vmm/src/builder.rs 63.15% 7 Missing ⚠️
src/vmm/src/pci/msix.rs 79.31% 6 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5870      +/-   ##
==========================================
- Coverage   83.08%   81.01%   -2.08%     
==========================================
  Files         277      280       +3     
  Lines       30201    31295    +1094     
==========================================
+ Hits        25094    25355     +261     
- Misses       5107     5940     +833     
Flag Coverage Δ
5.10-m5n.metal 81.11% <30.75%> (-2.31%) ⬇️
5.10-m6a.metal 80.41% <30.75%> (-2.36%) ⬇️
5.10-m6g.metal 77.81% <30.75%> (-2.25%) ⬇️
5.10-m6i.metal 81.11% <30.75%> (-2.31%) ⬇️
5.10-m7a.metal-48xl 80.40% <30.75%> (-2.36%) ⬇️
5.10-m7g.metal 77.82% <30.75%> (-2.25%) ⬇️
5.10-m7i.metal-24xl 81.09% <30.75%> (-2.31%) ⬇️
5.10-m7i.metal-48xl 81.08% <30.75%> (-2.31%) ⬇️
5.10-m8g.metal-24xl 77.81% <30.75%> (-2.25%) ⬇️
5.10-m8g.metal-48xl 77.81% <30.75%> (-2.25%) ⬇️
5.10-m8i.metal-48xl 81.08% <30.75%> (-2.31%) ⬇️
5.10-m8i.metal-96xl 81.09% <30.75%> (-2.30%) ⬇️
6.1-m5n.metal 81.13% <30.75%> (-2.31%) ⬇️
6.1-m6a.metal 80.43% <30.75%> (-2.36%) ⬇️
6.1-m6g.metal 77.81% <30.75%> (-2.25%) ⬇️
6.1-m6i.metal 81.13% <30.75%> (-2.31%) ⬇️
6.1-m7a.metal-48xl 80.43% <30.75%> (-2.35%) ⬇️
6.1-m7g.metal 77.81% <30.75%> (-2.25%) ⬇️
6.1-m7i.metal-24xl 81.15% <30.75%> (-2.31%) ⬇️
6.1-m7i.metal-48xl 81.15% <30.75%> (-2.31%) ⬇️
6.1-m8g.metal-24xl 77.81% <30.75%> (-2.25%) ⬇️
6.1-m8g.metal-48xl 77.81% <30.75%> (-2.25%) ⬇️
6.1-m8i.metal-48xl 81.14% <30.75%> (-2.32%) ⬇️
6.1-m8i.metal-96xl 81.15% <30.75%> (-2.30%) ⬇️
6.18-m5n.metal 81.13% <30.75%> (-2.31%) ⬇️
6.18-m6a.metal 80.43% <30.75%> (-2.36%) ⬇️
6.18-m6g.metal 77.81% <30.75%> (-2.25%) ⬇️
6.18-m6i.metal 81.14% <30.75%> (-2.30%) ⬇️
6.18-m7a.metal-48xl 80.42% <30.75%> (-2.36%) ⬇️
6.18-m7g.metal 77.81% <30.75%> (-2.25%) ⬇️
6.18-m7i.metal-24xl 81.15% <30.75%> (-2.30%) ⬇️
6.18-m7i.metal-48xl 81.15% <30.75%> (-2.30%) ⬇️
6.18-m8g.metal-24xl 77.81% <30.75%> (-2.25%) ⬇️
6.18-m8g.metal-48xl 77.82% <30.75%> (-2.25%) ⬇️
6.18-m8i.metal-48xl 81.15% <30.75%> (-2.31%) ⬇️
6.18-m8i.metal-96xl 81.15% <30.75%> (-2.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 11 times, most recently from f6d6fea to 50e789e Compare May 14, 2026 16:29
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 12 times, most recently from 2f84f01 to a21e87e Compare May 27, 2026 13:32
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 4 times, most recently from b2ea5ea to 528e62b Compare May 29, 2026 11:53
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from 528e62b to efa67e3 Compare June 8, 2026 13:25
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 5 times, most recently from 234ee05 to d8f211a Compare June 19, 2026 15:45
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from d8f211a to c895e2b Compare June 26, 2026 17:13
With addition of VFIO, we need a way for a user to provide us an
information about the PCIe device which they want to pass to the VM. We
will do this with SBDF. Since there can be many ways to specify SBDF, we
try to parse them all. PciSBDF will be used in the next commit as a part
of the user provided VFIO configuration.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the VfioConfig and VfioConfigs types for describing VFIO device
configuration. Wire them into VmResources and VmmConfig so that VFIO
devices can be specified before boot. Actual device setup will be added
in later commits.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add PUT /vfio/{id} API endpoint for configuring VFIO passthrough
devices.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The future commits VFIO code will use ArrayVec for BAR
mappings and MSI-X hole tracking, so make it required.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO devices can expose both 32-bit and 64-bit BARs. The existing
Bars type only handled 64-bit BARs. Add set_bar_32() for 32-bit
BARs and get_bar_addr() that works with both widths. Additionally add
get_bar_size() and is_64bit() utility functions.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO passthrough needs to emulate the MSI-X table and PBA regions
within device BARs. Add accessor methods to MsixCap for extracting
table/PBA BIR, offset, size, and the enabled/masked status bits.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO BAR regions containing MSI-X table/PBA must be split into
mmappable and emulated parts. KVM memory slots require host-page
alignment, but MSI-X structures can sit at arbitrary offsets
within a BAR. Add align_up_host_page, align_down_host_page, and
offset_from_lower_host_page helpers to expand emulated regions
to page boundaries.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the rust-vmm vfio-bindings (0.6.2) and vfio-ioctls (0.6.0)
crates that provide wrappers around VFIO kernel interfaces.
These are needed by the VFIO code.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
These functions will be used in the VFIO implementation, so they need to
be public.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the core VFIO passthrough implementation. This allows physical PCI
devices bound to vfio-pci on the host to be presented to the guest with
minimal overhead.

The implementation covers:
- PCI config space: most reads/writes are proxied to the physical
  device. BARs, MSI-X capability, and select extended capabilities are
  emulated or masked by Firecracker.
- BAR regions: device MMIO regions are mmap'd from the VFIO device fd
  and mapped into guest address space as KVM memory slots. BARs
  containing MSI-X table/PBA are split around the emulated regions using
  either sparse-mmap caps or manual hole calculation.
- MSI-X interrupts: the table and PBA are emulated in Firecracker.
  Physical device interrupts are delivered via eventfds wired through
  KVM irqfd.
- DMA: guest RAM regions are mapped into the VFIO container's IOMMU so
  the device can DMA directly to guest memory.

Only MSI-X interrupts are supported. IO BARs, ROM BARs, legacy INTx, and
MSI (non-X) are not handled.

Also no support for hot-plug/unplug of VFIO devices is present at this
point, so no cleanup for created VFIO devices is present. Only part
which is concerned with cleanup is the device setup code which ensures
that all resources are cleaned up if there are any errors during device
set-up.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add devtool options for preparing a PCI device for VFIO passthrough
testing. --vfio-device accepts a block device path (e.g. /dev/nvme1n1)
or a PCI SBDF, resolves it to a PCI device, binds it to vfio-pci, and
passes the SBDF and sysfs path to the test container via environment
variables. --first-vfio-pci-device is a fallback that searches for the
first NVMe device already bound to vfio-pci if the primary device is not
found.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add an integration tests that verify VFIO passthrough with a physical
NVMe device. Tests are gated behind the `vfio` pytest mark and
FC_VFIO_PCI_SBDF environment variable so they only run when a suitable
device is available.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Current VFIO implementation has some restrictions:
- Does not work without PCI since VFIO devices are PCI devices
- Does not work with virtio-mem device since we don't update DMA
  mappings on hot-plug/unplug
- Does not work with virtio-balloon since it can `fadvise` on memory

In order to prevent VMs being launched with invalid configurations,
implement multiple checks for invalid configurations:
- At API level, prevent adding of incompatible combinations (VFIO after
  balloon/mem or in reverse)
- At vm creation or snapshot restoraton since they get VmResources from
  other sources.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO device state is opaque to the VMM and cannot be serialized
or restored. Add VFIO devices to the list of snapshot-incompatible
devices so that snapshot requests are rejected with a clear error
instead of producing a corrupt snapshot.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO devices will use pread64/pwrite64 syscalls (from vfio-ioctls) to
interact with BARs during runtime. Add them to the VPU thread syscall
lists.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The VFIO integration tests use an NVMe device to verify passthrough
functionality. Update a kernel config to enable the NVMe core and block
device drivers so the guest can detect and use the passthrough device.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO tests need exclusive access to the passthrough device, so they
cannot run in parallel with other tests. Add a separate Buildkite step
in the PR pipeline that runs only the vfio-marked tests, similar to the
existing performance step. CI instances will have an additional 1GB NVMe
device at /dev/nvme1n1 for this purpose.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Wire up the code to allow hot-plugging of VFIO devices after VM boot.
The API is same as for usual VFIO device addition. Adding devices with
duplicated ids is disallowed.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
With VFIO device hot-plug support we need to add all syscalls needed for
VFIO devices creation to the VMM thread.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Implement VFIO device deinit logic and wire it up to the DELETE api.

During VFIO device removal, device returns all resources it allocated
back to the VM (except kvm_slots since we are not currently concerned
with running out of them). The destruction happens in 2 parts (just
like initialization) because it requires cooperation from both the
device and from a pci_mngr.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a changelog entry for the new VFIO PCI device passthrough feature.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add docs/vfio.md covering how VFIO passthrough works in Firecracker,
prerequisites (IOMMU, vfio-pci binding), configuration via API and
config file, security considerations, snapshot incompatibility, and
current limitations.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
do not merge: point to vfio artifacts

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from c895e2b to b3e961b Compare June 29, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants