[VFIO] add basic implementation#5870
Open
ShadowCurse wants to merge 23 commits into
Open
Conversation
Codecov Report❌ Patch coverage is Please upload reports for the commit b3e961b to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## main #5870 +/- ##
==========================================
- Coverage 83.08% 81.01% -2.08%
==========================================
Files 277 280 +3
Lines 30201 31295 +1094
==========================================
+ Hits 25094 25355 +261
- Misses 5107 5940 +833
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
f6d6fea to
50e789e
Compare
2f84f01 to
a21e87e
Compare
b2ea5ea to
528e62b
Compare
528e62b to
efa67e3
Compare
234ee05 to
d8f211a
Compare
d8f211a to
c895e2b
Compare
With addition of VFIO, we need a way for a user to provide us an information about the PCIe device which they want to pass to the VM. We will do this with SBDF. Since there can be many ways to specify SBDF, we try to parse them all. PciSBDF will be used in the next commit as a part of the user provided VFIO configuration. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the VfioConfig and VfioConfigs types for describing VFIO device configuration. Wire them into VmResources and VmmConfig so that VFIO devices can be specified before boot. Actual device setup will be added in later commits. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add PUT /vfio/{id} API endpoint for configuring VFIO passthrough
devices.
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The future commits VFIO code will use ArrayVec for BAR mappings and MSI-X hole tracking, so make it required. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO devices can expose both 32-bit and 64-bit BARs. The existing Bars type only handled 64-bit BARs. Add set_bar_32() for 32-bit BARs and get_bar_addr() that works with both widths. Additionally add get_bar_size() and is_64bit() utility functions. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO passthrough needs to emulate the MSI-X table and PBA regions within device BARs. Add accessor methods to MsixCap for extracting table/PBA BIR, offset, size, and the enabled/masked status bits. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO BAR regions containing MSI-X table/PBA must be split into mmappable and emulated parts. KVM memory slots require host-page alignment, but MSI-X structures can sit at arbitrary offsets within a BAR. Add align_up_host_page, align_down_host_page, and offset_from_lower_host_page helpers to expand emulated regions to page boundaries. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the rust-vmm vfio-bindings (0.6.2) and vfio-ioctls (0.6.0) crates that provide wrappers around VFIO kernel interfaces. These are needed by the VFIO code. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
These functions will be used in the VFIO implementation, so they need to be public. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the core VFIO passthrough implementation. This allows physical PCI devices bound to vfio-pci on the host to be presented to the guest with minimal overhead. The implementation covers: - PCI config space: most reads/writes are proxied to the physical device. BARs, MSI-X capability, and select extended capabilities are emulated or masked by Firecracker. - BAR regions: device MMIO regions are mmap'd from the VFIO device fd and mapped into guest address space as KVM memory slots. BARs containing MSI-X table/PBA are split around the emulated regions using either sparse-mmap caps or manual hole calculation. - MSI-X interrupts: the table and PBA are emulated in Firecracker. Physical device interrupts are delivered via eventfds wired through KVM irqfd. - DMA: guest RAM regions are mapped into the VFIO container's IOMMU so the device can DMA directly to guest memory. Only MSI-X interrupts are supported. IO BARs, ROM BARs, legacy INTx, and MSI (non-X) are not handled. Also no support for hot-plug/unplug of VFIO devices is present at this point, so no cleanup for created VFIO devices is present. Only part which is concerned with cleanup is the device setup code which ensures that all resources are cleaned up if there are any errors during device set-up. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add devtool options for preparing a PCI device for VFIO passthrough testing. --vfio-device accepts a block device path (e.g. /dev/nvme1n1) or a PCI SBDF, resolves it to a PCI device, binds it to vfio-pci, and passes the SBDF and sysfs path to the test container via environment variables. --first-vfio-pci-device is a fallback that searches for the first NVMe device already bound to vfio-pci if the primary device is not found. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add an integration tests that verify VFIO passthrough with a physical NVMe device. Tests are gated behind the `vfio` pytest mark and FC_VFIO_PCI_SBDF environment variable so they only run when a suitable device is available. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Current VFIO implementation has some restrictions: - Does not work without PCI since VFIO devices are PCI devices - Does not work with virtio-mem device since we don't update DMA mappings on hot-plug/unplug - Does not work with virtio-balloon since it can `fadvise` on memory In order to prevent VMs being launched with invalid configurations, implement multiple checks for invalid configurations: - At API level, prevent adding of incompatible combinations (VFIO after balloon/mem or in reverse) - At vm creation or snapshot restoraton since they get VmResources from other sources. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO device state is opaque to the VMM and cannot be serialized or restored. Add VFIO devices to the list of snapshot-incompatible devices so that snapshot requests are rejected with a clear error instead of producing a corrupt snapshot. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO devices will use pread64/pwrite64 syscalls (from vfio-ioctls) to interact with BARs during runtime. Add them to the VPU thread syscall lists. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The VFIO integration tests use an NVMe device to verify passthrough functionality. Update a kernel config to enable the NVMe core and block device drivers so the guest can detect and use the passthrough device. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO tests need exclusive access to the passthrough device, so they cannot run in parallel with other tests. Add a separate Buildkite step in the PR pipeline that runs only the vfio-marked tests, similar to the existing performance step. CI instances will have an additional 1GB NVMe device at /dev/nvme1n1 for this purpose. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Wire up the code to allow hot-plugging of VFIO devices after VM boot. The API is same as for usual VFIO device addition. Adding devices with duplicated ids is disallowed. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
With VFIO device hot-plug support we need to add all syscalls needed for VFIO devices creation to the VMM thread. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Implement VFIO device deinit logic and wire it up to the DELETE api. During VFIO device removal, device returns all resources it allocated back to the VM (except kvm_slots since we are not currently concerned with running out of them). The destruction happens in 2 parts (just like initialization) because it requires cooperation from both the device and from a pci_mngr. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a changelog entry for the new VFIO PCI device passthrough feature. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add docs/vfio.md covering how VFIO passthrough works in Firecracker, prerequisites (IOMMU, vfio-pci binding), configuration via API and config file, security considerations, snapshot incompatibility, and current limitations. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
do not merge: point to vfio artifacts Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
c895e2b to
b3e961b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Add basic implementation of the VFIO device pass-through.
Current version only allows devices to be added before VM boot.
Other limitations:
Reason
Provide a way to pass physical PCI devices into VM
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.