Merge dstack-cloud build system + upgrade to Yocto 6.0 wrynose (kernel 6.18)#64
Merged
Conversation
add gcp service account and scope config
The 6.18 defconfig was missing nftables config options that are required by Docker's iptables-nft backend. These were defined in dstack-docker.cfg for the linux-yocto kernel but linux-custom_6.18.7 does not use bbappend fragments, so they must be in the defconfig directly. Without these, rootfs build fails with missing kernel-module-nf-tables and related packages.
Add xt_comment and nf_tables kernel modules for k3s support
NVIDIA's open kernel driver (nvidia.ko) gates its LKCA-backed libspdm crypto provider on `CONFIG_CRYPTO_ECDSA` being defined when the driver is built (see `kernel-open/nvidia/internal_crypt_lib.h`: the `USE_LKCA` macro requires the kernel to advertise ECDSA, ECDH, RSA, HMAC, AKCIPHER, etc.). When `CONFIG_CRYPTO_ECDSA` is missing, libspdm falls back to stubs and at runtime prints `libspdm expects LKCA but found stubs!` then fails `spdmEstablishSession`, so H100 in Confidential Compute mode (e.g. GCP TDX + a3-highgpu-1g) never finishes init and `nvidia-smi` reports no devices. `meta-nvidia/recipes-kernel/linux/files/nvidia.cfg` already sets this config, but it ships as a `linux-yocto%.bbappend`, which does not attach to the in-tree `linux-custom_*.bb` recipes that build the dstack kernel from a defconfig. Add the option directly to the 6.17 and 6.18 defconfigs so all flavors (incl. nvidia) pick it up. Verified end-to-end on GCP a3-highgpu-1g + TDX after rebuilding the kernel + nvidia kernel modules with this change: SPDM session establishes, `nvidia-smi conf-compute -f` reports `CC status: ON`, and a PyTorch matmul runs at ~38 TFLOPs.
kernel: enable CONFIG_CRYPTO_ECDSA for H100 confidential compute
Many GCP projects only ship preemptible (SPOT) quota for newer GPUs —
in particular `PREEMPTIBLE-NVIDIA-H100-GPUS-per-project-{region,zone}`
is granted by default while `NVIDIA-H100-GPUS-per-project-region` is
zero. Without on-demand quota, the only way to launch H100 in a
Confidential TDX VM is to request `--provisioning-model=SPOT`.
Expose a `provisioning_model` field in `gcp_config` (default
`STANDARD`, backwards-compatible). When set to `SPOT`, also emit
`--instance-termination-action=STOP` so the boot/data disks survive
preemption and the instance can be resumed via `dstack-cloud start`
(important for the LUKS-encrypted data disk, which is keyed by the
KMS-provisioned per-instance secret).
Anything other than `STANDARD`/`SPOT` raises an early error rather
than silently dropping through.
Example `app.json` snippet for an H100 deploy:
"gcp_config": {
"machine_type": "a3-highgpu-1g",
"zone": "us-central1-a",
"provisioning_model": "SPOT"
}
…isioning dstack-cloud: add gcp_config.provisioning_model for SPOT instances
Pulls 315 commits of guest-agent / kms / gateway / vmm fixes into the recipe inputs. This is the state the v0.6.1 release tarballs were built against, so bumping the pointer here makes `git clone --recurse-submodules` reproduce the released images. dstack 603c6ee5..b051018a (Phala-Network/dstack-cloud:master tip).
bump dstack submodule to b051018a
Tags the artifacts produced by `FLAVORS=... make dist` as 0.6.1, so
`dstack-cloud pull dstack-cloud{,-nvidia}-0.6.1` resolves against the
released tarballs at
https://github.com/Phala-Network/meta-dstack-cloud/releases/tag/v0.6.1.
The 0.6.1 cycle ships the H100 CC kernel fix (#14), the SPOT
provisioning flag in `dstack-cloud` (#15), and the dstack submodule
bump to b051018a (#16). See the v0.6.1 release notes for details.
…@ wrynose - adopt official split layout (bitbake 2.18 + openembedded-core@wrynose + meta-yocto@wrynose) since combined poky has no wrynose branch; drop poky - bring cloud build system: --flavor multiconfig, UKI mkimage, dstack-cloud CLI, zfs 2.4, sysbox UNPACKDIR, multiconfig confs - rebase dstack forks onto wrynose: meta-virtualization/security/confidential-compute /meta-rust-bin LAYERSERIES_COMPAT -> wrynose - dstack submodule -> merged Apache-2.0 component; .gitmodules dstack url back to Dstack-TEE - dstack.conf: DISTRO_VERSION 0.6.0, kernel -> linux-yocto 6.18, keep NVIDIA + EFI/UKI - keep mainline OVMF (edk2-stable202502 pinning) and hardened dstack-docker.cfg - DISTRO_NAME back to DStack Kernel recipe cleanup (drop linux-custom) and tdx-guest-mod removal follow.
- drop self-written linux-custom_{6.17.6,6.18.7}.bb and their flat defconfigs
- linux-yocto%.bbappend now carries the dstack .scc/.cfg fragments (already
the mainline mechanism) on official linux-yocto 6.18 (wrynose)
- wire 0001-x86-tdx-select-dma-direct-remap.patch via SRC_URI:append:tdx
- dstack-tdx.cfg: CONFIG_TDX_GUEST_DRIVER=y + TSM_REPORTS=y (in-tree ConfigFS
TSM replaces the out-of-tree mod-tdx-guest module)
…on path - Makefile flavor_to_dist maps to dstack/dstack-dev/dstack-nvidia/dstack-nvidia-dev - dstack-uki.bb: glob python3.* site-packages instead of hardcoded python3.13 (wrynose native python version differs)
- dev-setup: TEMPLATECONF -> openembedded-core/meta/conf/templates/default (wrynose moved templates out of meta-poky); add meta-poky + meta-yocto-bsp to LAYERS explicitly (oe-core default template no longer pulls them) - meta-dstack/meta-nvidia layer.conf: LAYERSERIES_COMPAT -> wrynose - dstack.conf: DISTRO_FEATURES_BACKFILL_CONSIDERED -> DISTRO_FEATURES_OPTED_OUT - meta-confidential-compute: move wic/ -> files/wic/ (wrynose wks search path) - dstack-rootfs-base.inc: drop stray diff3 conflict marker verified: virtual/kernel = official linux-yocto 6.18.24, dma-direct-remap patch wired via SRC_URI:append:tdx
poky.conf defaults POKY_INIT_MANAGER=sysvinit, which pulls init-manager-sysvinit.inc and appends sysvinit to DISTRO_FEATURES — that conflicts with systemd so both systemd and eudev get skipped and nothing RPROVIDES udev (breaks cryptsetup -> dstack-initramfs). Setting INIT_MANAGER before requiring the poky-derived cvm.conf selects init-manager-systemd.inc.
- bump edk2 stable202502 -> stable202511 (202502 won't assemble with wrynose NASM 3.01); changes RTMR[0] -> needs new dstack-mr OvmfVariant baseline - add oe-core's CpuExceptionHandlerLib push-instruction NASM 3.0 backport - drop 0003/0004 reproducibility patches (don't apply to 202511 template; not needed for functional image — rebase from oe-core versions for production) - drop 0005-Declare-ProcessLibraryConstructorList (edk2 202511 declares it natively) - OVMF_VARIANT -> stable202511
linux-yocto-tiny ships CONFIG_CRYPTO_SHA256=m; dm-verity in the initramfs can't load modules, so early rootfs verity failed with 'Cannot initialize hash function (-ENOENT)' and init died -> kernel panic. Force SHA256/SHA512 built-in.
…nose dstack-mr can't yet compute measurements for newer edk2, so the pre202505 RTMR[0] layout must be preserved. Make 202502 build on wrynose by: - backporting edk2's NASM-3.0 CpuExceptionHandlerLib push-instruction fix - dropping 0005-Declare-ProcessLibraryConstructorList (GCC 15 rejects the K&R 'void f()' forward-decl as conflicting with edk2's EFIAPI prototype; the prototype is already in scope in 202502 so the manual decl is redundant) OVMF_VARIANT stays pre202505.
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request merges the meta-dstack-cloud build system back into this repository while upgrading the Yocto stack to Yocto 6.0 (wrynose) and switching the kernel over to official linux-yocto 6.18 with dstack-specific config/patches. It also introduces multiconfig “flavors” (prod/dev/nvidia/nvidia-dev), adds a UKI build flow, and updates multiple recipes/layers for the new layout and tooling.
Changes:
- Switch repo layout from
pokyto splitbitbake+openembedded-core+meta-yoctosubmodules and update layers towrynose. - Replace the custom kernel path with
linux-yocto6.18 plus dstack.scc/.cfgfragments and a TDX-specific Kconfig patch. - Add/extend build and release tooling: multiconfig flavors, UKI generation + Authenticode hashing, and a new
dstack-clouddeployment CLI.
Reviewed changes
Copilot reviewed 82 out of 88 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/bin/dstack.py | Rename config/manager classes and update CLI strings for consistent “dstack” naming. |
| scripts/bin/dstack-cloud | Add multi-cloud (currently GCP-focused) VM lifecycle CLI including KMS/env encryption and firewall helpers. |
| scripts/bin/authenticode_hash.py | Add PE/COFF Authenticode SHA256 calculator for UKI measurement compatibility. |
| repro-build/repro-build.sh | Limit reproducible builds to release flavors and adjust reproduce script repo/paths. |
| repro-build/Dockerfile.repro | Add build dependencies for GPT/FAT tooling used by new image/UKI flows. |
| repro-build/check.sh | Add an image whitelist comparison mode to avoid known non-reproducible artifacts. |
| README.md | Update project naming, links, and reproducible build instructions. |
| mkimage.sh | Add --flavor multiconfig support; create partitioned rootfs; optionally build UKI disk + Authenticode hash; split tar outputs. |
| meta-nvidia/recipes-graphics/nvidia/nvidia-persistenced_1.0.bb | Use UNPACKDIR for installed files (wrynose fetch/unpack behavior). |
| meta-nvidia/recipes-graphics/nvidia/nvidia-modprobe-config_1.0.bb | Use UNPACKDIR for installed files. |
| meta-nvidia/recipes-graphics/nvidia/nvidia-libs.inc | Minor syntax/style fix for FILES append. |
| meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_580.105.08.bb | Add updated Fabric Manager recipe version. |
| meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_570.172.08.bb | Remove older Fabric Manager recipe version. |
| meta-nvidia/recipes-graphics/nvidia/nvidia_580.105.08.bb | Update driver recipe to use UNPACKDIR and refresh checksum metadata. |
| meta-nvidia/recipes-graphics/nvidia/libnvidia-nscq_580.105.08.bb | Update source dir handling and adjust installed payload. |
| meta-nvidia/recipes-graphics/nvidia-container-toolkit/nvidia-container-toolkit.inc | Switch toolkit fetch to main + new SRCREV; adjust source layout. |
| meta-nvidia/recipes-graphics/nvidia-container-toolkit/nvidia-container-toolkit_1.00.bb | Install config from UNPACKDIR. |
| meta-nvidia/recipes-graphics/libnvidia-container/libtirpc134_1.3.4.bb | Set UNPACKDIR source dir and add GCC-15-related CFLAGS adjustments. |
| meta-nvidia/recipes-graphics/libnvidia-container/libnvidia-container/*.patch | Add Upstream-Status headers for OE patch hygiene. |
| meta-nvidia/recipes-graphics/libnvidia-container/libnvidia-container.inc | Update libnvidia-container + modprobe fetch strategy/versions. |
| meta-nvidia/recipes-graphics/libnvidia-container/libnvidia-container_1.00.bb | Add task to relocate modprobe sources into expected subtree. |
| meta-nvidia/recipes-graphics/ldconfig-compatibility-symlink/ldconfig-compatibility-symlink_1.0.0.bb | Minor syntax/style fix for FILES append. |
| meta-nvidia/recipes-graphics/containerd-config/containerd-config_1.0.0.bb | Install config from UNPACKDIR and fix FILES syntax. |
| meta-nvidia/conf/layer.conf | Declare wrynose layer compatibility. |
| meta-dstack/recipes-kernel/tdx-guest-mod/tdx-guest.bb | Remove out-of-tree TDX guest module recipe. |
| meta-dstack/recipes-kernel/linux/linux-yocto%.bbappend | Add TDX-only Kconfig patch selection for DMA_DIRECT_REMAP. |
| meta-dstack/recipes-kernel/linux/files/.scc/.cfg | Update dstack kernel config fragments, enable in-tree TDX driver/TSM reports, ensure SHA256 built-in. |
| meta-dstack/recipes-kernel/linux/files/0001-x86-tdx-select-dma-direct-remap.patch | Add TDX guest Kconfig select patch. |
| meta-dstack/recipes-devtools/gptfdisk/gptfdisk_%.bbappend | Disable ncurses/cgdisk option to avoid unwanted deps. |
| meta-dstack/recipes-devtools/gcc/libgcc-initial_%.bbappend | Add a configure prefunc that stubs stdio.h in staging for toolchain build. |
| meta-dstack/recipes-core/systemd/systemd_%.bbappend | Remove vconsole pieces and GPT auto generator; add systemd-resolved ordering drop-in; tweak PACKAGECONFIG. |
| meta-dstack/recipes-core/pahole/pahole_1.25.bbappend | Remove prior pahole SRCREV override. |
| meta-dstack/recipes-core/images/dstack-uki.bb | Add UKI image recipe building via ukify using verity hash/size from work-shared env. |
| meta-dstack/recipes-core/images/dstack-rootfs.bb | Unify rootfs recipe and select prod/dev/nvidia via multiconfig variables. |
| meta-dstack/recipes-core/images/dstack-rootfs-*.inc | Refactor prod/dev includes and dev features. |
| meta-dstack/recipes-core/images/dstack-rootfs-base.inc | Remove tdx-guest module dependency; add tpm2-tools; add containerd state dir. |
| meta-dstack/recipes-core/images/dstack-*-rootfs.bb | Remove separate nvidia/dev rootfs wrapper recipes in favor of unified rootfs + multiconfig. |
| meta-dstack/recipes-core/images/dstack-initscript/init | Update initramfs init logic to resolve root by PARTLABEL and require verity params. |
| meta-dstack/recipes-core/images/dstack-initscript.bb | Switch to UNPACKDIR as S for initramfs content. |
| meta-dstack/recipes-core/dstack-zfs/dstack-zfs_2.4.0.bb | Upgrade ZFS branch/SRCREV; drop patches; relax buildpaths QA for modules. |
| meta-dstack/recipes-core/dstack-sysbox/dstack-sysbox_0.6.7.bb | Update paths to use UNPACKDIR consistently. |
| meta-dstack/recipes-core/dstack-ovmf/dstack-ovmf/*.patch | Refresh OVMF patch set for NASM 3.x + reproducibility notes/metadata. |
| meta-dstack/recipes-core/dstack-ovmf/dstack-ovmf_git.bb | Keep stable202502 pin; swap patch 0005 to NASM push-instruction fix; add rationale. |
| meta-dstack/recipes-core/dstack-guest/dstack-guest.bb | Adjust source dir to UNPACKDIR, install extra script, and relax buildpaths QA for Cargo output. |
| meta-dstack/recipes-core/docker/docker-moby%.bbappend | Install override from UNPACKDIR. |
| meta-dstack/recipes-core/base-files/files/dstack-motd | Update MOTD casing to “dstack”. |
| meta-dstack/recipes-core/base-files/base-files%.bbappend | Use UNPACKDIR for MOTD installation and diagnostics. |
| meta-dstack/recipes-connectivity/openssh/openssh_%.bbappend | Install sshd drop-in from UNPACKDIR. |
| meta-dstack/conf/multiconfig/*.conf | Add prod/dev/nvidia/nvidia-dev flavor definitions with separate TMPDIRs. |
| meta-dstack/conf/local.conf | Add GNU mirror override and set BBMULTICONFIG default flavors. |
| meta-dstack/conf/layer.conf | Update layer series compatibility to wrynose. |
| meta-dstack/conf/distro/dstack.conf | Set INIT_MANAGER=systemd early, bump version, switch kernel provider to linux-yocto 6.18, add EFI/UKI settings. |
| Makefile | Build common artifacts + per-flavor multiconfig rootfs/UKI; run mkimage per flavor. |
| LICENSE | Change repository license text to Business Source License 1.1 with AGPL change license. |
| dev-setup | Move to oe-core’s oe-init-build-env, sync conf into build dir, add meta-yocto layers. |
| build.sh | Update release basename naming to dstack-cloud variants and adjust download URL. |
| .gitmodules | Replace poky with split bitbake/openembedded-core/meta-yocto submodules; adjust meta-rust-bin URL. |
| .gitignore | Ignore .vscode and .claude. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- mkimage.sh: deterministic GPT GUIDs (reproducible partitioned images); check verity env exists and sgdisk is installed before use - repro-build/check.sh: compare rootfs.img.parted.verity (new name); define YELLOW - build.sh: download from Dstack-TEE/meta-dstack releases (not the fork) - README: clone Dstack-TEE/meta-dstack for the reproducible-build steps - systemd bbappend: drop dangling blacklist-autofs4.conf FILES entry (never installed) - dstack-uki.bb: run ukify via argv list (no shell); fail clearly if ROOT_HASH/ DATA_SIZE missing - Makefile: build dstack-guest in images-common to avoid multiconfig fetch races
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The Copilot autofix (fed4cc1) added an sgdisk check right before create_partitioned_rootfs, but an equivalent early check already existed. Drop the redundant early check and keep the call-site one, with a clearer message (the 'set ENABLE_UKI_IMAGE=0' hint was misleading — sgdisk is needed for the partitioned bare-metal image, not the UKI path).
kvinwang
added a commit
that referenced
this pull request
Jun 4, 2026
Follow-up to #64: advance the dstack submodule from the early cloud-merge commit to the current Dstack-TEE/dstack#701 head (bde0d038) — GCP TDX + AWS Nitro attestation, verified-PCR hardening, vendored dstack-cloud CLI.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merges the meta-dstack-cloud build system back into mainline and upgrades to Yocto 6.0 (wrynose) with official linux-yocto 6.18 LTS, replacing the self-written
linux-customkernel.Depends on component PR Dstack-TEE/dstack#701 (the
dstacksubmodule points at its merged Apache-2.0 commit).Layout / Yocto upgrade
pokyrepo has no wrynose branch, so this adopts the official split layout:bitbake2.18 +openembedded-core@wrynose +meta-yocto@wrynose (dropspoky).dstack-wrynosebranches): meta-virtualization, meta-security, meta-confidential-compute, meta-rust-bin.Kernel
linux-custom_*.bb; use officiallinux-yocto6.18 vialinux-yocto%.bbappend+ dstack.scc/.cfg.dma-direct-remapKconfig patch (Intel TDX still doesn't select it; needed for NVMe DMA).CONFIG_TDX_GUEST_DRIVER=y+TSM_REPORTS=y(in-tree ConfigFS TSM replaces out-of-treemod-tdx-guest).CONFIG_CRYPTO_SHA256=ybuilt-in (dm-verity rootfs hash in initramfs).Build system (from cloud)
--flavormulticonfig (prod/dev/nvidia/nvidia-dev), UKI image (dstack-uki.bb+ mkimage), zfs 2.4, dstack-sysbox UNPACKDIR,scripts/bin/dstack-cloudCLI.wrynose migration fixes
TEMPLATECONF moved to oe-core;
INIT_MANAGER=systemd(else sysvinit conflicts break udev); layerLAYERSERIES_COMPAT=wrynose; wic→files/wic;DISTRO_FEATURES_OPTED_OUT; runc patch-fuzz; OVMF builds on NASM 3.0 / GCC 15 (backport NASM-3.0 fix, drop a now-redundant forward-decl) while keeping edk2-stable202502 for dstack-mr measurement compatibility (OVMF_VARIANT=pre202505).Verification
bitbakeparses clean (28395 targets, 0 errors). Builtdstack-0.6.0prod image and booted a TDX guest end-to-end: dm-verity rootfs, NVMe/DMA data disk, in-tree TDX quote, KMS/prpconboard succeeds, docker workload runs, reaches Multi-User System.