Skip to content

Merge dstack-cloud build system + upgrade to Yocto 6.0 wrynose (kernel 6.18)#64

Merged
kvinwang merged 131 commits into
mainfrom
merge/dstack-cloud-wrynose
Jun 4, 2026
Merged

Merge dstack-cloud build system + upgrade to Yocto 6.0 wrynose (kernel 6.18)#64
kvinwang merged 131 commits into
mainfrom
merge/dstack-cloud-wrynose

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

@kvinwang kvinwang commented Jun 1, 2026

Merges the meta-dstack-cloud build system back into mainline and upgrades to Yocto 6.0 (wrynose) with official linux-yocto 6.18 LTS, replacing the self-written linux-custom kernel.

Depends on component PR Dstack-TEE/dstack#701 (the dstack submodule points at its merged Apache-2.0 commit).

Layout / Yocto upgrade

  • The combined poky repo has no wrynose branch, so this adopts the official split layout: bitbake 2.18 + openembedded-core@wrynose + meta-yocto@wrynose (drops poky).
  • dstack fork submodules rebased onto wrynose (dstack-wrynose branches): meta-virtualization, meta-security, meta-confidential-compute, meta-rust-bin.

Kernel

  • Drop linux-custom_*.bb; use official linux-yocto 6.18 via linux-yocto%.bbappend + dstack .scc/.cfg.
  • Keep the TDX dma-direct-remap Kconfig patch (Intel TDX still doesn't select it; needed for NVMe DMA).
  • CONFIG_TDX_GUEST_DRIVER=y + TSM_REPORTS=y (in-tree ConfigFS TSM replaces out-of-tree mod-tdx-guest).
  • CONFIG_CRYPTO_SHA256=y built-in (dm-verity rootfs hash in initramfs).

Build system (from cloud)

  • --flavor multiconfig (prod/dev/nvidia/nvidia-dev), UKI image (dstack-uki.bb + mkimage), zfs 2.4, dstack-sysbox UNPACKDIR, scripts/bin/dstack-cloud CLI.

wrynose migration fixes

TEMPLATECONF moved to oe-core; INIT_MANAGER=systemd (else sysvinit conflicts break udev); layer LAYERSERIES_COMPAT=wrynose; wic→files/wic; DISTRO_FEATURES_OPTED_OUT; runc patch-fuzz; OVMF builds on NASM 3.0 / GCC 15 (backport NASM-3.0 fix, drop a now-redundant forward-decl) while keeping edk2-stable202502 for dstack-mr measurement compatibility (OVMF_VARIANT=pre202505).

Verification

bitbake parses clean (28395 targets, 0 errors). Built dstack-0.6.0 prod image and booted a TDX guest end-to-end: dm-verity rootfs, NVMe/DMA data disk, in-tree TDX quote, KMS /prpc onboard succeeds, docker workload runs, reaches Multi-User System.

kvinwang added 22 commits April 3, 2026 00:28
add gcp service account and scope config
The 6.18 defconfig was missing nftables config options that are required
by Docker's iptables-nft backend. These were defined in dstack-docker.cfg
for the linux-yocto kernel but linux-custom_6.18.7 does not use bbappend
fragments, so they must be in the defconfig directly.

Without these, rootfs build fails with missing kernel-module-nf-tables
and related packages.
Add xt_comment and nf_tables kernel modules for k3s support
NVIDIA's open kernel driver (nvidia.ko) gates its LKCA-backed libspdm
crypto provider on `CONFIG_CRYPTO_ECDSA` being defined when the driver
is built (see `kernel-open/nvidia/internal_crypt_lib.h`: the
`USE_LKCA` macro requires the kernel to advertise ECDSA, ECDH, RSA,
HMAC, AKCIPHER, etc.). When `CONFIG_CRYPTO_ECDSA` is missing, libspdm
falls back to stubs and at runtime prints
`libspdm expects LKCA but found stubs!` then fails
`spdmEstablishSession`, so H100 in Confidential Compute mode (e.g. GCP
TDX + a3-highgpu-1g) never finishes init and `nvidia-smi` reports no
devices.

`meta-nvidia/recipes-kernel/linux/files/nvidia.cfg` already sets this
config, but it ships as a `linux-yocto%.bbappend`, which does not
attach to the in-tree `linux-custom_*.bb` recipes that build the
dstack kernel from a defconfig. Add the option directly to the 6.17
and 6.18 defconfigs so all flavors (incl. nvidia) pick it up.

Verified end-to-end on GCP a3-highgpu-1g + TDX after rebuilding the
kernel + nvidia kernel modules with this change: SPDM session
establishes, `nvidia-smi conf-compute -f` reports `CC status: ON`,
and a PyTorch matmul runs at ~38 TFLOPs.
kernel: enable CONFIG_CRYPTO_ECDSA for H100 confidential compute
Many GCP projects only ship preemptible (SPOT) quota for newer GPUs —
in particular `PREEMPTIBLE-NVIDIA-H100-GPUS-per-project-{region,zone}`
is granted by default while `NVIDIA-H100-GPUS-per-project-region` is
zero. Without on-demand quota, the only way to launch H100 in a
Confidential TDX VM is to request `--provisioning-model=SPOT`.

Expose a `provisioning_model` field in `gcp_config` (default
`STANDARD`, backwards-compatible). When set to `SPOT`, also emit
`--instance-termination-action=STOP` so the boot/data disks survive
preemption and the instance can be resumed via `dstack-cloud start`
(important for the LUKS-encrypted data disk, which is keyed by the
KMS-provisioned per-instance secret).

Anything other than `STANDARD`/`SPOT` raises an early error rather
than silently dropping through.

Example `app.json` snippet for an H100 deploy:

    "gcp_config": {
      "machine_type": "a3-highgpu-1g",
      "zone": "us-central1-a",
      "provisioning_model": "SPOT"
    }
…isioning

dstack-cloud: add gcp_config.provisioning_model for SPOT instances
Pulls 315 commits of guest-agent / kms / gateway / vmm fixes into the
recipe inputs. This is the state the v0.6.1 release tarballs were
built against, so bumping the pointer here makes
`git clone --recurse-submodules` reproduce the released images.

dstack 603c6ee5..b051018a (Phala-Network/dstack-cloud:master tip).
Tags the artifacts produced by `FLAVORS=... make dist` as 0.6.1, so
`dstack-cloud pull dstack-cloud{,-nvidia}-0.6.1` resolves against the
released tarballs at
https://github.com/Phala-Network/meta-dstack-cloud/releases/tag/v0.6.1.

The 0.6.1 cycle ships the H100 CC kernel fix (#14), the SPOT
provisioning flag in `dstack-cloud` (#15), and the dstack submodule
bump to b051018a (#16). See the v0.6.1 release notes for details.
…@ wrynose

- adopt official split layout (bitbake 2.18 + openembedded-core@wrynose +
  meta-yocto@wrynose) since combined poky has no wrynose branch; drop poky
- bring cloud build system: --flavor multiconfig, UKI mkimage, dstack-cloud CLI,
  zfs 2.4, sysbox UNPACKDIR, multiconfig confs
- rebase dstack forks onto wrynose: meta-virtualization/security/confidential-compute
  /meta-rust-bin LAYERSERIES_COMPAT -> wrynose
- dstack submodule -> merged Apache-2.0 component; .gitmodules dstack url back to Dstack-TEE
- dstack.conf: DISTRO_VERSION 0.6.0, kernel -> linux-yocto 6.18, keep NVIDIA + EFI/UKI
- keep mainline OVMF (edk2-stable202502 pinning) and hardened dstack-docker.cfg
- DISTRO_NAME back to DStack
Kernel recipe cleanup (drop linux-custom) and tdx-guest-mod removal follow.
- drop self-written linux-custom_{6.17.6,6.18.7}.bb and their flat defconfigs
- linux-yocto%.bbappend now carries the dstack .scc/.cfg fragments (already
  the mainline mechanism) on official linux-yocto 6.18 (wrynose)
- wire 0001-x86-tdx-select-dma-direct-remap.patch via SRC_URI:append:tdx
- dstack-tdx.cfg: CONFIG_TDX_GUEST_DRIVER=y + TSM_REPORTS=y (in-tree ConfigFS
  TSM replaces the out-of-tree mod-tdx-guest module)
…on path

- Makefile flavor_to_dist maps to dstack/dstack-dev/dstack-nvidia/dstack-nvidia-dev
- dstack-uki.bb: glob python3.* site-packages instead of hardcoded python3.13
  (wrynose native python version differs)
- dev-setup: TEMPLATECONF -> openembedded-core/meta/conf/templates/default
  (wrynose moved templates out of meta-poky); add meta-poky + meta-yocto-bsp
  to LAYERS explicitly (oe-core default template no longer pulls them)
- meta-dstack/meta-nvidia layer.conf: LAYERSERIES_COMPAT -> wrynose
- dstack.conf: DISTRO_FEATURES_BACKFILL_CONSIDERED -> DISTRO_FEATURES_OPTED_OUT
- meta-confidential-compute: move wic/ -> files/wic/ (wrynose wks search path)
- dstack-rootfs-base.inc: drop stray diff3 conflict marker
verified: virtual/kernel = official linux-yocto 6.18.24, dma-direct-remap
patch wired via SRC_URI:append:tdx
poky.conf defaults POKY_INIT_MANAGER=sysvinit, which pulls
init-manager-sysvinit.inc and appends sysvinit to DISTRO_FEATURES — that
conflicts with systemd so both systemd and eudev get skipped and nothing
RPROVIDES udev (breaks cryptsetup -> dstack-initramfs). Setting INIT_MANAGER
before requiring the poky-derived cvm.conf selects init-manager-systemd.inc.
- bump edk2 stable202502 -> stable202511 (202502 won't assemble with wrynose
  NASM 3.01); changes RTMR[0] -> needs new dstack-mr OvmfVariant baseline
- add oe-core's CpuExceptionHandlerLib push-instruction NASM 3.0 backport
- drop 0003/0004 reproducibility patches (don't apply to 202511 template; not
  needed for functional image — rebase from oe-core versions for production)
- drop 0005-Declare-ProcessLibraryConstructorList (edk2 202511 declares it natively)
- OVMF_VARIANT -> stable202511
linux-yocto-tiny ships CONFIG_CRYPTO_SHA256=m; dm-verity in the initramfs can't
load modules, so early rootfs verity failed with 'Cannot initialize hash
function (-ENOENT)' and init died -> kernel panic. Force SHA256/SHA512 built-in.
…nose

dstack-mr can't yet compute measurements for newer edk2, so the pre202505
RTMR[0] layout must be preserved. Make 202502 build on wrynose by:
- backporting edk2's NASM-3.0 CpuExceptionHandlerLib push-instruction fix
- dropping 0005-Declare-ProcessLibraryConstructorList (GCC 15 rejects the K&R
  'void f()' forward-decl as conflicting with edk2's EFIAPI prototype; the
  prototype is already in scope in 202502 so the manual decl is redundant)
OVMF_VARIANT stays pre202505.
Copilot AI review requested due to automatic review settings June 1, 2026 02:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request merges the meta-dstack-cloud build system back into this repository while upgrading the Yocto stack to Yocto 6.0 (wrynose) and switching the kernel over to official linux-yocto 6.18 with dstack-specific config/patches. It also introduces multiconfig “flavors” (prod/dev/nvidia/nvidia-dev), adds a UKI build flow, and updates multiple recipes/layers for the new layout and tooling.

Changes:

  • Switch repo layout from poky to split bitbake + openembedded-core + meta-yocto submodules and update layers to wrynose.
  • Replace the custom kernel path with linux-yocto 6.18 plus dstack .scc/.cfg fragments and a TDX-specific Kconfig patch.
  • Add/extend build and release tooling: multiconfig flavors, UKI generation + Authenticode hashing, and a new dstack-cloud deployment CLI.

Reviewed changes

Copilot reviewed 82 out of 88 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
scripts/bin/dstack.py Rename config/manager classes and update CLI strings for consistent “dstack” naming.
scripts/bin/dstack-cloud Add multi-cloud (currently GCP-focused) VM lifecycle CLI including KMS/env encryption and firewall helpers.
scripts/bin/authenticode_hash.py Add PE/COFF Authenticode SHA256 calculator for UKI measurement compatibility.
repro-build/repro-build.sh Limit reproducible builds to release flavors and adjust reproduce script repo/paths.
repro-build/Dockerfile.repro Add build dependencies for GPT/FAT tooling used by new image/UKI flows.
repro-build/check.sh Add an image whitelist comparison mode to avoid known non-reproducible artifacts.
README.md Update project naming, links, and reproducible build instructions.
mkimage.sh Add --flavor multiconfig support; create partitioned rootfs; optionally build UKI disk + Authenticode hash; split tar outputs.
meta-nvidia/recipes-graphics/nvidia/nvidia-persistenced_1.0.bb Use UNPACKDIR for installed files (wrynose fetch/unpack behavior).
meta-nvidia/recipes-graphics/nvidia/nvidia-modprobe-config_1.0.bb Use UNPACKDIR for installed files.
meta-nvidia/recipes-graphics/nvidia/nvidia-libs.inc Minor syntax/style fix for FILES append.
meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_580.105.08.bb Add updated Fabric Manager recipe version.
meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_570.172.08.bb Remove older Fabric Manager recipe version.
meta-nvidia/recipes-graphics/nvidia/nvidia_580.105.08.bb Update driver recipe to use UNPACKDIR and refresh checksum metadata.
meta-nvidia/recipes-graphics/nvidia/libnvidia-nscq_580.105.08.bb Update source dir handling and adjust installed payload.
meta-nvidia/recipes-graphics/nvidia-container-toolkit/nvidia-container-toolkit.inc Switch toolkit fetch to main + new SRCREV; adjust source layout.
meta-nvidia/recipes-graphics/nvidia-container-toolkit/nvidia-container-toolkit_1.00.bb Install config from UNPACKDIR.
meta-nvidia/recipes-graphics/libnvidia-container/libtirpc134_1.3.4.bb Set UNPACKDIR source dir and add GCC-15-related CFLAGS adjustments.
meta-nvidia/recipes-graphics/libnvidia-container/libnvidia-container/*.patch Add Upstream-Status headers for OE patch hygiene.
meta-nvidia/recipes-graphics/libnvidia-container/libnvidia-container.inc Update libnvidia-container + modprobe fetch strategy/versions.
meta-nvidia/recipes-graphics/libnvidia-container/libnvidia-container_1.00.bb Add task to relocate modprobe sources into expected subtree.
meta-nvidia/recipes-graphics/ldconfig-compatibility-symlink/ldconfig-compatibility-symlink_1.0.0.bb Minor syntax/style fix for FILES append.
meta-nvidia/recipes-graphics/containerd-config/containerd-config_1.0.0.bb Install config from UNPACKDIR and fix FILES syntax.
meta-nvidia/conf/layer.conf Declare wrynose layer compatibility.
meta-dstack/recipes-kernel/tdx-guest-mod/tdx-guest.bb Remove out-of-tree TDX guest module recipe.
meta-dstack/recipes-kernel/linux/linux-yocto%.bbappend Add TDX-only Kconfig patch selection for DMA_DIRECT_REMAP.
meta-dstack/recipes-kernel/linux/files/.scc/.cfg Update dstack kernel config fragments, enable in-tree TDX driver/TSM reports, ensure SHA256 built-in.
meta-dstack/recipes-kernel/linux/files/0001-x86-tdx-select-dma-direct-remap.patch Add TDX guest Kconfig select patch.
meta-dstack/recipes-devtools/gptfdisk/gptfdisk_%.bbappend Disable ncurses/cgdisk option to avoid unwanted deps.
meta-dstack/recipes-devtools/gcc/libgcc-initial_%.bbappend Add a configure prefunc that stubs stdio.h in staging for toolchain build.
meta-dstack/recipes-core/systemd/systemd_%.bbappend Remove vconsole pieces and GPT auto generator; add systemd-resolved ordering drop-in; tweak PACKAGECONFIG.
meta-dstack/recipes-core/pahole/pahole_1.25.bbappend Remove prior pahole SRCREV override.
meta-dstack/recipes-core/images/dstack-uki.bb Add UKI image recipe building via ukify using verity hash/size from work-shared env.
meta-dstack/recipes-core/images/dstack-rootfs.bb Unify rootfs recipe and select prod/dev/nvidia via multiconfig variables.
meta-dstack/recipes-core/images/dstack-rootfs-*.inc Refactor prod/dev includes and dev features.
meta-dstack/recipes-core/images/dstack-rootfs-base.inc Remove tdx-guest module dependency; add tpm2-tools; add containerd state dir.
meta-dstack/recipes-core/images/dstack-*-rootfs.bb Remove separate nvidia/dev rootfs wrapper recipes in favor of unified rootfs + multiconfig.
meta-dstack/recipes-core/images/dstack-initscript/init Update initramfs init logic to resolve root by PARTLABEL and require verity params.
meta-dstack/recipes-core/images/dstack-initscript.bb Switch to UNPACKDIR as S for initramfs content.
meta-dstack/recipes-core/dstack-zfs/dstack-zfs_2.4.0.bb Upgrade ZFS branch/SRCREV; drop patches; relax buildpaths QA for modules.
meta-dstack/recipes-core/dstack-sysbox/dstack-sysbox_0.6.7.bb Update paths to use UNPACKDIR consistently.
meta-dstack/recipes-core/dstack-ovmf/dstack-ovmf/*.patch Refresh OVMF patch set for NASM 3.x + reproducibility notes/metadata.
meta-dstack/recipes-core/dstack-ovmf/dstack-ovmf_git.bb Keep stable202502 pin; swap patch 0005 to NASM push-instruction fix; add rationale.
meta-dstack/recipes-core/dstack-guest/dstack-guest.bb Adjust source dir to UNPACKDIR, install extra script, and relax buildpaths QA for Cargo output.
meta-dstack/recipes-core/docker/docker-moby%.bbappend Install override from UNPACKDIR.
meta-dstack/recipes-core/base-files/files/dstack-motd Update MOTD casing to “dstack”.
meta-dstack/recipes-core/base-files/base-files%.bbappend Use UNPACKDIR for MOTD installation and diagnostics.
meta-dstack/recipes-connectivity/openssh/openssh_%.bbappend Install sshd drop-in from UNPACKDIR.
meta-dstack/conf/multiconfig/*.conf Add prod/dev/nvidia/nvidia-dev flavor definitions with separate TMPDIRs.
meta-dstack/conf/local.conf Add GNU mirror override and set BBMULTICONFIG default flavors.
meta-dstack/conf/layer.conf Update layer series compatibility to wrynose.
meta-dstack/conf/distro/dstack.conf Set INIT_MANAGER=systemd early, bump version, switch kernel provider to linux-yocto 6.18, add EFI/UKI settings.
Makefile Build common artifacts + per-flavor multiconfig rootfs/UKI; run mkimage per flavor.
LICENSE Change repository license text to Business Source License 1.1 with AGPL change license.
dev-setup Move to oe-core’s oe-init-build-env, sync conf into build dir, add meta-yocto layers.
build.sh Update release basename naming to dstack-cloud variants and adjust download URL.
.gitmodules Replace poky with split bitbake/openembedded-core/meta-yocto submodules; adjust meta-rust-bin URL.
.gitignore Ignore .vscode and .claude.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mkimage.sh
Comment thread repro-build/check.sh
Comment thread README.md Outdated
Comment thread meta-dstack/recipes-core/systemd/systemd_%.bbappend
Comment thread meta-dstack/recipes-core/images/dstack-uki.bb
Comment thread build.sh
Comment thread mkimage.sh Outdated
Comment thread repro-build/check.sh
Comment thread Makefile Outdated
kvinwang and others added 3 commits May 31, 2026 20:28
- mkimage.sh: deterministic GPT GUIDs (reproducible partitioned images); check
  verity env exists and sgdisk is installed before use
- repro-build/check.sh: compare rootfs.img.parted.verity (new name); define YELLOW
- build.sh: download from Dstack-TEE/meta-dstack releases (not the fork)
- README: clone Dstack-TEE/meta-dstack for the reproducible-build steps
- systemd bbappend: drop dangling blacklist-autofs4.conf FILES entry (never installed)
- dstack-uki.bb: run ukify via argv list (no shell); fail clearly if ROOT_HASH/
  DATA_SIZE missing
- Makefile: build dstack-guest in images-common to avoid multiconfig fetch races
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The Copilot autofix (fed4cc1) added an sgdisk check right before
create_partitioned_rootfs, but an equivalent early check already existed.
Drop the redundant early check and keep the call-site one, with a clearer
message (the 'set ENABLE_UKI_IMAGE=0' hint was misleading — sgdisk is needed
for the partitioned bare-metal image, not the UKI path).
@kvinwang kvinwang merged commit ebdc817 into main Jun 4, 2026
3 checks passed
kvinwang added a commit that referenced this pull request Jun 4, 2026
Follow-up to #64: advance the dstack submodule from the early cloud-merge
commit to the current Dstack-TEE/dstack#701 head (bde0d038) — GCP TDX + AWS
Nitro attestation, verified-PCR hardening, vendored dstack-cloud CLI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants