We Have Oxide At Home - Part 2- Nested Virtualization

We're building a single-node Oxide lab (real Oxide control plane software running on hardware we actually have. The full stack: Nexus API server, CockroachDB, ClickHouse, sled agent, DNS, Oximeter) 22 zones orchestrated by Omicron on a Helios VM inside Proxmox. Everything was working. Then we tried to create a VM.

It crashed in 350 milliseconds.

This is the story of tracking down the crash, understanding why it only happens outside of Oxide's production hardware, trying the obvious fix (which didn't work), and writing a patch that did.

Unsupported or Unsupportable?

After getting the non-simulated Oxide control plane running everything in the UI seems to be working but until i could run an actual VM there was only so much i could do. As nested virtualisation isn't supported my original plan was to try and setup an older Dell mini pc i had available to run as an additional “Sled” to run the VM’s while leaving all the supporting infrastructure services on my Proxmox VM.

I created a 1 vCPU, 1 GB RAM VM and tried to start and unsurprisingly it failed The sled-agent logs told the story:

Propolis monitor unexpectedly reported no instance, marking as failed

Digging into the logs revealed the error

thread 'vcpu-0' panicked at bin/propolis-server/src/lib/vm/state_driver.rs:324:9:
vCPU 0: Unhandled VM exit: InstEmul(InstEmul {
    inst_data: <guest data redacted>,
    len: 15
})

Propolis (Oxide's hypervisor\VMM built on bhyve) was crashing on an instruction it couldn't emulate and an instruction bytes were redacted so i didn’t know which instruction this was.

Underacting the crash

Propolis wraps guest data in a GuestData<T> type that redacts values in production builds. The display behavior is controlled by a global AtomicBool called DISPLAY_GUEST_DATA in lib/propolis/src/common.rs, defaulted to false.

After digging around for a bit we had two options

Option 1 : Build flag

Propolis supports a --features guest_data_display cargo feature that sets DISPLAY_GUEST_DATA to true at compile time.

Option 2: Modify the default.

Edit common.rs directly and change AtomicBool::new(false) to AtomicBool::new(true). Crude, but effective for a one-off debug build .

I should have gone for option A but i went for Option B as i didn't know if this was going to lead anywhere as this was just going to be a quick test. It's a one-line change, i wanted to reduce the risk of feature flag changing other things .

I clone the propolis repo on the Helios VM, check out the matching commit (36f20be9) for the build i was using, flip the value and build.

git clone https://github.com/oxidecomputer/propolis.git
cd propolis
git checkout 36f20be9bb4c3b362029237f5feb6377c982395f
# Edit lib/propolis/src/common.rs: AtomicBool::new(false) → AtomicBool::new(true)
RUSTUP_TOOLCHAIN=1.91.1 cargo build --release --bin propolis-server

I went and made a coffee and when i cam back i had a debug binary. But getting it into the running system wasn't straightforward.

Deploying a custom propolis to debug

Propolis runs inside an illumos zone managed by an Oxide sled agent. The every time you create a new zone the sled-agent uses /opt/oxide/propolis-server.tar.gzto create the zone. I couldn't load the patched file into existing zone i had to re-create it.

The tarball has a specific structure that the zone installer expects:

propolis-server.tar.gz
├── oxide.json          ← metadata file (REQUIRED)
└── root/
    └── opt/oxide/propolis-server/
        └── bin/
            └── propolis-server

The tarball must include oxide.json at the top level alongside root/. Without it, the zone installer fails and i spent a bit of time time tracing this down.

Build And Cleanup Process

# 1. Generate a tarball with correct oxide.json metadata
cd ~/omicron
./target/release/omicron-package package --only propolis-server --no-rebuild
 
# 2. Extract, swap in our debug binary, repack
cd /tmp
tar xzf ~/omicron/out/propolis-server.tar.gz
cp ~/propolis/target/release/propolis-server root/opt/oxide/propolis-server/bin/propolis-server
tar czf propolis-server.tar.gz oxide.json root/
 
# 3. Deploy the new tarball
pfexec cp /tmp/propolis-server.tar.gz /opt/oxide/propolis-server.tar.gz
 
# 4-5. Delete the exsint incomplete instances
zoneadm list -cv | grep propolis | awk '$3 == "incomplete" {print $2}' | \
  while read zone; do
    pfexec zoneadm -z "$zone" uninstall -F
    pfexec zonecfg -z "$zone" delete -F
  done
 
# 6. Create a fresh instance through the Oxide Console or API

Each auto-restart attempt by sled-agent creates a new propolis zone that gets stuck in "incomplete" state after the crash. I am going to repeat it again so you don't miss it and its easier to find later

zoneadm list -cv | grep propolis | awk '$3 == "incomplete" {print $2}' | \
  while read zone; do
    pfexec zoneadm -z "$zone" uninstall -F
    pfexec zonecfg -z "$zone" delete -F
  done

The crash data

With the debug binary deployed and a fresh instance created, I needed to find the logs t. The panic goes to stderr of the propolis-server process, which SMF captures in the service log. But you can't just svcs -L into a zone that already crashed as it gets atomically torn down. This is where you need to undertand zone bundles

Zone Bundles

A zone bundle is a diagnostic snapshot that sled-agent automatically creates when a zone fails. It captures the zone's SMF service logs, configuration, and other diagnostic state into a directory before the zone is cleaned up.

Think of it as black box recorder for crashed zones

The bundles are stored at:

/pool/int/<zpool-uuid>/debug/bundle/zone/<zone-name>/

zpool-uuid - Internal zpool that sled-agent uses for debug data. Run zpool list to get a list
zone-name- the full zone name as shown in zoneadm list -cv, e.g. oxz_propolis-server_9be5fc93-154b-4de7-bbb7-475e8a2e4300

Since we just cleaned up all the incomplete zones, there should only be one propolis zone (the one that just crashed). We can find the bundle path directly:

# Find the zone bundle for the crashed propolis zone
find /pool/int/*/debug/bundle/zone/ -name "oxz_propolis-server_*" -type d
 
# If multiple bundles exist, find the one(s) containing the crash output
grep -rl "InstEmul\|panicked" /pool/int/*/debug/bundle/zone/oxz_propolis-server_*/

With the debug binary deployed and a fresh instance created, we got our unredacted crash:

vCPU 0: Unhandled VM exit: InstEmul(InstEmul {
    inst_data: [243, 110, 235, 10, 227, 8, 138, 6, 238, 72, 255, 198, 226, 248, 76],
    len: 15
})

Decoding the instruction

The inst_data array in the panic output is decimal (Rust's default Debug format for [u8]). First step is converting to hex, which is what x86 instruction references use:

A quick way to do this conversion:

printf '%02x ' 243 110 235 10 227 8 138 6 238 72 255 198 226 248 76; echo
# f3 6e eb 0a e3 08 8a 06 ee 48 ff c6 e2 f8 4c

This gives us this view

Decimal: [243, 110, 235, 10, 227, 8, 138, 6, 238, 72, 255, 198, 226, 248, 76]
Hex:     [ f3,  6e,  eb, 0a,  e3, 08, 8a, 06, ee, 48,  ff,  c6,  e2,  f8, 4c]

With the hex bytes in hand, we can decode them against the Intel x86 opcode reference. The buffer contains the instruction that caused the crash plus whatever follows it in memory (up to 15 bytes total. The maximum x86 instruction length)

With the hex bytes in hand, we can decode them using a x86 opcode map. This table that maps each hex byte to its instruction.

Bytes	Instruction	Meaning
`f3 6e`	`REP OUTSB`	Repeatedly output bytes from memory (DS:[RSI]) to port (DX), RCX times
`eb 0a`	`JMP +10`	Skip ahead 10 bytes (jump over the fallback loop below)
`e3 08`	`JRCXZ +8`	If RCX is zero, skip the loop entirely
`8a 06`	`MOV AL, [RSI]`	Load one byte from memory at RSI into AL
`ee`	`OUT DX, AL`	Output that byte to the port in DX
`48 ff c6`	`INC RSI`	Advance the memory pointer
`e2 f8`	`LOOP -8`	Decrement RCX, jump back to `MOV AL, [RSI]` if not zero

For deeper reference on any individual instruction, the Felix Cloutier x86 reference is a great starting point.

What is REP OTSB

On x86 hardware the CPU sends and receives data through numbered I/O port. These are not like network port which are local address (e.g. TCP port 443). An I/O port is a physical address on the CPU's I/O bus that directly connects to another hardware device.

An OUT instruction sends a single byte to a port. OUTSB ("output string byte") is a specialized version that reads a byte from a memory address and sends it to the CPU port.

The REP prefix tells the CPU to repeat the instruction # of times.

So REP OUTSB means "send bytes from memory to this port, advancing through the buffer automatically." It's the x86 equivalent of a bulk write: one instruction replaces an entire loop.

Just like a physical machine has BIOS or UEFI, a virtual machine needs its own equilivent. Propolis uses OVMF (Open Virtual Machine Firmware), an open-source UEFI firmware for virtual machines, built from the EDK II codebase. Oxide maintains their own fork with customizations for propolis.

OVMF is the first code that runs when a VM starts, before any operating system loads. It initializes virtual hardware, sets up memory, enumerates PCI devices, and provides the UEFI environment that a guest OS expects to find.

At this point i knew most OVMF is sending data to a port using REP OUTSB. The instruction itself is completely normal so the firmware isn't doing anything wrong. At this point i dident know why propolis did not know how to handle REP OUTSB.

Why this doesn't happen on real Oxide hardware

On production Oxide racks running AMD EPYC (Milan/Genoa), the CPU has a feature called DecodeAssist. When a REP OUTSB causes a VM exit, the hardware decodes the instruction and produces a structured VM_EXITCODE_INOUT exit with flags indicating it's a string operation with a REP prefix. The illumos kernel takes this breaks it down into individual byte I/O calls and forwards them to propolis one at a time. Propolis never sees the raw instruction bytes.

Our setup is different. We're running Helios as a VM inside Proxmox (KVM on AMD). This is nested virtualization: KVM is the L0 hypervisor, bhyve inside Helios is the L1 hypervisor. KVM doesn't expose DecodeAssist to its guests. Without it, when REP OUTSB causes a VM exit, the hardware can't decode it. bhyve falls back to VM_EXITCODE_INST_EMUL Essentilly telling propolis “here are the raw bytes, you work it out”. Propolis doesn’t know what to do and crashes.

          Real Oxide Rack                       Our Setup (Nested Virt)
 ╔═══════════════════════════════╗        ╔═══════════════════════════════╗
 ║ AMD EPYC                      ║        ║ AMD Ryzen 7 PRO 8845HS        ║
 ║ ┌───────────────────────────┐ ║        ║ ┌───────────────────────────┐ ║
 ║ │ Helios                    │ ║        ║ │ KVM / Proxmox · L0        │ ║
 ║ │ ┌───────────────────────┐ │ ║        ║ │ ┌───────────────────────┐ │ ║
 ║ │ │ bhyve · L0            │ │ ║        ║ │ │ Helios                │ │ ║
 ║ │ │ ┌───────────────────┐ │ │ ║        ║ │ │ ┌───────────────────┐ │ │ ║
 ║ │ │ │ Propolis          │ │ │ ║        ║ │ │ │ bhyve · L1        │ │ │ ║
 ║ │ │ │ ┌───────────────┐ │ │ │ ║        ║ │ │ │ ┌───────────────┐ │ │ │ ║
 ║ │ │ │ │ Guest VM      │ │ │ │ ║        ║ │ │ │ │ Propolis      │ │ │ │ ║
 ║ │ │ │ │ ┌───────────┐ │ │ │ │ ║        ║ │ │ │ │ ┌───────────┐ │ │ │ │ ║
 ║ │ │ │ │ │ OVMF      │ │ │ │ │ ║        ║ │ │ │ │ │ Guest VM  │ │ │ │ │ ║
 ║ │ │ │ │ └───────────┘ │ │ │ │ ║        ║ │ │ │ │ │ ┌───────┐ │ │ │ │ │ ║
 ║ │ │ │ └───────────────┘ │ │ │ ║        ║ │ │ │ │ │ │ OVMF  │ │ │ │ │ │ ║
 ║ │ │ └───────────────────┘ │ │ ║        ║ │ │ │ │ │ └───────┘ │ │ │ │ │ ║
 ║ │ └───────────────────────┘ │ ║        ║ │ │ │ │ └───────────┘ │ │ │ │ ║
 ║ └───────────────────────────┘ ║        ║ │ │ │ └───────────────┘ │ │ │ ║
 ╚═══════════════════════════════╝        ║ │ │ └───────────────────┘ │ │ ║
                                          ║ │ └───────────────────────┘ │ ║
               4 layers                   ║ └───────────────────────────┘ ║
                                          ╚═══════════════════════════════╝
 
                                             5 layers (extra: KVM/Proxmox)

This is a propolis limitation, not a nested virtualization bug. bhyve starts fine, the VM reaches "Running," and OVMF begins executing. The crash happens when OVMF hits its first REP OUTSB, roughly 350ms into boot.

Checking for existing fixes

Before writing my own fix, i wanted to try and make sure i wasn't re inventing the wheel and checked for other issues.

Propolis issues: Multiple GitHub issues document the same class of crash:

#333 (March 2023): "Just after boot a focal image panics propolis". Multiple Oxide engineers hitting Unhandled VM exit: InstEmul(...) with various instruction data
#335 (March 2023): meta-issue proposing graceful handling instead of panic. Still open, no timeline
#755 (September 2024): same panic on a production Oxide rack. though that case was garbage instruction bytes (0xAF repeated = SCASW), indicating a guest CPU that "jumped into space"

PR #795: The most recent related change, merged October 2024. We verified it's included in our commit:

git merge-base --is-ancestor cf2dc26 36f20be9  # → YES

What did it actually change? The InstEmul exit got its own explicit match arm that logs vCPU diagnostics before crashing. But the crash itself unhandled_vm_exit() still calls panic!().

The VmEntry::Run return after the panic call is dead code. PR #795 added better logging, not a fix.

Community: Searched all 30 forks of propolis on GitHub. No patches for string I/O emulation. No community workarounds. As far as i could tell no one had fixed this.

The "Oxide at Home" blog (artemis.sh) successfully booted Alpine Linux on propolis without hitting this crash, likely because Alpine's boot path doesn't use REP OUTSB. Our OVMF firmware is more aggressive with string I/O.

Attempt 1: patching OVMF (didn't work)

The obvious approach: if OVMF is using REP OUTSB and propolis can't handle it, make OVMF stop using it.

Oxide's OVMF fork uses BaseIoLibIntrinsicSev.inf, which includes an assembly file (IoFifoSev.nasm) with two paths:

AMD SEV detected → byte-by-byte loop using individual OUT instructions
No SEV (our case) → REP OUTSB

A one-line fix in the DSC build file switches to the non-SEV I/O library, which uses a simple C loop:

void IoWriteFifo8(UINTN Port, UINTN Count, VOID *Buffer) {
    UINT8 *Buffer8 = (UINT8 *)Buffer;
    while (Count-- > 0) {
        IoWrite8(Port, *Buffer8++);
    }
}

Individual OUT instructions that propolis handles just fine.

Phase 1: Wrong OVMF platform, but crash fix confirmed

I first built OVMF using BhyveX64.dsc (the bhyve-specific platform). The crash fix worked, the VM reached "Running" state and stayed there.

We have a running VM! 🎉 Only problem, the serial console produced no output.

The problem: BhyveX64.dsc uses a different serial console driver (PlatformBootManagerLibBhyve) that routes output differently. It seems like OVMF opens the UART port but its not routing where we are expecting it Oxide's own build script (illumos/build.sh) uses OvmfPkgX64.dsc so i switched to this.

Phase 2: Right platform, patch insufficient

Rebuilt with OvmfPkgX64.dsc and the same one-line IoLib patch. The VM crashed again, but with different instruction bytes:

inst_data: [243, 110, 76, 137, 198, 195, 252, 72, 135, 202, 73, 135, 240, 243, 102]

Decoding this revealed a completely different function, REP OUTSB from another call site, followed by REP OUTSW (16-bit string output) in yet another function. The one-line DSC patch only replaced the IoFifo library used by most OVMF code. But binary analysis of the patched OVMF_CODE.fd found 12 remaining occurrences of REP OUTSB from:

Inline assembly in individual drivers
Other library instances that independently reference BaseIoLibIntrinsicSev.inf
Hand-written assembly routines in platform-specific code

The OVMF approach was whack-a-mole. Eliminating all REP OUTS* instructions would require auditing every .inf file, finding inline assembly in individual drivers, patching multiple modules, and re-verifying after each change. And any future OVMF update could reintroduce the instructions.

The plan is to come back and explore this later but there was another approach

Attempt 2: teaching propolis to emulate string I/O

The x86 string I/O instruction set is small and well-defined:

Opcode	Mnemonic	Direction	Size
`0x6C`	`INSB`	Port → Memory	8-bit
`0x6D`	`INSW`/`INSD`	Port → Memory	16 or 32-bit
`0x6E`	`OUTSB`	Memory → Port	8-bit
`0x6F`	`OUTSW`/`OUTSD`	Memory → Port	16 or 32-bit

Two relevant prefixes:

0xF3 (REP): repeat RCX times
0x66: operand-size override (toggles between 16-bit and 32-bit for INSW/OUTSW)

The illumos kernel already does this in vie_emulate_inout_str(). We used that as a reference implementation, adapted for propolis's userspace architecture.

Implementation

All changes went into one file: bin/propolis-server/src/lib/vcpu_tasks.rs.

The core is an emulate_string_io() function (~100 lines) added to the InstEmul exit handler:

1. Parse the instruction bytes:

// Skip prefix bytes
let mutpos = 0;
let muthas_rep = false;
let mutoperand_size_override = false;
 
loop {
    match bytes[pos] {
        0xF3 => { has_rep = true; pos += 1; }
        0x66 => { operand_size_override = true; pos += 1; }
        _ => break,
    }
}
 
let opcode = bytes[pos];
let inst_len = pos + 1;  // prefix bytes + opcode

2. Determine operation type and data size:

let (is_out, byte_size) = match opcode {
    0x6C => (false, 1),  // INSB
    0x6D => (false, if operand_size_override { 2 } else { 4 }),  // INSW/INSD
    0x6E => (true, 1),   // OUTSB
    0x6F => (true, if operand_size_override { 2 } else { 4 }),   // OUTSW/OUTSD
    _ => return None,     // Not a string I/O instruction - fall through to panic
};

3. Read guest registers and loop:

let port = (vcpu.get_reg(VM_REG_GUEST_RDX)? & 0xFFFF) as u16;
let mutindex = vcpu.get_reg(if is_out { VM_REG_GUEST_RSI } else { VM_REG_GUEST_RDI })?;
let rflags = vcpu.get_reg(VM_REG_GUEST_RFLAGS)?;
let direction: i64 = if rflags & 0x400 != 0 { -(byte_size as i64) } else { byte_size as i64 };
let count = if has_rep { vcpu.get_reg(VM_REG_GUEST_RCX)? } else { 1 };
 
let guard = acc_mem.access().unwrap();
let mem = &*guard;
 
for _ in 0..count {
    if is_out {
        // Read from guest memory, write to port
        let val = mem.read::<u8>(GuestAddr(index));
        vcpu.bus_pio.handle_out(port, byte_size, val as u32)?;
    } else {
        // Read from port, write to guest memory
        let val = vcpu.bus_pio.handle_in(port, byte_size)?;
        mem.write::<u8>(GuestAddr(index), &(val as u8));
    }
    index = (index as i64 + direction) as u64;
}

4. Update registers and advance RIP:

vcpu.set_reg(if is_out { VM_REG_GUEST_RSI } else { VM_REG_GUEST_RDI }, index)?;
if has_rep {
    vcpu.set_reg(VM_REG_GUEST_RCX, 0)?;
}
vcpu.set_reg(VM_REG_GUEST_RIP, exit.rip + inst_len as u64)?;

The function returns Some(VmEntry::Run) if it handled the instruction, or None to fall through to the existing panic for unrecognized InstEmul exits.

Plumbing

In propolis, each virtual CPU runs in its own loop ( vcpu_loop ) which processes VM exits as they occur. This loop has access to the Vcpu object (for reading/writing guest registers and dispatching port I/O), but it did not have access to guest memory.

Guest memory in propolis is accessed through a MemAccessor , this is handle that lets you read from and write to a guest's physical address space. The main MemAccessor lives on Machine.acc_mem, which is set up when the VM is created. Our problem is that is isever passed into the vCPU loop (because the existing code didn't need it) . As far as i can tell this is because the existing VM exit handlers only needed registers and port I/O, not direct memory access.

String I/O instructions like REP OUTSB are different. They operate on contiguous blocks of bytes in guest memory using three x86-64 registers:

RSI (Register Source Index) 0- where to read from (source pointer in guest memory)
RDI (Register Destination Index) - where to write to (destination pointer in guest memory)
RCX (Register Count) how many bytes to process

The "string" here doesn't mean text ( it's a block of bytes). For output (REP OUTSB), the CPU reads bytes from the address in RSI and sends them to an I/O port (specified in DX), incrementing RSI and decrementing RCX each iteration. For input (REP INSB), it reads from a port and writes to the address in RDI. Either way, the emulator needs to read from or write to guest memory , the vCPU loop couldn't do that.

In VcpuTasks::new() (where vCPU threads are spawned), we cloned machine.acc_mem as a child accessor ( machine.acc_mem.child()) and captured it in the vcpu_loop closure alongside the existing vcpu parameter.

// In VcpuTasks::new() - clone a child accessor for the vCPU thread
let task_acc_mem =
    machine.acc_mem.child(Some("vcpu-string-io".to_string()));

Then pass it into the vCPU loop:

Self::vcpu_loop(
    vcpu.as_ref(),
    &task_acc_mem,  // new parameter
    task,
    task_event_handler,
    task_gen,
    ...
);

A child accessor is a lightweight, independently-owned handle to guest memory that shares the same underlying memory mapping. It can be safely moved into the spawned vCPU thread while the parent stays on Machine.

Inside the loop, calling acc_mem.access().unwrap() returns a Guard<MemCtx> Guard is a pattern where a value controls access to a resource for as long as it exists. it essentially means, "you have access to guest memory for as long as this guard object exists." When the guard goes out of scope (at the end of the block or function), the memory mapping is released. This prevents you from holding a stale reference to guest memory if the VM shuts down.

With the accessor plumbed in, the InstEmul exit handler now calls our new function instead of panicking:

VmExitKind::InstEmul(inst) => {
    match emulate_string_io(
        vcpu, acc_mem, &inst, exit.rip, &log,
    ) {
        Some(entry) => entry,
        None => {
            // Not a string I/O instruction - preserve
            // existing behavior (log + panic)
            let diag = propolis::vcpu::Diagnostics::capture(vcpu);
            error!(log, "instruction emulation exit"; "context" => ?inst);
            event_handler.unhandled_vm_exit(vcpu.id, exit.kind);
            VmEntry::Run
        }
    }
}

A note on address translation

We took a shortcut here. When the guest CPU says "read from address X" (via RSI), that could be a Guest Virtual Address (GVA) rather than a Guest Physical Address (GPA). Normally you'd need to translate GVA to GPA by walking the guest's page tables. This is the process of reading the multi-level page table structures the guest OS uses to map virtual addresses to physical RAM. This is the same way an OS translates process addresses to hardware addresses)

OVMF runs in 64-bit long mode during early boot. Long mode is the standard operating mode for modern x86-64 CPUs, and it requires paging to be enabled.

Paging is the hardware mechanism that translates memory addresses through a set of page tables before they reach physical RAM. Every memory access goes through this translation, even in the guest VM. As paging is mandatory in long mode, we need to care about what the guest's page tables look like.

In OVMF's case, the early boot page tables are identity-mapped, meaning virtual address 0x1000 maps to physical address 0x1000, they're the same. So while paging is active, the translation is effectively a not required and we can use RSI/RDI values directly as if they were physical memory addresses.

Warning: This isn’t a good long term solution

A fully general implementation would need GVA to GPA translation by walking the guest's page tables (for guest operating systems that use non-identity-mapped pages). Since the crash happens during OVMF early boot, before any OS loads, direct physical addressing works correctly for our use case.

It's worth noting that this isn't an issue on real Oxide hardware. AMD EPYC processors (Milan/Genoa) have DecodeAssist, which means the CPU handles string I/O decoding in hardware and this code path is never hit.

If a guest OS were to use string I/O instructions with non-identity-mapped page tables, our emulation would read from the wrong physical address.

In practice this is unlikely since guest operating systems typically use memory-mapped I/O or DMA rather than legacy port I/O with string instructions, but it's a known limitation. We took this approach to get things up and running, not to build a production-grade solution. We'll look at implementing proper GVA to GPA translation as part of a more permanent fix

Hopefully in part 3 of this series will cover this + assessing the performance impact of this + reassessing the OVMF approach

How to implement

For the full patch code check this commit sf and if you just want a pre-patched version i am currently building patched versions for each new commit so you don’t have to! These run daily and instructions are included with each release https://github.com/swherdman/propolis/releases

To build the patched binary yourself on the Helios VM (~30 minutes for a release build) and generate the repackage tarball for deployment

cd ~/propolis
curl -sL https://github.com/swherdman/propolis/commit/c2bc58f9cbf8596dc82d6d9d8ce7b06bf9581260.patch | git apply
cargo build --release --bin propolis-server
cd ~/omicron
omicron-package package --only propolis-server --no-rebuild
# Extract, replace binary, repack with oxide.json
tar xzf out/propolis-server.tar.gz
cp ~/propolis/target/release/propolis-server root/opt/oxide/propolis-server/bin/propolis-server
tar czf out/propolis-server.tar.gz oxide.json root/
cp out/propolis-server.tar.gz /opt/oxide/propolis-server.tar.gz

Testing

Created a test instance: 1 vCPU, 1 GB RAM, no boot disk, 1 NIC.

The instance reached "Running" and stayed running. Previously it crashed in 350ms. Now it ran indefinitely.

During OVMF boot, the emulator handled 1,914 REP OUTS instructions, all to port 0x402, the OVMF debug port. Zero panics. Zero unhandled exits. Zero triple faults.

The serial console worked. We could see OVMF completing its full initialization sequence:

BdsDxe: failed to load Boot0001 "UEFI Non-Block Boot Device" from
  PciRoot(0x0)/Pci(0x18,0x0): Not Found
>>Start PXE over IPv4.
  PXE-E16: No valid offer received.
BdsDxe: loading Boot0003 "EFI Internal Shell" from
  Fv(7CB8BDC9-...)/FvFile(7C04A583-...)
UEFI Interactive Shell v2.2
EDK II
UEFI v2.70 (EDK II, 0x00010000)
Mapping table
      FS0: Alias(s):F0:;BLK0:
          PciRoot(0x0)/Pci(0x18,0x0)
Press ESC in 5 seconds to skip startup.nsh or any other key to continue.
Shell>

No boot disk attached, so OVMF tried non-block device boot, PXE (no DHCP offer), and fell through to the EFI shell. Exactly correct behavior. The firmware initialized fully and cleanly.

If you want a simple test image check out https://github.com/swherdman/whoah-testimage

Comparison of approaches

	OVMF patch	Propolis patch
Expected complexity	Trivial (one-line change)	Medium (~150 lines)
Actual complexity	High - 12+ call sites across OVMF modules	Medium -one file, one function
Scope	Only OVMF firmware	Any guest firmware or OS
Fragility	High - OVMF updates can reintroduce instructions	Low - handles the instruction class generically
Status	Tested, insufficient	Implemented, validated, working

The OVMF approach looked like a one-line fix and turned out to be a sprawling audit. The propolis approach looked like the harder option and turned out to be a clean, self-contained ~150-line change in a single file. Sometimes the "bigger" fix is actually the simpler one.

Deploy gotchas we hit along the way

A collection of things that cost us time:

oxide.json in the tarball: The propolis zone tarball must include this metadata file at the top level. Zone installation fails silently without it. Easy to forget when repacking by hand.
Sled-agent caches zone images: Replacing the tarball on disk doesn't affect existing instances. You must delete the instance, clean up any zombie zones from crash retries, and create a fresh instance.
Zombie zones accumulate: Each auto-restart attempt after a crash creates a new propolis zone in "incomplete" state. They pile up. Must be manually cleaned with zoneadm uninstall -F and zonecfg delete -F.
Zone bundles don't always capture the panic: The process crashes and stderr isn't always flushed to the SMF log before the bundle is collected.
Stale zpools after reinstall: After a full uninstall/reinstall cycle with virtual-hardware destroy/create, old zpools from the previous installation survive. Sled-agent crashes with UnexpectedUuid on startup. Must destroy old zpools before recreating virtual hardware.