[WIP] We Have Oxide At Home - Part 2- Nested Virtualization
Debugging and patching propolis to support nested virtualisation by decoding the CPU instructions causing the crash.
We're building a single-node Oxide lab — real Oxide control plane software running on hardware we actually have. The full stack: Nexus API server, CockroachDB, ClickHouse, sled agent, DNS, Oximeter — 22 zones orchestrated by Omicron on a Helios VM inside Proxmox. Everything was working. Then we tried to create a VM.
It crashed in 350 milliseconds.
This is the story of tracking down the crash, understanding why it only happens outside of Oxide's production hardware, trying the obvious fix (which didn't work), and writing a patch that did.
Unsupported or Unsupportable?
After getting the non-simulated Oxide control plane running everything in the UI seems to be working but until i could run an actual VM there was only so much i could do. As nested virtualisation isn't supported my original plan was to try and setup an older Dell mini pc i had available to run as an additional “Sled” to run the VM’s while leaving all the supporting infrastructure services on my Proxmox VM.
I created a 1 vCPU, 1 GB RAM VM and tried to start and unsurprisingly it failed The sled-agent logs told the story:
Propolis monitor unexpectedly reported no instance, marking as failedDigging into the logs revealed the error
thread 'vcpu-0' panicked at bin/propolis-server/src/lib/vm/state_driver.rs:324:9:
vCPU 0: Unhandled VM exit: InstEmul(InstEmul {
inst_data: <guest data redacted>,
len: 15
})Propolis (Oxide's hypervisor\VMM built on bhyve) was crashing on an instruction it couldn't emulate and an instruction bytes were redacted so i didn’t know which instruction this was.
Underacting the crash
Propolis wraps guest data in a GuestData<T> type that redacts values in production builds. The display behavior is controlled by a global AtomicBool called DISPLAY_GUEST_DATA in lib/propolis/src/common.rs, defaulted to false.
After digging around for a bit we had two options
Option 1 : Build flag
Propolis supports a --features guest_data_display cargo feature that sets DISPLAY_GUEST_DATA to true at compile time.
Option 2: Modify the default.
Edit common.rs directly and change AtomicBool::new(false) to AtomicBool::new(true). Crude, but effective for a one-off debug build .
I should have gone for option A but i went for Option B as i didn't know if this was going to lead anywhere as this was just going to be a quick test. It's a one-line change, i wanted to reduce the risk of feature flag changing other things .
I clone the propolis repo on the Helios VM, check out the matching commit (36f20be9) for the build i was using, flip the value and build.
git clone https://github.com/oxidecomputer/propolis.git
cd propolis
git checkout 36f20be9bb4c3b362029237f5feb6377c982395f
# Edit lib/propolis/src/common.rs: AtomicBool::new(false) → AtomicBool::new(true)
RUSTUP_TOOLCHAIN=1.91.1 cargo build --release --bin propolis-serverI went and made a coffee and when i cam back i had a debug binary. But getting it into the running system wasn't straightforward.
Deploying a custom propolis
Propolis runs inside an illumos zone managed by an Oxide sled agent. The every time you create a new zone the sled-agent uses /opt/oxide/propolis-server.tar.gzto create the zone. I couldn't load the patched file into existing zone i had to re-create it.
The tarball has a specific structure that the zone installer expects:
propolis-server.tar.gz
├── oxide.json ← metadata file (REQUIRED)
└── root/
└── opt/oxide/propolis-server/
└── bin/
└── propolis-serverThe tarball must include oxide.json at the top level alongside root/. Without it, the zone installer fails and i spent a bit of time time tracing this down.
Build And Cleanup Process
# 1. Generate a tarball with correct oxide.json metadata
cd ~/omicron
./target/release/omicron-package package --only propolis-server --no-rebuild
# 2. Extract, swap in our debug binary, repack
cd /tmp
tar xzf ~/omicron/out/propolis-server.tar.gz
cp ~/propolis/target/release/propolis-server root/opt/oxide/propolis-server/bin/propolis-server
tar czf propolis-server.tar.gz oxide.json root/
# 3. Deploy the new tarball
pfexec cp /tmp/propolis-server.tar.gz /opt/oxide/propolis-server.tar.gz
# 4-5. Delete the exsint incomplete instances
zoneadm list -cv | grep propolis | awk '$3 == "incomplete" {print $2}' | \
while read zone; do
pfexec zoneadm -z "$zone" uninstall -F
pfexec zonecfg -z "$zone" delete -F
done
# 6. Create a fresh instance through the Oxide Console or APIThe crash data
With the debug binary deployed and a fresh instance created, I needed to find the logs t. The panic goes to stderr of the propolis-server process, which SMF captures in the service log. But you can't just svcs -L into a zone that already crashed as it gets atomically torn down. This is where you need to undertand zone bundles
Zone Bundles
A zone bundle is a diagnostic snapshot that sled-agent automatically creates when a zone fails. It captures the zone's SMF service logs, configuration, and other diagnostic state into a directory before the zone is cleaned up.
Think of it as black box recorder for crashed zones
The bundles are stored at:
/pool/int/
<zpool-uuid>
/debug/bundle/zone/
<zone-name>
/zpool-uuid- Internal zpool that sled-agent uses for debug data. Runzpool listto get a listzone-name- the full zone name as shown inzoneadm list -cv, e.g.oxz_propolis-server_9be5fc93-154b-4de7-bbb7-475e8a2e4300
Since we just cleaned up all the incomplete zones, there should only be one propolis zone (the one that just crashed). We can find the bundle path directly:
# Find the zone bundle for the crashed propolis zone
find /pool/int/*/debug/bundle/zone/ -name "oxz_propolis-server_*" -type d
# If multiple bundles exist, find the one(s) containing the crash output
grep -rl "InstEmul\|panicked" /pool/int/*/debug/bundle/zone/oxz_propolis-server_*/With the debug binary deployed and a fresh instance created, we got our unredacted crash:
vCPU 0: Unhandled VM exit: InstEmul(InstEmul {
inst_data: [243, 110, 235, 10, 227, 8, 138, 6, 238, 72, 255, 198, 226, 248, 76],
len: 15
})Decoding the instruction
The inst_data array in the panic output is decimal (Rust's default Debug format for [u8]). First step is converting to hex, which is what x86 instruction references use:
A quick way to do this conversion:
printf '%02x ' 243 110 235 10 227 8 138 6 238 72 255 198 226 248 76; echo
# f3 6e eb 0a e3 08 8a 06 ee 48 ff c6 e2 f8 4cThis gives us this view
Decimal: [243, 110, 235, 10, 227, 8, 138, 6, 238, 72, 255, 198, 226, 248, 76]
Hex: [ f3, 6e, eb, 0a, e3, 08, 8a, 06, ee, 48, ff, c6, e2, f8, 4c]With the hex bytes in hand, we can decode them against the Intel x86 opcode reference. The buffer contains the instruction that caused the crash plus whatever follows it in memory (up to 15 bytes total. The maximum x86 instruction length)
With the hex bytes in hand, we can decode them using a x86 opcode map. This table that maps each hex byte to its instruction.
| Bytes | Instruction | Meaning |
|---|---|---|
f3 6e |
REP OUTSB |
Repeatedly output bytes from memory (DS:[RSI]) to port (DX), RCX times — the crash |
eb 0a |
JMP +10 |
Skip ahead 10 bytes (jump over the fallback loop below) |
e3 08 |
JRCXZ +8 |
If RCX is zero, skip the loop entirely |
8a 06 |
MOV AL, [RSI] |
Load one byte from memory at RSI into AL |
ee |
OUT DX, AL |
Output that byte to the port in DX |
48 ff c6 |
INC RSI |
Advance the memory pointer |
e2 f8 |
LOOP -8 |
Decrement RCX, jump back to MOV AL, [RSI] if not zero |
For deeper reference on any individual instruction, the Felix Cloutier x86 reference is a great starting point.
What is REP OTSB
On x86 hardware the CPU sends and receives data through numbered I/O port. These are not like network port which are local address (e.g. TCP port 443). An I/O port is a physical address on the CPU's I/O bus that directly connects to another hardware device.
An `OUT` instruction sends a single byte to a port. `OUTSB` ("output string byte") is a specialized version that reads a byte from a memory address and sends it to the CPU port.
The `REP` prefix tells the CPU to repeat the instruction # of times.
So `REP OUTSB` means "send bytes from memory to this port, advancing through the buffer automatically." It's the x86 equivalent of a bulk write: one instruction replaces an entire loop.
Just like a physical machine has BIOS or UEFI, a virtual machine needs its own equilivent. Propolis uses OVMF (Open Virtual Machine Firmware), an open-source UEFI firmware for virtual machines, built from the EDK II codebase. Oxide maintains their own fork with customizations for propolis.
OVMF is the first code that runs when a VM starts, before any operating system loads. It initializes virtual hardware, sets up memory, enumerates PCI devices, and provides the UEFI environment that a guest OS expects to find.
At this point i knew most OVMF is sending data to a port using REP OUTSB. The instruction itself is completely normal so the firmware isn't doing anything wrong. At this point i dident know why propolis did not know how to handle REP OUTSB.