﻿# PVE Install: Same-Disk Boot Problem — Solution Options

## Problem Statement

NETSetup boots the Proxmox ISO from the same physical disk it installs to:

```
Partition 1 (gpt1): 4 GB FAT32 ESP — GRUB, Void Linux, NETSetup, ISO files, answer.toml
Partition 2 (gpt2): Raw Proxmox ISO (dd'd by BootISOGrub2)
```

The Proxmox installer's `extract_data()` (in `Proxmox/Install.pm`) executes in this order:

1. **Line ~809:** `vgchange -an` — deactivate VGs
2. **Line ~931:** `wipe_disk($target_hd)` — zeros first 16 MB of every partition, wipefs
3. **Line ~938:** `partition_bootable_disk(...)` — `sgdisk -Z` (destroy GPT), create new layout
4. **Line ~955:** `create_lvm_volumes(...)` — PV, VG, LVs
5. **Line ~969:** `mkswap`, `mkfs.vfat`, `mkfs.ext4`
6. **Line ~1025:** `stat($basefile)` — **reads base squashfs FROM the ISO** for extraction

Between step 2–3 (disk wiped) and step 6 (base system read), the ISO on partition 2 is destroyed.
The kernel and initrd are already in RAM, so the installer keeps running — but it can't read the
installation payload anymore.

---

## Option RAM: Copy ISO to RAM Before Installer Starts

**Principle:** Modify the `netsetup-check.sh` script (injected into the Proxmox initrd via the
existing `rdinit` wrapper) to copy the ISO from the partition into a tmpfs before setting `cdrom=`.
The installer then reads the ISO from RAM, and the disk can be freely wiped.

### Prerequisites

- Machine has enough RAM: ISO (~1.2 GB) + installer live environment (~1 GB) + working memory.
  Minimum ~4 GB RAM. Typical bare-metal servers (16+ GB) have no issue.
- The kernel parameter `ramdisk_size=16777216` is already set (16 GB max).

### Required Changes

#### Step 1: Modify `GetCheckScript()` in `BootISOGrub2.cs`

**File:** `src/NETSetup/Stages/4-LinuxPE/BootISOGrub2.cs`, method `GetCheckScript()` (line ~458)

Replace the Proxmox check script with a version that copies the ISO to RAM first:

```bash
# NETSetup: copy ISO to RAM, then mount from RAM so disk can be wiped
_ns_isodev=""
for _ns_par in $(cat /proc/cmdline); do
    case $_ns_par in
        netsetup.isodev=*) _ns_isodev="${_ns_par#netsetup.isodev=}" ;;
    esac
done
if [ -n "$_ns_isodev" ]; then
    echo "NETSetup: copying ISO from '$_ns_isodev' to RAM..."
    mkdir -p /iso-ram
    mount -t tmpfs -o size=2G tmpfs /iso-ram

    if dd if="$_ns_isodev" of=/iso-ram/proxmox.iso bs=4M status=progress 2>/dev/null; then
        echo "NETSetup: ISO copied to RAM, loop-mounting..."
        if mount -o loop,ro /iso-ram/proxmox.iso /mnt >/dev/null 2>&1; then
            if [ -r "/mnt/$CDID_FN" ] && [ "X$(cat "/mnt/$CDID_FN")" = "X$reqid" ]; then
                echo "NETSetup: valid ISO in RAM, disk is free for wiping"
                cdrom="loop-ram"
            else
                echo "NETSetup: wrong cd-id in RAM copy, continuing scan"
                umount /mnt
                umount /iso-ram
            fi
        else
            echo "NETSetup: failed to loop-mount RAM copy, continuing scan"
            umount /iso-ram
        fi
    else
        echo "NETSetup: dd failed, falling back to direct mount"
        umount /iso-ram

        # Fallback: mount directly (will break on wipe, but at least boots)
        if mount -t auto -o ro "$_ns_isodev" /mnt >/dev/null 2>&1; then
            if [ -r "/mnt/$CDID_FN" ] && [ "X$(cat "/mnt/$CDID_FN")" = "X$reqid" ]; then
                cdrom=$_ns_isodev
            else
                umount /mnt
            fi
        fi
    fi
fi
```

#### Step 2: Verify `cdrom=` value handling

The Proxmox `/init` script uses `$cdrom` to decide whether to skip the device-scanning loop.
Investigate whether setting `cdrom="loop-ram"` (a non-device value) is accepted, or whether it
must be a real block device path. If the latter, find the loop device with `losetup -l` and use
that, e.g. `cdrom=/dev/loop0`.

#### Step 3: Verify `/mnt` mount survives into the installer

After `/init` finishes and pivots root into the live squashfs, the ISO mount at `/mnt` must
remain accessible from the installer's perspective. The Proxmox init typically bind-mounts or
copies the ISO mount path into the live environment. Verify this works with the loop mount.

### Risks

| Risk | Severity | Mitigation |
|------|----------|------------|
| Not enough RAM on target machine | Medium | Check RAM in check script, fallback to direct mount |
| `dd` is slow for large ISOs | Low | 1.2 GB at ~2 GB/s (RAM speed) ≈ <1 second |
| `mount -o loop` not available in initrd | Low | Proxmox initrd has loop support (used internally) |
| `cdrom=` must be a real device path | Medium | Use `losetup -l` to find `/dev/loopN` |
| Proxmox `/init` remounts `/mnt` | Medium | Trace the init script to verify mount propagation |

### Validation Steps

1. Boot NETSetup on a test machine with ≥4 GB RAM
2. Verify `dmesg` / console shows "NETSetup: valid ISO in RAM"
3. Verify the installer completes the full auto-install without "unable to open file" errors
4. Verify the installed system boots correctly

---

## Option PATCH: Skip Installer's Wipe/Partition, Do It Manually

**Principle:** Pre-create the exact partition layout and LVM volumes before the auto-installer
starts, then patch `Proxmox/Install.pm` (via sed in the rdinit wrapper) to skip the
`wipe_disk` + `partition_bootable_disk` + `create_lvm_volumes` calls. The installer then only
does filesystem creation, extraction, and configuration.

### Prerequisites

- Deep understanding of the exact Proxmox partitioning scheme (documented in `docs/pve-install-disk.md`)
- The sed patch anchors must match the Proxmox version's `Install.pm`
- Must be updated when Proxmox changes the installer code

### Required Changes

#### Step 1: Create a pre-partition script

Write a shell script that runs in the Proxmox live environment BEFORE the auto-installer starts.
This script must replicate the exact partition layout from `pve-install-disk.md`:

```bash
#!/bin/sh
# Pre-partition the target disk with the exact Proxmox layout.
# Runs before proxmox-auto-installer, e.g. from a systemd unit or the rdinit wrapper.

TARGET_DEV="$1"       # e.g. /dev/sda
HDSIZE_GB="$2"        # optional cap
ESP_SIZE=1024          # MB (assuming >100GB disk, adjust if needed)
ESP_END=$((ESP_SIZE + 1))

# 1. Wipe (safe — ISO is still in partition 2 but we skip zeroing it,
#    or ISO is already in RAM per Option RAM)
sgdisk -Z "$TARGET_DEV"

# 2. Create ESP (partition 2 first, matching Proxmox order)
sgdisk -n2:1M:+${ESP_SIZE}M -t2:EF00 "$TARGET_DEV"

# 3. Create OS/LVM partition
if [ -n "$HDSIZE_GB" ]; then
    MAX_MB=$((HDSIZE_GB * 1024))
    sgdisk -n3:${ESP_END}M:${MAX_MB}M -t3:8E00 "$TARGET_DEV"
else
    sgdisk -n3:${ESP_END}M:0 -t3:8E00 "$TARGET_DEV"
fi

# 4. BIOS boot partition (only for 512-byte sector disks)
LBSIZE=$(cat /sys/block/$(basename "$TARGET_DEV")/queue/logical_block_size 2>/dev/null)
if [ "$LBSIZE" != "4096" ]; then
    sgdisk -a1 -n1:34:2047 -t1:EF02 "$TARGET_DEV"
fi

# 5. Trigger udev
sleep 1
udevadm trigger --subsystem-match block
udevadm settle --timeout 10

# 6. Zero first 256 MB of ESP and OS partitions (same as installer)
PART2=$(lsblk -no KNAME -p "$TARGET_DEV" | grep -E "${TARGET_DEV}p?2$" | head -1)
PART3=$(lsblk -no KNAME -p "$TARGET_DEV" | grep -E "${TARGET_DEV}p?3$" | head -1)
dd if=/dev/zero of="$PART2" bs=1M count=256 2>/dev/null
dd if=/dev/zero of="$PART3" bs=1M count=256 2>/dev/null

# 7. Create LVM
pvcreate --metadatasize 250k -y -ff "$PART3"
vgcreate pve "$PART3"

# Swap and root size calculation would go here — must replicate
# compute_swapsize() and create_lvm_volumes() logic from Install.pm.
# See docs/pve-install-disk.md for the exact formulas.
```

#### Step 2: Inject the pre-partition script into the boot flow

**Option A — systemd unit:** Create a systemd service that runs before `proxmox-auto-installer.service`.
Add it to the cpio overlay alongside `netsetup-init` and `netsetup-check.sh`.

**Option B — rdinit wrapper:** Extend the wrapper to run the partitioning after `/init` has set up
the live environment but before the auto-installer starts. This is complex because the wrapper runs
very early (before pivot_root).

**Option C — patched auto-installer:** Patch the auto-installer binary or its wrapper to run the
pre-partition script first.

#### Step 3: Patch `Install.pm` to skip wipe+partition

Add a sed patch to the rdinit wrapper (in `CreateInitWrapper()`) or to a post-boot hook that
modifies `Proxmox/Install.pm` before the installer runs.

Target: the `else` branch in `extract_data()` (the LVM/single-disk path, lines ~926–959):

```perl
# Original code to skip:
        } else {
            my $target_hd = Proxmox::Install::Config::get_target_hd();
            die "target '$target_hd' is not a valid block device\n" if !-b $target_hd;
            $diskcount = 1;

            wipe_disk($target_hd);                                    # ← SKIP

            update_progress(0, 0.02, $maxper, "create partitions");

            my $logical_bsize = Proxmox::Sys::Block::logical_blocksize($target_hd);

            my ($os_size, $osdev, $efidev) =
                partition_bootable_disk($target_hd, $hdsize, '8E00'); # ← SKIP

            Proxmox::Sys::Block::udevadm_trigger_block();

            # ...
            update_progress(0, 0.03, $maxper, "create LVs");

            my $swap_size = compute_swapsize($os_size);
            ($rootdev, $swapfile, $datadev) = create_lvm_volumes($osdev, $os_size, $swap_size); # ← SKIP
```

The patch must:
1. Skip `wipe_disk`, `partition_bootable_disk`, `create_lvm_volumes`
2. Instead, discover the pre-created partitions and LVs
3. Set `$osdev`, `$efidev`, `$rootdev`, `$swapfile`, `$datadev`, `$os_size` to the correct values

This is a complex sed patch. Example approach — replace the entire `else` block:

```bash
# In the rdinit wrapper or a post-boot hook, patch Install.pm:
INSTALL_PM="/usr/share/perl5/Proxmox/Install.pm"  # path in live environment

sed -i '/wipe_disk(\$target_hd)/,/create_lvm_volumes/ {
    s/wipe_disk(\$target_hd)/# NETSetup: skipped wipe_disk/
    s/partition_bootable_disk.*8E00.*/# NETSetup: skipped partition_bootable_disk/
    s/create_lvm_volumes.*swap_size.*/# NETSetup: skipped create_lvm_volumes/
}' "$INSTALL_PM"
```

**WARNING:** This sed approach is extremely fragile. Any whitespace change, variable rename, or
code reflow in a Proxmox update will break it silently.

#### Step 4: Inject pre-created device paths

After skipping the LVM creation, the installer expects these variables to be set:
- `$osdev` — e.g. `/dev/sda3`
- `$efidev` — e.g. `/dev/sda2`
- `$rootdev` — e.g. `/dev/pve/root`
- `$swapfile` — e.g. `/dev/pve/swap`
- `$datadev` — e.g. `/dev/pve/data`
- `$os_size` — in KB

These must be injected into the patched code. This requires either:
- Hardcoded values in the sed patch (inflexible)
- A helper script that detects the pre-created layout and writes a Perl snippet
- Environment variables read by the patched Install.pm

### Risks

| Risk | Severity | Mitigation |
|------|----------|------------|
| sed anchors break on PVE updates | **High** | Pin PVE ISO version, re-verify on each update |
| LVM size formulas drift from installer | **High** | Compare with `pve-install-disk.md` on each update |
| Variable injection wrong | High | Extensive testing per disk size |
| Race condition: partition script vs installer | Medium | Use systemd ordering (Before=) |
| BIOS boot partition edge case (4kn disks) | Low | Check logical block size in pre-partition script |
| Non-PVE products (PMG/PBS) have different LV layout | Medium | Branch in pre-partition script |

### Validation Steps

1. Boot NETSetup on test machine
2. Verify partitions exist before installer starts (`lsblk`, `pvs`, `lvs`)
3. Verify the installer's `extract_data()` picks up pre-created devices
4. Verify `mkfs`, extraction, and full install complete
5. Verify installed system boots correctly
6. **Repeat for each new Proxmox ISO version**

---

## Comparison

| Aspect | Option RAM | Option PATCH |
|--------|-----------|--------------|
| Complexity | Low — change one shell script | High — pre-partition script + sed patches + variable injection |
| Fragility | Low — works with any PVE version | **High** — breaks on PVE installer code changes |
| RAM requirement | ~1.2 GB extra | None |
| Maintenance | Minimal | Must re-verify sed anchors per PVE release |
| Risk of silent failure | Low — dd/mount errors are visible | High — wrong sed patch = corrupted Install.pm |
| Scope of change | `GetCheckScript()` only | `GetCheckScript()` + `CreateInitWrapper()` + new pre-partition script + new systemd unit |

**Recommendation:** Option RAM. It's simpler, version-independent, and the RAM cost is trivial
for bare-metal servers.
