From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id B0C87652A1
 for <pve-devel@lists.proxmox.com>; Tue,  1 Feb 2022 12:35:48 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 97F0C2DEF3
 for <pve-devel@lists.proxmox.com>; Tue,  1 Feb 2022 12:35:18 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id C21AC2DEE8
 for <pve-devel@lists.proxmox.com>; Tue,  1 Feb 2022 12:35:16 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9C14B436C8
 for <pve-devel@lists.proxmox.com>; Tue,  1 Feb 2022 12:35:16 +0100 (CET)
Date: Tue, 01 Feb 2022 12:35:09 +0100
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <20220131175918.2099575-1-s.ivanov@proxmox.com>
 <20220131175918.2099575-4-s.ivanov@proxmox.com>
In-Reply-To: <<20220131175918.2099575-4-s.ivanov@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid)
Message-Id: <1643711031.6fpgdd4qz1.astroid@nora.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.218 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [systemd.io]
Subject: Re: [pve-devel] [PATCH pve-kernel-meta 3/5] proxmox-boot: fix #3671
 add pin/unpin for kernel-version
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2022 11:35:48 -0000

On January 31, 2022 6:59 pm, Stoiko Ivanov wrote:
> The 2 commands follow the mechanics of p-b-t kernel add/remove in
> writing the desired abi-version to a config-file in /etc/kernel and
> actually modifying the boot-loader configuration upon p-b-t refresh.
>=20
> A dedicated new file is used instead of writing the version (with some
> kind of annotation) to the manual kernel list to keep parsing the file
> simple (and hopefully also cause fewer problems with manually edited
> files)

one thing I noticed while playing around - the following sequence of=20
actions is a bit surprising:

- pin (old) version FOO
- refresh
- ... (long time, different admin, ..)
- apt remove pve-kernel-$FOO

while this prints

 No linux-image /boot/vmlinuz-$FOO found - skipping

this is kind of hard to understand without knowing about p-b-t internals,
skipping here means we don't copy the kernel/initrd from /boot to the=20
ESP (since there is no source). now the $FOO kernel (and initrd) are on=20
the ESPs, but not in /boot. since the package is no longer installed,=20
future ABI-compatible upgrades are not installed, and the initrd is=20
never regenerated when triggered by other factors.

worse, if I pinned that kernel for important reasons (e.g., HW-compat),=20
removing the pin (via unpin, pinning another version, or next-boot to=20
try whether an updated kernel improves the situation!) will remove the=20
only copy of it..

I am not sure what we can do here (except making the message more=20
prominent?) - failing apt is ugly, removing the kernel on the ESP when=20
removing it from /boot despite it being pinned only makes it worse..

OTOH since a pinned kernel is by definition never auto-removed, hooking=20
into the APT hook might work since that would mean the removal is never=20
started, and the resulting dpkg/apt state is clean? obviously only=20
possible for our kernels where we know the naming scheme, anything=20
custom could still run into the issue..

> For systemd-boot we write the entry into the loader.conf on the ESP(s)
> instead of relying on the `bootctl set-default` mechanics (bootctl(1))
> which write the entry in an EFI-var. This was preferred, because of a
> few reports of unwriteable EFI-vars on some systems (e.g. DELL servers
> have a setting preventing writing EFI-vars from the OS). The rationale
> in `Why not simply rely on the EFI boot menu logic?` from [0] also
> makes a few points in that direction.
>=20
> For grub the following choices were made:
> * write the pinned version (or actually the menu-path leading to it)
>   to /etc/default/grub instead of editing the grub.cfg files on the
>   partition. Mostly to divert as little as possible from the
>   grub-workflow I assume people are used to.

did you test whether adding a snippet overriding GRUB_DEFAULT also=20
works? we already do that to set the distributor for the various=20
products.. creating/deleting a=20

/etc/default/grub.d/y_proxmox_pinned_kernel.cfg

and (if we want to make the latter be separate from pinning, see other=20
mail)

/etc/default/grub.d/z_proxmox_next_boot.cfg

seems like the cleaner approach compared to modifying the admin-managed=20
/etc/default/grub ..

> * the 'root-device-id' part of the menu-entries is parsed from
>   /boot/grub/grug.cfg since it was stable (the same on all ESPs and in
>   /boot/grub), saves us from copying the part of "find device behind
>   /, mangle it if zfs/btrfs, call grub_probe a few times" part of
>   grub-mkconfig - and seems a bit more robust
>=20
> Tested with a BIOS and an UEFI VM with / on ZFS.
>=20
> [0] https://systemd.io/BOOT_LOADER_SPECIFICATION/
>=20
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>