From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 450C1B8DB0 for ; Wed, 6 Dec 2023 15:33:25 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 158BB3A57 for ; Wed, 6 Dec 2023 15:33:25 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 6 Dec 2023 15:33:24 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 150C44250A for ; Wed, 6 Dec 2023 15:33:24 +0100 (CET) Date: Wed, 6 Dec 2023 15:33:23 +0100 (CET) From: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= To: Gabriel Goller , Proxmox Backup Server development discussion Message-ID: <1189176039.2029.1701873203100@webmail.proxmox.com> In-Reply-To: <9696737a-6235-4f9f-92ac-f92418dba4ed@proxmox.com> References: <20231206132834.240700-1-g.goller@proxmox.com> <1764237283.1899.1701870086441@webmail.proxmox.com> <2507e464-7b0a-4814-b089-dc5b1d8d2904@proxmox.com> <695531623.1949.1701872060137@webmail.proxmox.com> <9696737a-6235-4f9f-92ac-f92418dba4ed@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v7.10.6-Rev55 X-Originating-Client: open-xchange-appsuite X-SPAM-LEVEL: Spam detection results: 0 AWL 0.064 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pbs-devel] [PATCH v2 proxmox{, -backup} 0/2] Move ProcessLocker to tmpfs X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Dec 2023 14:33:25 -0000 > Gabriel Goller hat am 06.12.2023 15:21 CET geschri= eben: >=20 > =20 > On 12/6/23 15:14, Fabian Gr=C3=BCnbichler wrote: > >> Gabriel Goller hat am 06.12.2023 14:56 CET gesc= hrieben: > >> On 12/6/23 14:41, Fabian Gr=C3=BCnbichler wrote: > >>> [..] > >> Just spoke with Stefan Sterz about this and we will probably > >> apply/release this with a major version bump (kernel update), so that > >> the user > >> is forced to reboot the system (same as with his tmpfs locking series)= . > >> I don't think there is another way, because the lockfiles get moved to > >> another dir. Although F_SETLK and F_OFD_SETLK should be compatible, > >> so having one process use F_SETLK and another F_OFD_SETLK *should* sti= ll > >> work (don't take my word for it though). > > that doesn't really help though, unless we also add machinery to detect= the missing reboot and block any process-locker-requiring stuff in the new= process until it has happened? or we make "set all datastores to read-only= or offline" a requirement for upgrading from 3 to 4, instead of optional l= ike for 2 to 3[0]. otherwise even just the time between "postinst of PBS pa= ckage is called" to "upgrade of whole system is fully done" can be big enou= gh to cause a problem.. > > > > 0: https://pbs.proxmox.com/wiki/index.php/Upgrade_from_2_to_3#Optional:= _Enable_Maintenance_Mode > That's a good idea. > Optionally we could also somehow remove the `.lock` file in the=20 > datastore and remove the `.create(true)`, > so that creating the 'old' `.lock` file will fail? > Although not sure how we would do this... I don't see that working with the old code still running? and if the old co= de is not running (anymore), we don't have the problem anyway ;) > But can we also somehow force the user to have the datastore in a=20 > maintenance mode? I guess not... forcing is hard, but we could both - make it a required step in the upgrade guide (it's not our fault then if = the user didn't follow it ;)) - check in post-inst, print a big fat warning, and *not reload* but just ke= ep the old process running that way, the user will only get an actual 4.x process running if they manu= ally reload or restart the service(s), or reboot the machine: - reload could be handled by touching a flag file in tmpfs in postinst if t= he maintenance-mode pre-requisites are not met, and refusing to reload if i= t is found (that part could already be added to 3.x if needed) - restart and reboot are okay, since in both cases the old process is kille= d/stopped, and no lock path mismatch can happen still, the other variant with passing a long the "need to double-lock" flag= would also not be too complex I think if we don't want to wait that long -= postinst touches a flag file in tmpfs before reloading (on the first upgra= de from a pre-change version), as long as that file exists the new code use= s a compat mode that obtains both old and new lock paths. once the flag fil= e is gone (reboot, or process detects no more old processes are around), th= e compat code path becomes dead code at runtime, and can be removed altoget= her with the next major release.