From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 450C1B8DB0
 for <pbs-devel@lists.proxmox.com>; Wed,  6 Dec 2023 15:33:25 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 158BB3A57
 for <pbs-devel@lists.proxmox.com>; Wed,  6 Dec 2023 15:33:25 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Wed,  6 Dec 2023 15:33:24 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 150C44250A
 for <pbs-devel@lists.proxmox.com>; Wed,  6 Dec 2023 15:33:24 +0100 (CET)
Date: Wed, 6 Dec 2023 15:33:23 +0100 (CET)
From: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
To: Gabriel Goller <g.goller@proxmox.com>,
 Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
Message-ID: <1189176039.2029.1701873203100@webmail.proxmox.com>
In-Reply-To: <9696737a-6235-4f9f-92ac-f92418dba4ed@proxmox.com>
References: <20231206132834.240700-1-g.goller@proxmox.com>
 <1764237283.1899.1701870086441@webmail.proxmox.com>
 <2507e464-7b0a-4814-b089-dc5b1d8d2904@proxmox.com>
 <695531623.1949.1701872060137@webmail.proxmox.com>
 <9696737a-6235-4f9f-92ac-f92418dba4ed@proxmox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Priority: 3
Importance: Normal
X-Mailer: Open-Xchange Mailer v7.10.6-Rev55
X-Originating-Client: open-xchange-appsuite
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.064 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox.com]
Subject: Re: [pbs-devel] [PATCH v2 proxmox{,
 -backup} 0/2] Move ProcessLocker to tmpfs
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 06 Dec 2023 14:33:25 -0000


> Gabriel Goller <g.goller@proxmox.com> hat am 06.12.2023 15:21 CET geschri=
eben:
>=20
> =20
> On 12/6/23 15:14, Fabian Gr=C3=BCnbichler wrote:
> >> Gabriel Goller <g.goller@proxmox.com> hat am 06.12.2023 14:56 CET gesc=
hrieben:
> >> On 12/6/23 14:41, Fabian Gr=C3=BCnbichler wrote:
> >>> [..]
> >> Just spoke with Stefan Sterz about this and we will probably
> >> apply/release this with a major version bump (kernel update), so that
> >> the user
> >> is forced to reboot the system (same as with his tmpfs locking series)=
.
> >> I don't think there is another way, because the lockfiles get moved to
> >> another dir. Although F_SETLK and F_OFD_SETLK should be compatible,
> >> so having one process use F_SETLK and another F_OFD_SETLK *should* sti=
ll
> >> work (don't take my word for it though).
> > that doesn't really help though, unless we also add machinery to detect=
 the missing reboot and block any process-locker-requiring stuff in the new=
 process until it has happened? or we make "set all datastores to read-only=
 or offline" a requirement for upgrading from 3 to 4, instead of optional l=
ike for 2 to 3[0]. otherwise even just the time between "postinst of PBS pa=
ckage is called" to "upgrade of whole system is fully done" can be big enou=
gh to cause a problem..
> >
> > 0: https://pbs.proxmox.com/wiki/index.php/Upgrade_from_2_to_3#Optional:=
_Enable_Maintenance_Mode
> That's a good idea.
> Optionally we could also somehow remove the `.lock` file in the=20
> datastore and remove the `.create(true)`,
> so that creating the 'old' `.lock` file will fail?
> Although not sure how we would do this...

I don't see that working with the old code still running? and if the old co=
de is not running (anymore), we don't have the problem anyway ;)

> But can we also somehow force the user to have the datastore in a=20
> maintenance mode? I guess not...

forcing is hard, but we could both
- make it a required step in the upgrade guide (it's not our fault then if =
the user didn't follow it ;))
- check in post-inst, print a big fat warning, and *not reload* but just ke=
ep the old process running

that way, the user will only get an actual 4.x process running if they manu=
ally reload or restart the service(s), or reboot the machine:
- reload could be handled by touching a flag file in tmpfs in postinst if t=
he maintenance-mode pre-requisites are not met, and refusing to reload if i=
t is found (that part could already be added to 3.x if needed)
- restart and reboot are okay, since in both cases the old process is kille=
d/stopped, and no lock path mismatch can happen

still, the other variant with passing a long the "need to double-lock" flag=
 would also not be too complex I think if we don't want to wait that long -=
 postinst touches a flag file in tmpfs before reloading (on the first upgra=
de from a pre-change version), as long as that file exists the new code use=
s a compat mode that obtains both old and new lock paths. once the flag fil=
e is gone (reboot, or process detects no more old processes are around), th=
e compat code path becomes dead code at runtime, and can be removed altoget=
her with the next major release.