From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id BCE837C514 for ; Fri, 15 Jul 2022 13:40:07 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id ACE382791A for ; Fri, 15 Jul 2022 13:39:37 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 15 Jul 2022 13:39:36 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id BBBEB4231C for ; Fri, 15 Jul 2022 13:39:36 +0200 (CEST) Message-ID: Date: Fri, 15 Jul 2022 13:39:35 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Thunderbird/103.0 From: Thomas Lamprecht To: Proxmox VE development discussion , =?UTF-8?Q?Fabian_Gr=c3=bcnbichler?= References: <20220713131359.2771787-1-f.gruenbichler@proxmox.com> Content-Language: en-GB In-Reply-To: <20220713131359.2771787-1-f.gruenbichler@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.004 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pve-devel] applied: [PATCH access-control] auth key: fix double rotation in clusters X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2022 11:40:07 -0000 Am 13/07/2022 um 15:13 schrieb Fabian Gr=C3=BCnbichler: > there is a (hard to trigger) race that can cause a double rotation of > the auth key, with potentially confusing fallout (various processes on > different nodes having an inconsistent view of the current and previous= > auth keys, resulting in "random" invalid ticket errors until the next > proper key rotation 24h later). >=20 > the underlying cause is that `stat()` calls are excempt from our > otherwise non-cached/direct_io handling of pmxcfs I/O, which allows the= > following sequence of events to take place: >=20 > LAST: mtime of current auth key >=20 > - current epoch advances to LAST + 24h >=20 > the following can be arbitrarly interleaved between node A and B: > - LAST+24h node A: pvedaemon/pvestatd on node A calls check_authkey(1) > - LAST+24h node A: it returns 0 (rotation required) > - LAST+24h node A: rotate_key() is called > - LAST+24h node A: cfs_lock_authkey is called > - LAST+24h node B: pvedaemon/pvestatd calls check_authkey(1) > - LAST+24h node B: key is not yet cached in-memory by current process > - LAST+24h node B: key file is opened, stat-ed, read, parsed, and conte= nt+mtime > is cached (the kernel will now cache this stat result for 1s unless > the path is opened) > - LAST+24h node B: it returns 0 (rotation required) > - LAST+24h node B: rotate_key() is called > - LAST+24h node B: cfs_lock_authkey is called >=20 > the following is mutex-ed via a cfs_lock: > - LAST+24h node A: lock is obtained > - LAST+24h node A: check_authkey() is called > - LAST+24h node A: key is stat-ed, mtime is still (correctly) LAST, > cached mtime and content are returned > - LAST+24h node A: it returns 0 (rotation still required) > - LAST+24h node A: get_pubkey() is called and returns current auth key > - LAST+24h node A: new keypair is generated and persisted > - LAST+24h node A: cfs_lock is released > - LAST+24h node B: changes by node A are processed by pmxcfs > - LAST+24h node B: lock is obtained > - LAST+24h node B: check_authkey() is called > - LAST+24h node B: key is stat-ed, mtime is (incorrectly!) still LAST > since the stat call is handled by the kernel/page cache, not by > pmxcfs, cached mtime and content are returned > - LAST+24h node B: it returns 0 (rotation still required) > - LAST+24h node B: get_pubkey() is called and returns either previous o= r > key written by node A (depending on whether page cache or pmxcfs > answers stat call) > - LAST+24h node B: new keypair is generated, key returned by last > get_pubkey call is written as old key >=20 > the end result is that some nodes and process will treat the key > generated by node A as "current", while others will treat the one > generated by nodoe B as "current". both have the same mtime, so the > in-memory cache hash won't be updated unless the service is restarted o= r > another rotation happens. depending on who generated the ticket and who= > attempts validating it, a ticket might be rejected as invalid even > though the generating party would treat it as valid, and time on all > nodes is properly synced. >=20 > there seems to be now way for pmxcfs to pro-actively invalidate the pag= e > cache entry safely (since we'd need to do so while writes to the same > path can happen concurrently), so work around by forcing an open/close > at the (stat) call site which does the work for us. regular reads are > not affected since those already bypass the page cache entirely anyway.= >=20 > thankfully in almost all cases, the following sequence has enough > synchronization overhead baked in to avoid triggering the issue almost > entirely: >=20 > - cfs_lock > - generate key > - create tmp file for old key > - write tmp file > - rename tmp file into proper place > - create tmp file for new pub key > - write tmp file > - rename tmp file into proper place > - create tmp file for new priv key > - write tmp file > - rename tmp file into proper place > - release lock >=20 > that being said, there has been at least one report where this was > triggered in the wild from time to time. >=20 > it is easy to reproduce by increasing the attr_timeout and entry_timeou= t > fuse settings inside pmxcfs to increase the time stat results are > treated as valid/retained in the page cache: >=20 > -----8<----- >=20 applied, thanks!