From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>,
"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: [pve-devel] applied: [PATCH access-control] auth key: fix double rotation in clusters
Date: Fri, 15 Jul 2022 13:39:35 +0200 [thread overview]
Message-ID: <f83363a2-74b3-6fe2-82f7-ef6d7932f066@proxmox.com> (raw)
In-Reply-To: <20220713131359.2771787-1-f.gruenbichler@proxmox.com>
Am 13/07/2022 um 15:13 schrieb Fabian Grünbichler:
> there is a (hard to trigger) race that can cause a double rotation of
> the auth key, with potentially confusing fallout (various processes on
> different nodes having an inconsistent view of the current and previous
> auth keys, resulting in "random" invalid ticket errors until the next
> proper key rotation 24h later).
>
> the underlying cause is that `stat()` calls are excempt from our
> otherwise non-cached/direct_io handling of pmxcfs I/O, which allows the
> following sequence of events to take place:
>
> LAST: mtime of current auth key
>
> - current epoch advances to LAST + 24h
>
> the following can be arbitrarly interleaved between node A and B:
> - LAST+24h node A: pvedaemon/pvestatd on node A calls check_authkey(1)
> - LAST+24h node A: it returns 0 (rotation required)
> - LAST+24h node A: rotate_key() is called
> - LAST+24h node A: cfs_lock_authkey is called
> - LAST+24h node B: pvedaemon/pvestatd calls check_authkey(1)
> - LAST+24h node B: key is not yet cached in-memory by current process
> - LAST+24h node B: key file is opened, stat-ed, read, parsed, and content+mtime
> is cached (the kernel will now cache this stat result for 1s unless
> the path is opened)
> - LAST+24h node B: it returns 0 (rotation required)
> - LAST+24h node B: rotate_key() is called
> - LAST+24h node B: cfs_lock_authkey is called
>
> the following is mutex-ed via a cfs_lock:
> - LAST+24h node A: lock is obtained
> - LAST+24h node A: check_authkey() is called
> - LAST+24h node A: key is stat-ed, mtime is still (correctly) LAST,
> cached mtime and content are returned
> - LAST+24h node A: it returns 0 (rotation still required)
> - LAST+24h node A: get_pubkey() is called and returns current auth key
> - LAST+24h node A: new keypair is generated and persisted
> - LAST+24h node A: cfs_lock is released
> - LAST+24h node B: changes by node A are processed by pmxcfs
> - LAST+24h node B: lock is obtained
> - LAST+24h node B: check_authkey() is called
> - LAST+24h node B: key is stat-ed, mtime is (incorrectly!) still LAST
> since the stat call is handled by the kernel/page cache, not by
> pmxcfs, cached mtime and content are returned
> - LAST+24h node B: it returns 0 (rotation still required)
> - LAST+24h node B: get_pubkey() is called and returns either previous or
> key written by node A (depending on whether page cache or pmxcfs
> answers stat call)
> - LAST+24h node B: new keypair is generated, key returned by last
> get_pubkey call is written as old key
>
> the end result is that some nodes and process will treat the key
> generated by node A as "current", while others will treat the one
> generated by nodoe B as "current". both have the same mtime, so the
> in-memory cache hash won't be updated unless the service is restarted or
> another rotation happens. depending on who generated the ticket and who
> attempts validating it, a ticket might be rejected as invalid even
> though the generating party would treat it as valid, and time on all
> nodes is properly synced.
>
> there seems to be now way for pmxcfs to pro-actively invalidate the page
> cache entry safely (since we'd need to do so while writes to the same
> path can happen concurrently), so work around by forcing an open/close
> at the (stat) call site which does the work for us. regular reads are
> not affected since those already bypass the page cache entirely anyway.
>
> thankfully in almost all cases, the following sequence has enough
> synchronization overhead baked in to avoid triggering the issue almost
> entirely:
>
> - cfs_lock
> - generate key
> - create tmp file for old key
> - write tmp file
> - rename tmp file into proper place
> - create tmp file for new pub key
> - write tmp file
> - rename tmp file into proper place
> - create tmp file for new priv key
> - write tmp file
> - rename tmp file into proper place
> - release lock
>
> that being said, there has been at least one report where this was
> triggered in the wild from time to time.
>
> it is easy to reproduce by increasing the attr_timeout and entry_timeout
> fuse settings inside pmxcfs to increase the time stat results are
> treated as valid/retained in the page cache:
>
> -----8<-----
>
applied, thanks!
prev parent reply other threads:[~2022-07-15 11:40 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-13 13:13 [pve-devel] " Fabian Grünbichler
2022-07-15 11:39 ` Thomas Lamprecht [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f83363a2-74b3-6fe2-82f7-ef6d7932f066@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.