public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: [pve-devel] applied: [PATCH access-control] auth key: fix double rotation in clusters
Date: Fri, 15 Jul 2022 13:39:35 +0200	[thread overview]
Message-ID: <f83363a2-74b3-6fe2-82f7-ef6d7932f066@proxmox.com> (raw)
In-Reply-To: <20220713131359.2771787-1-f.gruenbichler@proxmox.com>

Am 13/07/2022 um 15:13 schrieb Fabian Grünbichler:
> there is a (hard to trigger) race that can cause a double rotation of
> the auth key, with potentially confusing fallout (various processes on
> different nodes having an inconsistent view of the current and previous
> auth keys, resulting in "random" invalid ticket errors until the next
> proper key rotation 24h later).
> 
> the underlying cause is that `stat()` calls are excempt from our
> otherwise non-cached/direct_io handling of pmxcfs I/O, which allows the
> following sequence of events to take place:
> 
> LAST: mtime of current auth key
> 
> - current epoch advances to LAST + 24h
> 
> the following can be arbitrarly interleaved between node A and B:
> - LAST+24h node A: pvedaemon/pvestatd on node A calls check_authkey(1)
> - LAST+24h node A: it returns 0 (rotation required)
> - LAST+24h node A: rotate_key() is called
> - LAST+24h node A: cfs_lock_authkey is called
> - LAST+24h node B: pvedaemon/pvestatd calls check_authkey(1)
> - LAST+24h node B: key is not yet cached in-memory by current process
> - LAST+24h node B: key file is opened, stat-ed, read, parsed, and content+mtime
>   is cached (the kernel will now cache this stat result for 1s unless
>   the path is opened)
> - LAST+24h node B: it returns 0 (rotation required)
> - LAST+24h node B: rotate_key() is called
> - LAST+24h node B: cfs_lock_authkey is called
> 
> the following is mutex-ed via a cfs_lock:
> - LAST+24h node A: lock is obtained
> - LAST+24h node A: check_authkey() is called
> - LAST+24h node A: key is stat-ed, mtime is still (correctly) LAST,
>   cached mtime and content are returned
> - LAST+24h node A: it returns 0 (rotation still required)
> - LAST+24h node A: get_pubkey() is called and returns current auth key
> - LAST+24h node A: new keypair is generated and persisted
> - LAST+24h node A: cfs_lock is released
> - LAST+24h node B: changes by node A are processed by pmxcfs
> - LAST+24h node B: lock is obtained
> - LAST+24h node B: check_authkey() is called
> - LAST+24h node B: key is stat-ed, mtime is (incorrectly!) still LAST
>   since the stat call is handled by the kernel/page cache, not by
>   pmxcfs, cached mtime and content are returned
> - LAST+24h node B: it returns 0 (rotation still required)
> - LAST+24h node B: get_pubkey() is called and returns either previous or
>   key written by node A (depending on whether page cache or pmxcfs
>   answers stat call)
> - LAST+24h node B: new keypair is generated, key returned by last
>   get_pubkey call is written as old key
> 
> the end result is that some nodes and process will treat the key
> generated by node A as "current", while others will treat the one
> generated by nodoe B as "current". both have the same mtime, so the
> in-memory cache hash won't be updated unless the service is restarted or
> another rotation happens. depending on who generated the ticket and who
> attempts validating it, a ticket might be rejected as invalid even
> though the generating party would treat it as valid, and time on all
> nodes is properly synced.
> 
> there seems to be now way for pmxcfs to pro-actively invalidate the page
> cache entry safely (since we'd need to do so while writes to the same
> path can happen concurrently), so work around by forcing an open/close
> at the (stat) call site which does the work for us. regular reads are
> not affected since those already bypass the page cache entirely anyway.
> 
> thankfully in almost all cases, the following sequence has enough
> synchronization overhead baked in to avoid triggering the issue almost
> entirely:
> 
> - cfs_lock
> - generate key
> - create tmp file for old key
> - write tmp file
> - rename tmp file into proper place
> - create tmp file for new pub key
> - write tmp file
> - rename tmp file into proper place
> - create tmp file for new priv key
> - write tmp file
> - rename tmp file into proper place
> - release lock
> 
> that being said, there has been at least one report where this was
> triggered in the wild from time to time.
> 
> it is easy to reproduce by increasing the attr_timeout and entry_timeout
> fuse settings inside pmxcfs to increase the time stat results are
> treated as valid/retained in the page cache:
> 
> -----8<-----
> 

applied, thanks!




      reply	other threads:[~2022-07-15 11:40 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13 13:13 [pve-devel] " Fabian Grünbichler
2022-07-15 11:39 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f83363a2-74b3-6fe2-82f7-ef6d7932f066@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal