From: Stefan Hanreich <s.hanreich@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
Proxmox Datacenter Manager development discussion
<pdm-devel@lists.proxmox.com>
Subject: Re: [pdm-devel] RFC: Synchronizing configuration changes across remotes
Date: Tue, 4 Feb 2025 11:34:25 +0100 [thread overview]
Message-ID: <6a9d506b-8f50-4a00-97bd-825060f3d06f@proxmox.com> (raw)
In-Reply-To: <8c24a4ad-2b53-49ac-acf9-b2f191fbeb8b@proxmox.com>
On 2/3/25 18:02, Thomas Lamprecht wrote:
> Yeah, digest is not giving you anything here, at least for anything that
> consists of more than one change; and adding a dedicated central API
> endpoint for every variant of batch update we might need seems hardly
> scalable nor like good API design.
Yes, although I've considered adding an endpoint for getting / setting
the whole SDN configuration at some point, but I've scrapped that since
it's unnecessary for what I'm currently implementing (adding single
zones / vnets / ...).
> Does it really require sweeping changes? I'd think modifications are
> already hedging against concurrent access now, so this should not mean
> we change to a completely new edit paradigm here.
We'd at least have to touch every non-read request in SDN to check for
the global lock - but yes, the wording is a bit overly dramatic. We
already have an existing lock_sdn_config, so adding another layer of
locking there shouldn't be an issue. If I decide to go for the .new
config route described below, this will be a bit more involved though.
> My thoughts when we talked was to go roughly for:
> Add a new endpoint that 1) ensure basic healthiness and 2) registers a
> lock for the whole, or potentially only some parts, of the SDN stack.
> This should work by returning a lock-cookie random string to be used by
> subsequent calls to do various updates in one go while ensuring nothing
> else can do so or just steal our lock. Then check this lock centrally
> on any write-config and be basically done I think?
That was basically what I envisioned as the implementation for the
lock too.
> A slightly more elaborate variant might be to also split the edit step,
> i.e.
> 1. check all remotes and get lock
> 2. extend the config(s) with a section (or a separate ".new" config) for
> pending changes, write all new changes to that.
> 3. commit the pending sections or .new config file.
>
> With that you would have the smallest possibility for failure due to
> unrelated node/connection hickups and reduce the time gap for actually
> activating the changes. If something is off an admin even could manually
> apply these directly on the cluster/nodes.
This sounds like an even better idea, I'll look into how I could
implement that. As a first step, I think I'll simply go for the
lock-cookie approach, since we can always implement this more elaborate
approach on top of that.
>> * In case of failures on the PDM side it is harder to recover, since
>> it requires manual intervention (removing the lock manually).
>
> Well, a partially rolled out SDN update might always be (relatively)
> hard to recover from; which approach would avoid that (and not require
> paxos, or raft level guarantees)?
One idea that came to my mind was automatic rollback after a timeout if
some health check on the PVE side fails, similar to when you change
resolution in a graphics driver.
> FWIW, we already got pmxcfs backed domain locks, which I added for the
> HA stack back in the day. These allow relatively cheaply to take a lock
> that only one pmxcfs instance (i.e., one node) at a time can hold. Pair
> that with some local lock (e.g., flock, in single-process, many threads
> rust land it could be an even cheaper mutex) and you can quite simply
> and not to expensively lock edits – and I'd figure SDN modifications do
> not have _that_ high of a frequency to make performance here to critical
> for such locking to become a problem.
I'll look into those - thanks for the pointer.
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
prev parent reply other threads:[~2025-02-04 10:34 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-30 15:48 Stefan Hanreich
2025-02-03 17:02 ` Thomas Lamprecht
2025-02-04 10:34 ` Stefan Hanreich [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6a9d506b-8f50-4a00-97bd-825060f3d06f@proxmox.com \
--to=s.hanreich@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal