From: Stefan Hanreich <s.hanreich@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
Proxmox Datacenter Manager development discussion
<pdm-devel@lists.proxmox.com>
Subject: Re: [pdm-devel] RFC: Synchronizing configuration changes across remotes
Date: Tue, 4 Feb 2025 11:34:25 +0100 [thread overview]
Message-ID: <6a9d506b-8f50-4a00-97bd-825060f3d06f@proxmox.com> (raw)
In-Reply-To: <8c24a4ad-2b53-49ac-acf9-b2f191fbeb8b@proxmox.com>
On 2/3/25 18:02, Thomas Lamprecht wrote:
> Yeah, digest is not giving you anything here, at least for anything that
> consists of more than one change; and adding a dedicated central API
> endpoint for every variant of batch update we might need seems hardly
> scalable nor like good API design.
Yes, although I've considered adding an endpoint for getting / setting
the whole SDN configuration at some point, but I've scrapped that since
it's unnecessary for what I'm currently implementing (adding single
zones / vnets / ...).
> Does it really require sweeping changes? I'd think modifications are
> already hedging against concurrent access now, so this should not mean
> we change to a completely new edit paradigm here.
We'd at least have to touch every non-read request in SDN to check for
the global lock - but yes, the wording is a bit overly dramatic. We
already have an existing lock_sdn_config, so adding another layer of
locking there shouldn't be an issue. If I decide to go for the .new
config route described below, this will be a bit more involved though.
> My thoughts when we talked was to go roughly for:
> Add a new endpoint that 1) ensure basic healthiness and 2) registers a
> lock for the whole, or potentially only some parts, of the SDN stack.
> This should work by returning a lock-cookie random string to be used by
> subsequent calls to do various updates in one go while ensuring nothing
> else can do so or just steal our lock. Then check this lock centrally
> on any write-config and be basically done I think?
That was basically what I envisioned as the implementation for the
lock too.
> A slightly more elaborate variant might be to also split the edit step,
> i.e.
> 1. check all remotes and get lock
> 2. extend the config(s) with a section (or a separate ".new" config) for
> pending changes, write all new changes to that.
> 3. commit the pending sections or .new config file.
>
> With that you would have the smallest possibility for failure due to
> unrelated node/connection hickups and reduce the time gap for actually
> activating the changes. If something is off an admin even could manually
> apply these directly on the cluster/nodes.
This sounds like an even better idea, I'll look into how I could
implement that. As a first step, I think I'll simply go for the
lock-cookie approach, since we can always implement this more elaborate
approach on top of that.
>> * In case of failures on the PDM side it is harder to recover, since
>> it requires manual intervention (removing the lock manually).
>
> Well, a partially rolled out SDN update might always be (relatively)
> hard to recover from; which approach would avoid that (and not require
> paxos, or raft level guarantees)?
One idea that came to my mind was automatic rollback after a timeout if
some health check on the PVE side fails, similar to when you change
resolution in a graphics driver.
> FWIW, we already got pmxcfs backed domain locks, which I added for the
> HA stack back in the day. These allow relatively cheaply to take a lock
> that only one pmxcfs instance (i.e., one node) at a time can hold. Pair
> that with some local lock (e.g., flock, in single-process, many threads
> rust land it could be an even cheaper mutex) and you can quite simply
> and not to expensively lock edits – and I'd figure SDN modifications do
> not have _that_ high of a frequency to make performance here to critical
> for such locking to become a problem.
I'll look into those - thanks for the pointer.
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
prev parent reply other threads:[~2025-02-04 10:34 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-30 15:48 Stefan Hanreich
2025-02-03 17:02 ` Thomas Lamprecht
2025-02-04 10:34 ` Stefan Hanreich [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6a9d506b-8f50-4a00-97bd-825060f3d06f@proxmox.com \
--to=s.hanreich@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal