From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pdm-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 316EB1FF16F for <inbox@lore.proxmox.com>; Thu, 30 Jan 2025 16:48:16 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0393F16B1C; Thu, 30 Jan 2025 16:48:13 +0100 (CET) Message-ID: <922f4d93-55be-4bad-9064-5f1907ea83fa@proxmox.com> Date: Thu, 30 Jan 2025 16:48:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: pdm-devel@lists.proxmox.com From: Stefan Hanreich <s.hanreich@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.658 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pdm-devel] RFC: Synchronizing configuration changes across remotes X-BeenThere: pdm-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Datacenter Manager development discussion <pdm-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pdm-devel/> List-Post: <mailto:pdm-devel@lists.proxmox.com> List-Help: <mailto:pdm-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Datacenter Manager development discussion <pdm-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pdm-devel-bounces@lists.proxmox.com Sender: "pdm-devel" <pdm-devel-bounces@lists.proxmox.com> I'm currently working on the SDN integration and for that I need a way to deploy SDN configuration changes to multiple remotes simultaneously. In general I will need to do the following: * Create / Update / Delete some parts of the SDN configuration of multiple remotes, preferably synchronized across the remotes. * Apply the new SDN configuration (possibly opt-in) for all/some nodes in multiple remotes During this operation it would make sense to make sure that there are no pending changes in the SDN configuration, so users do not accidentally apply unrelated changes via PDM. We also need to prevent any concurrent SDN configuration changes for the same reason - so we don't apply any unrelated configuration. The question is: Do we also want to be able to prevent concurrent changes across multiple remotes, or are we fine with only preventing concurrent changes on a remote level? With network configuration affecting more than one remote, I think it would be better to synchronize changes across remotes since oftentimes applying the configuration to only one remote doesn't really make sense and the failure to apply configuration could affect the other remote. The two options I see, depending on the answer to that question: * introducing some form of lock that prevents any changes to the SDN configuration from other sources * do something based on the current digest functionality The general process for making changes to the SDN configuration would look as follows with the lock-based approach: * check for pending changes, and if there are none: lock the SDN configuration (atomically in one API call) * make the changes to the SDN configuration * apply the SDN configuration changes * release the lock * In the case of errors we can rollback the configuration changes and then release all locks. I currently gravitate towards the lock-based approach due to the following reasons: * It enables us to synchronize changes across multiple remotes - as compared to a digest based approach. * It's a lot more ergonomic for developers, since you simply acquire/release the lock. With a digest-based approach, modifications that require multiple API calls need to acquire a new digest everytime and track it across multiple API calls. With SDN specifically, when applying the configuration, we need to provide and check the digest as well. * It is just easier to prevent concurrent changes in the first place rather than reacting to them. If they cannot occur, then rollbacking is easier and less error-prone since the developer can assume nothing changed in the previously handled remotes as well. The downsides of this approach I can see: * It requires sweeping changes to basically the whole SDN API, and keeping backwards compatibility is harder. * Also, many API endpoints in PVE already provide the digest functionality, so it would be a lot easier to retro-fit this for usage with PDM and possibly require no changes at all. * In case of failures on the PDM side it is harder to recover, since it requires manual intervention (removing the lock manually). For single configuration files the digest-based approach could work quite well in cases where we don't need to synchronize changes across multiple remotes. But for SDN the digest-based approach is a bit more complicated: We currently generate digests for each section in the configuration file, instead of for the configuration file as a whole. This would be relatively easy to add though. The second problem is that the configuration is split across multiple files, so we'd need to either look at all digests of all configuration files in all API calls or check a 'global' SDN configuration digest on every call. Again, certainly solvable but also requires some work. Since even with our best effort we will run into situations where the lock doesn't get properly released, a simple escape hatch to unlock the SDN config should be provided (like qm unlock). One such scenario would be PDM losing connectivity to one of the remotes while holding the lock, there's not really anything we can do there. Since we probably need some form of doing this with other configuration files as well, I wanted to ask for your input. I think this concept could be applied generally to configuration changes that need to be made synchronized across multiple remotes (syncing firewall configuration comes to mind). This is just a rough draft on how this could work and I probably oversaw some edge-cases. I'm happy for any input or alternative ideas! _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel