From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pve-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id DA7C31FF172 for <inbox@lore.proxmox.com>; Tue, 1 Apr 2025 13:37:58 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 60B6E1FFB8; Tue, 1 Apr 2025 13:37:47 +0200 (CEST) Message-ID: <247c3273-6fc2-4c2e-96c7-53c8eb037e4a@proxmox.com> Date: Tue, 1 Apr 2025 13:37:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta To: Thomas Lamprecht <t.lamprecht@proxmox.com>, Proxmox VE development discussion <pve-devel@lists.proxmox.com> References: <20250324111529.338025-1-alexandre.derumier@groupe-cyllene.com> <mailman.127.1742814976.359.pve-devel@lists.proxmox.com> <502327008.3598.1743501156797@webmail.proxmox.com> <ce2544c2-d467-4eff-ba4f-6df6c3350fe2@proxmox.com> <52ef2b59-21ec-4a6c-b528-47f1e11c691e@proxmox.com> <6b44b21c-5399-47a5-8e06-3c1d3e1eaab9@proxmox.com> Content-Language: en-US From: Dominik Csapak <d.csapak@proxmox.com> In-Reply-To: <6b44b21c-5399-47a5-8e06-3c1d3e1eaab9@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.021 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pve-devel] [PATCH qemu-server 1/1] qemu: add offline migration from dead node X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/> List-Post: <mailto:pve-devel@lists.proxmox.com> List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com> On 4/1/25 12:46, Thomas Lamprecht wrote: > Am 01.04.25 um 12:19 schrieb Dominik Csapak: >> while i also agree to all said here, I have one counter point to offer: >> >> In the case that such an operation is necessary (e.g. HA is not wanted/needed/possible >> for what ever reason), the user will fall back to do it manually (iow. 'mv source target') >> which is at least as dangerous as exposing over the API, since >> >> * now the admins sharing the system must share root@pam credentials (ssh/console access) >> (alternatively setup sudo, which has it's own problems) > > Setups with many admins need to handle already how they can log in as root, be > it through a jump user (`doas` is a thing if sudo is deemed to complex), some > identity provider (LDAP, OIDC, ... with PAM configuration), as root operations > are required for other things too. You're right. > >> * it promotes manually modifying /etc/pve/ content > > Yeah, as that's what's actually required after manual assessment, abstracting > that away won't really bring a big benefit IMO. It would reduce the necessity to do things via the CLI, which is IMO a strong point of PVE (but you're right, the assessment part can't be removed anyway) > >> >> * any error could be even more fatal than if done via the API >> (e.g. mv of the wrong file, from the wrong node, etc.) > > This cannot be said for sure, these are unknown unknowns. FWIW, the API could > make it worse too compared to an admin carefully fixing this according to the > needs of a specific situation at hand. Mhmm, what I meant here is that instructing the user to manually do 'mv some-path some-other-path' has more error potential (e.g. typos, misremembering nodenames/vmids/etc.) than e.g. clicking the vm on the offline node and pressing a button (or following a CLI tool output/options) > >> IMHO ways forward for this scenario could be: >> >> * use cluster level locking only for config move? (not sure if performance is still >> a concern for this action, since parallel moves don't happen too much?) > > What does this solve? The old node is still in an unknown state and does not > sees any pmxcfs changes at all. The VM can still run and cause issues with > duplicate unsynchronized resource access and all the other woes that can > happen if the same guest runs twice. I mentioned it because fabian wrote we could maybe solve it with a cluster wide VM lock, I think restricting the moving to such a lock could work, under the assumption that the admin makes sure the offline node is and stays offline. (Which he has to do anyway) > >> >> * provide a special CLI tool/cmd to deal with that -> would minimize potential >> errors but is still contained to root equivalent users > > This would still have your own arguments w.r.t. root login speaking against > that. And it would not be that big of a difference as for local involved > resources the tool cannot work if the source node cannot be talked with and > for all-shared resources the simple config move is as safe as such a tool > would get in the context of a dead source node, as for either the admin must > ensure it's actually dead. It still improves the UX for that situation since it's then a provided/guided way vs. mv'ing files on the filesystem. (e.g. such a tool could check if the source node is reachable, etc.) > >> * link to the doc section for it from the UI with a big caveat >> https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_recovery > > As Fabian wrote, such disclaimers might be nice for shifting the blame > but are not enough in practice for such an operation. > > And Fabians point wasn't that doing it on the CLI is less dangerous, its > about the same either way, but that exposing this as well-integrated feature > makes it seem much less dangerous to the user, especially those that are > less experienced and should be stumped and ask some support channel for > help. > > That said, the actual first step to move this forward would IMO be to create > an extensive documentation/how-to for how such things can be resolved and what > one needs to watch out for, sort of check-list style might be a good format. > As that alone should help users a lot already, and that would also make it > much clearer what a more integrated (semi-automated) way could look like. > Which could be a check tool that helps with assessing the recovery depending > on config, storage (types), network, mappings, ... which would ensure that > common issues/blockers are not missed and will even help experienced admins. > If that cannot be first documented and then optionally transformed into a > hands-off evaluation checker tool, or if that's deemed to not help users, I > really do not see how an API integrated solution can do so without just > hand-waving all actual and real issues for why this does not already exists > away. > > Yes, e.g. that's what i meant with tooling on the CLI is one possibility to improve it. Also, as Fabian wrote in the other message, STONITH can improve that (but comes with it's own set of difficulties & complexity) Just to clarify, I'm not for blindly implementing such an API call/CLI tool/etc. but wanted to argue that we probably want to improve the UX of that situation as good as we can and offered my thoughts on how we could do it. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel