From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id D01D61FF149 for ; Sat, 30 May 2026 01:25:28 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 905E919B3; Sat, 30 May 2026 01:25:27 +0200 (CEST) Message-ID: Date: Sat, 30 May 2026 01:25:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [RFC PATCH datacenter-manager 1/4] server: connection: multi client: use correct client error for retrying To: Dominik Csapak , pdm-devel@lists.proxmox.com References: <20260529133026.3149896-1-d.csapak@proxmox.com> <20260529133026.3149896-2-d.csapak@proxmox.com> Content-Language: en-US From: Thomas Lamprecht In-Reply-To: <20260529133026.3149896-2-d.csapak@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1780097092548 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.005 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Message-ID-Hash: 3YBWPHP2ZNSQLYIIOLXYE6FBVMLZ4WUX X-Message-ID-Hash: 3YBWPHP2ZNSQLYIIOLXYE6FBVMLZ4WUX X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Datacenter Manager development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 29.05.26 um 15:30 schrieb Dominik Csapak: > Since commit > 8e3b1f06 (client: split out Connect error and factor out error classification) > in the proxmox workspace, Connection errors are split out to it's own > type of `proxmox_client::Error`s. Since then, every connection error > was returned normally and not retried via other hosts from the remote > config. > > Use the correct error type to only retry when the actual connection is > not working, not if the api or the server returns an issue. > > Signed-off-by: Dominik Csapak > --- > Not sure if just changing this is a good idea, or if we want to catch > both connect and client errors? tracking only connect errors seemed > the correct way to me here, but maybe someone has a different opinion. This is not exactly how it was intended and also not fully correct, but many thanks for reminding me about having to pick this up proper. This was triggered originally by the occasionally noticed multi client error pop up I saw on long living tab with a PDM open, e.g. after having been idle for a while and then noticing that it basically always affected a remote with a single node (but did not a thorough statistical probe here). When checking the multi client's retry logic with that in mind I noticed that any stale connection or blip for a remote with just one node made the retry logic useless. And when wanting to fix that I also stumbled into the client here and the error types. Anyway, pushed now three commits that should fix that proper, one for the client [0] as there was still an issue left, and then two commits in PDM that fix the error usage [1] (replacing your patch here) and implement a final fallback to the first node [2], that short-circuits to the sole node for single-node remotes immediately and for clusters it retries the first, in the (rare but not theoretic) case all other tracked nodes are genuinely down and the first one just had a network blip it will help and has no real downside (given my implementation is correct naturally ;P). [0]: https://git.proxmox.com/?p=proxmox.git;a=commitdiff;h=52cc43f90b95d650286a568ed737264631f333e9 [1]: https://git.proxmox.com/?p=proxmox-datacenter-manager.git;a=commitdiff;h=2807bc729e131ea2cbce247bfa39c2e3db3f7fa6 [2]: https://git.proxmox.com/?p=proxmox-datacenter-manager.git;a=commitdiff;h=59135da87a5a0df2b0d3a0c82e8f282fb9ef0657