From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id BC59C1FF138 for ; Mon, 01 Jun 2026 09:20:19 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 10F711083D; Mon, 1 Jun 2026 09:20:18 +0200 (CEST) Message-ID: Date: Mon, 1 Jun 2026 09:19:35 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [RFC PATCH datacenter-manager 1/4] server: connection: multi client: use correct client error for retrying To: Thomas Lamprecht , pdm-devel@lists.proxmox.com References: <20260529133026.3149896-1-d.csapak@proxmox.com> <20260529133026.3149896-2-d.csapak@proxmox.com> Content-Language: en-US From: Dominik Csapak In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1780298350026 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.049 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: ZXJRTFDQXUUR54K7PU4FBJF7P3GH54FN X-Message-ID-Hash: ZXJRTFDQXUUR54K7PU4FBJF7P3GH54FN X-MailFrom: d.csapak@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Datacenter Manager development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 5/30/26 1:24 AM, Thomas Lamprecht wrote: > Am 29.05.26 um 15:30 schrieb Dominik Csapak: >> Since commit >> 8e3b1f06 (client: split out Connect error and factor out error classification) >> in the proxmox workspace, Connection errors are split out to it's own >> type of `proxmox_client::Error`s. Since then, every connection error >> was returned normally and not retried via other hosts from the remote >> config. >> >> Use the correct error type to only retry when the actual connection is >> not working, not if the api or the server returns an issue. >> >> Signed-off-by: Dominik Csapak >> --- >> Not sure if just changing this is a good idea, or if we want to catch >> both connect and client errors? tracking only connect errors seemed >> the correct way to me here, but maybe someone has a different opinion. > > This is not exactly how it was intended and also not fully correct, but > many thanks for reminding me about having to pick this up proper. > > This was triggered originally by the occasionally noticed multi client > error pop up I saw on long living tab with a PDM open, e.g. after having > been idle for a while and then noticing that it basically always affected a > remote with a single node (but did not a thorough statistical probe here). > When checking the multi client's retry logic with that in mind I noticed that > any stale connection or blip for a remote with just one node made the retry > logic useless. And when wanting to fix that I also stumbled into the client > here and the error types. > > Anyway, pushed now three commits that should fix that proper, one for the > client [0] as there was still an issue left, and then two commits in PDM that > fix the error usage [1] (replacing your patch here) and implement a final > fallback to the first node [2], that short-circuits to the sole node for > single-node remotes immediately and for clusters it retries the first, in > the (rare but not theoretic) case all other tracked nodes are genuinely down > and the first one just had a network blip it will help and has no real > downside (given my implementation is correct naturally ;P). > > [0]: https://git.proxmox.com/?p=proxmox.git;a=commitdiff;h=52cc43f90b95d650286a568ed737264631f333e9 > [1]: https://git.proxmox.com/?p=proxmox-datacenter-manager.git;a=commitdiff;h=2807bc729e131ea2cbce247bfa39c2e3db3f7fa6 > [2]: https://git.proxmox.com/?p=proxmox-datacenter-manager.git;a=commitdiff;h=59135da87a5a0df2b0d3a0c82e8f282fb9ef0657 nice thanks, but after reading the commits wouldn't this now not also retry the api call if it was a client one? it the first node returns a 'Client' error for any api call, it will still retry on the second node, etc. ? the new code only does not retry the first node if it was a client error? (maybe I'm misunderstanding something here though, the code with its Iterator/macro use is not super trivial)