From: Dominik Csapak <d.csapak@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>, pdm-devel@lists.proxmox.com
Subject: Re: [RFC PATCH datacenter-manager 1/4] server: connection: multi client: use correct client error for retrying
Date: Mon, 1 Jun 2026 09:19:35 +0200 [thread overview]
Message-ID: <e01d7208-6ff7-4d48-a8ae-1838741fb33f@proxmox.com> (raw)
In-Reply-To: <ebea3bce-d6e9-418d-bd51-cd9832be084d@proxmox.com>
On 5/30/26 1:24 AM, Thomas Lamprecht wrote:
> Am 29.05.26 um 15:30 schrieb Dominik Csapak:
>> Since commit
>> 8e3b1f06 (client: split out Connect error and factor out error classification)
>> in the proxmox workspace, Connection errors are split out to it's own
>> type of `proxmox_client::Error`s. Since then, every connection error
>> was returned normally and not retried via other hosts from the remote
>> config.
>>
>> Use the correct error type to only retry when the actual connection is
>> not working, not if the api or the server returns an issue.
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> Not sure if just changing this is a good idea, or if we want to catch
>> both connect and client errors? tracking only connect errors seemed
>> the correct way to me here, but maybe someone has a different opinion.
>
> This is not exactly how it was intended and also not fully correct, but
> many thanks for reminding me about having to pick this up proper.
>
> This was triggered originally by the occasionally noticed multi client
> error pop up I saw on long living tab with a PDM open, e.g. after having
> been idle for a while and then noticing that it basically always affected a
> remote with a single node (but did not a thorough statistical probe here).
> When checking the multi client's retry logic with that in mind I noticed that
> any stale connection or blip for a remote with just one node made the retry
> logic useless. And when wanting to fix that I also stumbled into the client
> here and the error types.
>
> Anyway, pushed now three commits that should fix that proper, one for the
> client [0] as there was still an issue left, and then two commits in PDM that
> fix the error usage [1] (replacing your patch here) and implement a final
> fallback to the first node [2], that short-circuits to the sole node for
> single-node remotes immediately and for clusters it retries the first, in
> the (rare but not theoretic) case all other tracked nodes are genuinely down
> and the first one just had a network blip it will help and has no real
> downside (given my implementation is correct naturally ;P).
>
> [0]: https://git.proxmox.com/?p=proxmox.git;a=commitdiff;h=52cc43f90b95d650286a568ed737264631f333e9
> [1]: https://git.proxmox.com/?p=proxmox-datacenter-manager.git;a=commitdiff;h=2807bc729e131ea2cbce247bfa39c2e3db3f7fa6
> [2]: https://git.proxmox.com/?p=proxmox-datacenter-manager.git;a=commitdiff;h=59135da87a5a0df2b0d3a0c82e8f282fb9ef0657
nice thanks, but after reading the commits wouldn't this now not also
retry the api call if it was a client one?
it the first node returns a 'Client' error for any api call,
it will still retry on the second node, etc. ?
the new code only does not retry the first node if it was a client
error?
(maybe I'm misunderstanding something here though, the code
with its Iterator/macro use is not super trivial)
next prev parent reply other threads:[~2026-06-01 7:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 13:30 [PATCH datacenter-manager 0/4] implement back-off mechanism for Dominik Csapak
2026-05-29 13:30 ` [RFC PATCH datacenter-manager 1/4] server: connection: multi client: use correct client error for retrying Dominik Csapak
2026-05-29 23:25 ` Thomas Lamprecht
2026-06-01 7:19 ` Dominik Csapak [this message]
2026-06-03 11:48 ` Thomas Lamprecht
2026-06-03 11:59 ` Dominik Csapak
2026-06-04 16:30 ` Thomas Lamprecht
2026-05-29 13:30 ` [PATCH datacenter-manager 2/4] server: remote cache: prepare for back-off mechanism Dominik Csapak
2026-05-29 23:40 ` Thomas Lamprecht
2026-05-29 13:30 ` [PATCH datacenter-manager 3/4] server: connection: multi-client: use back-off state from remote cache Dominik Csapak
2026-05-29 13:30 ` [PATCH datacenter-manager 4/4] server: pbs client: rework to use the back-off mechanism " Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e01d7208-6ff7-4d48-a8ae-1838741fb33f@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox