From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>, pdm-devel@lists.proxmox.com
Subject: Re: [PATCH datacenter-manager 2/4] server: remote cache: prepare for back-off mechanism
Date: Sat, 30 May 2026 01:40:29 +0200 [thread overview]
Message-ID: <683ab04a-a2e6-4f29-9eb4-21b0f5464879@proxmox.com> (raw)
In-Reply-To: <20260529133026.3149896-3-d.csapak@proxmox.com>
Am 29.05.26 um 15:30 schrieb Dominik Csapak:
> this introduces a new field for the RemoteMappingCache that contains the
> current status of a 'BackOffState'. This is intended to mark remotes as
> unreachable when the connection to them fails and only to retry if
> enough time elapsed. This is to prevent sending numerous connections out
> to a remote that is known to not be reachable.
>
> The back-off timeout is increased exponentially from 10 seconds up to
> 600 seconds, so at most it takes 10 minutes for a remote to be reachable
> again if it was offline for a prolonged period of time.
I'd prefer if we only start this backoff after a remote having been
failing already for a little while (10m to 1h) and capping rechecking
still at a relatively low period, like 1m to 5min. That already cuts
the checks down quite a bit and still keeps it responsive. One idea
might be to also scale this depending on other remotes being configured
and online, i.e. if all are offline use the first for a heuristic more
frequent polling and reset backoff if that turns back online. Or if
only a subset of nodes of a single remote are offline given them a
higher delay. This fine tuning could provide quite a bit better UX,
but might need some thoughts to encapsulate it right to avoid having
those edge cases leak all over the place into the code to handled.
What I'm generally missing:
- including some more details like the current check periods without
this in the commit message
- actually logging when we back off so that this can be traced and the
behavior understood by an admin (might be here or in 3/4, just mentioning
it in general).
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> Note that this now takes up to 10 minutes for pdm to mark a remote as
> reachable again, since it won't retry sooner. We could combat that by
> e.g. retrying every 10th connection, even if the back-off timeout has
> not run out yet. (probably has to be scaled by the nodes and tasks
> we are running?). Another possibility would be to have either a special
> API call to force refresh it, but my guess is that most users would
> just abuse that button?
>
> I'm very open for other ideas on how to improve this, maybe it's just
> a matter of finetuning the back-off scale and maximum to get a well
> working system.
next prev parent reply other threads:[~2026-05-29 23:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-29 13:30 [PATCH datacenter-manager 0/4] implement back-off mechanism for Dominik Csapak
2026-05-29 13:30 ` [RFC PATCH datacenter-manager 1/4] server: connection: multi client: use correct client error for retrying Dominik Csapak
2026-05-29 23:25 ` Thomas Lamprecht
2026-05-29 13:30 ` [PATCH datacenter-manager 2/4] server: remote cache: prepare for back-off mechanism Dominik Csapak
2026-05-29 23:40 ` Thomas Lamprecht [this message]
2026-05-29 13:30 ` [PATCH datacenter-manager 3/4] server: connection: multi-client: use back-off state from remote cache Dominik Csapak
2026-05-29 13:30 ` [PATCH datacenter-manager 4/4] server: pbs client: rework to use the back-off mechanism " Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=683ab04a-a2e6-4f29-9eb4-21b0f5464879@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.