From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 58E081FF13B for ; Mon, 08 Jun 2026 15:26:14 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 32F941D923; Mon, 8 Jun 2026 15:26:14 +0200 (CEST) From: Dominik Csapak To: pdm-devel@lists.proxmox.com Subject: [PATCH datacenter-manager v2 0/4] implement back-off mechanism for connection errors for remotes Date: Mon, 8 Jun 2026 15:25:28 +0200 Message-ID: <20260608132539.2949407-1-d.csapak@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.049 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [connection.rs,mod.rs] Message-ID-Hash: WNUVUJHUS4MQ2XV2TTCYMO5MPWTVPG3F X-Message-ID-Hash: WNUVUJHUS4MQ2XV2TTCYMO5MPWTVPG3F X-MailFrom: d.csapak@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Datacenter Manager development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: When a remote is not reachable (e.g. network outage, crashes, etc), PDM tries to connect on every attempt with a timeout. This leads to heavily delayed api calls in the PDM UI. To counter that, this series implements a basic back-off mechanism that increases the time between actual api calls in an exponential way (up to a maximum). For details on how the back-off mechanism works see patch 1/4 Possible Improvements/Future Work: * We could expose the back-off values via a config (either global or per remote) to give the admin some fine grained control over this behavior * There is still quite a bit of logs after this, but this can be cleaned up/improved upon later too. changes from v1: * rebased on master (dropped equivalent patches) * rework most of the code Dominik Csapak (4): server: remote cache: prepare for back-off mechanism server: remote cache: introduce canary remote when none is reachable server: connection: multi-client: use back-off state from remote cache tasks: remote node mapping: use host cache for PBS too .../tasks/remote_node_mapping.rs | 34 ++- server/src/connection.rs | 118 +++++++--- server/src/remote_cache/back_off.rs | 128 ++++++++++ server/src/remote_cache/mod.rs | 222 +++++++++++++++++- 4 files changed, 439 insertions(+), 63 deletions(-) create mode 100644 server/src/remote_cache/back_off.rs -- 2.47.3