From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id F24271FF13E for ; Fri, 03 Apr 2026 15:27:43 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 96D8B4A9E; Fri, 3 Apr 2026 15:28:14 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Fri, 03 Apr 2026 15:28:11 +0200 Message-Id: Subject: superseded: [PATCH proxmox-backup v3 0/3] fix #7400: improve handling of corrupted job statefiles From: =?utf-8?q?Michael_K=C3=B6ppl?= To: =?utf-8?q?Michael_K=C3=B6ppl?= , X-Mailer: aerc 0.21.0 References: <20260325160617.342295-1-m.koeppl@proxmox.com> In-Reply-To: <20260325160617.342295-1-m.koeppl@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1775222831745 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.101 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 2STLPKTS6GN2MKOYFRNYCTAGEETHLZ7Z X-Message-ID-Hash: 2STLPKTS6GN2MKOYFRNYCTAGEETHLZ7Z X-MailFrom: m.koeppl@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Superseded by https://lore.proxmox.com/pbs-devel/20260403132628.210128-1-m.= koeppl@proxmox.com On Wed Mar 25, 2026 at 5:06 PM CET, Michael K=C3=B6ppl wrote: > This patch series fixes a problem where an empty or corrupted job state > file (due to I/O error, abrupt shutdown, ...) would cause API endpoints > for listing jobs to return an error, breaking the web UI for users > because they could not view any of their configured jobs of that type. > It would also cause proxmox-backup-proxy to indefinitely skip the jobs > until a user manually triggered it to rewrite the statefile. > > 1/3 is a preparatory patch that centralizes job statefile loading > in compute_schedule_status instead of having every handler function > open the statefile, handle potential errors and then passing the > JobState to compute_schedule_status. > > 2/3 introduces a new JobState `Unknown`, representing cases in which the > job state could not be determined. In addition, the patch also updates > the scheduling functions such that errors during reading the statefiles > will result in the Unknown state. > > 3/3 then utilizes this Unknown state and adapts the scheduling functions > such that the Unknown state will then lead to the statefile being > overwritten with a new Created state and the job running again at its > next scheduled run. > > changes since v2: > - introduced the Unknown state in 2/3, adapted 3/3 accordingly (thanks, > @Fabian and @Christian) > - make sure the "could not open statefile" error is also printed in > garbage_collection_status if status_in_memory.upid is None (thanks, > @Christian) > - inline jobtype and err variables in error logging > > changes since v1: > - added preparatory patch 1/3, centralizing the statefile loading before > adapting the handling of the error case in that centralized place > (compute_schedule_status). Thanks, Christian for the suggestion! > - adapted the error message if job statefile loading fails to make clear > that the default status will be returned as a fallback > > proxmox-backup: > > Michael K=C3=B6ppl (3): > api: move statefile loading into compute_schedule_status > fix #7400: api: gracefully handle corrupted job statefiles > fix #7400: proxy: self-heal corrupted job statefiles > > src/api2/admin/datastore.rs | 15 ++++------ > src/api2/admin/prune.rs | 9 ++---- > src/api2/admin/sync.rs | 9 ++---- > src/api2/admin/verify.rs | 9 ++---- > src/api2/tape/backup.rs | 9 ++---- > src/bin/proxmox-backup-proxy.rs | 4 ++- > src/server/jobstate.rs | 52 +++++++++++++++++++++++++++++---- > 7 files changed, 67 insertions(+), 40 deletions(-) > > > Summary over all repositories: > 7 files changed, 67 insertions(+), 40 deletions(-)