From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 6ACC31FF141
	for <inbox@lore.proxmox.com>; Mon, 13 Apr 2026 15:20:56 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 8AB332594E;
	Mon, 13 Apr 2026 15:21:44 +0200 (CEST)
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Mon, 13 Apr 2026 15:21:41 +0200
Message-Id: <DHS24AT81BUH.RPRX6WKPVRT9@proxmox.com>
Subject: superseded: [PATCH proxmox-backup v4 0/3] fix #7400: improve
 handling of corrupted job statefiles
From: =?utf-8?q?Michael_K=C3=B6ppl?= <m.koeppl@proxmox.com>
To: =?utf-8?q?Michael_K=C3=B6ppl?= <m.koeppl@proxmox.com>,
 <pbs-devel@lists.proxmox.com>
X-Mailer: aerc 0.21.0
References: <20260403132628.210128-1-m.koeppl@proxmox.com>
In-Reply-To: <20260403132628.210128-1-m.koeppl@proxmox.com>
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1776086427215
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.102 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: ZVTJ5X5EEPROZTP4WFRNBZSYOBMZZKQ3
X-Message-ID-Hash: ZVTJ5X5EEPROZTP4WFRNBZSYOBMZZKQ3
X-MailFrom: m.koeppl@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pbs-devel-owner@lists.proxmox.com>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Subscribe: <mailto:pbs-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pbs-devel-leave@lists.proxmox.com>

Superseded by:
https://lore.proxmox.com/pbs-devel/20260413132000.49889-1-m.koeppl@proxmox.=
com

On Fri Apr 3, 2026 at 3:26 PM CEST, Michael K=C3=B6ppl wrote:
> This patch series fixes a problem [0] where an empty or corrupted job
> state file (due to I/O error, abrupt shutdown, ...) would cause API
> endpoints for listing jobs to return an error, breaking the web UI for
> users because they could not view any of their configured jobs of that
> type. It would also cause proxmox-backup-proxy to indefinitely skip the
> jobs until a user manually triggered it to rewrite the statefile.
>
> 1/3 is a preparatory patch that centralizes job statefile loading
> in compute_schedule_status instead of having every handler function
> open the statefile, handle potential errors and then passing the
> JobState to compute_schedule_status.
>
> 2/3 introduces a new JobState `Unknown`, representing cases in which the
> job state could not be determined. In addition, the patch also updates
> the scheduling functions such that errors during reading the statefiles
> will result in the Unknown state.
>
> 3/3 then utilizes this Unknown state and adapts the scheduling functions
> such that the Unknown state will then lead to the statefile being
> overwritten with a new Created state and the job running again at its
> next scheduled run.
>
> changes since v3:
> - adapted commit message of 1/3 to mention the change in behavior
>   regarding the handling of UPID parsing errors with garbage collection
>   state files
> - in 2/3, adapt JobState::load to return early with JobState::Unknown
> - defined a constant for the scheduling offset used when calculating
>   the last run time. The constant is introduced in 2/3 and also used in
>   3/3
>   Thanks for the feedback on v3, @Christian!
>
> changes since v2:
> - introduced the Unknown state in 2/3, adapted 3/3 accordingly (thanks,
>   @Fabian and @Christian)
> - make sure the "could not open statefile" error is also printed in
>   garbage_collection_status if status_in_memory.upid is None (thanks,
>   @Christian)
> - inline jobtype and err variables in error logging
>
> changes since v1:
> - added preparatory patch 1/3, centralizing the statefile loading before
>   adapting the handling of the error case in that centralized place
>   (compute_schedule_status). Thanks, Christian for the suggestion!
> - adapted the error message if job statefile loading fails to make clear
>   that the default status will be returned as a fallback
>
> [0] https://bugzilla.proxmox.com/show_bug.cgi?id=3D7400
>
> proxmox-backup:
>
> Michael K=C3=B6ppl (3):
>   api: move statefile loading into compute_schedule_status
>   fix #7400: api: gracefully handle corrupted job statefiles
>   fix #7400: proxy: self-heal corrupted job statefiles
>
>  src/api2/admin/datastore.rs     | 15 +++-----
>  src/api2/admin/prune.rs         |  9 ++---
>  src/api2/admin/sync.rs          |  9 ++---
>  src/api2/admin/verify.rs        |  9 ++---
>  src/api2/tape/backup.rs         |  9 ++---
>  src/bin/proxmox-backup-proxy.rs |  6 ++-
>  src/server/jobstate.rs          | 65 +++++++++++++++++++++++++++++----
>  7 files changed, 80 insertions(+), 42 deletions(-)
>
>
> Summary over all repositories:
>   7 files changed, 80 insertions(+), 42 deletions(-)