From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id F03951FF141
	for <inbox@lore.proxmox.com>; Mon, 13 Apr 2026 15:19:18 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 1EC69256B5;
	Mon, 13 Apr 2026 15:20:07 +0200 (CEST)
From: =?UTF-8?q?Michael=20K=C3=B6ppl?= <m.koeppl@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [PATCH proxmox-backup v5 0/3] fix #7400: improve handling of
 corrupted job statefiles
Date: Mon, 13 Apr 2026 15:19:57 +0200
Message-ID: <20260413132000.49889-1-m.koeppl@proxmox.com>
X-Mailer: git-send-email 2.47.3
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1776086327630
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.103 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
	URIBL_BLOCKED           0.001 ADMINISTRATOR NOTICE: The query to URIBL was
 blocked.  See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
 for more information.
 [backup.rs,datastore.rs,prune.rs,verify.rs,sync.rs,jobstate.rs,proxmox-backup-proxy.rs]
Message-ID-Hash: 2DHMDLJHY6JOIPSAVYIWULKSV4BKUWUS
X-Message-ID-Hash: 2DHMDLJHY6JOIPSAVYIWULKSV4BKUWUS
X-MailFrom: m.koeppl@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pbs-devel-owner@lists.proxmox.com>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Subscribe: <mailto:pbs-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pbs-devel-leave@lists.proxmox.com>

This patch series fixes a problem [0] where an empty or corrupted job
state file (due to I/O error, abrupt shutdown, ...) would cause API
endpoints for listing jobs to return an error, breaking the web UI for
users because they could not view any of their configured jobs of that
type. It would also cause proxmox-backup-proxy to indefinitely skip the
jobs until a user manually triggered it to rewrite the statefile.

1/3 is a preparatory patch that centralizes job statefile loading
in compute_schedule_status instead of having every handler function
open the statefile, handle potential errors and then passing the
JobState to compute_schedule_status.

2/3 introduces a new JobState `Unknown`, representing cases in which the
job state could not be determined. In addition, the patch also updates
the scheduling functions such that errors during reading the statefiles
will result in the Unknown state.

3/3 then utilizes this Unknown state and adapts the scheduling functions
such that the Unknown state will then lead to the statefile being
overwritten with a new Created state and the job running again at its
next scheduled run.

changes since v4:
- updated docstring for Unknown JobState, no functional changes (thanks,
  @Shannon)

changes since v3:
- adapted commit message of 1/3 to mention the change in behavior
  regarding the handling of UPID parsing errors with garbage collection
  state files
- in 2/3, adapt JobState::load to return early with JobState::Unknown
- defined a constant for the scheduling offset used when calculating
  the last run time. The constant is introduced in 2/3 and also used in
  3/3
  Thanks for the feedback on v3, @Christian!

changes since v2:
- introduced the Unknown state in 2/3, adapted 3/3 accordingly (thanks,
  @Fabian and @Christian)
- make sure the "could not open statefile" error is also printed in
  garbage_collection_status if status_in_memory.upid is None (thanks,
  @Christian)
- inline jobtype and err variables in error logging

changes since v1:
- added preparatory patch 1/3, centralizing the statefile loading before
  adapting the handling of the error case in that centralized place
  (compute_schedule_status). Thanks, Christian for the suggestion!
- adapted the error message if job statefile loading fails to make clear
  that the default status will be returned as a fallback


proxmox-backup:

Michael Köppl (3):
  api: move statefile loading into compute_schedule_status
  fix #7400: api: gracefully handle corrupted job statefiles
  fix #7400: proxy: self-heal corrupted job statefiles

 src/api2/admin/datastore.rs     | 15 +++-----
 src/api2/admin/prune.rs         |  9 ++---
 src/api2/admin/sync.rs          |  9 ++---
 src/api2/admin/verify.rs        |  9 ++---
 src/api2/tape/backup.rs         |  9 ++---
 src/bin/proxmox-backup-proxy.rs |  6 ++-
 src/server/jobstate.rs          | 67 +++++++++++++++++++++++++++++----
 7 files changed, 82 insertions(+), 42 deletions(-)


Summary over all repositories:
  7 files changed, 82 insertions(+), 42 deletions(-)

-- 
Generated by murpp 0.11.0