From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 0B80A1FF13C for ; Thu, 19 Mar 2026 11:19:11 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2E180192F8; Thu, 19 Mar 2026 11:19:25 +0100 (CET) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 19 Mar 2026 11:19:20 +0100 Message-Id: To: "Christian Ebner" , =?utf-8?q?Michael_K=C3=B6ppl?= , Subject: Re: [PATCH proxmox-backup v1 1/2] fix #7400: api: gracefully handle corrupted job statefiles From: =?utf-8?q?Michael_K=C3=B6ppl?= X-Mailer: aerc 0.21.0 References: <20260317160722.201693-1-m.koeppl@proxmox.com> <20260317160722.201693-2-m.koeppl@proxmox.com> <57632a32-1565-486a-bc05-ccd426d070f5@proxmox.com> In-Reply-To: <57632a32-1565-486a-bc05-ccd426d070f5@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1773915518279 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.996 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.408 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.819 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.903 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: I4O4LFABQHQA3XLBNKYBMPL242PS7JXB X-Message-ID-Hash: I4O4LFABQHQA3XLBNKYBMPL242PS7JXB X-MailFrom: m.koeppl@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu Mar 19, 2026 at 9:05 AM CET, Christian Ebner wrote: > On 3/18/26 6:22 PM, Michael K=C3=B6ppl wrote: >> On Tue Mar 17, 2026 at 5:07 PM CET, Michael K=C3=B6ppl wrote: >>=20 >> [snip] >>=20 >>> }; >>> use pbs_config::prune; >>> use pbs_config::CachedUserInfo; >>> @@ -73,10 +73,13 @@ pub fn list_prune_jobs( >>> let mut list =3D Vec::new(); >>> =20 >>> for job in job_config_iter { >>> - let last_state =3D JobState::load("prunejob", &job.id) >>> - .map_err(|err| format_err!("could not open statefile for {= }: {}", &job.id, err))?; >>> - >>> - let mut status =3D compute_schedule_status(&last_state, Some(&= job.schedule))?; >>> + let mut status =3D match JobState::load("prunejob", &job.id) { >>> + Ok(last_state) =3D> compute_schedule_status(&last_state, S= ome(&job.schedule))?, >>> + Err(err) =3D> { >>> + log::error!("could not open statefile for {}: {}", &jo= b.id, err); >>=20 >> Since I'm currently preparing v2, would it make sense to instead make >> this a warning? Not quite sure about it, but displaying an error to the >> user and then just continuing (and having self-healing behavior) seems a >> bit odd to me. > > IMO it makes sense as an error. There was an error reading the file=20 > after all. And you might not always be able to self-heal. E.g. what if=20 > you cannot re-write the state file (although this should be logged as wel= l). Ack, I see your point. Thanks for the feedback, I'll leave it as an error. > > Further, I do not expect these to show up frequently and the error=20 > message could be adapted to include the default being used as fallback? Yeah, something like "could not open statefile for {}: {} - falling back to default job schedule status". I'll adapt it and send the v2.