[PATCH proxmox-backup v2 0/3] fix #7400: improve handling of corrupted job statefiles

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [PATCH proxmox-backup v2 0/3] fix #7400: improve handling of corrupted job statefiles
@ 2026-03-19 11:03 Michael Köppl
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status Michael Köppl
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Michael Köppl @ 2026-03-19 11:03 UTC (permalink / raw)
  To: pbs-devel

This patch series fixes a problem where an empty or corrupted job state
file (due to I/O error, abrupt shutdown, ...) would cause API endpoints
for listing jobs to return an error, breaking the web UI for users
because they could not view any of their configured jobs of that type.
It would also cause proxmox-backup-proxy to indefinitely skip the jobs
until a user manually triggered it to rewrite the statefile.

1/3 is a preparatory patch that centralizes job statefile loading
in compute_schedule_status instead of having every handler function
open the statefile, handle potential errors and then passing the
JobState to compute_schedule_status.

2/3 fixes the API layer by catching the load error, logging an error
message, and returning a default JobScheduleStatus. This ensure the API
returns the job list, showing the affected job without its last run
status.

3/3 updates the scheduler loop inside the proxy to match the behavior of
a a newly created job, triggering the job at its next internal (which
then rewrites the statefile).

changes since v1:
- added preparatory patch 1/3, centralizing the statefile loading before
  adapting the handling of the error case in that centralized place
  (compute_schedule_status). Thanks, Christian for the suggestion!
- adapted the error message if job statefile loading fails to make clear
  that the default status will be returned as a fallback

proxmox-backup:

Michael Köppl (3):
  api: move statefile loading into compute_schedule_status
  fix #7400: api: gracefully handle corrupted job statefiles
  fix #7400: proxy: self-heal corrupted job statefiles

 src/api2/admin/datastore.rs     | 13 +++----------
 src/api2/admin/prune.rs         |  9 +++------
 src/api2/admin/sync.rs          |  9 +++------
 src/api2/admin/verify.rs        |  9 +++------
 src/api2/tape/backup.rs         |  9 +++------
 src/bin/proxmox-backup-proxy.rs |  2 +-
 src/server/jobstate.rs          | 17 +++++++++++++++--
 7 files changed, 31 insertions(+), 37 deletions(-)

Summary over all repositories:
  7 files changed, 31 insertions(+), 37 deletions(-)

-- 
Generated by murpp 0.10.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status
  2026-03-19 11:03 [PATCH proxmox-backup v2 0/3] fix #7400: improve handling of corrupted job statefiles Michael Köppl
@ 2026-03-19 11:03 ` Michael Köppl
  2026-03-19 11:24   ` Christian Ebner
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 2/3] fix #7400: api: gracefully handle corrupted job statefiles Michael Köppl
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 3/3] fix #7400: proxy: self-heal " Michael Köppl
  2 siblings, 1 reply; 8+ messages in thread
From: Michael Köppl @ 2026-03-19 11:03 UTC (permalink / raw)
  To: pbs-devel

Centralize loading of the job statefiles in compute_schedule_status,
reducing code duplication across the job management API endpoints.

Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
---
 src/api2/admin/datastore.rs | 13 +++----------
 src/api2/admin/prune.rs     |  9 +++------
 src/api2/admin/sync.rs      |  9 +++------
 src/api2/admin/verify.rs    |  9 +++------
 src/api2/tape/backup.rs     |  9 +++------
 src/server/jobstate.rs      |  8 ++++++--
 6 files changed, 21 insertions(+), 36 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index cca340553..4018e0301 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -70,7 +70,7 @@ use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
 use crate::api2::backup::optional_ns_param;
 use crate::api2::node::rrd::create_value_from_rrd;
 use crate::backup::{check_ns_privs_full, ListAccessibleBackupGroups, VerifyWorker, NS_PRIVS_OK};
-use crate::server::jobstate::{compute_schedule_status, Job, JobState};
+use crate::server::jobstate::{compute_schedule_status, Job};
 use crate::tools::{backup_info_to_snapshot_list_item, get_all_snapshot_files, read_backup_index};
 
 // helper to unify common sequence of checks:
@@ -1167,19 +1167,12 @@ pub fn garbage_collection_status(
 
     let datastore = DataStore::lookup_datastore(&store, Operation::Read)?;
     let status_in_memory = datastore.last_gc_status();
-    let state_file = JobState::load("garbage_collection", &store)
-        .map_err(|err| log::error!("could not open GC statefile for {store}: {err}"))
-        .ok();
 
     let mut last = proxmox_time::epoch_i64();
 
     if let Some(ref upid) = status_in_memory.upid {
-        let mut computed_schedule: JobScheduleStatus = JobScheduleStatus::default();
-        if let Some(state) = state_file {
-            if let Ok(cs) = compute_schedule_status(&state, Some(upid)) {
-                computed_schedule = cs;
-            }
-        }
+        let computed_schedule: JobScheduleStatus =
+            compute_schedule_status("garbage_collection", &store, Some(upid))?;
 
         if let Some(endtime) = computed_schedule.last_run_endtime {
             last = endtime;
diff --git a/src/api2/admin/prune.rs b/src/api2/admin/prune.rs
index a5ebf2975..1b1d2f1ba 100644
--- a/src/api2/admin/prune.rs
+++ b/src/api2/admin/prune.rs
@@ -1,6 +1,6 @@
 //! Datastore Prune Job Management
 
-use anyhow::{format_err, Error};
+use anyhow::Error;
 use serde_json::Value;
 
 use proxmox_router::{
@@ -18,7 +18,7 @@ use pbs_config::CachedUserInfo;
 
 use crate::server::{
     do_prune_job,
-    jobstate::{compute_schedule_status, Job, JobState},
+    jobstate::{compute_schedule_status, Job},
 };
 
 #[api(
@@ -73,10 +73,7 @@ pub fn list_prune_jobs(
     let mut list = Vec::new();
 
     for job in job_config_iter {
-        let last_state = JobState::load("prunejob", &job.id)
-            .map_err(|err| format_err!("could not open statefile for {}: {}", &job.id, err))?;
-
-        let mut status = compute_schedule_status(&last_state, Some(&job.schedule))?;
+        let mut status = compute_schedule_status("prunejob", &job.id, Some(&job.schedule))?;
         if job.disable {
             status.next_run = None;
         }
diff --git a/src/api2/admin/sync.rs b/src/api2/admin/sync.rs
index 6722ebea0..2384ede75 100644
--- a/src/api2/admin/sync.rs
+++ b/src/api2/admin/sync.rs
@@ -1,6 +1,6 @@
 //! Datastore Synchronization Job Management
 
-use anyhow::{bail, format_err, Error};
+use anyhow::{bail, Error};
 use serde::{Deserialize, Serialize};
 use serde_json::Value;
 
@@ -19,7 +19,7 @@ use pbs_config::CachedUserInfo;
 
 use crate::{
     api2::config::sync::{check_sync_job_modify_access, check_sync_job_read_access},
-    server::jobstate::{compute_schedule_status, Job, JobState},
+    server::jobstate::{compute_schedule_status, Job},
     server::sync::do_sync_job,
 };
 
@@ -112,10 +112,7 @@ pub fn list_config_sync_jobs(
             continue;
         }
 
-        let last_state = JobState::load("syncjob", &job.id)
-            .map_err(|err| format_err!("could not open statefile for {}: {}", &job.id, err))?;
-
-        let status = compute_schedule_status(&last_state, job.schedule.as_deref())?;
+        let status = compute_schedule_status("syncjob", &job.id, job.schedule.as_deref())?;
 
         list.push(SyncJobStatus {
             config: job,
diff --git a/src/api2/admin/verify.rs b/src/api2/admin/verify.rs
index 66695236c..af5b7fff4 100644
--- a/src/api2/admin/verify.rs
+++ b/src/api2/admin/verify.rs
@@ -1,6 +1,6 @@
 //! Datastore Verify Job Management
 
-use anyhow::{format_err, Error};
+use anyhow::Error;
 use serde_json::Value;
 
 use proxmox_router::{
@@ -19,7 +19,7 @@ use pbs_config::CachedUserInfo;
 
 use crate::server::{
     do_verification_job,
-    jobstate::{compute_schedule_status, Job, JobState},
+    jobstate::{compute_schedule_status, Job},
 };
 
 #[api(
@@ -73,10 +73,7 @@ pub fn list_verification_jobs(
     let mut list = Vec::new();
 
     for job in job_config_iter {
-        let last_state = JobState::load("verificationjob", &job.id)
-            .map_err(|err| format_err!("could not open statefile for {}: {}", &job.id, err))?;
-
-        let status = compute_schedule_status(&last_state, job.schedule.as_deref())?;
+        let status = compute_schedule_status("verificationjob", &job.id, job.schedule.as_deref())?;
 
         list.push(VerificationJobStatus {
             config: job,
diff --git a/src/api2/tape/backup.rs b/src/api2/tape/backup.rs
index 47e8d0209..2c1aa5c0a 100644
--- a/src/api2/tape/backup.rs
+++ b/src/api2/tape/backup.rs
@@ -1,6 +1,6 @@
 use std::sync::{Arc, Mutex};
 
-use anyhow::{bail, format_err, Error};
+use anyhow::{bail, Error};
 use serde_json::Value;
 use tracing::{info, warn};
 
@@ -23,7 +23,7 @@ use pbs_datastore::{DataStore, StoreProgress};
 use crate::tape::{assert_datastore_type, TapeNotificationMode};
 use crate::{
     server::{
-        jobstate::{compute_schedule_status, Job, JobState},
+        jobstate::{compute_schedule_status, Job},
         TapeBackupJobSummary,
     },
     tape::{
@@ -97,10 +97,7 @@ pub fn list_tape_backup_jobs(
             continue;
         }
 
-        let last_state = JobState::load("tape-backup-job", &job.id)
-            .map_err(|err| format_err!("could not open statefile for {}: {}", &job.id, err))?;
-
-        let status = compute_schedule_status(&last_state, job.schedule.as_deref())?;
+        let status = compute_schedule_status("tape-backup-job", &job.id, job.schedule.as_deref())?;
 
         let next_run = status.next_run.unwrap_or(current_time);
 
diff --git a/src/server/jobstate.rs b/src/server/jobstate.rs
index dc9f6c90d..cfb0b8945 100644
--- a/src/server/jobstate.rs
+++ b/src/server/jobstate.rs
@@ -301,11 +301,15 @@ impl Job {
 }
 
 pub fn compute_schedule_status(
-    job_state: &JobState,
+    jobtype: &str,
+    jobname: &str,
     schedule: Option<&str>,
 ) -> Result<JobScheduleStatus, Error> {
+    let job_state = JobState::load(jobtype, jobname)
+        .map_err(|err| format_err!("could not open statefile for {}: {}", jobname, err))?;
+
     let (upid, endtime, state, last) = match job_state {
-        JobState::Created { time } => (None, None, None, *time),
+        JobState::Created { time } => (None, None, None, time),
         JobState::Started { upid } => {
             let parsed_upid: UPID = upid.parse()?;
             (Some(upid), None, None, parsed_upid.starttime)
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH proxmox-backup v2 2/3] fix #7400: api: gracefully handle corrupted job statefiles
  2026-03-19 11:03 [PATCH proxmox-backup v2 0/3] fix #7400: improve handling of corrupted job statefiles Michael Köppl
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status Michael Köppl
@ 2026-03-19 11:03 ` Michael Köppl
  2026-03-19 11:23   ` Christian Ebner
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 3/3] fix #7400: proxy: self-heal " Michael Köppl
  2 siblings, 1 reply; 8+ messages in thread
From: Michael Köppl @ 2026-03-19 11:03 UTC (permalink / raw)
  To: pbs-devel

Previously, if a job statefile was empty or corrupted (e.g. due to an
I/O error or an abrupt shutdown), the JobStatus::load method would
return an error that would be propagated up, causing the endpoint to
return an error to the user, meaning users would not see any of their
jobs if a single job had a corrupted statefile.

Instead, handle the error explicitly, logging an error message and
returning a default JobScheduleStatus such that jobs lists can still be
fetched, displaying the affected job as configured but simply missing
its last run status.

Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
---
 src/server/jobstate.rs | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/server/jobstate.rs b/src/server/jobstate.rs
index cfb0b8945..638beac3e 100644
--- a/src/server/jobstate.rs
+++ b/src/server/jobstate.rs
@@ -305,8 +305,17 @@ pub fn compute_schedule_status(
     jobname: &str,
     schedule: Option<&str>,
 ) -> Result<JobScheduleStatus, Error> {
-    let job_state = JobState::load(jobtype, jobname)
-        .map_err(|err| format_err!("could not open statefile for {}: {}", jobname, err))?;
+    let job_state = match JobState::load(jobtype, jobname) {
+        Ok(job_state) => job_state,
+        Err(err) => {
+            log::error!(
+                "could not open statefile for {}: {} - falling back to default job schedule status",
+                jobname,
+                err
+            );
+            return Ok(JobScheduleStatus::default());
+        }
+    };
 
     let (upid, endtime, state, last) = match job_state {
         JobState::Created { time } => (None, None, None, time),
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH proxmox-backup v2 3/3] fix #7400: proxy: self-heal corrupted job statefiles
  2026-03-19 11:03 [PATCH proxmox-backup v2 0/3] fix #7400: improve handling of corrupted job statefiles Michael Köppl
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status Michael Köppl
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 2/3] fix #7400: api: gracefully handle corrupted job statefiles Michael Köppl
@ 2026-03-19 11:03 ` Michael Köppl
  2 siblings, 0 replies; 8+ messages in thread
From: Michael Köppl @ 2026-03-19 11:03 UTC (permalink / raw)
  To: pbs-devel

Previously, if a job statefile was corrupted or missing, last_run_time()
would return an error. The job scheduling loops handled this error by
skipping the job indefinitely. To fix this, manual intervention was
required (as in, triggering the job manually).

Instead, match the behavior of a newly created job, ensuring that the
job runs at its next scheduled interval. The job file will then be
rewritten upon successful completion of the job.

Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
---
 src/bin/proxmox-backup-proxy.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index c1fe3ac15..8359ac0f1 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -936,7 +936,7 @@ fn check_schedule(worker_type: &str, event_str: &str, id: &str) -> bool {
         Ok(time) => time,
         Err(err) => {
             eprintln!("could not get last run time of {worker_type} {id}: {err}");
-            return false;
+            proxmox_time::epoch_i64() - 30
         }
     };
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH proxmox-backup v2 2/3] fix #7400: api: gracefully handle corrupted job statefiles
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 2/3] fix #7400: api: gracefully handle corrupted job statefiles Michael Köppl
@ 2026-03-19 11:23   ` Christian Ebner
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2026-03-19 11:23 UTC (permalink / raw)
  To: Michael Köppl, pbs-devel

one nit inline.

On 3/19/26 12:02 PM, Michael Köppl wrote:
> Previously, if a job statefile was empty or corrupted (e.g. due to an
> I/O error or an abrupt shutdown), the JobStatus::load method would
> return an error that would be propagated up, causing the endpoint to
> return an error to the user, meaning users would not see any of their
> jobs if a single job had a corrupted statefile.
> 
> Instead, handle the error explicitly, logging an error message and
> returning a default JobScheduleStatus such that jobs lists can still be
> fetched, displaying the affected job as configured but simply missing
> its last run status.
> 
> Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
> ---
>   src/server/jobstate.rs | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/src/server/jobstate.rs b/src/server/jobstate.rs
> index cfb0b8945..638beac3e 100644
> --- a/src/server/jobstate.rs
> +++ b/src/server/jobstate.rs
> @@ -305,8 +305,17 @@ pub fn compute_schedule_status(
>       jobname: &str,
>       schedule: Option<&str>,
>   ) -> Result<JobScheduleStatus, Error> {
> -    let job_state = JobState::load(jobtype, jobname)
> -        .map_err(|err| format_err!("could not open statefile for {}: {}", jobname, err))?;
> +    let job_state = match JobState::load(jobtype, jobname) {
> +        Ok(job_state) => job_state,
> +        Err(err) => {
> +            log::error!(
> +                "could not open statefile for {}: {} - falling back to default job schedule status",
> +                jobname,
> +                err
> +            );

nit: please inline `jobname` and `err` into the format string directly. 
While there are still a lot of per-existing occurrences without 
in-lining, for new code in-lining is preferred.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status
  2026-03-19 11:03 ` [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status Michael Köppl
@ 2026-03-19 11:24   ` Christian Ebner
  2026-03-19 14:47     ` Michael Köppl
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Ebner @ 2026-03-19 11:24 UTC (permalink / raw)
  To: Michael Köppl, pbs-devel

one comment inline.

On 3/19/26 12:03 PM, Michael Köppl wrote:
> Centralize loading of the job statefiles in compute_schedule_status,
> reducing code duplication across the job management API endpoints.
> 
> Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
> ---
>   src/api2/admin/datastore.rs | 13 +++----------
>   src/api2/admin/prune.rs     |  9 +++------
>   src/api2/admin/sync.rs      |  9 +++------
>   src/api2/admin/verify.rs    |  9 +++------
>   src/api2/tape/backup.rs     |  9 +++------
>   src/server/jobstate.rs      |  8 ++++++--
>   6 files changed, 21 insertions(+), 36 deletions(-)
> 
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index cca340553..4018e0301 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -70,7 +70,7 @@ use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
>   use crate::api2::backup::optional_ns_param;
>   use crate::api2::node::rrd::create_value_from_rrd;
>   use crate::backup::{check_ns_privs_full, ListAccessibleBackupGroups, VerifyWorker, NS_PRIVS_OK};
> -use crate::server::jobstate::{compute_schedule_status, Job, JobState};
> +use crate::server::jobstate::{compute_schedule_status, Job};
>   use crate::tools::{backup_info_to_snapshot_list_item, get_all_snapshot_files, read_backup_index};
>   
>   // helper to unify common sequence of checks:
> @@ -1167,19 +1167,12 @@ pub fn garbage_collection_status(
>   
>       let datastore = DataStore::lookup_datastore(&store, Operation::Read)?;
>       let status_in_memory = datastore.last_gc_status();
> -    let state_file = JobState::load("garbage_collection", &store)
> -        .map_err(|err| log::error!("could not open GC statefile for {store}: {err}"))
> -        .ok();
>   
>       let mut last = proxmox_time::epoch_i64();
>   
>       if let Some(ref upid) = status_in_memory.upid {
> -        let mut computed_schedule: JobScheduleStatus = JobScheduleStatus::default();
> -        if let Some(state) = state_file {
> -            if let Ok(cs) = compute_schedule_status(&state, Some(upid)) {
> -                computed_schedule = cs;
> -            }
> -        }
> +        let computed_schedule: JobScheduleStatus =
> +            compute_schedule_status("garbage_collection", &store, Some(upid))?;

This alters behavior as now it is never tried to load the state file if 
status_in_memory.upid is None, so there is no error logged.

So this must be expanded by an else branch where the loading is 
attempted also for that case and the potential error logged.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status
  2026-03-19 11:24   ` Christian Ebner
@ 2026-03-19 14:47     ` Michael Köppl
  2026-03-19 15:27       ` Christian Ebner
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Köppl @ 2026-03-19 14:47 UTC (permalink / raw)
  To: Christian Ebner, Michael Köppl, pbs-devel

On Thu Mar 19, 2026 at 12:24 PM CET, Christian Ebner wrote:
> one comment inline.
>
> On 3/19/26 12:03 PM, Michael Köppl wrote:
>> Centralize loading of the job statefiles in compute_schedule_status,
>> reducing code duplication across the job management API endpoints.
>> 
>> Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
>> ---
>>   src/api2/admin/datastore.rs | 13 +++----------
>>   src/api2/admin/prune.rs     |  9 +++------
>>   src/api2/admin/sync.rs      |  9 +++------
>>   src/api2/admin/verify.rs    |  9 +++------
>>   src/api2/tape/backup.rs     |  9 +++------
>>   src/server/jobstate.rs      |  8 ++++++--
>>   6 files changed, 21 insertions(+), 36 deletions(-)
>> 
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index cca340553..4018e0301 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -70,7 +70,7 @@ use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
>>   use crate::api2::backup::optional_ns_param;
>>   use crate::api2::node::rrd::create_value_from_rrd;
>>   use crate::backup::{check_ns_privs_full, ListAccessibleBackupGroups, VerifyWorker, NS_PRIVS_OK};
>> -use crate::server::jobstate::{compute_schedule_status, Job, JobState};
>> +use crate::server::jobstate::{compute_schedule_status, Job};
>>   use crate::tools::{backup_info_to_snapshot_list_item, get_all_snapshot_files, read_backup_index};
>>   
>>   // helper to unify common sequence of checks:
>> @@ -1167,19 +1167,12 @@ pub fn garbage_collection_status(
>>   
>>       let datastore = DataStore::lookup_datastore(&store, Operation::Read)?;
>>       let status_in_memory = datastore.last_gc_status();
>> -    let state_file = JobState::load("garbage_collection", &store)
>> -        .map_err(|err| log::error!("could not open GC statefile for {store}: {err}"))
>> -        .ok();
>>   
>>       let mut last = proxmox_time::epoch_i64();
>>   
>>       if let Some(ref upid) = status_in_memory.upid {
>> -        let mut computed_schedule: JobScheduleStatus = JobScheduleStatus::default();
>> -        if let Some(state) = state_file {
>> -            if let Ok(cs) = compute_schedule_status(&state, Some(upid)) {
>> -                computed_schedule = cs;
>> -            }
>> -        }
>> +        let computed_schedule: JobScheduleStatus =
>> +            compute_schedule_status("garbage_collection", &store, Some(upid))?;
>
> This alters behavior as now it is never tried to load the state file if 
> status_in_memory.upid is None, so there is no error logged.
>
> So this must be expanded by an else branch where the loading is 
> attempted also for that case and the potential error logged.

Missed that while refactoring, sorry for the oversight. Also noticed
that there is an additional change in behavior regarding the handling of
any *other* error that might occur in compute_schedule_status because
previously, we would use basically ignore any error and return the
default status here, e.g. if the UPID could not be parsed for a started
job. To match this behavior, I could just do

    let computed_schedule: JobScheduleStatus =
        compute_schedule_status("garbage_collection", &store, Some(upid))
            .unwrap_or_else(|_| JobScheduleStatus::default());

But the question is if the behavior here *should* differ from all other
endpoints if the UPID could not be parsed? Because everywhere else we'd
still return an error in that case.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status
  2026-03-19 14:47     ` Michael Köppl
@ 2026-03-19 15:27       ` Christian Ebner
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2026-03-19 15:27 UTC (permalink / raw)
  To: Michael Köppl, pbs-devel

On 3/19/26 3:47 PM, Michael Köppl wrote:
> On Thu Mar 19, 2026 at 12:24 PM CET, Christian Ebner wrote:
>> one comment inline.
>>
>> On 3/19/26 12:03 PM, Michael Köppl wrote:
>>> Centralize loading of the job statefiles in compute_schedule_status,
>>> reducing code duplication across the job management API endpoints.
>>>
>>> Signed-off-by: Michael Köppl <m.koeppl@proxmox.com>
>>> ---
>>>    src/api2/admin/datastore.rs | 13 +++----------
>>>    src/api2/admin/prune.rs     |  9 +++------
>>>    src/api2/admin/sync.rs      |  9 +++------
>>>    src/api2/admin/verify.rs    |  9 +++------
>>>    src/api2/tape/backup.rs     |  9 +++------
>>>    src/server/jobstate.rs      |  8 ++++++--
>>>    6 files changed, 21 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>>> index cca340553..4018e0301 100644
>>> --- a/src/api2/admin/datastore.rs
>>> +++ b/src/api2/admin/datastore.rs
>>> @@ -70,7 +70,7 @@ use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
>>>    use crate::api2::backup::optional_ns_param;
>>>    use crate::api2::node::rrd::create_value_from_rrd;
>>>    use crate::backup::{check_ns_privs_full, ListAccessibleBackupGroups, VerifyWorker, NS_PRIVS_OK};
>>> -use crate::server::jobstate::{compute_schedule_status, Job, JobState};
>>> +use crate::server::jobstate::{compute_schedule_status, Job};
>>>    use crate::tools::{backup_info_to_snapshot_list_item, get_all_snapshot_files, read_backup_index};
>>>    
>>>    // helper to unify common sequence of checks:
>>> @@ -1167,19 +1167,12 @@ pub fn garbage_collection_status(
>>>    
>>>        let datastore = DataStore::lookup_datastore(&store, Operation::Read)?;
>>>        let status_in_memory = datastore.last_gc_status();
>>> -    let state_file = JobState::load("garbage_collection", &store)
>>> -        .map_err(|err| log::error!("could not open GC statefile for {store}: {err}"))
>>> -        .ok();
>>>    
>>>        let mut last = proxmox_time::epoch_i64();
>>>    
>>>        if let Some(ref upid) = status_in_memory.upid {
>>> -        let mut computed_schedule: JobScheduleStatus = JobScheduleStatus::default();
>>> -        if let Some(state) = state_file {
>>> -            if let Ok(cs) = compute_schedule_status(&state, Some(upid)) {
>>> -                computed_schedule = cs;
>>> -            }
>>> -        }
>>> +        let computed_schedule: JobScheduleStatus =
>>> +            compute_schedule_status("garbage_collection", &store, Some(upid))?;
>>
>> This alters behavior as now it is never tried to load the state file if
>> status_in_memory.upid is None, so there is no error logged.
>>
>> So this must be expanded by an else branch where the loading is
>> attempted also for that case and the potential error logged.
> 
> Missed that while refactoring, sorry for the oversight. Also noticed
> that there is an additional change in behavior regarding the handling of
> any *other* error that might occur in compute_schedule_status because
> previously, we would use basically ignore any error and return the
> default status here, e.g. if the UPID could not be parsed for a started
> job. To match this behavior, I could just do
> 
>      let computed_schedule: JobScheduleStatus =
>          compute_schedule_status("garbage_collection", &store, Some(upid))
>              .unwrap_or_else(|_| JobScheduleStatus::default());
> 
> But the question is if the behavior here *should* differ from all other
> endpoints if the UPID could not be parsed? Because everywhere else we'd
> still return an error in that case.

True: this was introduced with commit fe1d34d2e ("api: garbage collect 
job status") and then adapted with commit 3ae21d87c ("GC: flatten 
existing status into job status"). So I guess this is related to the 
mentioned renaming.

Maybe Fabian can give us a clue?




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-19 15:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-19 11:03 [PATCH proxmox-backup v2 0/3] fix #7400: improve handling of corrupted job statefiles Michael Köppl
2026-03-19 11:03 ` [PATCH proxmox-backup v2 1/3] api: move statefile loading into compute_schedule_status Michael Köppl
2026-03-19 11:24   ` Christian Ebner
2026-03-19 14:47     ` Michael Köppl
2026-03-19 15:27       ` Christian Ebner
2026-03-19 11:03 ` [PATCH proxmox-backup v2 2/3] fix #7400: api: gracefully handle corrupted job statefiles Michael Köppl
2026-03-19 11:23   ` Christian Ebner
2026-03-19 11:03 ` [PATCH proxmox-backup v2 3/3] fix #7400: proxy: self-heal " Michael Köppl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal