all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling
@ 2020-09-24 13:52 Dominik Csapak
  2020-09-24 13:52 ` [pbs-devel] [PATCH proxmox-backup 2/2] use jobstate mechanism for verify/garbage_collection schedules Dominik Csapak
  2020-09-24 14:01 ` [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling Stefan Reiter
  0 siblings, 2 replies; 4+ messages in thread
From: Dominik Csapak @ 2020-09-24 13:52 UTC (permalink / raw)
  To: pbs-devel

we rely on the jobstate handling to write the error of the worker
into its state file, but we used '?' here in a block which does not
return the error to the block, but to the function/closure instead

so if a prune job failed because of such an '?', we did not write
into the statefile and got a wrong state there

instead execute the code in a closure where the error gets returned
correctly

in the future, we can use 'try blocks' (currently not in stable)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/bin/proxmox-backup-proxy.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 8a6dfe36..96001214 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -434,7 +434,7 @@ async fn schedule_datastore_prune() {
 
                 job.start(&worker.upid().to_string())?;
 
-                let result = {
+                let result = (|| {
 
                     worker.log(format!("Starting datastore prune on store \"{}\"", store));
                     worker.log(format!("task triggered by schedule '{}'", event_str));
@@ -463,7 +463,7 @@ async fn schedule_datastore_prune() {
                         }
                     }
                     Ok(())
-                };
+                })();
 
                 let status = worker.create_state(&result);
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 2/2] use jobstate mechanism for verify/garbage_collection schedules
  2020-09-24 13:52 [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling Dominik Csapak
@ 2020-09-24 13:52 ` Dominik Csapak
  2020-09-24 14:01 ` [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling Stefan Reiter
  1 sibling, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2020-09-24 13:52 UTC (permalink / raw)
  To: pbs-devel

also changes:
* correct comment about reset (replace 'sync' with 'action')
* check schedule change correctly (only when it is actually changed)

with this changes, we can drop the 'lookup_last_worker' method

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/api2/config/datastore.rs    | 33 +++++++++--
 src/bin/proxmox-backup-proxy.rs | 98 +++++++++++++++++----------------
 2 files changed, 78 insertions(+), 53 deletions(-)

diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index 870324a3..298dd252 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -132,6 +132,8 @@ pub fn create_datastore(param: Value) -> Result<(), Error> {
     datastore::save_config(&config)?;
 
     crate::config::jobstate::create_state_file("prune", &datastore.name)?;
+    crate::config::jobstate::create_state_file("garbage_collection", &datastore.name)?;
+    crate::config::jobstate::create_state_file("verify", &datastore.name)?;
 
     Ok(())
 }
@@ -313,13 +315,23 @@ pub fn update_datastore(
         }
     }
 
-    if gc_schedule.is_some() { data.gc_schedule = gc_schedule; }
+    let mut gc_schedule_changed = false;
+    if gc_schedule.is_some() {
+        gc_schedule_changed = data.gc_schedule != gc_schedule;
+        data.gc_schedule = gc_schedule;
+    }
+
     let mut prune_schedule_changed = false;
     if prune_schedule.is_some() {
-        prune_schedule_changed = true;
+        prune_schedule_changed = data.prune_schedule != prune_schedule;
         data.prune_schedule = prune_schedule;
     }
-    if verify_schedule.is_some() { data.verify_schedule = verify_schedule; }
+
+    let mut verify_schedule_changed = false;
+    if verify_schedule.is_some() {
+        verify_schedule_changed = data.verify_schedule != verify_schedule;
+        data.verify_schedule = verify_schedule;
+    }
 
     if keep_last.is_some() { data.keep_last = keep_last; }
     if keep_hourly.is_some() { data.keep_hourly = keep_hourly; }
@@ -332,12 +344,20 @@ pub fn update_datastore(
 
     datastore::save_config(&config)?;
 
-    // we want to reset the statefile, to avoid an immediate sync in some cases
+    // we want to reset the statefiles, to avoid an immediate action in some cases
     // (e.g. going from monthly to weekly in the second week of the month)
+    if gc_schedule_changed {
+        crate::config::jobstate::create_state_file("garbage_collection", &name)?;
+    }
+
     if prune_schedule_changed {
         crate::config::jobstate::create_state_file("prune", &name)?;
     }
 
+    if verify_schedule_changed {
+        crate::config::jobstate::create_state_file("verify", &name)?;
+    }
+
     Ok(())
 }
 
@@ -377,7 +397,10 @@ pub fn delete_datastore(name: String, digest: Option<String>) -> Result<(), Erro
 
     datastore::save_config(&config)?;
 
-    crate::config::jobstate::remove_state_file("prune", &name)?;
+    // ignore errors
+    let _ = crate::config::jobstate::remove_state_file("prune", &name);
+    let _ = crate::config::jobstate::remove_state_file("garbage_collection", &name);
+    let _ = crate::config::jobstate::remove_state_file("verify", &name);
 
     Ok(())
 }
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 96001214..49741dc6 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -202,40 +202,14 @@ async fn schedule_tasks() -> Result<(), Error> {
     Ok(())
 }
 
-fn lookup_last_worker(worker_type: &str, worker_id: &str) -> Result<Option<server::UPID>, Error> {
-
-    let list = proxmox_backup::server::read_task_list()?;
-
-    let mut last: Option<&server::UPID> = None;
-
-    for entry in list.iter() {
-        if entry.upid.worker_type == worker_type {
-            if let Some(ref id) = entry.upid.worker_id {
-                if id == worker_id {
-                    match last {
-                        Some(ref upid) => {
-                            if upid.starttime < entry.upid.starttime {
-                                last = Some(&entry.upid)
-                            }
-                        }
-                        None => {
-                            last = Some(&entry.upid)
-                        }
-                    }
-                }
-            }
-        }
-    }
-
-    Ok(last.cloned())
-}
-
-
 async fn schedule_datastore_garbage_collection() {
 
     use proxmox_backup::backup::DataStore;
     use proxmox_backup::server::{UPID, WorkerTask};
-    use proxmox_backup::config::datastore::{self, DataStoreConfig};
+    use proxmox_backup::config::{
+        jobstate::{self, Job},
+        datastore::{self, DataStoreConfig}
+    };
     use proxmox_backup::tools::systemd::time::{
         parse_calendar_event, compute_next_event};
 
@@ -291,11 +265,10 @@ async fn schedule_datastore_garbage_collection() {
                 }
             }
         } else {
-            match lookup_last_worker(worker_type, &store) {
-                Ok(Some(upid)) => upid.starttime,
-                Ok(None) => 0,
+            match jobstate::last_run_time(worker_type, &store) {
+                Ok(time) => time,
                 Err(err) => {
-                    eprintln!("lookup_last_job_start failed: {}", err);
+                    eprintln!("could not get last run time of {} {}: {}", worker_type, store, err);
                     continue;
                 }
             }
@@ -314,6 +287,11 @@ async fn schedule_datastore_garbage_collection() {
 
         if next > now  { continue; }
 
+        let mut job = match Job::new(worker_type, &store) {
+            Ok(job) => job,
+            Err(_) => continue, // could not get lock
+        };
+
         let store2 = store.clone();
 
         if let Err(err) = WorkerTask::new_thread(
@@ -322,9 +300,20 @@ async fn schedule_datastore_garbage_collection() {
             Userid::backup_userid().clone(),
             false,
             move |worker| {
+                job.start(&worker.upid().to_string())?;
+
                 worker.log(format!("starting garbage collection on store {}", store));
                 worker.log(format!("task triggered by schedule '{}'", event_str));
-                datastore.garbage_collection(&worker)
+
+                let result = datastore.garbage_collection(&worker);
+
+                let status = worker.create_state(&result);
+
+                if let Err(err) = job.finish(status) {
+                    eprintln!("could not finish job state for {}: {}", worker_type, err);
+                }
+
+                result
             }
         ) {
             eprintln!("unable to start garbage collection on store {} - {}", store2, err);
@@ -482,7 +471,10 @@ async fn schedule_datastore_prune() {
 async fn schedule_datastore_verification() {
     use proxmox_backup::backup::{DataStore, verify_all_backups};
     use proxmox_backup::server::{WorkerTask};
-    use proxmox_backup::config::datastore::{self, DataStoreConfig};
+    use proxmox_backup::config::{
+        jobstate::{self, Job},
+        datastore::{self, DataStoreConfig}
+    };
     use proxmox_backup::tools::systemd::time::{
         parse_calendar_event, compute_next_event};
 
@@ -526,16 +518,10 @@ async fn schedule_datastore_verification() {
 
         let worker_type = "verify";
 
-        let last = match lookup_last_worker(worker_type, &store) {
-            Ok(Some(upid)) => {
-                if proxmox_backup::server::worker_is_active_local(&upid) {
-                    continue;
-                }
-                upid.starttime
-            }
-            Ok(None) => 0,
+        let last = match jobstate::last_run_time(worker_type, &store) {
+            Ok(time) => time,
             Err(err) => {
-                eprintln!("lookup_last_job_start failed: {}", err);
+                eprintln!("could not get last run time of {} {}: {}", worker_type, store, err);
                 continue;
             }
         };
@@ -553,6 +539,11 @@ async fn schedule_datastore_verification() {
 
         if next > now { continue; }
 
+        let mut job = match Job::new(worker_type, &store) {
+            Ok(job) => job,
+            Err(_) => continue, // could not get lock
+        };
+
         let worker_id = store.clone();
         let store2 = store.clone();
         if let Err(err) = WorkerTask::new_thread(
@@ -561,18 +552,29 @@ async fn schedule_datastore_verification() {
             Userid::backup_userid().clone(),
             false,
             move |worker| {
+                job.start(&worker.upid().to_string())?;
                 worker.log(format!("starting verification on store {}", store2));
                 worker.log(format!("task triggered by schedule '{}'", event_str));
-                if let Ok(failed_dirs) = verify_all_backups(datastore, worker.clone()) {
+                let result = (|| {
+                    let failed_dirs = verify_all_backups(datastore, worker.clone())?;
                     if failed_dirs.len() > 0 {
                         worker.log("Failed to verify following snapshots:");
                         for dir in failed_dirs {
                             worker.log(format!("\t{}", dir));
                         }
-                        bail!("verification failed - please check the log for details");
+                        Err(format_err!("verification failed - please check the log for details"))
+                    } else {
+                        Ok(())
                     }
+                })();
+
+                let status = worker.create_state(&result);
+
+                if let Err(err) = job.finish(status) {
+                    eprintln!("could not finish job state for {}: {}", worker_type, err);
                 }
-                Ok(())
+
+                result
             },
         ) {
             eprintln!("unable to start verification on store {} - {}", store, err);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling
  2020-09-24 13:52 [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling Dominik Csapak
  2020-09-24 13:52 ` [pbs-devel] [PATCH proxmox-backup 2/2] use jobstate mechanism for verify/garbage_collection schedules Dominik Csapak
@ 2020-09-24 14:01 ` Stefan Reiter
  2020-09-24 14:04   ` Dominik Csapak
  1 sibling, 1 reply; 4+ messages in thread
From: Stefan Reiter @ 2020-09-24 14:01 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

On 9/24/20 3:52 PM, Dominik Csapak wrote:
> we rely on the jobstate handling to write the error of the worker
> into its state file, but we used '?' here in a block which does not
> return the error to the block, but to the function/closure instead
> 
> so if a prune job failed because of such an '?', we did not write
> into the statefile and got a wrong state there
> 
> instead execute the code in a closure where the error gets returned
> correctly
> 
> in the future, we can use 'try blocks' (currently not in stable)
> 

Don't we also have the proxmox::try_block! macro? What's the difference 
to that then?

> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>   src/bin/proxmox-backup-proxy.rs | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
> index 8a6dfe36..96001214 100644
> --- a/src/bin/proxmox-backup-proxy.rs
> +++ b/src/bin/proxmox-backup-proxy.rs
> @@ -434,7 +434,7 @@ async fn schedule_datastore_prune() {
>   
>                   job.start(&worker.upid().to_string())?;
>   
> -                let result = {
> +                let result = (|| {
>   
>                       worker.log(format!("Starting datastore prune on store \"{}\"", store));
>                       worker.log(format!("task triggered by schedule '{}'", event_str));
> @@ -463,7 +463,7 @@ async fn schedule_datastore_prune() {
>                           }
>                       }
>                       Ok(())
> -                };
> +                })();
>   
>                   let status = worker.create_state(&result);
>   
> 




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling
  2020-09-24 14:01 ` [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling Stefan Reiter
@ 2020-09-24 14:04   ` Dominik Csapak
  0 siblings, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2020-09-24 14:04 UTC (permalink / raw)
  To: Stefan Reiter, Proxmox Backup Server development discussion

On 9/24/20 4:01 PM, Stefan Reiter wrote:
> Don't we also have the proxmox::try_block! macro? What's the difference 
> to that then?


ah yes, you're right. i missed that this exists^^
i'll send a v2 for both patches using the try_block! macro




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-24 14:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24 13:52 [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling Dominik Csapak
2020-09-24 13:52 ` [pbs-devel] [PATCH proxmox-backup 2/2] use jobstate mechanism for verify/garbage_collection schedules Dominik Csapak
2020-09-24 14:01 ` [pbs-devel] [PATCH proxmox-backup 1/2] proxy: fix error handling in prune scheduling Stefan Reiter
2020-09-24 14:04   ` Dominik Csapak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal