* [pbs-devel] [PATCH proxmox-backup v2] task tracking: improve pruning and fix accounting for missing entries @ 2025-11-20 6:02 Hannes Laimer 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler 0 siblings, 1 reply; 6+ messages in thread From: Hannes Laimer @ 2025-11-20 6:02 UTC (permalink / raw) To: pbs-devel Refactor the active operation tracking to use the `check_process_running_pstart` helper. This simplifies the logic for identifying stale entries caused by PID reuse and aligns the update path with the read path. Additionally, fix a logic bug where decrementing the operation count for a non-existent entry would incorrectly create a new entry with a positive count of 1. Now, such operations are ignored, and new entries are only created when the count is positive. Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Hannes Laimer <h.laimer@proxmox.com> --- v2, thanks @Fabian!: - use `check_process_running_pstart` instead of `check_process_running` + starttime comparison - improve readability by flattening match logic - fix bug for decrements when no entry exists (led to new entry with 1 before) pbs-datastore/src/task_tracking.rs | 52 ++++++++++++++++-------------- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs index 44a4522d..10afebbe 100644 --- a/pbs-datastore/src/task_tracking.rs +++ b/pbs-datastore/src/task_tracking.rs @@ -103,43 +103,45 @@ pub fn update_active_operations( let pid = std::process::id(); let starttime = procfs::PidStat::read_from_pid(Pid::from_raw(pid as pid_t))?.starttime; - let mut updated_active_operations = match operation { - Operation::Read => ActiveOperationStats { read: 1, write: 0 }, - Operation::Write => ActiveOperationStats { read: 0, write: 1 }, - Operation::Lookup => ActiveOperationStats { read: 0, write: 0 }, - }; + let mut updated_active_operations = ActiveOperationStats::default(); let mut found_entry = false; let mut updated_tasks: Vec<TaskOperations> = match file_read_optional_string(&path)? { Some(data) => serde_json::from_str::<Vec<TaskOperations>>(&data)? - .iter_mut() - .filter_map( - |task| match procfs::check_process_running(task.pid as pid_t) { - Some(stat) if pid == task.pid && stat.starttime != task.starttime => None, - Some(_) => { - if pid == task.pid { - found_entry = true; - match operation { - Operation::Read => task.active_operations.read += count, - Operation::Write => task.active_operations.write += count, - Operation::Lookup => (), // no IO must happen there - }; - updated_active_operations = task.active_operations; - } - Some(task.clone()) + .into_iter() + .filter_map(|mut task| { + match procfs::check_process_running_pstart(task.pid as pid_t, task.starttime) { + // update entry for current PID + Some(_stat) if pid == task.pid => { + found_entry = true; + match operation { + Operation::Read => task.active_operations.read += count, + Operation::Write => task.active_operations.write += count, + Operation::Lookup => (), // no IO must happen there + }; + updated_active_operations = task.active_operations; + Some(task) } - _ => None, - }, - ) + // keep other entries + Some(_stat) => Some(task), + // drop entries for PIDs which are not running or have been recycled + None => None, + } + }) .collect(), None => Vec::new(), }; - if !found_entry { + if !found_entry && count > 0 { + match operation { + Operation::Read => updated_active_operations.read = count, + Operation::Write => updated_active_operations.write = count, + Operation::Lookup => (), + }; updated_tasks.push(TaskOperations { pid, starttime, active_operations: updated_active_operations, - }) + }); } replace_file( &path, -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced 2025-11-20 6:02 [pbs-devel] [PATCH proxmox-backup v2] task tracking: improve pruning and fix accounting for missing entries Hannes Laimer @ 2025-11-20 9:01 ` Fabian Grünbichler 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Fabian Grünbichler @ 2025-11-20 9:01 UTC (permalink / raw) To: pbs-devel and warn about it. this *should* never happen unless the tracking file got somehow messed with manually.. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> --- This one fixes the replied-to patch to also correctly store an entry with no tasks for the current PID, instead of just returning that there are none.. I am actually not sure how we should handle such a desync, we now pretend it's the last task even though we don't know for sure.. maybe we should just error out and let the Drop handler (not) handle it? pbs-datastore/src/task_tracking.rs | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs index 10afebbe2..755d88fdf 100644 --- a/pbs-datastore/src/task_tracking.rs +++ b/pbs-datastore/src/task_tracking.rs @@ -94,7 +94,7 @@ pub fn get_active_operations_locked( pub fn update_active_operations( name: &str, operation: Operation, - count: i64, + mut count: i64, ) -> Result<ActiveOperationStats, Error> { let path = PathBuf::from(format!("{}/{}", crate::ACTIVE_OPERATIONS_DIR, name)); @@ -131,7 +131,15 @@ pub fn update_active_operations( None => Vec::new(), }; - if !found_entry && count > 0 { + if !found_entry { + if count < 0 { + // if we don't have any operations at the moment, decrementing is not possible.. + log::warn!( + "Active operations tracking mismatch - no current entry for {pid} but asked +to decrement by {count}!" + ); + count = 0; + }; match operation { Operation::Read => updated_active_operations.read = count, Operation::Write => updated_active_operations.write = count, -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler @ 2025-11-20 9:01 ` Fabian Grünbichler 2025-11-20 9:01 ` [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface Fabian Grünbichler 2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer 2 siblings, 0 replies; 6+ messages in thread From: Fabian Grünbichler @ 2025-11-20 9:01 UTC (permalink / raw) To: pbs-devel no semantic changes intended, but make the code a little more readable and slightly faster, by only initializing the new entry when needed. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> --- pbs-datastore/src/task_tracking.rs | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs index 755d88fdf..b7e569efb 100644 --- a/pbs-datastore/src/task_tracking.rs +++ b/pbs-datastore/src/task_tracking.rs @@ -101,10 +101,7 @@ pub fn update_active_operations( let (_lock, options) = open_lock_file(name)?; let pid = std::process::id(); - let starttime = procfs::PidStat::read_from_pid(Pid::from_raw(pid as pid_t))?.starttime; - - let mut updated_active_operations = ActiveOperationStats::default(); - let mut found_entry = false; + let mut current_pid_operations = None; let mut updated_tasks: Vec<TaskOperations> = match file_read_optional_string(&path)? { Some(data) => serde_json::from_str::<Vec<TaskOperations>>(&data)? .into_iter() @@ -112,13 +109,12 @@ pub fn update_active_operations( match procfs::check_process_running_pstart(task.pid as pid_t, task.starttime) { // update entry for current PID Some(_stat) if pid == task.pid => { - found_entry = true; match operation { Operation::Read => task.active_operations.read += count, Operation::Write => task.active_operations.write += count, Operation::Lookup => (), // no IO must happen there }; - updated_active_operations = task.active_operations; + current_pid_operations = Some(task.active_operations); Some(task) } // keep other entries @@ -131,7 +127,9 @@ pub fn update_active_operations( None => Vec::new(), }; - if !found_entry { + let active_operations = if let Some(current) = current_pid_operations { + current + } else { if count < 0 { // if we don't have any operations at the moment, decrementing is not possible.. log::warn!( @@ -140,22 +138,27 @@ to decrement by {count}!" ); count = 0; }; + let starttime = procfs::PidStat::read_from_pid(Pid::from_raw(pid as pid_t))?.starttime; + + let mut active_operations = ActiveOperationStats::default(); match operation { - Operation::Read => updated_active_operations.read = count, - Operation::Write => updated_active_operations.write = count, + Operation::Read => active_operations.read = count, + Operation::Write => active_operations.write = count, Operation::Lookup => (), }; updated_tasks.push(TaskOperations { pid, starttime, - active_operations: updated_active_operations, + active_operations, }); - } + active_operations + }; replace_file( &path, serde_json::to_string(&updated_tasks)?.as_bytes(), options, false, - ) - .map(|_| updated_active_operations) + )?; + + Ok(active_operations) } -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler @ 2025-11-20 9:01 ` Fabian Grünbichler 2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer 2 siblings, 0 replies; 6+ messages in thread From: Fabian Grünbichler @ 2025-11-20 9:01 UTC (permalink / raw) To: pbs-devel instead of an update fn that is always called with 1 or -1, use add/remove_ wrappers. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> --- Notes: we could go one step further, and make remove_ return whether it was the last one, since that is basically the only thing we are interested in when calling either helper.. pbs-datastore/src/datastore.rs | 12 ++++++------ pbs-datastore/src/task_tracking.rs | 16 +++++++++++++++- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index 0a5179230..8bc58dcbb 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -43,7 +43,7 @@ use crate::fixed_index::{FixedIndexReader, FixedIndexWriter}; use crate::hierarchy::{ListGroups, ListGroupsType, ListNamespaces, ListNamespacesRecursive}; use crate::index::IndexFile; use crate::s3::S3_CONTENT_PREFIX; -use crate::task_tracking::{self, update_active_operations}; +use crate::task_tracking::{self, add_active_operation, remove_active_operation}; use crate::{DataBlob, LocalDatastoreLruCache}; static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> = @@ -175,7 +175,7 @@ impl Clone for DataStore { fn clone(&self) -> Self { let mut new_operation = self.operation; if let Some(operation) = self.operation { - if let Err(e) = update_active_operations(self.name(), operation, 1) { + if let Err(e) = add_active_operation(self.name(), operation) { log::error!("could not update active operations - {}", e); new_operation = None; } @@ -192,7 +192,7 @@ impl Drop for DataStore { fn drop(&mut self) { if let Some(operation) = self.operation { let mut last_task = false; - match update_active_operations(self.name(), operation, -1) { + match remove_active_operation(self.name(), operation) { Err(e) => log::error!("could not update active operations - {}", e), Ok(updated_operations) => { last_task = updated_operations.read + updated_operations.write == 0; @@ -356,7 +356,7 @@ impl DataStore { let last_digest = datastore.last_digest.as_ref(); if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) { if let Some(operation) = operation { - update_active_operations(name, operation, 1)?; + add_active_operation(name, operation)?; } return Ok(Arc::new(Self { inner: Arc::clone(datastore), @@ -382,7 +382,7 @@ impl DataStore { datastore_cache.insert(name.to_string(), datastore.clone()); if let Some(operation) = operation { - update_active_operations(name, operation, 1)?; + add_active_operation(name, operation)?; } Ok(Arc::new(Self { @@ -469,7 +469,7 @@ impl DataStore { )?); if let Some(operation) = operation { - update_active_operations(&name, operation, 1)?; + add_active_operation(&name, operation)?; } Ok(Arc::new(Self { inner, operation })) diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs index b7e569efb..f4f6c57e7 100644 --- a/pbs-datastore/src/task_tracking.rs +++ b/pbs-datastore/src/task_tracking.rs @@ -91,7 +91,21 @@ pub fn get_active_operations_locked( Ok((data, lock.unwrap())) } -pub fn update_active_operations( +pub fn add_active_operation( + name: &str, + operation: Operation, +) -> Result<ActiveOperationStats, Error> { + update_active_operations(name, operation, 1) +} + +pub fn remove_active_operation( + name: &str, + operation: Operation, +) -> Result<ActiveOperationStats, Error> { + update_active_operations(name, operation, -1) +} + +fn update_active_operations( name: &str, operation: Operation, mut count: i64, -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler 2025-11-20 9:01 ` [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface Fabian Grünbichler @ 2025-11-20 9:37 ` Hannes Laimer 2025-11-20 10:22 ` Fabian Grünbichler 2 siblings, 1 reply; 6+ messages in thread From: Hannes Laimer @ 2025-11-20 9:37 UTC (permalink / raw) To: Fabian Grünbichler, pbs-devel hmm, I'm not sure pushing a new 0/0 entry in that case adds much... logging this though makes a lot if sense actually, I think my patch is not correct. If we have `0/0` and call update with -1 we'd end up with a -1 count in the tracking file. decrementing is also a problem with a 0 counter, not just with non-existing entries. On 11/20/25 10:03, Fabian Grünbichler wrote: > and warn about it. this *should* never happen unless the tracking file got > somehow messed with manually.. > > Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> > --- > This one fixes the replied-to patch to also correctly store an entry with no > tasks for the current PID, instead of just returning that there are none.. > > I am actually not sure how we should handle such a desync, we now pretend it's > the last task even though we don't know for sure.. maybe we should just error > out and let the Drop handler (not) handle it? > > pbs-datastore/src/task_tracking.rs | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs > index 10afebbe2..755d88fdf 100644 > --- a/pbs-datastore/src/task_tracking.rs > +++ b/pbs-datastore/src/task_tracking.rs > @@ -94,7 +94,7 @@ pub fn get_active_operations_locked( > pub fn update_active_operations( > name: &str, > operation: Operation, > - count: i64, > + mut count: i64, > ) -> Result<ActiveOperationStats, Error> { > let path = PathBuf::from(format!("{}/{}", crate::ACTIVE_OPERATIONS_DIR, name)); > > @@ -131,7 +131,15 @@ pub fn update_active_operations( > None => Vec::new(), > }; > > - if !found_entry && count > 0 { > + if !found_entry { > + if count < 0 { > + // if we don't have any operations at the moment, decrementing is not possible.. > + log::warn!( > + "Active operations tracking mismatch - no current entry for {pid} but asked > +to decrement by {count}!" > + ); > + count = 0; > + }; > match operation { > Operation::Read => updated_active_operations.read = count, > Operation::Write => updated_active_operations.write = count, _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced 2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer @ 2025-11-20 10:22 ` Fabian Grünbichler 0 siblings, 0 replies; 6+ messages in thread From: Fabian Grünbichler @ 2025-11-20 10:22 UTC (permalink / raw) To: Hannes Laimer, pbs-devel; +Cc: Thomas Lamprecht On November 20, 2025 10:37 am, Hannes Laimer wrote: > hmm, I'm not sure pushing a new 0/0 entry in that case adds much... > logging this though makes a lot if sense > > actually, I think my patch is not correct. If we have `0/0` and call > update with -1 we'd end up with a -1 count in the tracking file. > decrementing is also a problem with a 0 counter, not just with > non-existing entries. that's true. maybe we should first answer the question how we want to handle such a mismatch, and then think about implementation details ;) AFAICT: - we add an operation during datastore lookup (two calls) - we add an operation when cloning a datastore instance (one call) - we remove an operation when dropping a datastore instance (one call) there's some more which are only used by examples and should maybe be dropped.. if a process crashes without executing the drop handler, a left-over entry could exist. but such an entry will be cleaned up by the next update_active_operations call since the PID is no longer valid. so the only remaining issues would be: - explicitly leaking instead of dropping a datastore (should never be done) - manually editing the active operations file - unlinking the lock file while it is used effectively, if we would ever end up with an active operation count < 0 for a given PID, we know something is wrong. but we can not recover for this particular PID, so maybe we should add a poison flag (or use a negative count as such), and require that process to exit before considering the datastore to be "sane" again? there are only a few places where the operation counts matter: - removal from the cache map to close FDs when the last task exits, in case certain maintenance mode is set - waiting for active tasks to be done before activating certain maintenance modes neither of this can be done (safely) if we can no longer tell whether there are active tasks.. > > On 11/20/25 10:03, Fabian Grünbichler wrote: >> and warn about it. this *should* never happen unless the tracking file got >> somehow messed with manually.. >> >> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> >> --- >> This one fixes the replied-to patch to also correctly store an entry with no >> tasks for the current PID, instead of just returning that there are none.. >> >> I am actually not sure how we should handle such a desync, we now pretend it's >> the last task even though we don't know for sure.. maybe we should just error >> out and let the Drop handler (not) handle it? >> >> pbs-datastore/src/task_tracking.rs | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs >> index 10afebbe2..755d88fdf 100644 >> --- a/pbs-datastore/src/task_tracking.rs >> +++ b/pbs-datastore/src/task_tracking.rs >> @@ -94,7 +94,7 @@ pub fn get_active_operations_locked( >> pub fn update_active_operations( >> name: &str, >> operation: Operation, >> - count: i64, >> + mut count: i64, >> ) -> Result<ActiveOperationStats, Error> { >> let path = PathBuf::from(format!("{}/{}", crate::ACTIVE_OPERATIONS_DIR, name)); >> >> @@ -131,7 +131,15 @@ pub fn update_active_operations( >> None => Vec::new(), >> }; >> >> - if !found_entry && count > 0 { >> + if !found_entry { >> + if count < 0 { >> + // if we don't have any operations at the moment, decrementing is not possible.. >> + log::warn!( >> + "Active operations tracking mismatch - no current entry for {pid} but asked >> +to decrement by {count}!" >> + ); >> + count = 0; >> + }; >> match operation { >> Operation::Read => updated_active_operations.read = count, >> Operation::Write => updated_active_operations.write = count, > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-20 10:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-11-20 6:02 [pbs-devel] [PATCH proxmox-backup v2] task tracking: improve pruning and fix accounting for missing entries Hannes Laimer 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler 2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler 2025-11-20 9:01 ` [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface Fabian Grünbichler 2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer 2025-11-20 10:22 ` Fabian Grünbichler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox