* [pbs-devel] [PATCH proxmox-backup v2] task tracking: improve pruning and fix accounting for missing entries
@ 2025-11-20 6:02 Hannes Laimer
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler
0 siblings, 1 reply; 6+ messages in thread
From: Hannes Laimer @ 2025-11-20 6:02 UTC (permalink / raw)
To: pbs-devel
Refactor the active operation tracking to use the
`check_process_running_pstart` helper. This simplifies the logic for
identifying stale entries caused by PID reuse and aligns the update path
with the read path.
Additionally, fix a logic bug where decrementing the operation count for
a non-existent entry would incorrectly create a new entry with a
positive count of 1. Now, such operations are ignored, and new entries
are only created when the count is positive.
Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
---
v2, thanks @Fabian!:
- use `check_process_running_pstart` instead of `check_process_running` + starttime comparison
- improve readability by flattening match logic
- fix bug for decrements when no entry exists (led to new entry with 1
before)
pbs-datastore/src/task_tracking.rs | 52 ++++++++++++++++--------------
1 file changed, 27 insertions(+), 25 deletions(-)
diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
index 44a4522d..10afebbe 100644
--- a/pbs-datastore/src/task_tracking.rs
+++ b/pbs-datastore/src/task_tracking.rs
@@ -103,43 +103,45 @@ pub fn update_active_operations(
let pid = std::process::id();
let starttime = procfs::PidStat::read_from_pid(Pid::from_raw(pid as pid_t))?.starttime;
- let mut updated_active_operations = match operation {
- Operation::Read => ActiveOperationStats { read: 1, write: 0 },
- Operation::Write => ActiveOperationStats { read: 0, write: 1 },
- Operation::Lookup => ActiveOperationStats { read: 0, write: 0 },
- };
+ let mut updated_active_operations = ActiveOperationStats::default();
let mut found_entry = false;
let mut updated_tasks: Vec<TaskOperations> = match file_read_optional_string(&path)? {
Some(data) => serde_json::from_str::<Vec<TaskOperations>>(&data)?
- .iter_mut()
- .filter_map(
- |task| match procfs::check_process_running(task.pid as pid_t) {
- Some(stat) if pid == task.pid && stat.starttime != task.starttime => None,
- Some(_) => {
- if pid == task.pid {
- found_entry = true;
- match operation {
- Operation::Read => task.active_operations.read += count,
- Operation::Write => task.active_operations.write += count,
- Operation::Lookup => (), // no IO must happen there
- };
- updated_active_operations = task.active_operations;
- }
- Some(task.clone())
+ .into_iter()
+ .filter_map(|mut task| {
+ match procfs::check_process_running_pstart(task.pid as pid_t, task.starttime) {
+ // update entry for current PID
+ Some(_stat) if pid == task.pid => {
+ found_entry = true;
+ match operation {
+ Operation::Read => task.active_operations.read += count,
+ Operation::Write => task.active_operations.write += count,
+ Operation::Lookup => (), // no IO must happen there
+ };
+ updated_active_operations = task.active_operations;
+ Some(task)
}
- _ => None,
- },
- )
+ // keep other entries
+ Some(_stat) => Some(task),
+ // drop entries for PIDs which are not running or have been recycled
+ None => None,
+ }
+ })
.collect(),
None => Vec::new(),
};
- if !found_entry {
+ if !found_entry && count > 0 {
+ match operation {
+ Operation::Read => updated_active_operations.read = count,
+ Operation::Write => updated_active_operations.write = count,
+ Operation::Lookup => (),
+ };
updated_tasks.push(TaskOperations {
pid,
starttime,
active_operations: updated_active_operations,
- })
+ });
}
replace_file(
&path,
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced
2025-11-20 6:02 [pbs-devel] [PATCH proxmox-backup v2] task tracking: improve pruning and fix accounting for missing entries Hannes Laimer
@ 2025-11-20 9:01 ` Fabian Grünbichler
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2025-11-20 9:01 UTC (permalink / raw)
To: pbs-devel
and warn about it. this *should* never happen unless the tracking file got
somehow messed with manually..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
This one fixes the replied-to patch to also correctly store an entry with no
tasks for the current PID, instead of just returning that there are none..
I am actually not sure how we should handle such a desync, we now pretend it's
the last task even though we don't know for sure.. maybe we should just error
out and let the Drop handler (not) handle it?
pbs-datastore/src/task_tracking.rs | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
index 10afebbe2..755d88fdf 100644
--- a/pbs-datastore/src/task_tracking.rs
+++ b/pbs-datastore/src/task_tracking.rs
@@ -94,7 +94,7 @@ pub fn get_active_operations_locked(
pub fn update_active_operations(
name: &str,
operation: Operation,
- count: i64,
+ mut count: i64,
) -> Result<ActiveOperationStats, Error> {
let path = PathBuf::from(format!("{}/{}", crate::ACTIVE_OPERATIONS_DIR, name));
@@ -131,7 +131,15 @@ pub fn update_active_operations(
None => Vec::new(),
};
- if !found_entry && count > 0 {
+ if !found_entry {
+ if count < 0 {
+ // if we don't have any operations at the moment, decrementing is not possible..
+ log::warn!(
+ "Active operations tracking mismatch - no current entry for {pid} but asked
+to decrement by {count}!"
+ );
+ count = 0;
+ };
match operation {
Operation::Read => updated_active_operations.read = count,
Operation::Write => updated_active_operations.write = count,
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler
@ 2025-11-20 9:01 ` Fabian Grünbichler
2025-11-20 9:01 ` [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface Fabian Grünbichler
2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer
2 siblings, 0 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2025-11-20 9:01 UTC (permalink / raw)
To: pbs-devel
no semantic changes intended, but make the code a little more readable and
slightly faster, by only initializing the new entry when needed.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
pbs-datastore/src/task_tracking.rs | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
index 755d88fdf..b7e569efb 100644
--- a/pbs-datastore/src/task_tracking.rs
+++ b/pbs-datastore/src/task_tracking.rs
@@ -101,10 +101,7 @@ pub fn update_active_operations(
let (_lock, options) = open_lock_file(name)?;
let pid = std::process::id();
- let starttime = procfs::PidStat::read_from_pid(Pid::from_raw(pid as pid_t))?.starttime;
-
- let mut updated_active_operations = ActiveOperationStats::default();
- let mut found_entry = false;
+ let mut current_pid_operations = None;
let mut updated_tasks: Vec<TaskOperations> = match file_read_optional_string(&path)? {
Some(data) => serde_json::from_str::<Vec<TaskOperations>>(&data)?
.into_iter()
@@ -112,13 +109,12 @@ pub fn update_active_operations(
match procfs::check_process_running_pstart(task.pid as pid_t, task.starttime) {
// update entry for current PID
Some(_stat) if pid == task.pid => {
- found_entry = true;
match operation {
Operation::Read => task.active_operations.read += count,
Operation::Write => task.active_operations.write += count,
Operation::Lookup => (), // no IO must happen there
};
- updated_active_operations = task.active_operations;
+ current_pid_operations = Some(task.active_operations);
Some(task)
}
// keep other entries
@@ -131,7 +127,9 @@ pub fn update_active_operations(
None => Vec::new(),
};
- if !found_entry {
+ let active_operations = if let Some(current) = current_pid_operations {
+ current
+ } else {
if count < 0 {
// if we don't have any operations at the moment, decrementing is not possible..
log::warn!(
@@ -140,22 +138,27 @@ to decrement by {count}!"
);
count = 0;
};
+ let starttime = procfs::PidStat::read_from_pid(Pid::from_raw(pid as pid_t))?.starttime;
+
+ let mut active_operations = ActiveOperationStats::default();
match operation {
- Operation::Read => updated_active_operations.read = count,
- Operation::Write => updated_active_operations.write = count,
+ Operation::Read => active_operations.read = count,
+ Operation::Write => active_operations.write = count,
Operation::Lookup => (),
};
updated_tasks.push(TaskOperations {
pid,
starttime,
- active_operations: updated_active_operations,
+ active_operations,
});
- }
+ active_operations
+ };
replace_file(
&path,
serde_json::to_string(&updated_tasks)?.as_bytes(),
options,
false,
- )
- .map(|_| updated_active_operations)
+ )?;
+
+ Ok(active_operations)
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler
@ 2025-11-20 9:01 ` Fabian Grünbichler
2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer
2 siblings, 0 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2025-11-20 9:01 UTC (permalink / raw)
To: pbs-devel
instead of an update fn that is always called with 1 or -1, use add/remove_
wrappers.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
Notes:
we could go one step further, and make remove_ return whether it was the last
one, since that is basically the only thing we are interested in when calling
either helper..
pbs-datastore/src/datastore.rs | 12 ++++++------
pbs-datastore/src/task_tracking.rs | 16 +++++++++++++++-
2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 0a5179230..8bc58dcbb 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -43,7 +43,7 @@ use crate::fixed_index::{FixedIndexReader, FixedIndexWriter};
use crate::hierarchy::{ListGroups, ListGroupsType, ListNamespaces, ListNamespacesRecursive};
use crate::index::IndexFile;
use crate::s3::S3_CONTENT_PREFIX;
-use crate::task_tracking::{self, update_active_operations};
+use crate::task_tracking::{self, add_active_operation, remove_active_operation};
use crate::{DataBlob, LocalDatastoreLruCache};
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
@@ -175,7 +175,7 @@ impl Clone for DataStore {
fn clone(&self) -> Self {
let mut new_operation = self.operation;
if let Some(operation) = self.operation {
- if let Err(e) = update_active_operations(self.name(), operation, 1) {
+ if let Err(e) = add_active_operation(self.name(), operation) {
log::error!("could not update active operations - {}", e);
new_operation = None;
}
@@ -192,7 +192,7 @@ impl Drop for DataStore {
fn drop(&mut self) {
if let Some(operation) = self.operation {
let mut last_task = false;
- match update_active_operations(self.name(), operation, -1) {
+ match remove_active_operation(self.name(), operation) {
Err(e) => log::error!("could not update active operations - {}", e),
Ok(updated_operations) => {
last_task = updated_operations.read + updated_operations.write == 0;
@@ -356,7 +356,7 @@ impl DataStore {
let last_digest = datastore.last_digest.as_ref();
if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
+ add_active_operation(name, operation)?;
}
return Ok(Arc::new(Self {
inner: Arc::clone(datastore),
@@ -382,7 +382,7 @@ impl DataStore {
datastore_cache.insert(name.to_string(), datastore.clone());
if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
+ add_active_operation(name, operation)?;
}
Ok(Arc::new(Self {
@@ -469,7 +469,7 @@ impl DataStore {
)?);
if let Some(operation) = operation {
- update_active_operations(&name, operation, 1)?;
+ add_active_operation(&name, operation)?;
}
Ok(Arc::new(Self { inner, operation }))
diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
index b7e569efb..f4f6c57e7 100644
--- a/pbs-datastore/src/task_tracking.rs
+++ b/pbs-datastore/src/task_tracking.rs
@@ -91,7 +91,21 @@ pub fn get_active_operations_locked(
Ok((data, lock.unwrap()))
}
-pub fn update_active_operations(
+pub fn add_active_operation(
+ name: &str,
+ operation: Operation,
+) -> Result<ActiveOperationStats, Error> {
+ update_active_operations(name, operation, 1)
+}
+
+pub fn remove_active_operation(
+ name: &str,
+ operation: Operation,
+) -> Result<ActiveOperationStats, Error> {
+ update_active_operations(name, operation, -1)
+}
+
+fn update_active_operations(
name: &str,
operation: Operation,
mut count: i64,
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler
2025-11-20 9:01 ` [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface Fabian Grünbichler
@ 2025-11-20 9:37 ` Hannes Laimer
2025-11-20 10:22 ` Fabian Grünbichler
2 siblings, 1 reply; 6+ messages in thread
From: Hannes Laimer @ 2025-11-20 9:37 UTC (permalink / raw)
To: Fabian Grünbichler, pbs-devel
hmm, I'm not sure pushing a new 0/0 entry in that case adds much...
logging this though makes a lot if sense
actually, I think my patch is not correct. If we have `0/0` and call
update with -1 we'd end up with a -1 count in the tracking file.
decrementing is also a problem with a 0 counter, not just with
non-existing entries.
On 11/20/25 10:03, Fabian Grünbichler wrote:
> and warn about it. this *should* never happen unless the tracking file got
> somehow messed with manually..
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> This one fixes the replied-to patch to also correctly store an entry with no
> tasks for the current PID, instead of just returning that there are none..
>
> I am actually not sure how we should handle such a desync, we now pretend it's
> the last task even though we don't know for sure.. maybe we should just error
> out and let the Drop handler (not) handle it?
>
> pbs-datastore/src/task_tracking.rs | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
> index 10afebbe2..755d88fdf 100644
> --- a/pbs-datastore/src/task_tracking.rs
> +++ b/pbs-datastore/src/task_tracking.rs
> @@ -94,7 +94,7 @@ pub fn get_active_operations_locked(
> pub fn update_active_operations(
> name: &str,
> operation: Operation,
> - count: i64,
> + mut count: i64,
> ) -> Result<ActiveOperationStats, Error> {
> let path = PathBuf::from(format!("{}/{}", crate::ACTIVE_OPERATIONS_DIR, name));
>
> @@ -131,7 +131,15 @@ pub fn update_active_operations(
> None => Vec::new(),
> };
>
> - if !found_entry && count > 0 {
> + if !found_entry {
> + if count < 0 {
> + // if we don't have any operations at the moment, decrementing is not possible..
> + log::warn!(
> + "Active operations tracking mismatch - no current entry for {pid} but asked
> +to decrement by {count}!"
> + );
> + count = 0;
> + };
> match operation {
> Operation::Read => updated_active_operations.read = count,
> Operation::Write => updated_active_operations.write = count,
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced
2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer
@ 2025-11-20 10:22 ` Fabian Grünbichler
0 siblings, 0 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2025-11-20 10:22 UTC (permalink / raw)
To: Hannes Laimer, pbs-devel; +Cc: Thomas Lamprecht
On November 20, 2025 10:37 am, Hannes Laimer wrote:
> hmm, I'm not sure pushing a new 0/0 entry in that case adds much...
> logging this though makes a lot if sense
>
> actually, I think my patch is not correct. If we have `0/0` and call
> update with -1 we'd end up with a -1 count in the tracking file.
> decrementing is also a problem with a 0 counter, not just with
> non-existing entries.
that's true. maybe we should first answer the question how we want to
handle such a mismatch, and then think about implementation details ;)
AFAICT:
- we add an operation during datastore lookup (two calls)
- we add an operation when cloning a datastore instance (one call)
- we remove an operation when dropping a datastore instance (one call)
there's some more which are only used by examples and should maybe be
dropped..
if a process crashes without executing the drop handler, a left-over
entry could exist. but such an entry will be cleaned up by the next
update_active_operations call since the PID is no longer valid.
so the only remaining issues would be:
- explicitly leaking instead of dropping a datastore (should never be
done)
- manually editing the active operations file
- unlinking the lock file while it is used
effectively, if we would ever end up with an active operation count < 0
for a given PID, we know something is wrong. but we can not recover for
this particular PID, so maybe we should add a poison flag (or use a
negative count as such), and require that process to exit before
considering the datastore to be "sane" again?
there are only a few places where the operation counts matter:
- removal from the cache map to close FDs when the last task exits, in
case certain maintenance mode is set
- waiting for active tasks to be done before activating certain
maintenance modes
neither of this can be done (safely) if we can no longer tell whether
there are active tasks..
>
> On 11/20/25 10:03, Fabian Grünbichler wrote:
>> and warn about it. this *should* never happen unless the tracking file got
>> somehow messed with manually..
>>
>> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>> ---
>> This one fixes the replied-to patch to also correctly store an entry with no
>> tasks for the current PID, instead of just returning that there are none..
>>
>> I am actually not sure how we should handle such a desync, we now pretend it's
>> the last task even though we don't know for sure.. maybe we should just error
>> out and let the Drop handler (not) handle it?
>>
>> pbs-datastore/src/task_tracking.rs | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
>> index 10afebbe2..755d88fdf 100644
>> --- a/pbs-datastore/src/task_tracking.rs
>> +++ b/pbs-datastore/src/task_tracking.rs
>> @@ -94,7 +94,7 @@ pub fn get_active_operations_locked(
>> pub fn update_active_operations(
>> name: &str,
>> operation: Operation,
>> - count: i64,
>> + mut count: i64,
>> ) -> Result<ActiveOperationStats, Error> {
>> let path = PathBuf::from(format!("{}/{}", crate::ACTIVE_OPERATIONS_DIR, name));
>>
>> @@ -131,7 +131,15 @@ pub fn update_active_operations(
>> None => Vec::new(),
>> };
>>
>> - if !found_entry && count > 0 {
>> + if !found_entry {
>> + if count < 0 {
>> + // if we don't have any operations at the moment, decrementing is not possible..
>> + log::warn!(
>> + "Active operations tracking mismatch - no current entry for {pid} but asked
>> +to decrement by {count}!"
>> + );
>> + count = 0;
>> + };
>> match operation {
>> Operation::Read => updated_active_operations.read = count,
>> Operation::Write => updated_active_operations.write = count,
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-20 10:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-20 6:02 [pbs-devel] [PATCH proxmox-backup v2] task tracking: improve pruning and fix accounting for missing entries Hannes Laimer
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler
2025-11-20 9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler
2025-11-20 9:01 ` [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface Fabian Grünbichler
2025-11-20 9:37 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer
2025-11-20 10:22 ` Fabian Grünbichler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox