all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Hannes Laimer <h.laimer@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	pbs-devel@lists.proxmox.com
Cc: Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced
Date: Mon, 24 Nov 2025 09:12:02 +0100	[thread overview]
Message-ID: <b6429230-8ab7-453a-a20c-a541ec6c6e64@proxmox.com> (raw)
In-Reply-To: <1763633501.91d4npp4ky.astroid@yuna.none>

On 11/20/25 11:22, Fabian Grünbichler wrote:
> On November 20, 2025 10:37 am, Hannes Laimer wrote:
>> hmm, I'm not sure pushing a new 0/0 entry in that case adds much...
>> logging this though makes a lot if sense
>>
>> actually, I think my patch is not correct. If we have `0/0` and call
>> update with -1 we'd end up with a -1 count in the tracking file.
>> decrementing is also a problem with a 0 counter, not just with
>> non-existing entries.
> 
> that's true. maybe we should first answer the question how we want to
> handle such a mismatch, and then think about implementation details ;)
> 
> AFAICT:
> 
> - we add an operation during datastore lookup (two calls)
> - we add an operation when cloning a datastore instance (one call)
> - we remove an operation when dropping a datastore instance (one call)
> 
> there's some more which are only used by examples and should maybe be
> dropped..
> 
> if a process crashes without executing the drop handler, a left-over
> entry could exist. but such an entry will be cleaned up by the next
> update_active_operations call since the PID is no longer valid.
> 
> so the only remaining issues would be:
> - explicitly leaking instead of dropping a datastore (should never be
>    done)
> - manually editing the active operations file
> - unlinking the lock file while it is used
> 
> effectively, if we would ever end up with an active operation count < 0
> for a given PID, we know something is wrong. but we can not recover for
> this particular PID, so maybe we should add a poison flag (or use a
> negative count as such), and require that process to exit before
> considering the datastore to be "sane" again?
> 

yes, treating a negative value as such makes sense. I guess in such a
case we just shouldn't allow the creating of new (IO) tasks completely.
This should allow running tasks to finish, and with a log message saying
something like
`looks like something went wrong... please restart proxy.service`

not sure if lookup's are fine, since we don't really enforce that the
reference is actually never used for IO...
([1] from some time ago would address that, but only orthogonality
relevant here)


[1] 
https://lore.proxmox.com/pbs-devel/20250526141445.228717-1-h.laimer@proxmox.com/

> there are only a few places where the operation counts matter:
> - removal from the cache map to close FDs when the last task exits, in
>    case certain maintenance mode is set
> - waiting for active tasks to be done before activating certain
>    maintenance modes
> 
> neither of this can be done (safely) if we can no longer tell whether
> there are active tasks..
> 
>>
>> On 11/20/25 10:03, Fabian Grünbichler wrote:
>>> and warn about it. this *should* never happen unless the tracking file got
>>> somehow messed with manually..
>>>
>>> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>>> ---
>>> This one fixes the replied-to patch to also correctly store an entry with no
>>> tasks for the current PID, instead of just returning that there are none..
>>>
>>> I am actually not sure how we should handle such a desync, we now pretend it's
>>> the last task even though we don't know for sure.. maybe we should just error
>>> out and let the Drop handler (not) handle it?
>>>
>>>    pbs-datastore/src/task_tracking.rs | 12 ++++++++++--
>>>    1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
>>> index 10afebbe2..755d88fdf 100644
>>> --- a/pbs-datastore/src/task_tracking.rs
>>> +++ b/pbs-datastore/src/task_tracking.rs
>>> @@ -94,7 +94,7 @@ pub fn get_active_operations_locked(
>>>    pub fn update_active_operations(
>>>        name: &str,
>>>        operation: Operation,
>>> -    count: i64,
>>> +    mut count: i64,
>>>    ) -> Result<ActiveOperationStats, Error> {
>>>        let path = PathBuf::from(format!("{}/{}", crate::ACTIVE_OPERATIONS_DIR, name));
>>>    
>>> @@ -131,7 +131,15 @@ pub fn update_active_operations(
>>>            None => Vec::new(),
>>>        };
>>>    
>>> -    if !found_entry && count > 0 {
>>> +    if !found_entry {
>>> +        if count < 0 {
>>> +            // if we don't have any operations at the moment, decrementing is not possible..
>>> +            log::warn!(
>>> +                "Active operations tracking mismatch - no current entry for {pid} but asked
>>> +to decrement by {count}!"
>>> +            );
>>> +            count = 0;
>>> +        };
>>>            match operation {
>>>                Operation::Read => updated_active_operations.read = count,
>>>                Operation::Write => updated_active_operations.write = count,
>>
>>



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

      reply	other threads:[~2025-11-24  8:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-20  6:02 [pbs-devel] [PATCH proxmox-backup v2] task tracking: improve pruning and fix accounting for missing entries Hannes Laimer
2025-11-20  9:01 ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Fabian Grünbichler
2025-11-20  9:01   ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 3/4] task tracking: refactor code Fabian Grünbichler
2025-11-20  9:01   ` [pbs-devel] [RFC FOLLOW-UP proxmox-backup 4/4] task tracking: simplify public interface Fabian Grünbichler
2025-11-20  9:37   ` [pbs-devel] [PATCH FOLLOW-UP proxmox-backup 2/4] task tracking: actually reset entry if desynced Hannes Laimer
2025-11-20 10:22     ` Fabian Grünbichler
2025-11-24  8:12       ` Hannes Laimer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6429230-8ab7-453a-a20c-a541ec6c6e64@proxmox.com \
    --to=h.laimer@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal