[pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed

public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
@ 2022-03-09 13:50 Stefan Sterz
  2022-03-11 12:20 ` Thomas Lamprecht
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Sterz @ 2022-03-09 13:50 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index d416c8d8..623b7688 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -346,6 +346,28 @@ impl DataStore {
                 )
             })?;
 
+        // check if this was the last snapshot and if so remove the group
+        if backup_dir
+            .group()
+            .list_backups(&self.base_path())?
+            .is_empty()
+        {
+            let group_path = self.group_path(backup_dir.group());
+            let _guard = proxmox_sys::fs::lock_dir_noblock(
+                &group_path,
+                "backup group",
+                "possible running backup",
+            )?;
+
+            std::fs::remove_dir_all(&group_path).map_err(|err| {
+                format_err!(
+                    "removing backup group directory {:?} failed - {}",
+                    group_path,
+                    err,
+                )
+            })?;
+        }
+
         // the manifest does not exists anymore, we do not need to keep the lock
         if let Ok(path) = self.manifest_lock_path(backup_dir) {
             // ignore errors
-- 
2.30.2





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-09 13:50 [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed Stefan Sterz
@ 2022-03-11 12:20 ` Thomas Lamprecht
  2022-03-14  9:36   ` Wolfgang Bumiller
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Lamprecht @ 2022-03-11 12:20 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Sterz

On 09.03.22 14:50, Stefan Sterz wrote:
> Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
> ---
>  pbs-datastore/src/datastore.rs | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index d416c8d8..623b7688 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -346,6 +346,28 @@ impl DataStore {
>                  )
>              })?;
>  
> +        // check if this was the last snapshot and if so remove the group
> +        if backup_dir
> +            .group()
> +            .list_backups(&self.base_path())?
> +            .is_empty()
> +        {

a log::info could be appropriate in the "success" (i.e., delete dir) case.

I'd factor the this block below out into a non-pub (or pub(crate)) remove_empty_group_dir fn.

> +            let group_path = self.group_path(backup_dir.group());
> +            let _guard = proxmox_sys::fs::lock_dir_noblock(
> +                &group_path,
> +                "backup group",
> +                "possible running backup",
> +            )?;
> +
> +            std::fs::remove_dir_all(&group_path).map_err(|err| {

this is still unsafe as there's a TOCTOU race, the lock does not protects you from the
following sequence with two threads/async-excutions t1 and t1

t1.1 snapshot deleted
t1.2 empty dir check holds up, entering "delete group dir" code branch
t2.1                                        create new snapshot in group -> lock group dir
t2.2                                        finish new snapshot in group -> unlock group dir
t1.3 lock group dir
t1.4 delete all files, including the new snapshot made in-between.

Rather, just use the safer "remove_dir" variant, that way the TOCTOU race doesn't matters,
the check merely becomes a short cut; if we'd explicitly check for
  `err.kind() != ErrorKind::DirectoryNotEmpty
and silent it we could even do away with the check, should result in the same amount of
syscalls in the best-case (one rmdir vs. one readir) and can be better on success
(readdir + rmdir vs. rmdir only), not that perfromance matters much in this case.

fyi, "remove_backup_group", the place where I think you copied this part, can use the
remove_dir_all safely because there's no check to made there, so no TOCTOU.

> +                format_err!(
> +                    "removing backup group directory {:?} failed - {}",
> +                    group_path,
> +                    err,
> +                )
> +            })?;
> +        }
> +
>          // the manifest does not exists anymore, we do not need to keep the lock
>          if let Ok(path) = self.manifest_lock_path(backup_dir) {
>              // ignore errors

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-11 12:20 ` Thomas Lamprecht
@ 2022-03-14  9:36   ` Wolfgang Bumiller
  2022-03-14 10:19     ` Thomas Lamprecht
  0 siblings, 1 reply; 10+ messages in thread
From: Wolfgang Bumiller @ 2022-03-14  9:36 UTC (permalink / raw)
  To: Thomas Lamprecht
  Cc: Proxmox Backup Server development discussion, Stefan Sterz

On Fri, Mar 11, 2022 at 01:20:22PM +0100, Thomas Lamprecht wrote:
> On 09.03.22 14:50, Stefan Sterz wrote:
> > Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
> > ---
> >  pbs-datastore/src/datastore.rs | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> > index d416c8d8..623b7688 100644
> > --- a/pbs-datastore/src/datastore.rs
> > +++ b/pbs-datastore/src/datastore.rs
> > @@ -346,6 +346,28 @@ impl DataStore {
> >                  )
> >              })?;
> >  
> > +        // check if this was the last snapshot and if so remove the group
> > +        if backup_dir
> > +            .group()
> > +            .list_backups(&self.base_path())?
> > +            .is_empty()
> > +        {
> 
> a log::info could be appropriate in the "success" (i.e., delete dir) case.
> 
> I'd factor the this block below out into a non-pub (or pub(crate)) remove_empty_group_dir fn.
> 
> > +            let group_path = self.group_path(backup_dir.group());
> > +            let _guard = proxmox_sys::fs::lock_dir_noblock(
> > +                &group_path,
> > +                "backup group",
> > +                "possible running backup",
> > +            )?;
> > +
> > +            std::fs::remove_dir_all(&group_path).map_err(|err| {
> 
> this is still unsafe as there's a TOCTOU race, the lock does not protects you from the
> following sequence with two threads/async-excutions t1 and t1
> 
> t1.1 snapshot deleted
> t1.2 empty dir check holds up, entering "delete group dir" code branch
> t2.1                                        create new snapshot in group -> lock group dir
> t2.2                                        finish new snapshot in group -> unlock group dir
> t1.3 lock group dir
> t1.4 delete all files, including the new snapshot made in-between.
> 
> Rather, just use the safer "remove_dir" variant, that way the TOCTOU race doesn't matters,
> the check merely becomes a short cut; if we'd explicitly check for
>   `err.kind() != ErrorKind::DirectoryNotEmpty
> and silent it we could even do away with the check, should result in the same amount of
> syscalls in the best-case (one rmdir vs. one readir) and can be better on success
> (readdir + rmdir vs. rmdir only), not that perfromance matters much in this case.
> 
> fyi, "remove_backup_group", the place where I think you copied this part, can use the
> remove_dir_all safely because there's no check to made there, so no TOCTOU.

Correct me if I'm wrong, but I think we need to rethink our locking
there in general. We can't lock the directory itself if we also want to
be allowed to delete it (same reasoning as with regular files):

-> A locks backup group
    -> B begins locking: opens dir handle
-> A deletes group, group is now gone
        -> C recreates the backup group, _locked_
-> A drops directory handle (& with it the lock)
    -> B acquries lock on deleted directory handle which works just fine

now B and C both think they're holding an exlusive lock

We *could* use a lock helper that also stats before and after the lock
(on the handle first, then on the *path* for the second one) to see if
the inode changed, to catch this...
Or we just live with empty directories or (hidden) lock files lingering.
(which would only be safe to clean up during a maintenance mode
operation).
Or we introduce a create/delete lock one level up, held only for the
duration of mkdir()/rmdir() calls.

(But in any case, all the current inline `lock_dir_noblock` calls should
instead go over a safe helper dealing with this properly...)




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-14  9:36   ` Wolfgang Bumiller
@ 2022-03-14 10:19     ` Thomas Lamprecht
  2022-03-14 11:13       ` Stefan Sterz
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Lamprecht @ 2022-03-14 10:19 UTC (permalink / raw)
  To: Wolfgang Bumiller
  Cc: Proxmox Backup Server development discussion, Stefan Sterz

On 14.03.22 10:36, Wolfgang Bumiller wrote:
> On Fri, Mar 11, 2022 at 01:20:22PM +0100, Thomas Lamprecht wrote:
>> On 09.03.22 14:50, Stefan Sterz wrote:
>>> Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
>>> ---
>>>  pbs-datastore/src/datastore.rs | 22 ++++++++++++++++++++++
>>>  1 file changed, 22 insertions(+)
>>>
>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>>> index d416c8d8..623b7688 100644
>>> --- a/pbs-datastore/src/datastore.rs
>>> +++ b/pbs-datastore/src/datastore.rs
>>> @@ -346,6 +346,28 @@ impl DataStore {
>>>                  )
>>>              })?;
>>>  
>>> +        // check if this was the last snapshot and if so remove the group
>>> +        if backup_dir
>>> +            .group()
>>> +            .list_backups(&self.base_path())?
>>> +            .is_empty()
>>> +        {
>>
>> a log::info could be appropriate in the "success" (i.e., delete dir) case.
>>
>> I'd factor the this block below out into a non-pub (or pub(crate)) remove_empty_group_dir fn.
>>
>>> +            let group_path = self.group_path(backup_dir.group());
>>> +            let _guard = proxmox_sys::fs::lock_dir_noblock(
>>> +                &group_path,
>>> +                "backup group",
>>> +                "possible running backup",
>>> +            )?;
>>> +
>>> +            std::fs::remove_dir_all(&group_path).map_err(|err| {
>>
>> this is still unsafe as there's a TOCTOU race, the lock does not protects you from the
>> following sequence with two threads/async-excutions t1 and t1
>>
>> t1.1 snapshot deleted
>> t1.2 empty dir check holds up, entering "delete group dir" code branch
>> t2.1                                        create new snapshot in group -> lock group dir
>> t2.2                                        finish new snapshot in group -> unlock group dir
>> t1.3 lock group dir
>> t1.4 delete all files, including the new snapshot made in-between.
>>
>> Rather, just use the safer "remove_dir" variant, that way the TOCTOU race doesn't matters,
>> the check merely becomes a short cut; if we'd explicitly check for
>>   `err.kind() != ErrorKind::DirectoryNotEmpty
>> and silent it we could even do away with the check, should result in the same amount of
>> syscalls in the best-case (one rmdir vs. one readir) and can be better on success
>> (readdir + rmdir vs. rmdir only), not that perfromance matters much in this case.
>>
>> fyi, "remove_backup_group", the place where I think you copied this part, can use the
>> remove_dir_all safely because there's no check to made there, so no TOCTOU.
> 
> Correct me if I'm wrong, but I think we need to rethink our locking
> there in general. We can't lock the directory itself if we also want to
> be allowed to delete it (same reasoning as with regular files):
> 
> -> A locks backup group
>     -> B begins locking: opens dir handle
> -> A deletes group, group is now gone
>         -> C recreates the backup group, _locked_
> -> A drops directory handle (& with it the lock)
>     -> B acquries lock on deleted directory handle which works just fine
> 
> now B and C both think they're holding an exlusive lock

hmm, reads as "can really happen" to me.

> 
> We *could* use a lock helper that also stats before and after the lock
> (on the handle first, then on the *path* for the second one) to see if
> the inode changed, to catch this...
> Or we just live with empty directories or (hidden) lock files lingering.
> (which would only be safe to clean up during a maintenance mode
> operation).
> Or we introduce a create/delete lock one level up, held only for the
> duration of mkdir()/rmdir() calls.

Or fstat the lock fd after and check for st_nlink > 0 to determine if it's
still existing (i.e., not deleted).

Or move locking into a tmpfs backed per-datastore directory, e.g. in run or
a separate mounted one, which would give several benefits:

 1. independent of FS issues w.r.t. flock like especially network FSs
    sometimes have
 2. faster locking as we won't produce (metadata) IO to a real block dev
 3. making locks independent of the actual datastore, avoiding above issue

Only thing we would need to check is setting the tmpfs inode count high
enough and avoid inode-per-directory limits, as we probably want to have
the locks flattened to a single hierarchy (avoids directory creation/owner
issues), at least if that's a problem at all for tmpfs (total inode count
for sure can be)

> 
> (But in any case, all the current inline `lock_dir_noblock` calls should
> instead go over a safe helper dealing with this properly...)





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-14 10:19     ` Thomas Lamprecht
@ 2022-03-14 11:13       ` Stefan Sterz
  2022-03-14 11:36         ` Thomas Lamprecht
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Sterz @ 2022-03-14 11:13 UTC (permalink / raw)
  To: Thomas Lamprecht, Wolfgang Bumiller
  Cc: Proxmox Backup Server development discussion

On 14.03.22 11:19, Thomas Lamprecht wrote:
> On 14.03.22 10:36, Wolfgang Bumiller wrote:
>> On Fri, Mar 11, 2022 at 01:20:22PM +0100, Thomas Lamprecht wrote:
>>> On 09.03.22 14:50, Stefan Sterz wrote:
>>>> Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
>>>> ---
>>>>  pbs-datastore/src/datastore.rs | 22 ++++++++++++++++++++++
>>>>  1 file changed, 22 insertions(+)
>>>>
>>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>>>> index d416c8d8..623b7688 100644
>>>> --- a/pbs-datastore/src/datastore.rs
>>>> +++ b/pbs-datastore/src/datastore.rs
>>>> @@ -346,6 +346,28 @@ impl DataStore {
>>>>                  )
>>>>              })?;
>>>>  
>>>> +        // check if this was the last snapshot and if so remove the group
>>>> +        if backup_dir
>>>> +            .group()
>>>> +            .list_backups(&self.base_path())?
>>>> +            .is_empty()
>>>> +        {
>>>
>>> a log::info could be appropriate in the "success" (i.e., delete dir) case.
>>>
>>> I'd factor the this block below out into a non-pub (or pub(crate)) remove_empty_group_dir fn.
>>>
>>>> +            let group_path = self.group_path(backup_dir.group());
>>>> +            let _guard = proxmox_sys::fs::lock_dir_noblock(
>>>> +                &group_path,
>>>> +                "backup group",
>>>> +                "possible running backup",
>>>> +            )?;
>>>> +
>>>> +            std::fs::remove_dir_all(&group_path).map_err(|err| {
>>>
>>> this is still unsafe as there's a TOCTOU race, the lock does not protects you from the
>>> following sequence with two threads/async-excutions t1 and t1
>>>
>>> t1.1 snapshot deleted
>>> t1.2 empty dir check holds up, entering "delete group dir" code branch
>>> t2.1                                        create new snapshot in group -> lock group dir
>>> t2.2                                        finish new snapshot in group -> unlock group dir
>>> t1.3 lock group dir
>>> t1.4 delete all files, including the new snapshot made in-between.
>>>
>>> Rather, just use the safer "remove_dir" variant, that way the TOCTOU race doesn't matters,
>>> the check merely becomes a short cut; if we'd explicitly check for
>>>   `err.kind() != ErrorKind::DirectoryNotEmpty
>>> and silent it we could even do away with the check, should result in the same amount of
>>> syscalls in the best-case (one rmdir vs. one readir) and can be better on success
>>> (readdir + rmdir vs. rmdir only), not that perfromance matters much in this case.
>>>

as discussed off list, this is not an option, because the directory
still contains the "owner" file at that point and, thus, is never
empty in this case (also DirectoryNotEmpty is not stabilized yet [1]).
one solution would be to lock and then check if we are deleting the
last group. however, that would still be affected by the locking issue
outlined below:

[1]: https://github.com/rust-lang/rust/issues/86442

>>> fyi, "remove_backup_group", the place where I think you copied this part, can use the
>>> remove_dir_all safely because there's no check to made there, so no TOCTOU.
>>
>> Correct me if I'm wrong, but I think we need to rethink our locking
>> there in general. We can't lock the directory itself if we also want to
>> be allowed to delete it (same reasoning as with regular files):
>>
>> -> A locks backup group
>>     -> B begins locking: opens dir handle
>> -> A deletes group, group is now gone
>>         -> C recreates the backup group, _locked_
>> -> A drops directory handle (& with it the lock)
>>     -> B acquries lock on deleted directory handle which works just fine
>>
>> now B and C both think they're holding an exlusive lock
> 
> hmm, reads as "can really happen" to me.
> 
>>
>> We *could* use a lock helper that also stats before and after the lock
>> (on the handle first, then on the *path* for the second one) to see if
>> the inode changed, to catch this...
>> Or we just live with empty directories or (hidden) lock files lingering.
>> (which would only be safe to clean up during a maintenance mode
>> operation).
>> Or we introduce a create/delete lock one level up, held only for the
>> duration of mkdir()/rmdir() calls.
> 
> Or fstat the lock fd after and check for st_nlink > 0 to determine if it's
> still existing (i.e., not deleted).
> 
> Or move locking into a tmpfs backed per-datastore directory, e.g. in run or
> a separate mounted one, which would give several benefits:
> 
>  1. independent of FS issues w.r.t. flock like especially network FSs
>     sometimes have
>  2. faster locking as we won't produce (metadata) IO to a real block dev
>  3. making locks independent of the actual datastore, avoiding above issue
> 
> Only thing we would need to check is setting the tmpfs inode count high
> enough and avoid inode-per-directory limits, as we probably want to have
> the locks flattened to a single hierarchy (avoids directory creation/owner
> issues), at least if that's a problem at all for tmpfs (total inode count
> for sure can be)
> 

how do we move forward on this issue? the changes proposed above sound
rather far reaching and not really connected to the bug that sparked
the original patch. it might make sense to break them out into their
own patch series and either fix the issue at hand (bug #3336) after it
has been applied. alternatively we could just remove the "owner" file
in a given group. this should fix the bug too and would not suffer
from the locking problem (as we would lock its parent directory), but
would leave empty directories behind. please advise :)

>>
>> (But in any case, all the current inline `lock_dir_noblock` calls should
>> instead go over a safe helper dealing with this properly...)
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-14 11:13       ` Stefan Sterz
@ 2022-03-14 11:36         ` Thomas Lamprecht
  2022-03-14 14:18           ` Stefan Sterz
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Lamprecht @ 2022-03-14 11:36 UTC (permalink / raw)
  To: Stefan Sterz, Wolfgang Bumiller
  Cc: Proxmox Backup Server development discussion

On 14.03.22 12:13, Stefan Sterz wrote:
> how do we move forward on this issue? the changes proposed above sound
> rather far reaching and not really connected to the bug that sparked
> the original patch. it might make sense to break them out into their
> own patch series and either fix the issue at hand (bug #3336) after it
> has been applied. alternatively we could just remove the "owner" file
> in a given group. this should fix the bug too and would not suffer
> from the locking problem (as we would lock its parent directory), but
> would leave empty directories behind. please advise 😄
> 

I reread the actual bug and it seems that if we're Ok with just deleting
the owner with the rather implicit reason of the last snapshot being
deleted, allowing another authid to "snatch up" that backup group ownership,
then just deleting the owner file would be the simplest solution.

I'm not against that, and I definitively agree with the bug report that
doing so is less work, but given how serious we honor the owner in general,
it feels a bit odd to just implicitly do so on a single snapshot deletion.

On the other hand, we also handle creation in a similar implicit matter,
so maybe I'm overthinking it and just removing it would actually be more
consistent/expected for users.

So, if you don't see a problem/issue with that approach and agree with
the last paragraph above feel free to go for deleting the owner file only.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-14 11:36         ` Thomas Lamprecht
@ 2022-03-14 14:18           ` Stefan Sterz
  2022-03-14 14:53             ` Thomas Lamprecht
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Sterz @ 2022-03-14 14:18 UTC (permalink / raw)
  To: Thomas Lamprecht, Wolfgang Bumiller
  Cc: Proxmox Backup Server development discussion

On 14.03.22 12:36, Thomas Lamprecht wrote:
> On 14.03.22 12:13, Stefan Sterz wrote:
>> how do we move forward on this issue? the changes proposed above sound
>> rather far reaching and not really connected to the bug that sparked
>> the original patch. it might make sense to break them out into their
>> own patch series and either fix the issue at hand (bug #3336) after it
>> has been applied. alternatively we could just remove the "owner" file
>> in a given group. this should fix the bug too and would not suffer
>> from the locking problem (as we would lock its parent directory), but
>> would leave empty directories behind. please advise 😄
>>
> 
> I reread the actual bug and it seems that if we're Ok with just deleting
> the owner with the rather implicit reason of the last snapshot being
> deleted, allowing another authid to "snatch up" that backup group ownership,
> then just deleting the owner file would be the simplest solution.
> 
> I'm not against that, and I definitively agree with the bug report that
> doing so is less work, but given how serious we honor the owner in general,
> it feels a bit odd to just implicitly do so on a single snapshot deletion.
> 
> On the other hand, we also handle creation in a similar implicit matter,
> so maybe I'm overthinking it and just removing it would actually be more
> consistent/expected for users.
> 
> So, if you don't see a problem/issue with that approach and agree with
> the last paragraph above feel free to go for deleting the owner file only.

for the most part i agree with you. i would also like to point out
that when a group is deleted (as in, not the last snapshot, but the
entire group at once) the owner is also implicitly removed (because
the entire group directory is removed). so in a way, we already delete
ownership information implicitly and the proposed solution would just
be consistent with that behavior.

however, i did some more digging and testing and it turns out that we
currently assume the owner file to be present when a group directory
exists. this affects not only sync jobs, but also verification and
more. thus, i would need to do quite a bit of refactoring to get this
to work and even more testing. so while this issue seemed simple
enough, as far as i can tell our current options are:

1. re-factor locking and remove the directory
2. re-factor how an empty group directory and the owner file is
treated
3. add "empty" groups to the gui

in light of this, taking the gui route is possibly the easiest option.
sorry, for not being aware of this earlier.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-14 14:18           ` Stefan Sterz
@ 2022-03-14 14:53             ` Thomas Lamprecht
  2022-03-14 15:19               ` Stefan Sterz
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Lamprecht @ 2022-03-14 14:53 UTC (permalink / raw)
  To: Stefan Sterz, Wolfgang Bumiller
  Cc: Proxmox Backup Server development discussion

On 14.03.22 15:18, Stefan Sterz wrote:
> On 14.03.22 12:36, Thomas Lamprecht wrote:
>> On the other hand, we also handle creation in a similar implicit matter,
>> so maybe I'm overthinking it and just removing it would actually be more
>> consistent/expected for users.
>>
>> So, if you don't see a problem/issue with that approach and agree with
>> the last paragraph above feel free to go for deleting the owner file only.
> 
> for the most part i agree with you. i would also like to point out
> that when a group is deleted (as in, not the last snapshot, but the
> entire group at once) the owner is also implicitly removed (because
> the entire group directory is removed). so in a way, we already delete
> ownership information implicitly and the proposed solution would just
> be consistent with that behavior.

I really do not think that's comparable or would count as implicit deletion
;-)

A user triggering a whole-group removal explicitly expects that all the
associated stuff gets removed too, including owner + group directory,
there's nothing to own after that.

Iow., the difference would be like:
`rm -rf group-dir/` vs. `rm -rf group-dir/snapshot-dir`

> 
> however, i did some more digging and testing and it turns out that we
> currently assume the owner file to be present when a group directory
> exists. this affects not only sync jobs, but also verification and
> more. thus, i would need to do quite a bit of refactoring to get this
> to work and even more testing. so while this issue seemed simple
> enough, as far as i can tell our current options are:
> > 1. re-factor locking and remove the directory
> 2. re-factor how an empty group directory and the owner file is
> treated

meh, not really liking this one as it could conflict with some assumptions.

> 3. add "empty" groups to the gui

Thinking more of it with past users-behavior in mind, I'd be surprised if we
then would get the bug report for not auto-removing this in one step ^^

> 
> in light of this, taking the gui route is possibly the easiest option.
> sorry, for not being aware of this earlier.

I mean, the locking problem Wolfgang pointed out already exists currently,
meaning that we either could:

1. stay ignorant (for now) and just delete the directory
2. fixing that up-front already as it has its own merits

I don't see 1. as _that_ problematic as the deletion of the last snapshot
always has to be a manual action, (auto)pruning will never cause that.
This would allow the assumption that the user/admin already took care
of periodic backup jobs before cleaning up stuff. But yeah, definitively
has a slight sour taste. Putting this on hold and see how we can best
improve the locking w.r.t. to full backup-dir removals would IMO be the
cleanest solution.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-14 14:53             ` Thomas Lamprecht
@ 2022-03-14 15:19               ` Stefan Sterz
  2022-03-14 17:12                 ` Thomas Lamprecht
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Sterz @ 2022-03-14 15:19 UTC (permalink / raw)
  To: Thomas Lamprecht, Wolfgang Bumiller
  Cc: Proxmox Backup Server development discussion

On 14.03.22 15:53, Thomas Lamprecht wrote:
> On 14.03.22 15:18, Stefan Sterz wrote:
>> On 14.03.22 12:36, Thomas Lamprecht wrote:
>>> On the other hand, we also handle creation in a similar implicit matter,
>>> so maybe I'm overthinking it and just removing it would actually be more
>>> consistent/expected for users.
>>>
>>> So, if you don't see a problem/issue with that approach and agree with
>>> the last paragraph above feel free to go for deleting the owner file only.

-- snip --

>>
>> however, i did some more digging and testing and it turns out that we
>> currently assume the owner file to be present when a group directory
>> exists. this affects not only sync jobs, but also verification and
>> more. thus, i would need to do quite a bit of refactoring to get this
>> to work and even more testing. so while this issue seemed simple
>> enough, as far as i can tell our current options are:
>> 1. re-factor locking and remove the directory
>> 2. re-factor how an empty group directory and the owner file is
>> treated
> 
> meh, not really liking this one as it could conflict with some assumptions.
> 
>> 3. add "empty" groups to the gui
> 
> Thinking more of it with past users-behavior in mind, I'd be surprised if we
> then would get the bug report for not auto-removing this in one step ^^
> 
> 
>>
>> in light of this, taking the gui route is possibly the easiest option.
>> sorry, for not being aware of this earlier.
> 
> I mean, the locking problem Wolfgang pointed out already exists currently,
> meaning that we either could:
> 
> 1. stay ignorant (for now) and just delete the directory
> 2. fixing that up-front already as it has its own merits
> 
> I don't see 1. as _that_ problematic as the deletion of the last snapshot
> always has to be a manual action, (auto)pruning will never cause that.
> This would allow the assumption that the user/admin already took care
> of periodic backup jobs before cleaning up stuff. But yeah, definitively
> has a slight sour taste. Putting this on hold and see how we can best
> improve the locking w.r.t. to full backup-dir removals would IMO be the
> cleanest solution.

agreed, would you mind if i open a bug for overhauling the locking
mechanism and started work on that? i looked a bit into your proposed
solution regarding tmpfs afaict there is no per directory inode limit,
only an overall limit corresponding to halve the physical memory
pages. we could use a completely flat structure based on either
encoded or hashed canonical paths. im assuming thats pretty close to
what you had in mind?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed
  2022-03-14 15:19               ` Stefan Sterz
@ 2022-03-14 17:12                 ` Thomas Lamprecht
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Lamprecht @ 2022-03-14 17:12 UTC (permalink / raw)
  To: Stefan Sterz, Wolfgang Bumiller
  Cc: Proxmox Backup Server development discussion

On 14.03.22 16:19, Stefan Sterz wrote:
> agreed, would you mind if i open a bug for overhauling the locking
> mechanism and started work on that? i looked a bit into your proposed

that's fine.

> solution regarding tmpfs afaict there is no per directory inode limit,
> only an overall limit corresponding to halve the physical memory
> pages. we could use a completely flat structure based on either
> encoded or hashed canonical paths. im assuming thats pretty close to
> what you had in mind?

Yeah, I'd use proxmox_sys::systemd::escape_unit for flattening lock paths
and ship a systemd .mount unit file setting up a separate tmpfs in a place
like /run/proxmox-backup/locks with a relatively high inode count (size
can/should? be relatively low)




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-14 17:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-09 13:50 [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove backup group if the last snapshot is removed Stefan Sterz
2022-03-11 12:20 ` Thomas Lamprecht
2022-03-14  9:36   ` Wolfgang Bumiller
2022-03-14 10:19     ` Thomas Lamprecht
2022-03-14 11:13       ` Stefan Sterz
2022-03-14 11:36         ` Thomas Lamprecht
2022-03-14 14:18           ` Stefan Sterz
2022-03-14 14:53             ` Thomas Lamprecht
2022-03-14 15:19               ` Stefan Sterz
2022-03-14 17:12                 ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal