From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <s.sterz@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 0B9AA69CBC
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Mar 2022 12:13:55 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id E91E832E1
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Mar 2022 12:13:24 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id BBD3832D3
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Mar 2022 12:13:23 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8C14645938
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Mar 2022 12:13:23 +0100 (CET)
Message-ID: <738d037f-ed3c-db76-287f-5b6d37a3b7f3@proxmox.com>
Date: Mon, 14 Mar 2022 12:13:20 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.6.2
Content-Language: en-US
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
 Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
References: <20220309135031.1995207-1-s.sterz@proxmox.com>
 <717c8999-d3f8-a01b-a8f5-da0f5960d23f@proxmox.com>
 <20220314093617.n2mc2jv4k6ntzroo@wobu-vie.proxmox.com>
 <e5ddc12e-dc3a-6a5d-f1b4-242e118db85e@proxmox.com>
From: Stefan Sterz <s.sterz@proxmox.com>
In-Reply-To: <e5ddc12e-dc3a-6a5d-f1b4-242e118db85e@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.000 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
Subject: Re: [pbs-devel] [PATCH proxmox-backup] fix #3336: api: remove
 backup group if the last snapshot is removed
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 14 Mar 2022 11:13:55 -0000

On 14.03.22 11:19, Thomas Lamprecht wrote:
> On 14.03.22 10:36, Wolfgang Bumiller wrote:
>> On Fri, Mar 11, 2022 at 01:20:22PM +0100, Thomas Lamprecht wrote:
>>> On 09.03.22 14:50, Stefan Sterz wrote:
>>>> Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
>>>> ---
>>>>  pbs-datastore/src/datastore.rs | 22 ++++++++++++++++++++++
>>>>  1 file changed, 22 insertions(+)
>>>>
>>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>>>> index d416c8d8..623b7688 100644
>>>> --- a/pbs-datastore/src/datastore.rs
>>>> +++ b/pbs-datastore/src/datastore.rs
>>>> @@ -346,6 +346,28 @@ impl DataStore {
>>>>                  )
>>>>              })?;
>>>>  
>>>> +        // check if this was the last snapshot and if so remove the group
>>>> +        if backup_dir
>>>> +            .group()
>>>> +            .list_backups(&self.base_path())?
>>>> +            .is_empty()
>>>> +        {
>>>
>>> a log::info could be appropriate in the "success" (i.e., delete dir) case.
>>>
>>> I'd factor the this block below out into a non-pub (or pub(crate)) remove_empty_group_dir fn.
>>>
>>>> +            let group_path = self.group_path(backup_dir.group());
>>>> +            let _guard = proxmox_sys::fs::lock_dir_noblock(
>>>> +                &group_path,
>>>> +                "backup group",
>>>> +                "possible running backup",
>>>> +            )?;
>>>> +
>>>> +            std::fs::remove_dir_all(&group_path).map_err(|err| {
>>>
>>> this is still unsafe as there's a TOCTOU race, the lock does not protects you from the
>>> following sequence with two threads/async-excutions t1 and t1
>>>
>>> t1.1 snapshot deleted
>>> t1.2 empty dir check holds up, entering "delete group dir" code branch
>>> t2.1                                        create new snapshot in group -> lock group dir
>>> t2.2                                        finish new snapshot in group -> unlock group dir
>>> t1.3 lock group dir
>>> t1.4 delete all files, including the new snapshot made in-between.
>>>
>>> Rather, just use the safer "remove_dir" variant, that way the TOCTOU race doesn't matters,
>>> the check merely becomes a short cut; if we'd explicitly check for
>>>   `err.kind() != ErrorKind::DirectoryNotEmpty
>>> and silent it we could even do away with the check, should result in the same amount of
>>> syscalls in the best-case (one rmdir vs. one readir) and can be better on success
>>> (readdir + rmdir vs. rmdir only), not that perfromance matters much in this case.
>>>

as discussed off list, this is not an option, because the directory
still contains the "owner" file at that point and, thus, is never
empty in this case (also DirectoryNotEmpty is not stabilized yet [1]).
one solution would be to lock and then check if we are deleting the
last group. however, that would still be affected by the locking issue
outlined below:

[1]: https://github.com/rust-lang/rust/issues/86442

>>> fyi, "remove_backup_group", the place where I think you copied this part, can use the
>>> remove_dir_all safely because there's no check to made there, so no TOCTOU.
>>
>> Correct me if I'm wrong, but I think we need to rethink our locking
>> there in general. We can't lock the directory itself if we also want to
>> be allowed to delete it (same reasoning as with regular files):
>>
>> -> A locks backup group
>>     -> B begins locking: opens dir handle
>> -> A deletes group, group is now gone
>>         -> C recreates the backup group, _locked_
>> -> A drops directory handle (& with it the lock)
>>     -> B acquries lock on deleted directory handle which works just fine
>>
>> now B and C both think they're holding an exlusive lock
> 
> hmm, reads as "can really happen" to me.
> 
>>
>> We *could* use a lock helper that also stats before and after the lock
>> (on the handle first, then on the *path* for the second one) to see if
>> the inode changed, to catch this...
>> Or we just live with empty directories or (hidden) lock files lingering.
>> (which would only be safe to clean up during a maintenance mode
>> operation).
>> Or we introduce a create/delete lock one level up, held only for the
>> duration of mkdir()/rmdir() calls.
> 
> Or fstat the lock fd after and check for st_nlink > 0 to determine if it's
> still existing (i.e., not deleted).
> 
> Or move locking into a tmpfs backed per-datastore directory, e.g. in run or
> a separate mounted one, which would give several benefits:
> 
>  1. independent of FS issues w.r.t. flock like especially network FSs
>     sometimes have
>  2. faster locking as we won't produce (metadata) IO to a real block dev
>  3. making locks independent of the actual datastore, avoiding above issue
> 
> Only thing we would need to check is setting the tmpfs inode count high
> enough and avoid inode-per-directory limits, as we probably want to have
> the locks flattened to a single hierarchy (avoids directory creation/owner
> issues), at least if that's a problem at all for tmpfs (total inode count
> for sure can be)
> 

how do we move forward on this issue? the changes proposed above sound
rather far reaching and not really connected to the bug that sparked
the original patch. it might make sense to break them out into their
own patch series and either fix the issue at hand (bug #3336) after it
has been applied. alternatively we could just remove the "owner" file
in a given group. this should fix the bug too and would not suffer
from the locking problem (as we would lock its parent directory), but
would leave empty directories behind. please advise :)

>>
>> (But in any case, all the current inline `lock_dir_noblock` calls should
>> instead go over a safe helper dealing with this properly...)
>