From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <g.goller@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id A746EAE2D
 for <pbs-devel@lists.proxmox.com>; Fri,  8 Sep 2023 14:56:19 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 90A7012159
 for <pbs-devel@lists.proxmox.com>; Fri,  8 Sep 2023 14:56:19 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Fri,  8 Sep 2023 14:56:18 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4A7E843276
 for <pbs-devel@lists.proxmox.com>; Fri,  8 Sep 2023 14:56:18 +0200 (CEST)
Message-ID: <318f786b-a715-d99b-7819-1b25899bb6ac@proxmox.com>
Date: Fri, 8 Sep 2023 14:56:17 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.15.0
Content-Language: en-US
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
 Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
References: <20230905093747.54879-1-g.goller@proxmox.com>
 <482af171-fcb0-4dc6-b662-66d1511fc4a3@proxmox.com>
 <d014a127-fb73-63c5-6707-af897a82946d@proxmox.com>
 <4111ea6e-b478-488e-b954-d6d9f3add1d4@proxmox.com>
From: Gabriel Goller <g.goller@proxmox.com>
In-Reply-To: <4111ea6e-b478-488e-b954-d6d9f3add1d4@proxmox.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.319 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -1.473 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore
 vanished files when walking directory
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 08 Sep 2023 12:56:19 -0000


On 9/8/23 11:36, Thomas Lamprecht wrote:
> On 08/09/2023 09:41, Gabriel Goller wrote:
>>> that's some deep indentation level.. not a must, but maybe you find
>>> some good/simple way to refactor some of this to make it a bit less
>>> crowded here (if, then in a separate patch please)
>> I could check if `err` is an `io:Error`, thus returning early. Then calling `.unwrap()`
>> to get the actual `io::Error` later on.
> I re-checked the whole code and pushed a clean-up that reduces the
> indentation level and also code lines. Can you please re-check on that
> if to ensure I did not missed something by mistake?
>
> Oh, and while checking this out I noticed something odd, i.e., that we
> silence any error that isn't a io::Error one,  as of Walkdir::Error
> rustdocs [0] that can only happen if there's a (symlink) loop, but
> still. So I checked the history and it seems that I caused this way
> back in 2020 by the original addition of the "ignore lost+found" EPERM
> commit [1].
> But just making that an error again now might be a breaking change,
> depending on what effect symlink loops currently actually have on GC.
> Can you test this please? If it does makes GC endlessly loop then we
> can just introduce the error again, as it's now broken already anyway.
> But if walkdir breaks the loop, or the like, then we might just make
> that an log with warning level. As we do not enable WalkDir's opt-in
> follow_symlinks it might not matter after all as the Loop error
> shouldn't be triggered if symlinks aren't followed, but still good
> to test and be certain.
>
> [0]: https://rust.velas.com/walkdir/struct.Error.html#method.io_error
> [1]: c3b090ac ("backup: list images: handle walkdir error, catch "lost+found"")

We don't follow symbolic links when traversing files.
`walkdir` has a builder function `follow_links()` with which you can
select if you want to follow symbolic links or not, the default is no [0].

[0]: 
https://docs.rs/walkdir/latest/walkdir/struct.WalkDir.html#method.follow_links 

>>>> +                            // only allow to skip ext4 fsck directory, avoid GC if, for example,
>>>> +                            // a user got file permissions wrong on datastore rsync to new server
>>>> +                            if err.depth() > 1 || !path.ends_with("lost+found") {
>>>> +                                bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
>>>> +                            }
>>>> +                        }
>>>> +                        io::ErrorKind::NotFound => {
>>>> +                            // ignore vanished file
>>> would be still good to log that here, at least at debug level
>>> if it can be noisy; but as there wasn't many that run into this
>>> in the four years of PBS existing I'd guess a always visible
>>> level is fine as long as the log message doesn't sounds scary.
>> How about a "ignoring/skipping vanished file: {path}" on the info log level?
> Yeah, something like that, but I'd use just one of "ignoring" or
> "skipping"
>
> Oh, and I would also mention in the commit message that backups that
> are just being created (where a atomic rename from e.g. a tmpdir to
> the actual one might also cause a ENOENT) are not problematic here,
> as GC is handling those explicitly anyway.
Submitted a new patch.