From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <g.goller@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id A5EE8C693
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Aug 2023 11:33:22 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 7E784BBA
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Aug 2023 11:32:52 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Aug 2023 11:32:51 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7DDB2473DC
 for <pbs-devel@lists.proxmox.com>; Mon, 14 Aug 2023 11:32:51 +0200 (CEST)
Message-ID: <a352cc28-fa40-0b67-e5af-27615a0e563a@proxmox.com>
Date: Mon, 14 Aug 2023 11:32:50 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.1
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
References: <20230809101913.81818-1-g.goller@proxmox.com>
 <c7waqbj7iysa2qjmumsqtajd6sfh6fliknryy4njorg65t2tf7@nt3hanxfqw3u>
Content-Language: en-US
From: Gabriel Goller <g.goller@proxmox.com>
In-Reply-To: <c7waqbj7iysa2qjmumsqtajd6sfh6fliknryy4njorg65t2tf7@nt3hanxfqw3u>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.643 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -2.265 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH pathpatterns] match_list: added
 `matches_path()` function, which matches only the path
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Mon, 14 Aug 2023 09:33:22 -0000

On 8/11/23 10:26, Wolfgang Bumiller wrote:

> On Wed, Aug 09, 2023 at 12:19:12PM +0200, Gabriel Goller wrote:
>> Added `matches_path()` function, which only matches against the path and returns
>> an error if a file_mode pattern is found/needed in the matching list. This is
>> useful when we want to check if a file is excluded before running `stat()` on
>> the file to get the file_mode (which could fail).
>>
>> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
>> ---
>>   src/match_list.rs | 159 +++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 158 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/match_list.rs b/src/match_list.rs
>> index c5b14e0..acad328 100644
>> --- a/src/match_list.rs
>> +++ b/src/match_list.rs
>> @@ -1,6 +1,6 @@
>>   //! Helpers for include/exclude lists.
>> -
>>   use bitflags::bitflags;
>> +use std::fmt;
>>   
>>   use crate::PatternFlag;
>>   
>> @@ -39,6 +39,17 @@ impl Default for MatchFlag {
>>       }
>>   }
>>   
>> +#[derive(Debug, PartialEq)]
>> +pub struct FileModeRequiredForMatching;
> Let's shorten this to just `FileModeRequired` ;-)
Agree
>> +
>> +impl fmt::Display for FileModeRequiredForMatching {
>> +    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>> +        write!(f, "File mode is required for matching")
>> +    }
>> +}
>> +
>> +impl std::error::Error for FileModeRequiredForMatching {}
>> +
>>   /// A pattern entry. (Glob patterns or literal patterns.)
>>   // Note:
>>   // For regex we'd likely use the POSIX extended REs via `regexec(3)`, since we're targetting
>> @@ -304,12 +315,32 @@ impl MatchEntry {
>>   
>>           self.matches_path_exact(path)
>>       }
>> +
>> +    /// Check whether the path contains a matching suffix. Returns an error if a file mode is required.
>> +    pub fn matches_path<T: AsRef<[u8]>>(
>> +        &self,
>> +        path: T,
>> +    ) -> Result<bool, FileModeRequiredForMatching> {
>> +        self.matches_path_do(path.as_ref())
>> +    }
>> +
>> +    fn matches_path_do(&self, path: &[u8]) -> Result<bool, FileModeRequiredForMatching> {
>> +        if !self.flags.contains(MatchFlag::ANY_FILE_TYPE) {
>> +            return Err(FileModeRequiredForMatching);
>> +        }
>> +
>> +        Ok(self.matches_path_suffix_do(path))
>> +    }
>>   }
>>   
>>   #[doc(hidden)]
>>   pub trait MatchListEntry {
>>       fn entry_matches(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
>>       fn entry_matches_exact(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
>> +    fn entry_matches_path(
>> +        &self,
>> +        path: &[u8],
>> +    ) -> Result<Option<MatchType>, FileModeRequiredForMatching>;
>>   }
>>   
>>   impl MatchListEntry for &'_ MatchEntry {
>> @@ -328,6 +359,21 @@ impl MatchListEntry for &'_ MatchEntry {
>>               None
>>           }
>>       }
>> +
>> +    fn entry_matches_path(
>> +        &self,
>> +        path: &[u8],
>> +    ) -> Result<Option<MatchType>, FileModeRequiredForMatching> {
>> +        if let Ok(b) = self.matches_path(path) {
> This can just use `?`, it's the exact same error type after all.
> (Also `if let Ok` is generally best avoided since the 'else' branch
> discards the error, and if not, it's often a case like this where it can
> juse use '?' ;-) ).
>
> Effectively this could be as short as
>
>      Ok(self.matches_path(path)?.then(|| self.match_type()))
>
>> +            if b {
> As an additional hint: when you nest ifs around things you can match,
> you can just include both cases in the patterns:
>      match self.matches_path(path) {
>          Ok(true) => Ok(Some(self.match_type())),
>          Ok(false) => Ok(None),
>          Err(err) => Err(err),
>          // where this Err() case already tells you that you can use '?' instead
>      }
>
>> +                Ok(Some(self.match_type()))
>> +            } else {
>> +                Ok(None)
>> +            }
>> +        } else {
>> +            Err(FileModeRequiredForMatching)
>> +        }
>> +    }
>>   }
>>   
>>   impl MatchListEntry for &'_ &'_ MatchEntry {
>> @@ -346,6 +392,21 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
>>               None
>>           }
>>       }
>> +
>> +    fn entry_matches_path(
>> +        &self,
>> +        path: &[u8],
>> +    ) -> Result<Option<MatchType>, FileModeRequiredForMatching> {
> same
>
>> +        if let Ok(b) = self.matches_path(path) {
>> +            if b {
>> +                Ok(Some(self.match_type()))
>> +            } else {
>> +                Ok(None)
>> +            }
>> +        } else {
>> +            Err(FileModeRequiredForMatching)
>> +        }
>> +    }
>>   }
>>   
>>   /// This provides [`matches`](MatchList::matches) and [`matches_exact`](MatchList::matches_exact)
>> @@ -374,6 +435,20 @@ pub trait MatchList {
>>       }
>>   
>>       fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
>> +
>> +    /// Check whether this list contains anything exactly matching the path, returns error if
>> +    /// `file_mode` is required for exact matching.
>> +    fn matches_path<T: AsRef<[u8]>>(
>> +        &self,
>> +        path: T,
>> +    ) -> Result<Option<MatchType>, FileModeRequiredForMatching> {
>> +        self.matches_path_do(path.as_ref())
>> +    }
>> +
>> +    fn matches_path_do(
>> +        &self,
>> +        path: &[u8],
>> +    ) -> Result<Option<MatchType>, FileModeRequiredForMatching>;
>>   }
>>   
>>   impl<'a, T> MatchList for T
>> @@ -408,6 +483,24 @@ where
>>   
>>           None
>>       }
>> +
>> +    fn matches_path_do(
>> +        &self,
>> +        path: &[u8],
>> +    ) -> Result<Option<MatchType>, FileModeRequiredForMatching> {
> Given the amount of tiny match helpers we run through with those 2
> traits already I wonder if we should just make a breaking change here
> instead and only have the versions with the `Result` while users (or
> defaulted helpers in the trait) just pass `file_mode.unwrap_or(!0)`
> (since a mode of 0 should match anything ;-) ).
>
> But I don't have any strong feelings about this, so either way is fine
> with me.

Hmm, I don't get what you mean by the `versions with the Result`?
Do you mean we should just modify the `matches()` and `matches_exact()` 
function to
return a `Result` if the mode is `ANY_FILE_TYPE` and we need one (a file 
mode) to match?

>> +        // This is an &self method on a `T where T: 'a`.
>> +        let this: &'a Self = unsafe { std::mem::transmute(self) };
>> +
>> +        for m in this.into_iter().rev() {
>> +            if let Ok(mt) = m.entry_matches_path(path) {
> IIRC the intention was actually to immediately fail if we hit a pattern
> with a file mode, since we wouldn't be able to tell if it would already
> exclude the file, otherwise we really *could* just skip this entirely
> and have a failing stat() call just use `Some(!0)` as file mode.
>
> Basically, if the user runs into an inaccessible file they have to
> append `--exclude=/that/one/file` to the CLI invocation to fix it,
> whereas otherwise they'd still get an error.
>
> So this should just be
>
>      if let Some(mt) = m.entry_matches_path(path)? {
>          return Some(mt);
>      }

Hmm, my solution would have been to traverse all the patterns with a 
`ANY_FILE_TYPE` mode and
if there is a match, nice, return. If there is no match, check if there 
are other patterns without
the `ANY_FILE_TYPE` mode (then we need to fail because we don't know the 
`file_mode` yet...) or
return nothing if there are no patterns left. I don't know if this is 
the correct approach though...

>> +                if mt.is_some() {
>> +                    return Ok(mt);
>> +                }
>> +            }
>> +        }
>> +
>> +        Err(FileModeRequiredForMatching)
> Also this is wrong. If nothing matches, nothing matches ;-) (just like
> in the other matching variants).
> Which immediately tells you that skipping the error above doesn't make
> much sense :-) )
Yes, you are right, corrected it.