public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Wagner <l.wagner@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>,
	Philipp Hufnagl <p.hufnagl@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v4 1/3] fix #4315: jobs: modify GroupFilter so include/exclude is tracked
Date: Thu, 14 Dec 2023 17:22:47 +0100	[thread overview]
Message-ID: <1f3aa0a6-9cad-4e39-8ac3-5c75262d2f07@proxmox.com> (raw)
In-Reply-To: <20231204100414.152770-2-p.hufnagl@proxmox.com>

Hi Philipp,

some comments inline :)

On 12/4/23 11:04, Philipp Hufnagl wrote:
> After some discussion I canged the include/exclude behavior to first run
> all include filter and after that all exclude filter (rather then
> allowing to alternate inbetween). This is simply done by sorting the
> list (include first) before executing it.
> 
> Since a GroupFilter now also features an behavior, the Struct has been
> renamed To GroupType (since simply type is a keyword). The new
> GroupFilter now has a behaviour as a flag 'is_exclude'.
> 
> I considered calling it 'is_include' but a reader later then might not
> know what the opposite of 'include' is (do not include?  deactivate?). I
> also considered making a new enum 'behaviour' but since there are only 2
> values I considered it over engeneered.
> 
> Matching a filter will now iterate with a forech loop in order to also
> exclude matches.
> 
> Signed-off-by: Philipp Hufnagl <p.hufnagl@proxmox.com>
> ---
>   pbs-api-types/src/datastore.rs | 11 +++---
>   pbs-api-types/src/jobs.rs      | 64 +++++++++++++++++++++++++++-------
>   src/api2/pull.rs               | 11 +++++-
>   src/api2/tape/backup.rs        | 17 +++++++--
>   src/server/pull.rs             | 23 +++++++++---
>   5 files changed, 99 insertions(+), 27 deletions(-)
> 
> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
> index d4ead1d1..c8f26b57 100644
> --- a/pbs-api-types/src/datastore.rs
> +++ b/pbs-api-types/src/datastore.rs
> @@ -843,17 +843,16 @@ impl BackupGroup {
>       }
>   
>       pub fn matches(&self, filter: &crate::GroupFilter) -> bool {
> -        use crate::GroupFilter;
> -
> -        match filter {
> -            GroupFilter::Group(backup_group) => {
> +        use crate::FilterType;
> +        match &filter.filter_type {
> +            FilterType::Group(backup_group) => {
>                   match backup_group.parse::<BackupGroup>() {
>                       Ok(group) => *self == group,
>                       Err(_) => false, // shouldn't happen if value is schema-checked
>                   }
>               }
> -            GroupFilter::BackupType(ty) => self.ty == *ty,
> -            GroupFilter::Regex(regex) => regex.is_match(&self.to_string()),
> +            FilterType::BackupType(ty) => self.ty == *ty,
> +            FilterType::Regex(regex) => regex.is_match(&self.to_string()),
>           }
>       }
>   }
> diff --git a/pbs-api-types/src/jobs.rs b/pbs-api-types/src/jobs.rs
> index 1f5b3cf1..dff02395 100644
> --- a/pbs-api-types/src/jobs.rs
> +++ b/pbs-api-types/src/jobs.rs
> @@ -3,6 +3,7 @@ use std::str::FromStr;
>   
>   use regex::Regex;
>   use serde::{Deserialize, Serialize};
> +use std::cmp::Ordering;
>   
>   use proxmox_schema::*;
>   
> @@ -388,7 +389,7 @@ pub struct TapeBackupJobStatus {
>   
>   #[derive(Clone, Debug)]
>   /// Filter for matching `BackupGroup`s, for use with `BackupGroup::filter`.
> -pub enum GroupFilter {
> +pub enum FilterType {
>       /// BackupGroup type - either `vm`, `ct`, or `host`.
>       BackupType(BackupType),
>       /// Full identifier of BackupGroup, including type
> @@ -397,7 +398,7 @@ pub enum GroupFilter {
>       Regex(Regex),
>   }
>   
> -impl PartialEq for GroupFilter {
> +impl PartialEq for FilterType {
>       fn eq(&self, other: &Self) -> bool {
>           match (self, other) {
>               (Self::BackupType(a), Self::BackupType(b)) => a == b,
> @@ -408,27 +409,64 @@ impl PartialEq for GroupFilter {
>       }
>   }
>   
> +#[derive(Clone, Debug)]
> +pub struct GroupFilter {
> +    pub is_exclude: bool,
> +    pub filter_type: FilterType,
> +}
> +
> +impl PartialEq for GroupFilter {
> +    fn eq(&self, other: &Self) -> bool {
> +        self.filter_type == other.filter_type && self.is_exclude == other.is_exclude
> +    }
> +} > +
> +impl Eq for GroupFilter {}
> +
> +impl PartialOrd for GroupFilter {
> +    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
> +        self.is_exclude.partial_cmp(&other.is_exclude)
> +    }
> + > +
> +impl Ord for GroupFilter {
> +    fn cmp(&self, other: &Self) -> Ordering {
> +        self.is_exclude.cmp(&other.is_exclude)
> +    }
> +}

Having Ord/ParitalOrd based on the exclude flag is extremely confusing, 
please don't do this.

See later comments for a less confusing way.

> +
>   impl std::str::FromStr for GroupFilter {
>       type Err = anyhow::Error;
>   
>       fn from_str(s: &str) -> Result<Self, Self::Err> {
> -        match s.split_once(':') {
> -            Some(("group", value)) => BACKUP_GROUP_SCHEMA.parse_simple_value(value).map(|_| GroupFilter::Group(value.to_string())),
> -            Some(("type", value)) => Ok(GroupFilter::BackupType(value.parse()?)),
> -            Some(("regex", value)) => Ok(GroupFilter::Regex(Regex::new(value)?)),
> +        let (is_exclude, type_str) = match s.split_once(':') {
> +            Some(("include", value)) => (false, value),
> +            Some(("exclude", value)) => (true, value),
> +            _ => (false, s),
> +        };
> +
> +        let filter_type = match type_str.split_once(':') {
> +            Some(("group", value)) => BACKUP_GROUP_SCHEMA.parse_simple_value(value).map(|_| FilterType::Group(value.to_string())),
> +            Some(("type", value)) => Ok(FilterType::BackupType(value.parse()?)),
> +            Some(("regex", value)) => Ok(FilterType::Regex(Regex::new(value)?)),
>               Some((ty, _value)) => Err(format_err!("expected 'group', 'type' or 'regex' prefix, got '{}'", ty)),
>               None => Err(format_err!("input doesn't match expected format '<group:GROUP||type:<vm|ct|host>|regex:REGEX>'")),
> -        }.map_err(|err| format_err!("'{}' - {}", s, err))
> +        }?;

Is there a reason why you change the error format here?


> +        Ok(GroupFilter {
> +            is_exclude,
> +            filter_type,
> +        })
>       }
>   }
>   
>   // used for serializing below, caution!
>   impl std::fmt::Display for GroupFilter {
>       fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> -        match self {
> -            GroupFilter::BackupType(backup_type) => write!(f, "type:{}", backup_type),
> -            GroupFilter::Group(backup_group) => write!(f, "group:{}", backup_group),
> -            GroupFilter::Regex(regex) => write!(f, "regex:{}", regex.as_str()),
> +        let exclude = if self.is_exclude { "exclude:" } else { "" };
> +        match &self.filter_type {
> +            FilterType::BackupType(backup_type) => write!(f, "{}type:{}", exclude, backup_type),
> +            FilterType::Group(backup_group) => write!(f, "{}group:{}", exclude, backup_group),
> +            FilterType::Regex(regex) => write!(f, "{}regex:{}", exclude, regex.as_str()),
>           }
>       }
>   }
> @@ -441,9 +479,9 @@ fn verify_group_filter(input: &str) -> Result<(), anyhow::Error> {
>   }
>   
>   pub const GROUP_FILTER_SCHEMA: Schema = StringSchema::new(
> -    "Group filter based on group identifier ('group:GROUP'), group type ('type:<vm|ct|host>'), or regex ('regex:RE').")
> +    "Group filter based on group identifier ('group:GROUP'), group type ('type:<vm|ct|host>'), or regex ('regex:RE'). Can be inverted by adding 'exclude:' before.")

'adding ... before' sounds a bit odd - maybe "Can be inverted by 
prepending 'exclude:'" would be better here?
Also 'include' is not documented here.

>       .format(&ApiStringFormat::VerifyFn(verify_group_filter))
> -    .type_text("<type:<vm|ct|host>|group:GROUP|regex:RE>")
> +    .type_text("[<exclude:|include:>]<type:<vm|ct|host>|group:GROUP|regex:RE>")
>       .schema();
>   
>   pub const GROUP_FILTER_LIST_SCHEMA: Schema =
> diff --git a/src/api2/pull.rs b/src/api2/pull.rs
> index eb9a2199..f174926c 100644
> --- a/src/api2/pull.rs
> +++ b/src/api2/pull.rs
> @@ -72,6 +72,15 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
>       type Error = Error;
>   
>       fn try_from(sync_job: &SyncJobConfig) -> Result<Self, Self::Error> {
> +        let filters = match &sync_job.group_filter {
> +            Some(v) => {
> +                let mut f = v.clone();
> +                f.sort();
> +                Some(f)
> +            }
> +            None => None,
> +        };
> +

I don't think that .sort()'ing is a good way to separate include/exclude 
groups. PartialEq/PartialOrd/Ord being only based on the exclude flag is 
extremely confusing.

Rather split the GroupFilter into two groups manually via a helper 
(since you need to do it in multiple places), based on the exclude flag.
Then, first process the includes and subtract the excludes afterwards.
I'd do that at [1].


>           PullParameters::new(
>               &sync_job.store,
>               sync_job.ns.clone().unwrap_or_default(),
> @@ -85,7 +94,7 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
>                   .clone(),
>               sync_job.remove_vanished,
>               sync_job.max_depth,
> -            sync_job.group_filter.clone(),
> +            filters,
>               sync_job.limit.clone(),
>               sync_job.transfer_last,
>           )
> diff --git a/src/api2/tape/backup.rs b/src/api2/tape/backup.rs
> index 2f9385a7..80dcdd1d 100644
> --- a/src/api2/tape/backup.rs
> +++ b/src/api2/tape/backup.rs
> @@ -412,14 +412,25 @@ fn backup_worker(
>       group_list.sort_unstable_by(|a, b| a.group().cmp(b.group()));
>   
>       let (group_list, group_count) = if let Some(group_filters) = &setup.group_filter {
> -        let filter_fn = |group: &BackupGroup, group_filters: &[GroupFilter]| {
> -            group_filters.iter().any(|filter| group.matches(filter))
> +        let filter_fn = |group: &BackupGroup, group_filters: &[GroupFilter], start_with: bool| {
> +            let mut is_match = start_with;
> +            for filter in group_filters.iter() {
I think calling .iter() is not necessary here.

> +                if group.matches(filter) {
> +                    is_match = !filter.is_exclude;
> +                }
> +            }
> +            is_match
>           };
>   
>           let group_count_full = group_list.len();
> +        // if there are only exclude filter, inculude everything

Typo in 'include'
> +        let mut include_all = false;
> +        if !group_filters.is_empty() || group_filters.first().unwrap().is_exclude {
> +            include_all = true;
> +        }
I think the logic is off here.

If group_filters only includes INCLUDE filters, we only want to include 
those groups.

So .is_empty() returns false and we invert that, we set include_all to 
true... which is not what we want.

Just to illustrate the different cases:
   - no filters: All groups
   - only include filters: ONLY the included ones
   - only exclude filters: ALL BUT the excluded ones
   - both: ONLY the included ones, minus the excluded ones

  ----

[1]: I would split the GroupFilters into includes/excludes here.

>           let list: Vec<BackupGroup> = group_list
>               .into_iter()
> -            .filter(|group| filter_fn(group, group_filters))
> +            .filter(|group| filter_fn(group, group_filters, include_all))
>               .collect();
>           let group_count = list.len();
>           task_log!(
> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index 3b71c156..027194a1 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -1368,15 +1368,26 @@ pub(crate) async fn pull_ns(
>           }
>       });
>   
> -    let apply_filters = |group: &BackupGroup, filters: &[GroupFilter]| -> bool {
> -        filters.iter().any(|filter| group.matches(filter))
> +    let apply_filters = |group: &BackupGroup, filters: &[GroupFilter], start_with: bool| -> bool {
> +        let mut is_match = start_with;
> +        for filter in filters.iter() {
> +            if group.matches(filter) {
> +                is_match = !filter.is_exclude;
> +            }
> +        }
> +        is_match
>       };
>   
>       let list = if let Some(ref group_filter) = &params.group_filter {
> +        // if there are only exclude filter, inculude everything
> +        let mut include_all = false;
> +        if !group_filter.is_empty() || group_filter.first().unwrap().is_exclude {
> +            include_all = true;
> +        }

Same logic error here.

>           let unfiltered_count = list.len();
>           let list: Vec<BackupGroup> = list
>               .into_iter()
> -            .filter(|group| apply_filters(group, group_filter))
> +            .filter(|group| apply_filters(group, group_filter, include_all))
>               .collect();
>           task_log!(
>               worker,
> @@ -1458,7 +1469,11 @@ pub(crate) async fn pull_ns(
>                       continue;
>                   }
>                   if let Some(ref group_filter) = &params.group_filter {
> -                    if !apply_filters(local_group, group_filter) {
> +                    let mut include_all = false;
> +                    if !group_filter.is_empty() || group_filter.first().unwrap().is_exclude {
> +                        include_all = true;
> +                    }

Same logic error here.
> +                    if !apply_filters(local_group, group_filter, include_all) {
>                           continue;
>                       }
>                   }

-- 
- Lukas




  reply	other threads:[~2023-12-14 16:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 10:04 [pbs-devel] [PATCH proxmox-backup v4 0/3] fix #4315: datastore: Exclude entries from sync Philipp Hufnagl
2023-12-04 10:04 ` [pbs-devel] [PATCH proxmox-backup v4 1/3] fix #4315: jobs: modify GroupFilter so include/exclude is tracked Philipp Hufnagl
2023-12-14 16:22   ` Lukas Wagner [this message]
2023-12-15  9:44   ` Lukas Wagner
2023-12-15  9:47     ` Philipp Hufnagl
2023-12-04 10:04 ` [pbs-devel] [PATCH proxmox-backup v4 2/3] ui: Show if Filter includes or excludes Philipp Hufnagl
2023-12-04 10:04 ` [pbs-devel] [PATCH proxmox-backup v4 3/3] docs: document new include/exclude paramenter Philipp Hufnagl
2023-12-14 16:22 ` [pbs-devel] [PATCH proxmox-backup v4 0/3] fix #4315: datastore: Exclude entries from sync Lukas Wagner
2023-12-15  8:45   ` Philipp Hufnagl
  -- strict thread matches above, loose matches on Subject: below --
2023-11-28 14:34 Philipp Hufnagl
2023-11-28 14:34 ` [pbs-devel] [PATCH proxmox-backup v4 1/3] fix #4315: jobs: modify GroupFilter so include/exclude is tracked Philipp Hufnagl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f3aa0a6-9cad-4e39-8ac3-5c75262d2f07@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=p.hufnagl@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal