From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <s.ivanov@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 4ABFB9380B
 for <pmg-devel@lists.proxmox.com>; Tue, 20 Feb 2024 12:10:40 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 2BCA25A73
 for <pmg-devel@lists.proxmox.com>; Tue, 20 Feb 2024 12:10:40 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pmg-devel@lists.proxmox.com>; Tue, 20 Feb 2024 12:10:39 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1DD4E43EE5
 for <pmg-devel@lists.proxmox.com>; Tue, 20 Feb 2024 12:10:39 +0100 (CET)
Date: Tue, 20 Feb 2024 12:10:35 +0100
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>
Cc: pmg-devel@lists.proxmox.com
Message-ID: <20240220121035.5b7f6889@rosa.proxmox.com>
In-Reply-To: <20240209125440.2572239-4-d.csapak@proxmox.com>
References: <20240209125440.2572239-1-d.csapak@proxmox.com>
 <20240209125440.2572239-4-d.csapak@proxmox.com>
X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.086 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [rulecache.pm, remove.pm]
Subject: Re: [pmg-devel] [PATCH pmg-api 03/12] RuleCache: reorganize how we
 gather marks and spaminfo
X-BeenThere: pmg-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Mail Gateway development discussion
 <pmg-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pmg-devel>, 
 <mailto:pmg-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pmg-devel/>
List-Post: <mailto:pmg-devel@lists.proxmox.com>
List-Help: <mailto:pmg-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pmg-devel>, 
 <mailto:pmg-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 20 Feb 2024 11:10:40 -0000

On Fri,  9 Feb 2024 13:54:27 +0100
Dominik Csapak <d.csapak@proxmox.com> wrote:

> instead of collecting the spaminfo (+match) seperately, collect this
> per target together with the regular marks. With this, we can omit the
> 'global' marks list, since each target has their own anyway.
> 
> We want this, since when we'll implement and/invert for matches, the marks
> can differ between targets, since the spamlevel can diverge for them and
> that can be and-combined with objects that add marks. For that to be
> possible we have to save each match + info per target instead of
> globally.
> 
> Since we don't change the actual matching behaviour with this patch,
> for the remove action, we can simply use the marks from the first target
> (as they currently have to be identical).
I don't think this premise holds - or rather the reasoning seems a bit off?

* marks are generated with what_matches
* global (not-per-part) matches are virus, spam - these just mark with an
  empty array-ref [] - indicating they affect the whole mail
* per-part what-matches are MatchField, and the content-type/filename
  matches - they add a list of all parts they match
* the only what_match that might differ per user/target is the spam-match,
  which marks the complete mail

marks are identical per rule across all targets, because the only place
where they could differ just pushes the contents of an empty array to the
list. 

(sorry if this sounds a bit pedantic - but it sadly took me 30 minutes
with Data::Dumper to get my head around this)

> 
> Conversely, we currently save the spaminfo per target, but later in
> pmg-smtp-filter we only ever use the first one we encounter, so instead
> save it only the first time and use that.
we currently get the spaminfo as part of the resulting hashref from
RuleCache::what_match, next to the only other member 'targets'.
Maybe we could return that as second value from what_match and save
ourselves the second level of nesting (see inline)
Please disregard if this becomes obsolete by one of the later patches

> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  src/PMG/RuleCache.pm     | 32 ++++++++++----------------------
>  src/PMG/RuleDB/Remove.pm | 19 +++++++++++++++----
>  src/bin/pmg-smtp-filter  | 18 +++++-------------
>  3 files changed, 30 insertions(+), 39 deletions(-)
> 
> diff --git a/src/PMG/RuleCache.pm b/src/PMG/RuleCache.pm
> index fd22a16..4f7ebe7 100644
> --- a/src/PMG/RuleCache.pm
> +++ b/src/PMG/RuleCache.pm
> @@ -304,37 +304,25 @@ sub what_match {
>      if (scalar($what->{groups}->@*) == 0) {
>  	# match all targets
>  	foreach my $target (@{$msginfo->{targets}}) {
> -	    $res->{$target}->{marks} = [];
> +	    $res->{targets}->{$target}->{marks} = [];
here this could become $res->{$target}->{marks}
>  	}
> -
> -	$res->{marks} = [];
>  	return $res;
>      }
>  
> -    my $marks;
> -
>      for my $group ($what->{groups}->@*) {
>  	for my $obj ($group->{objects}->@*) {
>  	    if (!$obj->can('what_match_targets')) {
>  		if (my $match = $obj->what_match($queue, $element, $msginfo, $dbh)) {
> -		    push @$marks, @$match;
> +		    for my $target ($msginfo->{targets}->@*) {
> +			push $res->{targets}->{$target}->{marks}->@*, $match->@*;
here as well

> +		    }
>  		}
> -	    }
> -	}
> -    }
> -
> -    foreach my $target (@{$msginfo->{targets}}) {
> -	$res->{$target}->{marks} = $marks;
> -	$res->{marks} = $marks;
> -    }
> -
> -    for my $group ($what->{groups}->@*) {
> -	for my $obj ($group->{objects}->@*) {
> -	    if ($obj->can ("what_match_targets")) {
> -		my $target_info;
> -		if ($target_info = $obj->what_match_targets($queue, $element, $msginfo, $dbh)) {
> -		    foreach my $k (keys %$target_info) {
> -			$res->{$k} = $target_info->{$k};
> +	    } else {
> +		if (my $target_info = $obj->what_match_targets($queue, $element, $msginfo, $dbh)) {
> +		    foreach my $k (keys $target_info->%*) {
> +			push $res->{targets}->{$k}->{marks}->@*, $target_info->{$k}->{marks}->@*;
and here
> +			# only save spaminfo once
> +			$res->{spaminfo} = $target_info->{$k}->{spaminfo} if !defined($res->{spaminfo});
this would need to be changed (and returned as second value below)

>  		    }
>  		}
>  	    }
> diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
> index e7c353c..5812602 100644
> --- a/src/PMG/RuleDB/Remove.pm
> +++ b/src/PMG/RuleDB/Remove.pm
> @@ -198,9 +198,15 @@ sub execute {
>  
>      my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>  
> -    if (!$self->{all} && ($#$marks == -1)) {
> -	# no marks
> -	return;
> +    if (!$self->{all}) {
> +	my $found_mark = 0;
> +	for my $target (keys $marks->{targets}->%*) {
> +	    if (scalar($marks->{targets}->{$target}->{marks}->@*) > 0) {
> +		$found_mark = 1;
> +		last;
> +	    }
> +	}
> +	return if !$found_mark;
>      }
>  
>      my $subgroups = $mod_group->subgroups ($targets);
> @@ -256,7 +262,12 @@ sub execute {
>  	}
>  
>  	$self->{message_seen} = 0;
> -	$self->delete_marked_parts($queue, $entity, $html, $rtype, $marks, $rulename);
> +
> +	# since all matches are or combinded, marks for all targets must be the same if they exist
> +	# so simply use the first one here
maybe "since currently all marks are equal for all targets, use the first
one"?
> +	my $match_marks = $marks->{targets}->{$tg->[0]}->{marks};
> +
> +	$self->delete_marked_parts($queue, $entity, $html, $rtype, $match_marks, $rulename);
>  	delete $self->{message_seen};
>  
>  	if ($msginfo->{testmode}) {
> diff --git a/src/bin/pmg-smtp-filter b/src/bin/pmg-smtp-filter
> index 7da3de8..71043b0 100755
> --- a/src/bin/pmg-smtp-filter
> +++ b/src/bin/pmg-smtp-filter
> @@ -276,8 +276,9 @@ sub apply_rules {
>  	foreach my $target (@{$msginfo->{targets}}) {
>  	    next if $final->{$target};
>  	    next if !defined ($rule_marks{$rule->{id}});
> -	    next if !defined ($rule_marks{$rule->{id}}->{$target});
> -	    next if !defined ($rule_marks{$rule->{id}}->{$target}->{marks});
> +	    next if !defined ($rule_marks{$rule->{id}}->{targets});
here you could get rid of this line - if the what_match returns the spaminfo as second value.

> +	    next if !defined ($rule_marks{$rule->{id}}->{targets}->{$target});
> +	    next if !defined ($rule_marks{$rule->{id}}->{targets}->{$target}->{marks});
and here get rid of {targets}->
>  	    next if !$rulecache->to_match ($rule->{id}, $target, $ldap);
>  
>  	    $final->{$target} = $fin;
> @@ -320,24 +321,15 @@ sub apply_rules {
>  	my $targets = $rule_targets{$rule->{id}};
>  	next if !$targets;
>  
> -	my $spaminfo;
> -	foreach my $t (@$targets) {
> -	    if ($rule_marks{$rule->{id}}->{$t} && $rule_marks{$rule->{id}}->{$t}->{spaminfo}) {
> -		$spaminfo = $rule_marks{$rule->{id}}->{$t}->{spaminfo};
> -		# we assume spam info is the same for all matching targets
> -		last;
> -	    }
> -	}
> -
>  	my $vars = $self->get_prox_vars (
> -	    $queue, $entity, $msginfo, $rule, $rule_targets{$rule->{id}}, $spaminfo);
> +	    $queue, $entity, $msginfo, $rule, $rule_targets{$rule->{id}}, $rule_marks{$rule->{id}}->{spaminfo});
>  
>  	my @sorted_actions = sort {$a->priority <=> $b->priority} @{$rule_actions{$rule->{id}}};
>  
>  	foreach my $action (@sorted_actions) {
>  	    $action->execute(
>  		$queue, $self->{ruledb}, $mod_group, $rule_targets{$rule->{id}}, $msginfo, $vars,
> -		$rule_marks{$rule->{id}}->{marks}, $ldap
> +		$rule_marks{$rule->{id}}, $ldap
>  	    );
>  	    last if $action->final;
>  	}