public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: Stoiko Ivanov <s.ivanov@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database
Date: Mon, 14 Nov 2022 15:36:13 +0100	[thread overview]
Message-ID: <d9b5e2ba-6e1c-5cb2-4564-22e685b9d8f2@proxmox.com> (raw)
In-Reply-To: <20221109182728.629576-4-s.ivanov@proxmox.com>

misin

comments inline:

On 11/9/22 19:27, Stoiko Ivanov wrote:
> This patch adds support for storing rule names, comments(info), and
> most relevant values (e.g. the header content to match) in utf-8 in
> the database.
> 
> backwards-compatibility should not be an issue:
> * following the argumentation from commit
>    43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage
> * we only need to consider that the valid multibyte utf-8 characters
>    do not really yield sensible combinations of single-byte characters
>    (starting with a byte > 127 - e.g. "£")
> 
> the database is created with SQL_ASCII encoding - which behaves by
> interpreting bytes <= 127 as ascii and those > 127 are not interpreted
> (see [0], which just means that we have to explicitly en-/decode upon
> storing/reading from there)
> 
> This patch currently omits most Who objects:
> * for email/domain we'd still need to consider how to store them
>    (puny-code for the domain part, or everything as UTF-8) and it would
>    need changes to the API-types.
> * the LDAP objects currently would not work too well, since our LDAPCache
>    is not UTF-8 safe - and fixing warants its own patch-series
> * WhoRegex should work and be able to handle many use-cases
> 
> The ContentType values should also contain only ascii characters per
> RFC6838 [1] and RFC2045 [2].
> 
> [0] https://www.postgresql.org/docs/13/multibyte.html
> [1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
> [2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1
> 
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>   src/PMG/RuleDB.pm               | 23 ++++++++++++++++-------
>   src/PMG/RuleDB/Accept.pm        |  2 +-
>   src/PMG/RuleDB/BCC.pm           |  2 +-
>   src/PMG/RuleDB/Block.pm         |  2 +-
>   src/PMG/RuleDB/Disclaimer.pm    |  2 +-
>   src/PMG/RuleDB/Group.pm         |  4 ++--
>   src/PMG/RuleDB/MatchField.pm    |  6 +++++-
>   src/PMG/RuleDB/MatchFilename.pm |  5 ++++-
>   src/PMG/RuleDB/ModField.pm      |  6 ++++--
>   src/PMG/RuleDB/Notify.pm        |  2 +-
>   src/PMG/RuleDB/Quarantine.pm    |  3 ++-
>   src/PMG/RuleDB/Remove.pm        | 12 +++++++-----
>   src/PMG/RuleDB/Rule.pm          |  2 +-
>   src/PMG/RuleDB/WhoRegex.pm      |  5 ++++-
>   src/PMG/Utils.pm                |  5 +++++
>   15 files changed, 55 insertions(+), 26 deletions(-)
> 
> diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
> index 895acc6..9d6d99d 100644
> --- a/src/PMG/RuleDB.pm
> +++ b/src/PMG/RuleDB.pm
> @@ -5,6 +5,7 @@ use warnings;
>   use DBI;
>   use HTML::Entities;
>   use Data::Dumper;
> +use Encode qw(encode decode);
>   
>   use PVE::SafeSyslog;
>   
> @@ -72,7 +73,8 @@ sub create_group_with_obj {
>   
>       $name //= '';
>       $info //= '';
> -
> +    $name = encode('UTF-8', $name);
> +    $info = encode('UTF-8',$info);

nit: spacing and we could combine with the lines above:

$name = encode('UTF-8', $name // '');
...


>       eval {
>   
>   	$self->{dbh}->begin_work;
> @@ -174,7 +176,9 @@ sub save_group {
>   	$self->{dbh}->do("UPDATE Objectgroup " .
>   			 "SET Name = ?, Info = ? " .
>   			 "WHERE ID = ?", undef,
> -			 $og->{name}, $og->{info}, $og->{id});
> +			 encode('UTF-8', $og->{name}),
> +			 encode('UTF-8', $og->{info}),
> +			 $og->{id});
>   
>   	return $og->{id};
>   
> @@ -183,7 +187,7 @@ sub save_group {
>   	    "INSERT INTO Objectgroup (Name, Info, Class) " .
>   	    "VALUES (?, ?, ?);");
>   
> -	$sth->execute($og->name, $og->info, $og->class);
> +	$sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
>   
>   	return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
>       }
> @@ -212,7 +216,9 @@ sub delete_group {
>   	$sth->execute($groupid);
>   
>   	if (my $ref = $sth->fetchrow_hashref()) {
> -	    die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
> +	    my $groupname = PMG::Utils::try_deocode_utf8($ref->{groupname});
> +	    my $rulename = PMG::Utils::try_deocode_utf8($ref->{rulename});

typo: deocode vs decode

> +	    die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
>   	}
>   
>           $sth->finish();
> @@ -474,6 +480,7 @@ sub load_object_full {
>   sub load_group_by_name {
>       my ($self, $name) = @_;
>   
> +    $name = PMG::Utils::try_decode_utf8($name);

isn't this wrong?
we encode the name before writing into the database, so we nave to
encode it again when getting it out of the db, no?

otoh, after a short grep, we never call 'load_group_by_name' anyway,
so we could just delete

>       my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
>   				    "WHERE name = ?");
>   
> @@ -598,13 +605,14 @@ sub save_rule {
>       defined($rule->{direction}) ||
>   	die "undefined rule attribute - direction: ERROR";
>   
> +    my $rulename = encode('UTF-8', $rule->{name});
>       if (defined($rule->{id})) {
>   
>   	$self->{dbh}->do(
>   	    "UPDATE Rule " .
>   	    "SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
>   	    "WHERE ID = ?", undef,
> -	    $rule->{name}, $rule->{priority}, $rule->{active},
> +	    $rulename, $rule->{priority}, $rule->{active},
>   	    $rule->{direction}, $rule->{id});
>   
>   	return $rule->{id};
> @@ -614,7 +622,7 @@ sub save_rule {
>   	    "INSERT INTO Rule (Name, Priority, Active, Direction) " .
>   	    "VALUES (?, ?, ?, ?);");
>   
> -	$sth->execute($rule->name, $rule->priority, $rule->active,
> +	$sth->execute($rulename, $rule->priority, $rule->active,
>   		      $rule->direction);
>   
>   	return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
> @@ -779,7 +787,8 @@ sub load_rules {
>       $sth->execute();
>   
>       while (my $ref = $sth->fetchrow_hashref()) {
> -	my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
> +	my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
> +	my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
>   					  $ref->{active}, $ref->{direction});
>   	$rule->{id} = $ref->{id};
>   	push @$rules, $rule;
> diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
> index cd67ea2..4ebd6da 100644
> --- a/src/PMG/RuleDB/Accept.pm
> +++ b/src/PMG/RuleDB/Accept.pm
> @@ -93,7 +93,7 @@ sub execute {
>       my $dkim = $msginfo->{dkim} // {};
>       my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       foreach my $ta (@$subgroups) {
>   	my ($tg, $entity) = (@$ta[0], @$ta[1]);
> diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
> index d364690..c1225f3 100644
> --- a/src/PMG/RuleDB/BCC.pm
> +++ b/src/PMG/RuleDB/BCC.pm
> @@ -115,7 +115,7 @@ sub execute {
>   
>       my $subgroups = $mod_group->subgroups($targets, 1);
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
>   
> diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
> index c758787..25bb74e 100644
> --- a/src/PMG/RuleDB/Block.pm
> +++ b/src/PMG/RuleDB/Block.pm
> @@ -89,7 +89,7 @@ sub execute {
>       my ($self, $queue, $ruledb, $mod_group, $targets,
>   	$msginfo, $vars, $marks) = @_;
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       if ($msginfo->{testmode}) {
>   	my $fh = $msginfo->{test_fh};
> diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
> index d3003b2..c6afe54 100644
> --- a/src/PMG/RuleDB/Disclaimer.pm
> +++ b/src/PMG/RuleDB/Disclaimer.pm
> @@ -193,7 +193,7 @@ sub execute {
>       my ($self, $queue, $ruledb, $mod_group, $targets,
>   	$msginfo, $vars, $marks) = @_;
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       my $subgroups = $mod_group->subgroups($targets);
>   
> diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
> index 2508305..baa68ce 100644
> --- a/src/PMG/RuleDB/Group.pm
> +++ b/src/PMG/RuleDB/Group.pm
> @@ -12,8 +12,8 @@ sub new {
>       my ($type, $name, $info, $class) = @_;
>   
>       my $self = {
> -	name => $name,
> -	info => $info,
> +	name => PMG::Utils::try_decode_utf8($name),
> +	info => PMG::Utils::try_decode_utf8($info),
>   	class => $class,
>       };
>   
> diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
> index 2671ea4..8246e6e 100644
> --- a/src/PMG/RuleDB/MatchField.pm
> +++ b/src/PMG/RuleDB/MatchField.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(decode encode);

again here

>   use MIME::Words;
>   
>   use PVE::SafeSyslog;
> @@ -50,9 +51,10 @@ sub load_attr {
>       defined($field) || die "undefined object attribute: ERROR";
>       defined($field_value) || die "undefined object attribute: ERROR";
>   
> +    my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
>       # use known constructor, bless afterwards (because sub class can have constructor
>       # with other parameter signature).
> -    my $obj =  PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
> +    my $obj =  PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
>       bless $obj, $class;
>   
>       $obj->{id} = $id;
> @@ -69,6 +71,7 @@ sub save {
>   
>       my $new_value = "$self->{field}:$self->{field_value}";
>       $new_value =~ s/\\/\\\\/g;
> +    $new_value = encode('UTF-8', $new_value);
>   
>       if (defined ($self->{id})) {
>   	# update
> @@ -106,6 +109,7 @@ sub parse_entity {
>   	    chomp $value;
>   
>   	    my $decvalue = MIME::Words::decode_mimewords($value);
> +	    $decvalue = PMG::Utils::try_decode_utf8($decvalue);

doesn't that do utf8 decoding already? we have the mail here not the database
object...

>   
>   	    if ($decvalue =~ m|$self->{field_value}|i) {
>   		push @$res, $id;
> diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
> index 7e5b486..06bf931 100644
> --- a/src/PMG/RuleDB/MatchFilename.pm
> +++ b/src/PMG/RuleDB/MatchFilename.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(encode decode);
>   use MIME::Words;
>   
>   use PMG::Utils;
> @@ -41,8 +42,9 @@ sub load_attr {
>       my $class = ref($type) || $type;
>   
>       defined($value) || die "undefined value: ERROR";;
> +    my $decvalue = PMG::Utils::try_decode_utf8($value);
>   
> -    my $obj = $class->new($value, $ogroup);
> +    my $obj = $class->new($decvalue, $ogroup);
>       $obj->{id} = $id;
>   
>       $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
> @@ -57,6 +59,7 @@ sub save {
>   
>       my $new_value = $self->{fname};
>       $new_value =~ s/\\/\\\\/g;
> +    $new_value = encode('UTF-8', $new_value);
>   
>       if (defined($self->{id})) {
>   	# update
> diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
> index fb15076..1e1727f 100644
> --- a/src/PMG/RuleDB/ModField.pm
> +++ b/src/PMG/RuleDB/ModField.pm
> @@ -57,7 +57,9 @@ sub load_attr {
>   
>       (defined($field) && defined($field_value)) || return undef;
>   
> -    my $obj = $class->new($field, $field_value, $ogroup);
> +    my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
> +
> +    my $obj = $class->new($field, $dec_field_value, $ogroup);
>       $obj->{id} = $id;
>   
>       $obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
> @@ -70,7 +72,7 @@ sub save {
>   
>       defined($self->{ogroup}) || return undef;
>   
> -    my $new_value = "$self->{field}:$self->{field_value}";
> +    my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
>   
>       if (defined ($self->{id})) {
>   	# update
> diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
> index af853a3..bca5ebf 100644
> --- a/src/PMG/RuleDB/Notify.pm
> +++ b/src/PMG/RuleDB/Notify.pm
> @@ -208,7 +208,7 @@ sub execute {
>   :bd
>       my $from = 'postmaster';
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       my $body = PMG::Utils::subst_values($self->{body}, $vars);
>       my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
> diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
> index 1426393..30bc5ec 100644
> --- a/src/PMG/RuleDB/Quarantine.pm
> +++ b/src/PMG/RuleDB/Quarantine.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(decode encode);

why decode too?

>   
>   use PVE::SafeSyslog;
>   
> @@ -89,7 +90,7 @@ sub execute {
>       
>       my $subgroups = $mod_group->subgroups($targets, 1);
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       foreach my $ta (@$subgroups) {
>   	my ($tg, $entity) = (@$ta[0], @$ta[1]);
> diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
> index 6b27b91..da6c25f 100644
> --- a/src/PMG/RuleDB/Remove.pm
> +++ b/src/PMG/RuleDB/Remove.pm
> @@ -63,12 +63,14 @@ sub load_attr {
>   
>       defined ($value) || die "undefined value: ERROR";
>   
> -    my $obj;
> +    my ($obj, $text);
>   
>       if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
> -	$obj = $class->new($1, $4, $ogroup, $2);
> +	$text = PMG::Utils::try_decode_utf8($4);
> +	$obj = $class->new($1, $text, $ogroup, $2);
>       } elsif ($value =~ m/^([01])(\:(.*))?$/s) {
> -	$obj = $class->new($1, $3, $ogroup);
> +	$text = PMG::Utils::try_decode_utf8($3);
> +	$obj = $class->new($1, $text, $ogroup);
>       } else {
>   	$obj = $class->new(0, undef, $ogroup);
>       }
> @@ -89,7 +91,7 @@ sub save {
>       $value .= ','. ($self->{quarantine} ? '1' : '0');
>   
>       if ($self->{text}) {
> -	$value .= ":$self->{text}";
> +	$value .= encode('UTF-8', ":$self->{text}");
>       }
>   
>       if (defined ($self->{id})) {
> @@ -194,7 +196,7 @@ sub execute {
>       my ($self, $queue, $ruledb, $mod_group, $targets,
>   	$msginfo, $vars, $marks, $ldap) = @_;
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       if (!$self->{all} && ($#$marks == -1)) {
>   	# no marks
> diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
> index c49ad21..e7c9146 100644
> --- a/src/PMG/RuleDB/Rule.pm
> +++ b/src/PMG/RuleDB/Rule.pm
> @@ -12,7 +12,7 @@ sub new {
>       my ($type, $name, $priority, $active, $direction) = @_;
>   
>       my $self = {
> -	name => $name // '',
> +	name => PMG::Utils::try_decode_utf8($name) // '',
>   	priority => $priority // 0,
>   	active => $active // 0,
>       };
> diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
> index 37ec3aa..ccc94a0 100644
> --- a/src/PMG/RuleDB/WhoRegex.pm
> +++ b/src/PMG/RuleDB/WhoRegex.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(decode encode);

again here

>   
>   use PMG::Utils;
>   use PMG::RuleDB::Object;
> @@ -43,7 +44,8 @@ sub load_attr {
>   
>       defined($value) || die "undefined value: ERROR";
>   
> -    my $obj = $class->new ($value, $ogroup);
> +    my $decoded_value = PMG::Utils::try_decode_utf8($value);
> +    my $obj = $class->new ($decoded_value, $ogroup);
>       $obj->{id} = $id;
>   
>       $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
> @@ -59,6 +61,7 @@ sub save {
>   
>       my $adr = $self->{address};
>       $adr =~ s/\\/\\\\/g;
> +    $adr = encode('UTF-8', $adr);
>   
>       if (defined ($self->{id})) {
>   	# update
> diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
> index cef232b..23f60eb 100644
> --- a/src/PMG/Utils.pm
> +++ b/src/PMG/Utils.pm
> @@ -1542,4 +1542,9 @@ sub get_existing_object_id {
>       return;
>   }
>   
> +sub try_decode_utf8 {
> +    my ($data) = @_;
> +    return eval { decode('UTF-8', $data, 1) } // $data;
> +}
> +
>   1;





  reply	other threads:[~2022-11-14 14:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09 18:27 [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 1/5] ruledb: modfield: properly encode field after variable substitution Stoiko Ivanov
2022-11-11 13:56   ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 2/5] ruledb: add deprecation warnings for unused actions Stoiko Ivanov
2022-11-14 16:02   ` Dominik Csapak
2022-11-15 14:32   ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database Stoiko Ivanov
2022-11-14 14:36   ` Dominik Csapak [this message]
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 4/5] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-14 14:49   ` Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 5/5] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-14 16:03   ` Dominik Csapak
2022-11-14 16:02 ` [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9b5e2ba-6e1c-5cb2-4564-22e685b9d8f2@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    --cc=s.ivanov@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal