From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 5B9E491961 for ; Mon, 14 Nov 2022 15:36:20 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 36A8A24DBF for ; Mon, 14 Nov 2022 15:36:20 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 14 Nov 2022 15:36:17 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C643443CC9; Mon, 14 Nov 2022 15:36:16 +0100 (CET) Message-ID: Date: Mon, 14 Nov 2022 15:36:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:107.0) Gecko/20100101 Thunderbird/107.0 Content-Language: en-US To: Stoiko Ivanov , pmg-devel@lists.proxmox.com References: <20221109182728.629576-1-s.ivanov@proxmox.com> <20221109182728.629576-4-s.ivanov@proxmox.com> From: Dominik Csapak In-Reply-To: <20221109182728.629576-4-s.ivanov@proxmox.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: =?UTF-8?Q?0=0A=09?=AWL 0.066 Adjusted score from AWL reputation of From: =?UTF-8?Q?address=0A=09?=BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict =?UTF-8?Q?Alignment=0A=09?=NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF =?UTF-8?Q?Record=0A=09?=SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database X-BeenThere: pmg-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Mail Gateway development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Nov 2022 14:36:20 -0000 misin comments inline: On 11/9/22 19:27, Stoiko Ivanov wrote: > This patch adds support for storing rule names, comments(info), and > most relevant values (e.g. the header content to match) in utf-8 in > the database. > > backwards-compatibility should not be an issue: > * following the argumentation from commit > 43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage > * we only need to consider that the valid multibyte utf-8 characters > do not really yield sensible combinations of single-byte characters > (starting with a byte > 127 - e.g. "£") > > the database is created with SQL_ASCII encoding - which behaves by > interpreting bytes <= 127 as ascii and those > 127 are not interpreted > (see [0], which just means that we have to explicitly en-/decode upon > storing/reading from there) > > This patch currently omits most Who objects: > * for email/domain we'd still need to consider how to store them > (puny-code for the domain part, or everything as UTF-8) and it would > need changes to the API-types. > * the LDAP objects currently would not work too well, since our LDAPCache > is not UTF-8 safe - and fixing warants its own patch-series > * WhoRegex should work and be able to handle many use-cases > > The ContentType values should also contain only ascii characters per > RFC6838 [1] and RFC2045 [2]. > > [0] https://www.postgresql.org/docs/13/multibyte.html > [1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2 > [2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1 > > Signed-off-by: Stoiko Ivanov > --- > src/PMG/RuleDB.pm | 23 ++++++++++++++++------- > src/PMG/RuleDB/Accept.pm | 2 +- > src/PMG/RuleDB/BCC.pm | 2 +- > src/PMG/RuleDB/Block.pm | 2 +- > src/PMG/RuleDB/Disclaimer.pm | 2 +- > src/PMG/RuleDB/Group.pm | 4 ++-- > src/PMG/RuleDB/MatchField.pm | 6 +++++- > src/PMG/RuleDB/MatchFilename.pm | 5 ++++- > src/PMG/RuleDB/ModField.pm | 6 ++++-- > src/PMG/RuleDB/Notify.pm | 2 +- > src/PMG/RuleDB/Quarantine.pm | 3 ++- > src/PMG/RuleDB/Remove.pm | 12 +++++++----- > src/PMG/RuleDB/Rule.pm | 2 +- > src/PMG/RuleDB/WhoRegex.pm | 5 ++++- > src/PMG/Utils.pm | 5 +++++ > 15 files changed, 55 insertions(+), 26 deletions(-) > > diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm > index 895acc6..9d6d99d 100644 > --- a/src/PMG/RuleDB.pm > +++ b/src/PMG/RuleDB.pm > @@ -5,6 +5,7 @@ use warnings; > use DBI; > use HTML::Entities; > use Data::Dumper; > +use Encode qw(encode decode); > > use PVE::SafeSyslog; > > @@ -72,7 +73,8 @@ sub create_group_with_obj { > > $name //= ''; > $info //= ''; > - > + $name = encode('UTF-8', $name); > + $info = encode('UTF-8',$info); nit: spacing and we could combine with the lines above: $name = encode('UTF-8', $name // ''); ... > eval { > > $self->{dbh}->begin_work; > @@ -174,7 +176,9 @@ sub save_group { > $self->{dbh}->do("UPDATE Objectgroup " . > "SET Name = ?, Info = ? " . > "WHERE ID = ?", undef, > - $og->{name}, $og->{info}, $og->{id}); > + encode('UTF-8', $og->{name}), > + encode('UTF-8', $og->{info}), > + $og->{id}); > > return $og->{id}; > > @@ -183,7 +187,7 @@ sub save_group { > "INSERT INTO Objectgroup (Name, Info, Class) " . > "VALUES (?, ?, ?);"); > > - $sth->execute($og->name, $og->info, $og->class); > + $sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class); > > return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq'); > } > @@ -212,7 +216,9 @@ sub delete_group { > $sth->execute($groupid); > > if (my $ref = $sth->fetchrow_hashref()) { > - die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n"; > + my $groupname = PMG::Utils::try_deocode_utf8($ref->{groupname}); > + my $rulename = PMG::Utils::try_deocode_utf8($ref->{rulename}); typo: deocode vs decode > + die "Group '$groupname' is used by rule '$rulename' - unable to delete\n"; > } > > $sth->finish(); > @@ -474,6 +480,7 @@ sub load_object_full { > sub load_group_by_name { > my ($self, $name) = @_; > > + $name = PMG::Utils::try_decode_utf8($name); isn't this wrong? we encode the name before writing into the database, so we nave to encode it again when getting it out of the db, no? otoh, after a short grep, we never call 'load_group_by_name' anyway, so we could just delete > my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " . > "WHERE name = ?"); > > @@ -598,13 +605,14 @@ sub save_rule { > defined($rule->{direction}) || > die "undefined rule attribute - direction: ERROR"; > > + my $rulename = encode('UTF-8', $rule->{name}); > if (defined($rule->{id})) { > > $self->{dbh}->do( > "UPDATE Rule " . > "SET Name = ?, Priority = ?, Active = ?, Direction = ? " . > "WHERE ID = ?", undef, > - $rule->{name}, $rule->{priority}, $rule->{active}, > + $rulename, $rule->{priority}, $rule->{active}, > $rule->{direction}, $rule->{id}); > > return $rule->{id}; > @@ -614,7 +622,7 @@ sub save_rule { > "INSERT INTO Rule (Name, Priority, Active, Direction) " . > "VALUES (?, ?, ?, ?);"); > > - $sth->execute($rule->name, $rule->priority, $rule->active, > + $sth->execute($rulename, $rule->priority, $rule->active, > $rule->direction); > > return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq'); > @@ -779,7 +787,8 @@ sub load_rules { > $sth->execute(); > > while (my $ref = $sth->fetchrow_hashref()) { > - my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority}, > + my $rulename = PMG::Utils::try_decode_utf8($ref->{name}); > + my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority}, > $ref->{active}, $ref->{direction}); > $rule->{id} = $ref->{id}; > push @$rules, $rule; > diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm > index cd67ea2..4ebd6da 100644 > --- a/src/PMG/RuleDB/Accept.pm > +++ b/src/PMG/RuleDB/Accept.pm > @@ -93,7 +93,7 @@ sub execute { > my $dkim = $msginfo->{dkim} // {}; > my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign}); > > - my $rulename = $vars->{RULE} // 'unknown'; > + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); > > foreach my $ta (@$subgroups) { > my ($tg, $entity) = (@$ta[0], @$ta[1]); > diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm > index d364690..c1225f3 100644 > --- a/src/PMG/RuleDB/BCC.pm > +++ b/src/PMG/RuleDB/BCC.pm > @@ -115,7 +115,7 @@ sub execute { > > my $subgroups = $mod_group->subgroups($targets, 1); > > - my $rulename = $vars->{RULE} // 'unknown'; > + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); > > my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars); > > diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm > index c758787..25bb74e 100644 > --- a/src/PMG/RuleDB/Block.pm > +++ b/src/PMG/RuleDB/Block.pm > @@ -89,7 +89,7 @@ sub execute { > my ($self, $queue, $ruledb, $mod_group, $targets, > $msginfo, $vars, $marks) = @_; > > - my $rulename = $vars->{RULE} // 'unknown'; > + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); > > if ($msginfo->{testmode}) { > my $fh = $msginfo->{test_fh}; > diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm > index d3003b2..c6afe54 100644 > --- a/src/PMG/RuleDB/Disclaimer.pm > +++ b/src/PMG/RuleDB/Disclaimer.pm > @@ -193,7 +193,7 @@ sub execute { > my ($self, $queue, $ruledb, $mod_group, $targets, > $msginfo, $vars, $marks) = @_; > > - my $rulename = $vars->{RULE} // 'unknown'; > + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); > > my $subgroups = $mod_group->subgroups($targets); > > diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm > index 2508305..baa68ce 100644 > --- a/src/PMG/RuleDB/Group.pm > +++ b/src/PMG/RuleDB/Group.pm > @@ -12,8 +12,8 @@ sub new { > my ($type, $name, $info, $class) = @_; > > my $self = { > - name => $name, > - info => $info, > + name => PMG::Utils::try_decode_utf8($name), > + info => PMG::Utils::try_decode_utf8($info), > class => $class, > }; > > diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm > index 2671ea4..8246e6e 100644 > --- a/src/PMG/RuleDB/MatchField.pm > +++ b/src/PMG/RuleDB/MatchField.pm > @@ -4,6 +4,7 @@ use strict; > use warnings; > use DBI; > use Digest::SHA; > +use Encode qw(decode encode); again here > use MIME::Words; > > use PVE::SafeSyslog; > @@ -50,9 +51,10 @@ sub load_attr { > defined($field) || die "undefined object attribute: ERROR"; > defined($field_value) || die "undefined object attribute: ERROR"; > > + my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value); > # use known constructor, bless afterwards (because sub class can have constructor > # with other parameter signature). > - my $obj = PMG::RuleDB::MatchField->new($field, $field_value, $ogroup); > + my $obj = PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup); > bless $obj, $class; > > $obj->{id} = $id; > @@ -69,6 +71,7 @@ sub save { > > my $new_value = "$self->{field}:$self->{field_value}"; > $new_value =~ s/\\/\\\\/g; > + $new_value = encode('UTF-8', $new_value); > > if (defined ($self->{id})) { > # update > @@ -106,6 +109,7 @@ sub parse_entity { > chomp $value; > > my $decvalue = MIME::Words::decode_mimewords($value); > + $decvalue = PMG::Utils::try_decode_utf8($decvalue); doesn't that do utf8 decoding already? we have the mail here not the database object... > > if ($decvalue =~ m|$self->{field_value}|i) { > push @$res, $id; > diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm > index 7e5b486..06bf931 100644 > --- a/src/PMG/RuleDB/MatchFilename.pm > +++ b/src/PMG/RuleDB/MatchFilename.pm > @@ -4,6 +4,7 @@ use strict; > use warnings; > use DBI; > use Digest::SHA; > +use Encode qw(encode decode); > use MIME::Words; > > use PMG::Utils; > @@ -41,8 +42,9 @@ sub load_attr { > my $class = ref($type) || $type; > > defined($value) || die "undefined value: ERROR";; > + my $decvalue = PMG::Utils::try_decode_utf8($value); > > - my $obj = $class->new($value, $ogroup); > + my $obj = $class->new($decvalue, $ogroup); > $obj->{id} = $id; > > $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup); > @@ -57,6 +59,7 @@ sub save { > > my $new_value = $self->{fname}; > $new_value =~ s/\\/\\\\/g; > + $new_value = encode('UTF-8', $new_value); > > if (defined($self->{id})) { > # update > diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm > index fb15076..1e1727f 100644 > --- a/src/PMG/RuleDB/ModField.pm > +++ b/src/PMG/RuleDB/ModField.pm > @@ -57,7 +57,9 @@ sub load_attr { > > (defined($field) && defined($field_value)) || return undef; > > - my $obj = $class->new($field, $field_value, $ogroup); > + my $dec_field_value = PMG::Utils::try_decode_utf8($field_value); > + > + my $obj = $class->new($field, $dec_field_value, $ogroup); > $obj->{id} = $id; > > $obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup); > @@ -70,7 +72,7 @@ sub save { > > defined($self->{ogroup}) || return undef; > > - my $new_value = "$self->{field}:$self->{field_value}"; > + my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}"); > > if (defined ($self->{id})) { > # update > diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm > index af853a3..bca5ebf 100644 > --- a/src/PMG/RuleDB/Notify.pm > +++ b/src/PMG/RuleDB/Notify.pm > @@ -208,7 +208,7 @@ sub execute { > :bd > my $from = 'postmaster'; > > - my $rulename = $vars->{RULE} // 'unknown'; > + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); > > my $body = PMG::Utils::subst_values($self->{body}, $vars); > my $subject = PMG::Utils::subst_values($self->{subject}, $vars); > diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm > index 1426393..30bc5ec 100644 > --- a/src/PMG/RuleDB/Quarantine.pm > +++ b/src/PMG/RuleDB/Quarantine.pm > @@ -4,6 +4,7 @@ use strict; > use warnings; > use DBI; > use Digest::SHA; > +use Encode qw(decode encode); why decode too? > > use PVE::SafeSyslog; > > @@ -89,7 +90,7 @@ sub execute { > > my $subgroups = $mod_group->subgroups($targets, 1); > > - my $rulename = $vars->{RULE} // 'unknown'; > + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); > > foreach my $ta (@$subgroups) { > my ($tg, $entity) = (@$ta[0], @$ta[1]); > diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm > index 6b27b91..da6c25f 100644 > --- a/src/PMG/RuleDB/Remove.pm > +++ b/src/PMG/RuleDB/Remove.pm > @@ -63,12 +63,14 @@ sub load_attr { > > defined ($value) || die "undefined value: ERROR"; > > - my $obj; > + my ($obj, $text); > > if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) { > - $obj = $class->new($1, $4, $ogroup, $2); > + $text = PMG::Utils::try_decode_utf8($4); > + $obj = $class->new($1, $text, $ogroup, $2); > } elsif ($value =~ m/^([01])(\:(.*))?$/s) { > - $obj = $class->new($1, $3, $ogroup); > + $text = PMG::Utils::try_decode_utf8($3); > + $obj = $class->new($1, $text, $ogroup); > } else { > $obj = $class->new(0, undef, $ogroup); > } > @@ -89,7 +91,7 @@ sub save { > $value .= ','. ($self->{quarantine} ? '1' : '0'); > > if ($self->{text}) { > - $value .= ":$self->{text}"; > + $value .= encode('UTF-8', ":$self->{text}"); > } > > if (defined ($self->{id})) { > @@ -194,7 +196,7 @@ sub execute { > my ($self, $queue, $ruledb, $mod_group, $targets, > $msginfo, $vars, $marks, $ldap) = @_; > > - my $rulename = $vars->{RULE} // 'unknown'; > + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); > > if (!$self->{all} && ($#$marks == -1)) { > # no marks > diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm > index c49ad21..e7c9146 100644 > --- a/src/PMG/RuleDB/Rule.pm > +++ b/src/PMG/RuleDB/Rule.pm > @@ -12,7 +12,7 @@ sub new { > my ($type, $name, $priority, $active, $direction) = @_; > > my $self = { > - name => $name // '', > + name => PMG::Utils::try_decode_utf8($name) // '', > priority => $priority // 0, > active => $active // 0, > }; > diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm > index 37ec3aa..ccc94a0 100644 > --- a/src/PMG/RuleDB/WhoRegex.pm > +++ b/src/PMG/RuleDB/WhoRegex.pm > @@ -4,6 +4,7 @@ use strict; > use warnings; > use DBI; > use Digest::SHA; > +use Encode qw(decode encode); again here > > use PMG::Utils; > use PMG::RuleDB::Object; > @@ -43,7 +44,8 @@ sub load_attr { > > defined($value) || die "undefined value: ERROR"; > > - my $obj = $class->new ($value, $ogroup); > + my $decoded_value = PMG::Utils::try_decode_utf8($value); > + my $obj = $class->new ($decoded_value, $ogroup); > $obj->{id} = $id; > > $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup); > @@ -59,6 +61,7 @@ sub save { > > my $adr = $self->{address}; > $adr =~ s/\\/\\\\/g; > + $adr = encode('UTF-8', $adr); > > if (defined ($self->{id})) { > # update > diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm > index cef232b..23f60eb 100644 > --- a/src/PMG/Utils.pm > +++ b/src/PMG/Utils.pm > @@ -1542,4 +1542,9 @@ sub get_existing_object_id { > return; > } > > +sub try_decode_utf8 { > + my ($data) = @_; > + return eval { decode('UTF-8', $data, 1) } // $data; > +} > + > 1;