From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 001EBB861 for ; Thu, 24 Nov 2022 13:22:00 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D77572F251 for ; Thu, 24 Nov 2022 13:21:30 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 24 Nov 2022 13:21:24 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 35F5B43C7C for ; Thu, 24 Nov 2022 13:21:16 +0100 (CET) From: Dominik Csapak To: pmg-devel@lists.proxmox.com Date: Thu, 24 Nov 2022 13:21:03 +0100 Message-Id: <20221124122112.666868-4-d.csapak@proxmox.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221124122112.666868-1-d.csapak@proxmox.com> References: <20221124122112.666868-1-d.csapak@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.064 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [group.pm, ietf.org, disclaimer.pm, block.pm, modfield.pm, bcc.pm, matchfilename.pm, notify.pm, matchfield.pm, quarantine.pm, remove.pm, postgresql.org, ruledb.pm, whoregex.pm, accept.pm, rule.pm] Subject: [pmg-devel] [PATCH pmg-api v4 03/12] fix #2541 ruledb: encode relevant values as utf-8 in database X-BeenThere: pmg-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Mail Gateway development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Nov 2022 12:22:01 -0000 From: Stoiko Ivanov This patch adds support for storing rule names, comments(info), and most relevant values (e.g. the header content to match) in utf-8 in the database. backwards-compatibility should not be an issue: * currently the database should not contain any utf-8 multibyte characters, as our tooling prevented this due to sending wide-characters, which causes an exception in DBI. * any character > 127 and < 256 will be correctly interpreted when stored in a perl-string (this happens if the decode fails in try_decode_utf8), and will be correctly encoded when storing into the database. the database is created with SQL_ASCII encoding - which behaves by interpreting bytes <= 127 as ascii and those > 127 are not interpreted (see [0], which just means that we have to explicitly en-/decode upon storing/reading from there) This patch currently omits most Who objects: * for email/domain we'd still need to consider how to store them (puny-code for the domain part, or everything as UTF-8) and it would need changes to the API-types. * the LDAP objects currently would not work too well, since our LDAPCache is not UTF-8 safe - and fixing warants its own patch-series * WhoRegex should work and be able to handle many use-cases The ContentType values should also contain only ascii characters per RFC6838 [1] and RFC2045 [2]. [0] https://www.postgresql.org/docs/13/multibyte.html [1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2 [2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1 Signed-off-by: Stoiko Ivanov Signed-off-by: Dominik Csapak --- src/PMG/RuleDB.pm | 24 ++++++++++++++++-------- src/PMG/RuleDB/Accept.pm | 2 +- src/PMG/RuleDB/BCC.pm | 2 +- src/PMG/RuleDB/Block.pm | 2 +- src/PMG/RuleDB/Disclaimer.pm | 2 +- src/PMG/RuleDB/Group.pm | 4 ++-- src/PMG/RuleDB/MatchField.pm | 8 ++++++-- src/PMG/RuleDB/MatchFilename.pm | 5 ++++- src/PMG/RuleDB/ModField.pm | 6 ++++-- src/PMG/RuleDB/Notify.pm | 2 +- src/PMG/RuleDB/Quarantine.pm | 3 ++- src/PMG/RuleDB/Remove.pm | 12 +++++++----- src/PMG/RuleDB/Rule.pm | 2 +- src/PMG/RuleDB/WhoRegex.pm | 5 ++++- 14 files changed, 51 insertions(+), 28 deletions(-) diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm index 895acc6..a6b0b79 100644 --- a/src/PMG/RuleDB.pm +++ b/src/PMG/RuleDB.pm @@ -5,6 +5,7 @@ use warnings; use DBI; use HTML::Entities; use Data::Dumper; +use Encode qw(encode); use PVE::SafeSyslog; @@ -70,8 +71,8 @@ sub create_group_with_obj { defined($obj) || die "proxmox: undefined object"; - $name //= ''; - $info //= ''; + $name = encode('UTF-8', $name // ''); + $info = encode('UTF-8', $info // ''); eval { @@ -174,7 +175,9 @@ sub save_group { $self->{dbh}->do("UPDATE Objectgroup " . "SET Name = ?, Info = ? " . "WHERE ID = ?", undef, - $og->{name}, $og->{info}, $og->{id}); + encode('UTF-8', $og->{name}), + encode('UTF-8', $og->{info}), + $og->{id}); return $og->{id}; @@ -183,7 +186,7 @@ sub save_group { "INSERT INTO Objectgroup (Name, Info, Class) " . "VALUES (?, ?, ?);"); - $sth->execute($og->name, $og->info, $og->class); + $sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class); return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq'); } @@ -212,7 +215,9 @@ sub delete_group { $sth->execute($groupid); if (my $ref = $sth->fetchrow_hashref()) { - die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n"; + my $groupname = PMG::Utils::try_decode_utf8($ref->{groupname}); + my $rulename = PMG::Utils::try_decode_utf8($ref->{rulename}); + die "Group '$groupname' is used by rule '$rulename' - unable to delete\n"; } $sth->finish(); @@ -474,6 +479,7 @@ sub load_object_full { sub load_group_by_name { my ($self, $name) = @_; + $name = encode('UTF-8', $name); my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " . "WHERE name = ?"); @@ -598,13 +604,14 @@ sub save_rule { defined($rule->{direction}) || die "undefined rule attribute - direction: ERROR"; + my $rulename = encode('UTF-8', $rule->{name}); if (defined($rule->{id})) { $self->{dbh}->do( "UPDATE Rule " . "SET Name = ?, Priority = ?, Active = ?, Direction = ? " . "WHERE ID = ?", undef, - $rule->{name}, $rule->{priority}, $rule->{active}, + $rulename, $rule->{priority}, $rule->{active}, $rule->{direction}, $rule->{id}); return $rule->{id}; @@ -614,7 +621,7 @@ sub save_rule { "INSERT INTO Rule (Name, Priority, Active, Direction) " . "VALUES (?, ?, ?, ?);"); - $sth->execute($rule->name, $rule->priority, $rule->active, + $sth->execute($rulename, $rule->priority, $rule->active, $rule->direction); return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq'); @@ -779,7 +786,8 @@ sub load_rules { $sth->execute(); while (my $ref = $sth->fetchrow_hashref()) { - my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority}, + my $rulename = PMG::Utils::try_decode_utf8($ref->{name}); + my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority}, $ref->{active}, $ref->{direction}); $rule->{id} = $ref->{id}; push @$rules, $rule; diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm index cd67ea2..4ebd6da 100644 --- a/src/PMG/RuleDB/Accept.pm +++ b/src/PMG/RuleDB/Accept.pm @@ -93,7 +93,7 @@ sub execute { my $dkim = $msginfo->{dkim} // {}; my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign}); - my $rulename = $vars->{RULE} // 'unknown'; + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); foreach my $ta (@$subgroups) { my ($tg, $entity) = (@$ta[0], @$ta[1]); diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm index 4867d83..6244dd9 100644 --- a/src/PMG/RuleDB/BCC.pm +++ b/src/PMG/RuleDB/BCC.pm @@ -115,7 +115,7 @@ sub execute { my $subgroups = $mod_group->subgroups($targets, 1); - my $rulename = $vars->{RULE} // 'unknown'; + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); my $bcc_to = PMG::Utils::subst_values_for_header($self->{target}, $vars); diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm index c758787..25bb74e 100644 --- a/src/PMG/RuleDB/Block.pm +++ b/src/PMG/RuleDB/Block.pm @@ -89,7 +89,7 @@ sub execute { my ($self, $queue, $ruledb, $mod_group, $targets, $msginfo, $vars, $marks) = @_; - my $rulename = $vars->{RULE} // 'unknown'; + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); if ($msginfo->{testmode}) { my $fh = $msginfo->{test_fh}; diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm index d3003b2..c6afe54 100644 --- a/src/PMG/RuleDB/Disclaimer.pm +++ b/src/PMG/RuleDB/Disclaimer.pm @@ -193,7 +193,7 @@ sub execute { my ($self, $queue, $ruledb, $mod_group, $targets, $msginfo, $vars, $marks) = @_; - my $rulename = $vars->{RULE} // 'unknown'; + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); my $subgroups = $mod_group->subgroups($targets); diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm index 2508305..baa68ce 100644 --- a/src/PMG/RuleDB/Group.pm +++ b/src/PMG/RuleDB/Group.pm @@ -12,8 +12,8 @@ sub new { my ($type, $name, $info, $class) = @_; my $self = { - name => $name, - info => $info, + name => PMG::Utils::try_decode_utf8($name), + info => PMG::Utils::try_decode_utf8($info), class => $class, }; diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm index 2671ea4..2b56058 100644 --- a/src/PMG/RuleDB/MatchField.pm +++ b/src/PMG/RuleDB/MatchField.pm @@ -4,6 +4,7 @@ use strict; use warnings; use DBI; use Digest::SHA; +use Encode qw(encode); use MIME::Words; use PVE::SafeSyslog; @@ -50,9 +51,10 @@ sub load_attr { defined($field) || die "undefined object attribute: ERROR"; defined($field_value) || die "undefined object attribute: ERROR"; + my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value); # use known constructor, bless afterwards (because sub class can have constructor # with other parameter signature). - my $obj = PMG::RuleDB::MatchField->new($field, $field_value, $ogroup); + my $obj = PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup); bless $obj, $class; $obj->{id} = $id; @@ -69,6 +71,7 @@ sub save { my $new_value = "$self->{field}:$self->{field_value}"; $new_value =~ s/\\/\\\\/g; + $new_value = encode('UTF-8', $new_value); if (defined ($self->{id})) { # update @@ -105,7 +108,8 @@ sub parse_entity { for my $value ($entity->head->get_all($self->{field})) { chomp $value; - my $decvalue = MIME::Words::decode_mimewords($value); + my $decvalue = PMG::Utils::decode_rfc1522($value); + $decvalue = PMG::Utils::try_decode_utf8($decvalue); if ($decvalue =~ m|$self->{field_value}|i) { push @$res, $id; diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm index 7e5b486..c9cdbe0 100644 --- a/src/PMG/RuleDB/MatchFilename.pm +++ b/src/PMG/RuleDB/MatchFilename.pm @@ -4,6 +4,7 @@ use strict; use warnings; use DBI; use Digest::SHA; +use Encode qw(encode); use MIME::Words; use PMG::Utils; @@ -41,8 +42,9 @@ sub load_attr { my $class = ref($type) || $type; defined($value) || die "undefined value: ERROR";; + my $decvalue = PMG::Utils::try_decode_utf8($value); - my $obj = $class->new($value, $ogroup); + my $obj = $class->new($decvalue, $ogroup); $obj->{id} = $id; $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup); @@ -57,6 +59,7 @@ sub save { my $new_value = $self->{fname}; $new_value =~ s/\\/\\\\/g; + $new_value = encode('UTF-8', $new_value); if (defined($self->{id})) { # update diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm index 34108d1..6232322 100644 --- a/src/PMG/RuleDB/ModField.pm +++ b/src/PMG/RuleDB/ModField.pm @@ -56,7 +56,9 @@ sub load_attr { (defined($field) && defined($field_value)) || return undef; - my $obj = $class->new($field, $field_value, $ogroup); + my $dec_field_value = PMG::Utils::try_decode_utf8($field_value); + + my $obj = $class->new($field, $dec_field_value, $ogroup); $obj->{id} = $id; $obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup); @@ -69,7 +71,7 @@ sub save { defined($self->{ogroup}) || return undef; - my $new_value = "$self->{field}:$self->{field_value}"; + my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}"); if (defined ($self->{id})) { # update diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm index 7b38e0d..8a9945b 100644 --- a/src/PMG/RuleDB/Notify.pm +++ b/src/PMG/RuleDB/Notify.pm @@ -208,7 +208,7 @@ sub execute { my $from = 'postmaster'; - my $rulename = $vars->{RULE} // 'unknown'; + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); my $body = PMG::Utils::subst_values($self->{body}, $vars); my $subject = PMG::Utils::subst_values_for_header($self->{subject}, $vars); diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm index 1426393..9d802fe 100644 --- a/src/PMG/RuleDB/Quarantine.pm +++ b/src/PMG/RuleDB/Quarantine.pm @@ -4,6 +4,7 @@ use strict; use warnings; use DBI; use Digest::SHA; +use Encode qw(encode); use PVE::SafeSyslog; @@ -89,7 +90,7 @@ sub execute { my $subgroups = $mod_group->subgroups($targets, 1); - my $rulename = $vars->{RULE} // 'unknown'; + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); foreach my $ta (@$subgroups) { my ($tg, $entity) = (@$ta[0], @$ta[1]); diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm index 6b27b91..da6c25f 100644 --- a/src/PMG/RuleDB/Remove.pm +++ b/src/PMG/RuleDB/Remove.pm @@ -63,12 +63,14 @@ sub load_attr { defined ($value) || die "undefined value: ERROR"; - my $obj; + my ($obj, $text); if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) { - $obj = $class->new($1, $4, $ogroup, $2); + $text = PMG::Utils::try_decode_utf8($4); + $obj = $class->new($1, $text, $ogroup, $2); } elsif ($value =~ m/^([01])(\:(.*))?$/s) { - $obj = $class->new($1, $3, $ogroup); + $text = PMG::Utils::try_decode_utf8($3); + $obj = $class->new($1, $text, $ogroup); } else { $obj = $class->new(0, undef, $ogroup); } @@ -89,7 +91,7 @@ sub save { $value .= ','. ($self->{quarantine} ? '1' : '0'); if ($self->{text}) { - $value .= ":$self->{text}"; + $value .= encode('UTF-8', ":$self->{text}"); } if (defined ($self->{id})) { @@ -194,7 +196,7 @@ sub execute { my ($self, $queue, $ruledb, $mod_group, $targets, $msginfo, $vars, $marks, $ldap) = @_; - my $rulename = $vars->{RULE} // 'unknown'; + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown'); if (!$self->{all} && ($#$marks == -1)) { # no marks diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm index c49ad21..e7c9146 100644 --- a/src/PMG/RuleDB/Rule.pm +++ b/src/PMG/RuleDB/Rule.pm @@ -12,7 +12,7 @@ sub new { my ($type, $name, $priority, $active, $direction) = @_; my $self = { - name => $name // '', + name => PMG::Utils::try_decode_utf8($name) // '', priority => $priority // 0, active => $active // 0, }; diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm index 37ec3aa..5c13604 100644 --- a/src/PMG/RuleDB/WhoRegex.pm +++ b/src/PMG/RuleDB/WhoRegex.pm @@ -4,6 +4,7 @@ use strict; use warnings; use DBI; use Digest::SHA; +use Encode qw(encode); use PMG::Utils; use PMG::RuleDB::Object; @@ -43,7 +44,8 @@ sub load_attr { defined($value) || die "undefined value: ERROR"; - my $obj = $class->new ($value, $ogroup); + my $decoded_value = PMG::Utils::try_decode_utf8($value); + my $obj = $class->new ($decoded_value, $ogroup); $obj->{id} = $id; $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup); @@ -59,6 +61,7 @@ sub save { my $adr = $self->{address}; $adr =~ s/\\/\\\\/g; + $adr = encode('UTF-8', $adr); if (defined ($self->{id})) { # update -- 2.30.2