From: Dominik Csapak <d.csapak@proxmox.com>
To: Stoiko Ivanov <s.ivanov@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database
Date: Mon, 14 Nov 2022 15:36:13 +0100 [thread overview]
Message-ID: <d9b5e2ba-6e1c-5cb2-4564-22e685b9d8f2@proxmox.com> (raw)
In-Reply-To: <20221109182728.629576-4-s.ivanov@proxmox.com>
misin
comments inline:
On 11/9/22 19:27, Stoiko Ivanov wrote:
> This patch adds support for storing rule names, comments(info), and
> most relevant values (e.g. the header content to match) in utf-8 in
> the database.
>
> backwards-compatibility should not be an issue:
> * following the argumentation from commit
> 43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage
> * we only need to consider that the valid multibyte utf-8 characters
> do not really yield sensible combinations of single-byte characters
> (starting with a byte > 127 - e.g. "£")
>
> the database is created with SQL_ASCII encoding - which behaves by
> interpreting bytes <= 127 as ascii and those > 127 are not interpreted
> (see [0], which just means that we have to explicitly en-/decode upon
> storing/reading from there)
>
> This patch currently omits most Who objects:
> * for email/domain we'd still need to consider how to store them
> (puny-code for the domain part, or everything as UTF-8) and it would
> need changes to the API-types.
> * the LDAP objects currently would not work too well, since our LDAPCache
> is not UTF-8 safe - and fixing warants its own patch-series
> * WhoRegex should work and be able to handle many use-cases
>
> The ContentType values should also contain only ascii characters per
> RFC6838 [1] and RFC2045 [2].
>
> [0] https://www.postgresql.org/docs/13/multibyte.html
> [1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
> [2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1
>
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
> src/PMG/RuleDB.pm | 23 ++++++++++++++++-------
> src/PMG/RuleDB/Accept.pm | 2 +-
> src/PMG/RuleDB/BCC.pm | 2 +-
> src/PMG/RuleDB/Block.pm | 2 +-
> src/PMG/RuleDB/Disclaimer.pm | 2 +-
> src/PMG/RuleDB/Group.pm | 4 ++--
> src/PMG/RuleDB/MatchField.pm | 6 +++++-
> src/PMG/RuleDB/MatchFilename.pm | 5 ++++-
> src/PMG/RuleDB/ModField.pm | 6 ++++--
> src/PMG/RuleDB/Notify.pm | 2 +-
> src/PMG/RuleDB/Quarantine.pm | 3 ++-
> src/PMG/RuleDB/Remove.pm | 12 +++++++-----
> src/PMG/RuleDB/Rule.pm | 2 +-
> src/PMG/RuleDB/WhoRegex.pm | 5 ++++-
> src/PMG/Utils.pm | 5 +++++
> 15 files changed, 55 insertions(+), 26 deletions(-)
>
> diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
> index 895acc6..9d6d99d 100644
> --- a/src/PMG/RuleDB.pm
> +++ b/src/PMG/RuleDB.pm
> @@ -5,6 +5,7 @@ use warnings;
> use DBI;
> use HTML::Entities;
> use Data::Dumper;
> +use Encode qw(encode decode);
>
> use PVE::SafeSyslog;
>
> @@ -72,7 +73,8 @@ sub create_group_with_obj {
>
> $name //= '';
> $info //= '';
> -
> + $name = encode('UTF-8', $name);
> + $info = encode('UTF-8',$info);
nit: spacing and we could combine with the lines above:
$name = encode('UTF-8', $name // '');
...
> eval {
>
> $self->{dbh}->begin_work;
> @@ -174,7 +176,9 @@ sub save_group {
> $self->{dbh}->do("UPDATE Objectgroup " .
> "SET Name = ?, Info = ? " .
> "WHERE ID = ?", undef,
> - $og->{name}, $og->{info}, $og->{id});
> + encode('UTF-8', $og->{name}),
> + encode('UTF-8', $og->{info}),
> + $og->{id});
>
> return $og->{id};
>
> @@ -183,7 +187,7 @@ sub save_group {
> "INSERT INTO Objectgroup (Name, Info, Class) " .
> "VALUES (?, ?, ?);");
>
> - $sth->execute($og->name, $og->info, $og->class);
> + $sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
>
> return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
> }
> @@ -212,7 +216,9 @@ sub delete_group {
> $sth->execute($groupid);
>
> if (my $ref = $sth->fetchrow_hashref()) {
> - die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
> + my $groupname = PMG::Utils::try_deocode_utf8($ref->{groupname});
> + my $rulename = PMG::Utils::try_deocode_utf8($ref->{rulename});
typo: deocode vs decode
> + die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
> }
>
> $sth->finish();
> @@ -474,6 +480,7 @@ sub load_object_full {
> sub load_group_by_name {
> my ($self, $name) = @_;
>
> + $name = PMG::Utils::try_decode_utf8($name);
isn't this wrong?
we encode the name before writing into the database, so we nave to
encode it again when getting it out of the db, no?
otoh, after a short grep, we never call 'load_group_by_name' anyway,
so we could just delete
> my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
> "WHERE name = ?");
>
> @@ -598,13 +605,14 @@ sub save_rule {
> defined($rule->{direction}) ||
> die "undefined rule attribute - direction: ERROR";
>
> + my $rulename = encode('UTF-8', $rule->{name});
> if (defined($rule->{id})) {
>
> $self->{dbh}->do(
> "UPDATE Rule " .
> "SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
> "WHERE ID = ?", undef,
> - $rule->{name}, $rule->{priority}, $rule->{active},
> + $rulename, $rule->{priority}, $rule->{active},
> $rule->{direction}, $rule->{id});
>
> return $rule->{id};
> @@ -614,7 +622,7 @@ sub save_rule {
> "INSERT INTO Rule (Name, Priority, Active, Direction) " .
> "VALUES (?, ?, ?, ?);");
>
> - $sth->execute($rule->name, $rule->priority, $rule->active,
> + $sth->execute($rulename, $rule->priority, $rule->active,
> $rule->direction);
>
> return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
> @@ -779,7 +787,8 @@ sub load_rules {
> $sth->execute();
>
> while (my $ref = $sth->fetchrow_hashref()) {
> - my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
> + my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
> + my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
> $ref->{active}, $ref->{direction});
> $rule->{id} = $ref->{id};
> push @$rules, $rule;
> diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
> index cd67ea2..4ebd6da 100644
> --- a/src/PMG/RuleDB/Accept.pm
> +++ b/src/PMG/RuleDB/Accept.pm
> @@ -93,7 +93,7 @@ sub execute {
> my $dkim = $msginfo->{dkim} // {};
> my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
>
> - my $rulename = $vars->{RULE} // 'unknown';
> + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>
> foreach my $ta (@$subgroups) {
> my ($tg, $entity) = (@$ta[0], @$ta[1]);
> diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
> index d364690..c1225f3 100644
> --- a/src/PMG/RuleDB/BCC.pm
> +++ b/src/PMG/RuleDB/BCC.pm
> @@ -115,7 +115,7 @@ sub execute {
>
> my $subgroups = $mod_group->subgroups($targets, 1);
>
> - my $rulename = $vars->{RULE} // 'unknown';
> + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>
> my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
>
> diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
> index c758787..25bb74e 100644
> --- a/src/PMG/RuleDB/Block.pm
> +++ b/src/PMG/RuleDB/Block.pm
> @@ -89,7 +89,7 @@ sub execute {
> my ($self, $queue, $ruledb, $mod_group, $targets,
> $msginfo, $vars, $marks) = @_;
>
> - my $rulename = $vars->{RULE} // 'unknown';
> + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>
> if ($msginfo->{testmode}) {
> my $fh = $msginfo->{test_fh};
> diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
> index d3003b2..c6afe54 100644
> --- a/src/PMG/RuleDB/Disclaimer.pm
> +++ b/src/PMG/RuleDB/Disclaimer.pm
> @@ -193,7 +193,7 @@ sub execute {
> my ($self, $queue, $ruledb, $mod_group, $targets,
> $msginfo, $vars, $marks) = @_;
>
> - my $rulename = $vars->{RULE} // 'unknown';
> + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>
> my $subgroups = $mod_group->subgroups($targets);
>
> diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
> index 2508305..baa68ce 100644
> --- a/src/PMG/RuleDB/Group.pm
> +++ b/src/PMG/RuleDB/Group.pm
> @@ -12,8 +12,8 @@ sub new {
> my ($type, $name, $info, $class) = @_;
>
> my $self = {
> - name => $name,
> - info => $info,
> + name => PMG::Utils::try_decode_utf8($name),
> + info => PMG::Utils::try_decode_utf8($info),
> class => $class,
> };
>
> diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
> index 2671ea4..8246e6e 100644
> --- a/src/PMG/RuleDB/MatchField.pm
> +++ b/src/PMG/RuleDB/MatchField.pm
> @@ -4,6 +4,7 @@ use strict;
> use warnings;
> use DBI;
> use Digest::SHA;
> +use Encode qw(decode encode);
again here
> use MIME::Words;
>
> use PVE::SafeSyslog;
> @@ -50,9 +51,10 @@ sub load_attr {
> defined($field) || die "undefined object attribute: ERROR";
> defined($field_value) || die "undefined object attribute: ERROR";
>
> + my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
> # use known constructor, bless afterwards (because sub class can have constructor
> # with other parameter signature).
> - my $obj = PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
> + my $obj = PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
> bless $obj, $class;
>
> $obj->{id} = $id;
> @@ -69,6 +71,7 @@ sub save {
>
> my $new_value = "$self->{field}:$self->{field_value}";
> $new_value =~ s/\\/\\\\/g;
> + $new_value = encode('UTF-8', $new_value);
>
> if (defined ($self->{id})) {
> # update
> @@ -106,6 +109,7 @@ sub parse_entity {
> chomp $value;
>
> my $decvalue = MIME::Words::decode_mimewords($value);
> + $decvalue = PMG::Utils::try_decode_utf8($decvalue);
doesn't that do utf8 decoding already? we have the mail here not the database
object...
>
> if ($decvalue =~ m|$self->{field_value}|i) {
> push @$res, $id;
> diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
> index 7e5b486..06bf931 100644
> --- a/src/PMG/RuleDB/MatchFilename.pm
> +++ b/src/PMG/RuleDB/MatchFilename.pm
> @@ -4,6 +4,7 @@ use strict;
> use warnings;
> use DBI;
> use Digest::SHA;
> +use Encode qw(encode decode);
> use MIME::Words;
>
> use PMG::Utils;
> @@ -41,8 +42,9 @@ sub load_attr {
> my $class = ref($type) || $type;
>
> defined($value) || die "undefined value: ERROR";;
> + my $decvalue = PMG::Utils::try_decode_utf8($value);
>
> - my $obj = $class->new($value, $ogroup);
> + my $obj = $class->new($decvalue, $ogroup);
> $obj->{id} = $id;
>
> $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
> @@ -57,6 +59,7 @@ sub save {
>
> my $new_value = $self->{fname};
> $new_value =~ s/\\/\\\\/g;
> + $new_value = encode('UTF-8', $new_value);
>
> if (defined($self->{id})) {
> # update
> diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
> index fb15076..1e1727f 100644
> --- a/src/PMG/RuleDB/ModField.pm
> +++ b/src/PMG/RuleDB/ModField.pm
> @@ -57,7 +57,9 @@ sub load_attr {
>
> (defined($field) && defined($field_value)) || return undef;
>
> - my $obj = $class->new($field, $field_value, $ogroup);
> + my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
> +
> + my $obj = $class->new($field, $dec_field_value, $ogroup);
> $obj->{id} = $id;
>
> $obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
> @@ -70,7 +72,7 @@ sub save {
>
> defined($self->{ogroup}) || return undef;
>
> - my $new_value = "$self->{field}:$self->{field_value}";
> + my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
>
> if (defined ($self->{id})) {
> # update
> diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
> index af853a3..bca5ebf 100644
> --- a/src/PMG/RuleDB/Notify.pm
> +++ b/src/PMG/RuleDB/Notify.pm
> @@ -208,7 +208,7 @@ sub execute {
> :bd
> my $from = 'postmaster';
>
> - my $rulename = $vars->{RULE} // 'unknown';
> + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>
> my $body = PMG::Utils::subst_values($self->{body}, $vars);
> my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
> diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
> index 1426393..30bc5ec 100644
> --- a/src/PMG/RuleDB/Quarantine.pm
> +++ b/src/PMG/RuleDB/Quarantine.pm
> @@ -4,6 +4,7 @@ use strict;
> use warnings;
> use DBI;
> use Digest::SHA;
> +use Encode qw(decode encode);
why decode too?
>
> use PVE::SafeSyslog;
>
> @@ -89,7 +90,7 @@ sub execute {
>
> my $subgroups = $mod_group->subgroups($targets, 1);
>
> - my $rulename = $vars->{RULE} // 'unknown';
> + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>
> foreach my $ta (@$subgroups) {
> my ($tg, $entity) = (@$ta[0], @$ta[1]);
> diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
> index 6b27b91..da6c25f 100644
> --- a/src/PMG/RuleDB/Remove.pm
> +++ b/src/PMG/RuleDB/Remove.pm
> @@ -63,12 +63,14 @@ sub load_attr {
>
> defined ($value) || die "undefined value: ERROR";
>
> - my $obj;
> + my ($obj, $text);
>
> if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
> - $obj = $class->new($1, $4, $ogroup, $2);
> + $text = PMG::Utils::try_decode_utf8($4);
> + $obj = $class->new($1, $text, $ogroup, $2);
> } elsif ($value =~ m/^([01])(\:(.*))?$/s) {
> - $obj = $class->new($1, $3, $ogroup);
> + $text = PMG::Utils::try_decode_utf8($3);
> + $obj = $class->new($1, $text, $ogroup);
> } else {
> $obj = $class->new(0, undef, $ogroup);
> }
> @@ -89,7 +91,7 @@ sub save {
> $value .= ','. ($self->{quarantine} ? '1' : '0');
>
> if ($self->{text}) {
> - $value .= ":$self->{text}";
> + $value .= encode('UTF-8', ":$self->{text}");
> }
>
> if (defined ($self->{id})) {
> @@ -194,7 +196,7 @@ sub execute {
> my ($self, $queue, $ruledb, $mod_group, $targets,
> $msginfo, $vars, $marks, $ldap) = @_;
>
> - my $rulename = $vars->{RULE} // 'unknown';
> + my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>
> if (!$self->{all} && ($#$marks == -1)) {
> # no marks
> diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
> index c49ad21..e7c9146 100644
> --- a/src/PMG/RuleDB/Rule.pm
> +++ b/src/PMG/RuleDB/Rule.pm
> @@ -12,7 +12,7 @@ sub new {
> my ($type, $name, $priority, $active, $direction) = @_;
>
> my $self = {
> - name => $name // '',
> + name => PMG::Utils::try_decode_utf8($name) // '',
> priority => $priority // 0,
> active => $active // 0,
> };
> diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
> index 37ec3aa..ccc94a0 100644
> --- a/src/PMG/RuleDB/WhoRegex.pm
> +++ b/src/PMG/RuleDB/WhoRegex.pm
> @@ -4,6 +4,7 @@ use strict;
> use warnings;
> use DBI;
> use Digest::SHA;
> +use Encode qw(decode encode);
again here
>
> use PMG::Utils;
> use PMG::RuleDB::Object;
> @@ -43,7 +44,8 @@ sub load_attr {
>
> defined($value) || die "undefined value: ERROR";
>
> - my $obj = $class->new ($value, $ogroup);
> + my $decoded_value = PMG::Utils::try_decode_utf8($value);
> + my $obj = $class->new ($decoded_value, $ogroup);
> $obj->{id} = $id;
>
> $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
> @@ -59,6 +61,7 @@ sub save {
>
> my $adr = $self->{address};
> $adr =~ s/\\/\\\\/g;
> + $adr = encode('UTF-8', $adr);
>
> if (defined ($self->{id})) {
> # update
> diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
> index cef232b..23f60eb 100644
> --- a/src/PMG/Utils.pm
> +++ b/src/PMG/Utils.pm
> @@ -1542,4 +1542,9 @@ sub get_existing_object_id {
> return;
> }
>
> +sub try_decode_utf8 {
> + my ($data) = @_;
> + return eval { decode('UTF-8', $data, 1) } // $data;
> +}
> +
> 1;
next prev parent reply other threads:[~2022-11-14 14:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-09 18:27 [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 1/5] ruledb: modfield: properly encode field after variable substitution Stoiko Ivanov
2022-11-11 13:56 ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 2/5] ruledb: add deprecation warnings for unused actions Stoiko Ivanov
2022-11-14 16:02 ` Dominik Csapak
2022-11-15 14:32 ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database Stoiko Ivanov
2022-11-14 14:36 ` Dominik Csapak [this message]
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 4/5] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-14 14:49 ` Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 5/5] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-14 16:03 ` Dominik Csapak
2022-11-14 16:02 ` [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d9b5e2ba-6e1c-5cb2-4564-22e685b9d8f2@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pmg-devel@lists.proxmox.com \
--cc=s.ivanov@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox