all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: Stoiko Ivanov <s.ivanov@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database
Date: Mon, 14 Nov 2022 15:36:13 +0100	[thread overview]
Message-ID: <d9b5e2ba-6e1c-5cb2-4564-22e685b9d8f2@proxmox.com> (raw)
In-Reply-To: <20221109182728.629576-4-s.ivanov@proxmox.com>

misin

comments inline:

On 11/9/22 19:27, Stoiko Ivanov wrote:
> This patch adds support for storing rule names, comments(info), and
> most relevant values (e.g. the header content to match) in utf-8 in
> the database.
> 
> backwards-compatibility should not be an issue:
> * following the argumentation from commit
>    43f8112f0bb424f99057106d57d32276d7d422a6 in pve-storage
> * we only need to consider that the valid multibyte utf-8 characters
>    do not really yield sensible combinations of single-byte characters
>    (starting with a byte > 127 - e.g. "£")
> 
> the database is created with SQL_ASCII encoding - which behaves by
> interpreting bytes <= 127 as ascii and those > 127 are not interpreted
> (see [0], which just means that we have to explicitly en-/decode upon
> storing/reading from there)
> 
> This patch currently omits most Who objects:
> * for email/domain we'd still need to consider how to store them
>    (puny-code for the domain part, or everything as UTF-8) and it would
>    need changes to the API-types.
> * the LDAP objects currently would not work too well, since our LDAPCache
>    is not UTF-8 safe - and fixing warants its own patch-series
> * WhoRegex should work and be able to handle many use-cases
> 
> The ContentType values should also contain only ascii characters per
> RFC6838 [1] and RFC2045 [2].
> 
> [0] https://www.postgresql.org/docs/13/multibyte.html
> [1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2
> [2] https://datatracker.ietf.org/doc/html/rfc2045#section-5.1
> 
> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
> ---
>   src/PMG/RuleDB.pm               | 23 ++++++++++++++++-------
>   src/PMG/RuleDB/Accept.pm        |  2 +-
>   src/PMG/RuleDB/BCC.pm           |  2 +-
>   src/PMG/RuleDB/Block.pm         |  2 +-
>   src/PMG/RuleDB/Disclaimer.pm    |  2 +-
>   src/PMG/RuleDB/Group.pm         |  4 ++--
>   src/PMG/RuleDB/MatchField.pm    |  6 +++++-
>   src/PMG/RuleDB/MatchFilename.pm |  5 ++++-
>   src/PMG/RuleDB/ModField.pm      |  6 ++++--
>   src/PMG/RuleDB/Notify.pm        |  2 +-
>   src/PMG/RuleDB/Quarantine.pm    |  3 ++-
>   src/PMG/RuleDB/Remove.pm        | 12 +++++++-----
>   src/PMG/RuleDB/Rule.pm          |  2 +-
>   src/PMG/RuleDB/WhoRegex.pm      |  5 ++++-
>   src/PMG/Utils.pm                |  5 +++++
>   15 files changed, 55 insertions(+), 26 deletions(-)
> 
> diff --git a/src/PMG/RuleDB.pm b/src/PMG/RuleDB.pm
> index 895acc6..9d6d99d 100644
> --- a/src/PMG/RuleDB.pm
> +++ b/src/PMG/RuleDB.pm
> @@ -5,6 +5,7 @@ use warnings;
>   use DBI;
>   use HTML::Entities;
>   use Data::Dumper;
> +use Encode qw(encode decode);
>   
>   use PVE::SafeSyslog;
>   
> @@ -72,7 +73,8 @@ sub create_group_with_obj {
>   
>       $name //= '';
>       $info //= '';
> -
> +    $name = encode('UTF-8', $name);
> +    $info = encode('UTF-8',$info);

nit: spacing and we could combine with the lines above:

$name = encode('UTF-8', $name // '');
...


>       eval {
>   
>   	$self->{dbh}->begin_work;
> @@ -174,7 +176,9 @@ sub save_group {
>   	$self->{dbh}->do("UPDATE Objectgroup " .
>   			 "SET Name = ?, Info = ? " .
>   			 "WHERE ID = ?", undef,
> -			 $og->{name}, $og->{info}, $og->{id});
> +			 encode('UTF-8', $og->{name}),
> +			 encode('UTF-8', $og->{info}),
> +			 $og->{id});
>   
>   	return $og->{id};
>   
> @@ -183,7 +187,7 @@ sub save_group {
>   	    "INSERT INTO Objectgroup (Name, Info, Class) " .
>   	    "VALUES (?, ?, ?);");
>   
> -	$sth->execute($og->name, $og->info, $og->class);
> +	$sth->execute(encode('UTF-8', $og->name), encode('UTF-8', $og->info), $og->class);
>   
>   	return $og->{id} = PMG::Utils::lastid($self->{dbh}, 'objectgroup_id_seq');
>       }
> @@ -212,7 +216,9 @@ sub delete_group {
>   	$sth->execute($groupid);
>   
>   	if (my $ref = $sth->fetchrow_hashref()) {
> -	    die "Group '$ref->{groupname}' is used by rule '$ref->{rulename}' - unable to delete\n";
> +	    my $groupname = PMG::Utils::try_deocode_utf8($ref->{groupname});
> +	    my $rulename = PMG::Utils::try_deocode_utf8($ref->{rulename});

typo: deocode vs decode

> +	    die "Group '$groupname' is used by rule '$rulename' - unable to delete\n";
>   	}
>   
>           $sth->finish();
> @@ -474,6 +480,7 @@ sub load_object_full {
>   sub load_group_by_name {
>       my ($self, $name) = @_;
>   
> +    $name = PMG::Utils::try_decode_utf8($name);

isn't this wrong?
we encode the name before writing into the database, so we nave to
encode it again when getting it out of the db, no?

otoh, after a short grep, we never call 'load_group_by_name' anyway,
so we could just delete

>       my $sth = $self->{dbh}->prepare("SELECT * FROM Objectgroup " .
>   				    "WHERE name = ?");
>   
> @@ -598,13 +605,14 @@ sub save_rule {
>       defined($rule->{direction}) ||
>   	die "undefined rule attribute - direction: ERROR";
>   
> +    my $rulename = encode('UTF-8', $rule->{name});
>       if (defined($rule->{id})) {
>   
>   	$self->{dbh}->do(
>   	    "UPDATE Rule " .
>   	    "SET Name = ?, Priority = ?, Active = ?, Direction = ? " .
>   	    "WHERE ID = ?", undef,
> -	    $rule->{name}, $rule->{priority}, $rule->{active},
> +	    $rulename, $rule->{priority}, $rule->{active},
>   	    $rule->{direction}, $rule->{id});
>   
>   	return $rule->{id};
> @@ -614,7 +622,7 @@ sub save_rule {
>   	    "INSERT INTO Rule (Name, Priority, Active, Direction) " .
>   	    "VALUES (?, ?, ?, ?);");
>   
> -	$sth->execute($rule->name, $rule->priority, $rule->active,
> +	$sth->execute($rulename, $rule->priority, $rule->active,
>   		      $rule->direction);
>   
>   	return $rule->{id} = PMG::Utils::lastid($self->{dbh}, 'rule_id_seq');
> @@ -779,7 +787,8 @@ sub load_rules {
>       $sth->execute();
>   
>       while (my $ref = $sth->fetchrow_hashref()) {
> -	my $rule = PMG::RuleDB::Rule->new($ref->{name}, $ref->{priority},
> +	my $rulename = PMG::Utils::try_decode_utf8($ref->{name});
> +	my $rule = PMG::RuleDB::Rule->new($rulename, $ref->{priority},
>   					  $ref->{active}, $ref->{direction});
>   	$rule->{id} = $ref->{id};
>   	push @$rules, $rule;
> diff --git a/src/PMG/RuleDB/Accept.pm b/src/PMG/RuleDB/Accept.pm
> index cd67ea2..4ebd6da 100644
> --- a/src/PMG/RuleDB/Accept.pm
> +++ b/src/PMG/RuleDB/Accept.pm
> @@ -93,7 +93,7 @@ sub execute {
>       my $dkim = $msginfo->{dkim} // {};
>       my $subgroups = $mod_group->subgroups($targets, !$dkim->{sign});
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       foreach my $ta (@$subgroups) {
>   	my ($tg, $entity) = (@$ta[0], @$ta[1]);
> diff --git a/src/PMG/RuleDB/BCC.pm b/src/PMG/RuleDB/BCC.pm
> index d364690..c1225f3 100644
> --- a/src/PMG/RuleDB/BCC.pm
> +++ b/src/PMG/RuleDB/BCC.pm
> @@ -115,7 +115,7 @@ sub execute {
>   
>       my $subgroups = $mod_group->subgroups($targets, 1);
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       my $bcc_to = PMG::Utils::subst_values($self->{target}, $vars);
>   
> diff --git a/src/PMG/RuleDB/Block.pm b/src/PMG/RuleDB/Block.pm
> index c758787..25bb74e 100644
> --- a/src/PMG/RuleDB/Block.pm
> +++ b/src/PMG/RuleDB/Block.pm
> @@ -89,7 +89,7 @@ sub execute {
>       my ($self, $queue, $ruledb, $mod_group, $targets,
>   	$msginfo, $vars, $marks) = @_;
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       if ($msginfo->{testmode}) {
>   	my $fh = $msginfo->{test_fh};
> diff --git a/src/PMG/RuleDB/Disclaimer.pm b/src/PMG/RuleDB/Disclaimer.pm
> index d3003b2..c6afe54 100644
> --- a/src/PMG/RuleDB/Disclaimer.pm
> +++ b/src/PMG/RuleDB/Disclaimer.pm
> @@ -193,7 +193,7 @@ sub execute {
>       my ($self, $queue, $ruledb, $mod_group, $targets,
>   	$msginfo, $vars, $marks) = @_;
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       my $subgroups = $mod_group->subgroups($targets);
>   
> diff --git a/src/PMG/RuleDB/Group.pm b/src/PMG/RuleDB/Group.pm
> index 2508305..baa68ce 100644
> --- a/src/PMG/RuleDB/Group.pm
> +++ b/src/PMG/RuleDB/Group.pm
> @@ -12,8 +12,8 @@ sub new {
>       my ($type, $name, $info, $class) = @_;
>   
>       my $self = {
> -	name => $name,
> -	info => $info,
> +	name => PMG::Utils::try_decode_utf8($name),
> +	info => PMG::Utils::try_decode_utf8($info),
>   	class => $class,
>       };
>   
> diff --git a/src/PMG/RuleDB/MatchField.pm b/src/PMG/RuleDB/MatchField.pm
> index 2671ea4..8246e6e 100644
> --- a/src/PMG/RuleDB/MatchField.pm
> +++ b/src/PMG/RuleDB/MatchField.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(decode encode);

again here

>   use MIME::Words;
>   
>   use PVE::SafeSyslog;
> @@ -50,9 +51,10 @@ sub load_attr {
>       defined($field) || die "undefined object attribute: ERROR";
>       defined($field_value) || die "undefined object attribute: ERROR";
>   
> +    my $decoded_field_value = PMG::Utils::try_decode_utf8($field_value);
>       # use known constructor, bless afterwards (because sub class can have constructor
>       # with other parameter signature).
> -    my $obj =  PMG::RuleDB::MatchField->new($field, $field_value, $ogroup);
> +    my $obj =  PMG::RuleDB::MatchField->new($field, $decoded_field_value, $ogroup);
>       bless $obj, $class;
>   
>       $obj->{id} = $id;
> @@ -69,6 +71,7 @@ sub save {
>   
>       my $new_value = "$self->{field}:$self->{field_value}";
>       $new_value =~ s/\\/\\\\/g;
> +    $new_value = encode('UTF-8', $new_value);
>   
>       if (defined ($self->{id})) {
>   	# update
> @@ -106,6 +109,7 @@ sub parse_entity {
>   	    chomp $value;
>   
>   	    my $decvalue = MIME::Words::decode_mimewords($value);
> +	    $decvalue = PMG::Utils::try_decode_utf8($decvalue);

doesn't that do utf8 decoding already? we have the mail here not the database
object...

>   
>   	    if ($decvalue =~ m|$self->{field_value}|i) {
>   		push @$res, $id;
> diff --git a/src/PMG/RuleDB/MatchFilename.pm b/src/PMG/RuleDB/MatchFilename.pm
> index 7e5b486..06bf931 100644
> --- a/src/PMG/RuleDB/MatchFilename.pm
> +++ b/src/PMG/RuleDB/MatchFilename.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(encode decode);
>   use MIME::Words;
>   
>   use PMG::Utils;
> @@ -41,8 +42,9 @@ sub load_attr {
>       my $class = ref($type) || $type;
>   
>       defined($value) || die "undefined value: ERROR";;
> +    my $decvalue = PMG::Utils::try_decode_utf8($value);
>   
> -    my $obj = $class->new($value, $ogroup);
> +    my $obj = $class->new($decvalue, $ogroup);
>       $obj->{id} = $id;
>   
>       $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
> @@ -57,6 +59,7 @@ sub save {
>   
>       my $new_value = $self->{fname};
>       $new_value =~ s/\\/\\\\/g;
> +    $new_value = encode('UTF-8', $new_value);
>   
>       if (defined($self->{id})) {
>   	# update
> diff --git a/src/PMG/RuleDB/ModField.pm b/src/PMG/RuleDB/ModField.pm
> index fb15076..1e1727f 100644
> --- a/src/PMG/RuleDB/ModField.pm
> +++ b/src/PMG/RuleDB/ModField.pm
> @@ -57,7 +57,9 @@ sub load_attr {
>   
>       (defined($field) && defined($field_value)) || return undef;
>   
> -    my $obj = $class->new($field, $field_value, $ogroup);
> +    my $dec_field_value = PMG::Utils::try_decode_utf8($field_value);
> +
> +    my $obj = $class->new($field, $dec_field_value, $ogroup);
>       $obj->{id} = $id;
>   
>       $obj->{digest} = Digest::SHA::sha1_hex($id, $field, $field_value, $ogroup);
> @@ -70,7 +72,7 @@ sub save {
>   
>       defined($self->{ogroup}) || return undef;
>   
> -    my $new_value = "$self->{field}:$self->{field_value}";
> +    my $new_value = encode('UTF-8', "$self->{field}:$self->{field_value}");
>   
>       if (defined ($self->{id})) {
>   	# update
> diff --git a/src/PMG/RuleDB/Notify.pm b/src/PMG/RuleDB/Notify.pm
> index af853a3..bca5ebf 100644
> --- a/src/PMG/RuleDB/Notify.pm
> +++ b/src/PMG/RuleDB/Notify.pm
> @@ -208,7 +208,7 @@ sub execute {
>   :bd
>       my $from = 'postmaster';
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       my $body = PMG::Utils::subst_values($self->{body}, $vars);
>       my $subject = PMG::Utils::subst_values($self->{subject}, $vars);
> diff --git a/src/PMG/RuleDB/Quarantine.pm b/src/PMG/RuleDB/Quarantine.pm
> index 1426393..30bc5ec 100644
> --- a/src/PMG/RuleDB/Quarantine.pm
> +++ b/src/PMG/RuleDB/Quarantine.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(decode encode);

why decode too?

>   
>   use PVE::SafeSyslog;
>   
> @@ -89,7 +90,7 @@ sub execute {
>       
>       my $subgroups = $mod_group->subgroups($targets, 1);
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       foreach my $ta (@$subgroups) {
>   	my ($tg, $entity) = (@$ta[0], @$ta[1]);
> diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
> index 6b27b91..da6c25f 100644
> --- a/src/PMG/RuleDB/Remove.pm
> +++ b/src/PMG/RuleDB/Remove.pm
> @@ -63,12 +63,14 @@ sub load_attr {
>   
>       defined ($value) || die "undefined value: ERROR";
>   
> -    my $obj;
> +    my ($obj, $text);
>   
>       if ($value =~ m/^([01])\,([01])(\:(.*))?$/s) {
> -	$obj = $class->new($1, $4, $ogroup, $2);
> +	$text = PMG::Utils::try_decode_utf8($4);
> +	$obj = $class->new($1, $text, $ogroup, $2);
>       } elsif ($value =~ m/^([01])(\:(.*))?$/s) {
> -	$obj = $class->new($1, $3, $ogroup);
> +	$text = PMG::Utils::try_decode_utf8($3);
> +	$obj = $class->new($1, $text, $ogroup);
>       } else {
>   	$obj = $class->new(0, undef, $ogroup);
>       }
> @@ -89,7 +91,7 @@ sub save {
>       $value .= ','. ($self->{quarantine} ? '1' : '0');
>   
>       if ($self->{text}) {
> -	$value .= ":$self->{text}";
> +	$value .= encode('UTF-8', ":$self->{text}");
>       }
>   
>       if (defined ($self->{id})) {
> @@ -194,7 +196,7 @@ sub execute {
>       my ($self, $queue, $ruledb, $mod_group, $targets,
>   	$msginfo, $vars, $marks, $ldap) = @_;
>   
> -    my $rulename = $vars->{RULE} // 'unknown';
> +    my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>   
>       if (!$self->{all} && ($#$marks == -1)) {
>   	# no marks
> diff --git a/src/PMG/RuleDB/Rule.pm b/src/PMG/RuleDB/Rule.pm
> index c49ad21..e7c9146 100644
> --- a/src/PMG/RuleDB/Rule.pm
> +++ b/src/PMG/RuleDB/Rule.pm
> @@ -12,7 +12,7 @@ sub new {
>       my ($type, $name, $priority, $active, $direction) = @_;
>   
>       my $self = {
> -	name => $name // '',
> +	name => PMG::Utils::try_decode_utf8($name) // '',
>   	priority => $priority // 0,
>   	active => $active // 0,
>       };
> diff --git a/src/PMG/RuleDB/WhoRegex.pm b/src/PMG/RuleDB/WhoRegex.pm
> index 37ec3aa..ccc94a0 100644
> --- a/src/PMG/RuleDB/WhoRegex.pm
> +++ b/src/PMG/RuleDB/WhoRegex.pm
> @@ -4,6 +4,7 @@ use strict;
>   use warnings;
>   use DBI;
>   use Digest::SHA;
> +use Encode qw(decode encode);

again here

>   
>   use PMG::Utils;
>   use PMG::RuleDB::Object;
> @@ -43,7 +44,8 @@ sub load_attr {
>   
>       defined($value) || die "undefined value: ERROR";
>   
> -    my $obj = $class->new ($value, $ogroup);
> +    my $decoded_value = PMG::Utils::try_decode_utf8($value);
> +    my $obj = $class->new ($decoded_value, $ogroup);
>       $obj->{id} = $id;
>   
>       $obj->{digest} = Digest::SHA::sha1_hex($id, $value, $ogroup);
> @@ -59,6 +61,7 @@ sub save {
>   
>       my $adr = $self->{address};
>       $adr =~ s/\\/\\\\/g;
> +    $adr = encode('UTF-8', $adr);
>   
>       if (defined ($self->{id})) {
>   	# update
> diff --git a/src/PMG/Utils.pm b/src/PMG/Utils.pm
> index cef232b..23f60eb 100644
> --- a/src/PMG/Utils.pm
> +++ b/src/PMG/Utils.pm
> @@ -1542,4 +1542,9 @@ sub get_existing_object_id {
>       return;
>   }
>   
> +sub try_decode_utf8 {
> +    my ($data) = @_;
> +    return eval { decode('UTF-8', $data, 1) } // $data;
> +}
> +
>   1;





  reply	other threads:[~2022-11-14 14:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09 18:27 [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Stoiko Ivanov
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 1/5] ruledb: modfield: properly encode field after variable substitution Stoiko Ivanov
2022-11-11 13:56   ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 2/5] ruledb: add deprecation warnings for unused actions Stoiko Ivanov
2022-11-14 16:02   ` Dominik Csapak
2022-11-15 14:32   ` [pmg-devel] applied: " Thomas Lamprecht
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 3/5] fix #2541 ruledb: encode relevant values as utf-8 in database Stoiko Ivanov
2022-11-14 14:36   ` Dominik Csapak [this message]
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 4/5] ruledb: encode e-mail addresses for syslog Stoiko Ivanov
2022-11-14 14:49   ` Dominik Csapak
2022-11-09 18:27 ` [pmg-devel] [PATCH pmg-api 5/5] partially fix #2465: handle smtputf8 addresses in the rule-system Stoiko Ivanov
2022-11-14 16:03   ` Dominik Csapak
2022-11-14 16:02 ` [pmg-devel] [PATCH pmg-api 0/5] ruledb - improve experience for non-ascii tests and mails Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9b5e2ba-6e1c-5cb2-4564-22e685b9d8f2@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    --cc=s.ivanov@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal