From: "Daniel Kral" <d.kral@proxmox.com>
To: "Fiona Ebner" <f.ebner@proxmox.com>,
"Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH ha-manager] rules: fix utf-8 encoding and decoding for comments field
Date: Thu, 30 Oct 2025 10:56:01 +0100 [thread overview]
Message-ID: <DDVKEY2127XY.E5RTWKB5DOW6@proxmox.com> (raw)
In-Reply-To: <09e16950-42ca-4570-ba8c-8d0dcf1ac7e1@proxmox.com>
On Wed Oct 22, 2025 at 2:58 PM CEST, Fiona Ebner wrote:
> Am 05.09.25 um 12:17 PM schrieb Daniel Kral:
>> As reported by a user in the community forum [0].
>>
>> [0] https://forum.proxmox.com/threads/169258/page-14#post-792521
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> Tested a few strings from an unicode example page [1], including at
>> least a mix of ASCII, Latin-1, Cyrillic and Bengali characters.
>>
>> Also checked this applies both on master and on another ha-manager
>> series [2] without any fuzz.
>>
>> [1] https://www.cogsci.ed.ac.uk/~richard/unicode-sample.html
>> [2] https://lore.proxmox.com/pve-devel/20250821143705.256562-1-d.kral@proxmox.com/
>>
>> src/PVE/HA/Rules.pm | 28 ++++++++++++++++++++++++----
>> 1 file changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
>> index 323ad038..8c60b5ce 100644
>> --- a/src/PVE/HA/Rules.pm
>> +++ b/src/PVE/HA/Rules.pm
>> @@ -163,8 +163,6 @@ sub decode_value {
>> }
>>
>> return $res;
>> - } elsif ($key eq 'comment') {
>> - return PVE::Tools::decode_text($value);
>> }
>>
>> my $plugin = $class->lookup($type);
>> @@ -198,8 +196,6 @@ sub encode_value {
>> PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
>>
>> return join(',', sort keys %$value);
>> - } elsif ($key eq 'comment') {
>> - return PVE::Tools::encode_text($value);
>> }
>>
>> my $plugin = $class->lookup($type);
>
> Why did the original implementation with {de,en}code_value() not work?
>
Sorry for the late reply, I had to look into it a bit: It seems with the
{de,en}code_text($value) in {de,en}code_value(...) from above, we decode
the comment text twice and then encode it to store it.
I haven't found where it is done exactly/not sure about it, but it seems
that we already unescape+utf8-decode text somewhere else for the API
arguments (maybe AnyEvent::Http or some other handler?) before that, but
maybe someone more knowledgeable about pve-http-server could answer
here. The following script tries to capture what happens with the value:
```
#!/usr/bin/env perl
use v5.36;
use Encode;
use URI::Escape;
use Data::Dumper;
sub encode_text {
my ($text) = @_;
# all control and hi-bit characters, ':' and '%'
my $unsafe = "^\x20-\x24\x26-\x39\x3b-\x7e";
return uri_escape(Encode::encode("utf8", $text), $unsafe);
}
sub decode_text {
my ($data) = @_;
return Encode::decode("utf8", uri_unescape($data));
}
my $input = 'à á â ã ä å';
print "Original: $input\n";
print "Decode: " . ($input = decode_text($input)) . "\n";
print "2*Decode: " . ($input = decode_text($input)) . "\n";
print "2*Decode+Encode: " . ($input = encode_text($input)) . "\n";
```
With the following output:
```
# ./decode-encode.pl
Original: à á â ã ä å
Decode:
Wide character in print at ./decode-encode.pl line 27.
2*Decode: � � � � � �
2*Decode+Encode: %EF%BF%BD %EF%BF%BD %EF%BF%BD %EF%BF%BD %EF%BF%BD %EF%BF%BD
```
which is exactly what is stored in the rules.cfg afterwards.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2025-10-30 9:55 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-05 10:13 Daniel Kral
2025-10-22 12:58 ` Fiona Ebner
2025-10-30 9:56 ` Daniel Kral [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DDVKEY2127XY.E5RTWKB5DOW6@proxmox.com \
--to=d.kral@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox