public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH ha-manager] rules: fix utf-8 encoding and decoding for comments field
@ 2025-09-05 10:13 Daniel Kral
  2025-10-22 12:58 ` Fiona Ebner
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Kral @ 2025-09-05 10:13 UTC (permalink / raw)
  To: pve-devel

As reported by a user in the community forum [0].

[0] https://forum.proxmox.com/threads/169258/page-14#post-792521

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
Tested a few strings from an unicode example page [1], including at
least a mix of ASCII, Latin-1, Cyrillic and Bengali characters.

Also checked this applies both on master and on another ha-manager
series [2] without any fuzz.

[1] https://www.cogsci.ed.ac.uk/~richard/unicode-sample.html
[2] https://lore.proxmox.com/pve-devel/20250821143705.256562-1-d.kral@proxmox.com/

 src/PVE/HA/Rules.pm | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
index 323ad038..8c60b5ce 100644
--- a/src/PVE/HA/Rules.pm
+++ b/src/PVE/HA/Rules.pm
@@ -163,8 +163,6 @@ sub decode_value {
         }
 
         return $res;
-    } elsif ($key eq 'comment') {
-        return PVE::Tools::decode_text($value);
     }
 
     my $plugin = $class->lookup($type);
@@ -198,8 +196,6 @@ sub encode_value {
         PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
 
         return join(',', sort keys %$value);
-    } elsif ($key eq 'comment') {
-        return PVE::Tools::encode_text($value);
     }
 
     my $plugin = $class->lookup($type);
@@ -220,6 +216,30 @@ sub parse_section_header {
     return undef;
 }
 
+sub parse_config {
+    my ($class, $filename, $raw, $allow_unknown) = @_;
+
+    my $cfg = $class->SUPER::parse_config($filename, $raw, $allow_unknown);
+
+    for my $rule (values $cfg->{ids}->%*) {
+        $rule->{comment} = PVE::Tools::decode_text($rule->{comment})
+            if defined($rule->{comment});
+    }
+
+    return $cfg;
+}
+
+sub write_config {
+    my ($class, $filename, $cfg, $allow_unknown) = @_;
+
+    for my $rule (values $cfg->{ids}->%*) {
+        $rule->{comment} = PVE::Tools::encode_text($rule->{comment})
+            if defined($rule->{comment});
+    }
+
+    return $class->SUPER::write_config($filename, $cfg, $allow_unknown);
+}
+
 # General rule helpers
 
 =head3 $class->set_rule_defaults($rule)
-- 
2.47.2



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH ha-manager] rules: fix utf-8 encoding and decoding for comments field
  2025-09-05 10:13 [pve-devel] [PATCH ha-manager] rules: fix utf-8 encoding and decoding for comments field Daniel Kral
@ 2025-10-22 12:58 ` Fiona Ebner
  2025-10-30  9:56   ` Daniel Kral
  0 siblings, 1 reply; 4+ messages in thread
From: Fiona Ebner @ 2025-10-22 12:58 UTC (permalink / raw)
  To: Proxmox VE development discussion, Daniel Kral

Am 05.09.25 um 12:17 PM schrieb Daniel Kral:
> As reported by a user in the community forum [0].
> 
> [0] https://forum.proxmox.com/threads/169258/page-14#post-792521
> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> Tested a few strings from an unicode example page [1], including at
> least a mix of ASCII, Latin-1, Cyrillic and Bengali characters.
> 
> Also checked this applies both on master and on another ha-manager
> series [2] without any fuzz.
> 
> [1] https://www.cogsci.ed.ac.uk/~richard/unicode-sample.html
> [2] https://lore.proxmox.com/pve-devel/20250821143705.256562-1-d.kral@proxmox.com/
> 
>  src/PVE/HA/Rules.pm | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
> index 323ad038..8c60b5ce 100644
> --- a/src/PVE/HA/Rules.pm
> +++ b/src/PVE/HA/Rules.pm
> @@ -163,8 +163,6 @@ sub decode_value {
>          }
>  
>          return $res;
> -    } elsif ($key eq 'comment') {
> -        return PVE::Tools::decode_text($value);
>      }
>  
>      my $plugin = $class->lookup($type);
> @@ -198,8 +196,6 @@ sub encode_value {
>          PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
>  
>          return join(',', sort keys %$value);
> -    } elsif ($key eq 'comment') {
> -        return PVE::Tools::encode_text($value);
>      }
>  
>      my $plugin = $class->lookup($type);

Why did the original implementation with {de,en}code_value() not work?

> @@ -220,6 +216,30 @@ sub parse_section_header {
>      return undef;
>  }
>  
> +sub parse_config {
> +    my ($class, $filename, $raw, $allow_unknown) = @_;
> +
> +    my $cfg = $class->SUPER::parse_config($filename, $raw, $allow_unknown);
> +
> +    for my $rule (values $cfg->{ids}->%*) {
> +        $rule->{comment} = PVE::Tools::decode_text($rule->{comment})
> +            if defined($rule->{comment});
> +    }
> +
> +    return $cfg;
> +}
> +
> +sub write_config {
> +    my ($class, $filename, $cfg, $allow_unknown) = @_;
> +
> +    for my $rule (values $cfg->{ids}->%*) {
> +        $rule->{comment} = PVE::Tools::encode_text($rule->{comment})
> +            if defined($rule->{comment});
> +    }
> +
> +    return $class->SUPER::write_config($filename, $cfg, $allow_unknown);
> +}
> +
>  # General rule helpers
>  
>  =head3 $class->set_rule_defaults($rule)



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH ha-manager] rules: fix utf-8 encoding and decoding for comments field
  2025-10-22 12:58 ` Fiona Ebner
@ 2025-10-30  9:56   ` Daniel Kral
  2025-11-11 17:10     ` Fiona Ebner
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Kral @ 2025-10-30  9:56 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion

On Wed Oct 22, 2025 at 2:58 PM CEST, Fiona Ebner wrote:
> Am 05.09.25 um 12:17 PM schrieb Daniel Kral:
>> As reported by a user in the community forum [0].
>> 
>> [0] https://forum.proxmox.com/threads/169258/page-14#post-792521
>> 
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> Tested a few strings from an unicode example page [1], including at
>> least a mix of ASCII, Latin-1, Cyrillic and Bengali characters.
>> 
>> Also checked this applies both on master and on another ha-manager
>> series [2] without any fuzz.
>> 
>> [1] https://www.cogsci.ed.ac.uk/~richard/unicode-sample.html
>> [2] https://lore.proxmox.com/pve-devel/20250821143705.256562-1-d.kral@proxmox.com/
>> 
>>  src/PVE/HA/Rules.pm | 28 ++++++++++++++++++++++++----
>>  1 file changed, 24 insertions(+), 4 deletions(-)
>> 
>> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
>> index 323ad038..8c60b5ce 100644
>> --- a/src/PVE/HA/Rules.pm
>> +++ b/src/PVE/HA/Rules.pm
>> @@ -163,8 +163,6 @@ sub decode_value {
>>          }
>>  
>>          return $res;
>> -    } elsif ($key eq 'comment') {
>> -        return PVE::Tools::decode_text($value);
>>      }
>>  
>>      my $plugin = $class->lookup($type);
>> @@ -198,8 +196,6 @@ sub encode_value {
>>          PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
>>  
>>          return join(',', sort keys %$value);
>> -    } elsif ($key eq 'comment') {
>> -        return PVE::Tools::encode_text($value);
>>      }
>>  
>>      my $plugin = $class->lookup($type);
>
> Why did the original implementation with {de,en}code_value() not work?
>

Sorry for the late reply, I had to look into it a bit: It seems with the
{de,en}code_text($value) in {de,en}code_value(...) from above, we decode
the comment text twice and then encode it to store it.

I haven't found where it is done exactly/not sure about it, but it seems
that we already unescape+utf8-decode text somewhere else for the API
arguments (maybe AnyEvent::Http or some other handler?) before that, but
maybe someone more knowledgeable about pve-http-server could answer
here. The following script tries to capture what happens with the value:

```
#!/usr/bin/env perl

use v5.36;

use Encode;
use URI::Escape;
use Data::Dumper;

sub encode_text {
    my ($text) = @_;

    # all control and hi-bit characters, ':' and '%'
    my $unsafe = "^\x20-\x24\x26-\x39\x3b-\x7e";
    return uri_escape(Encode::encode("utf8", $text), $unsafe);
}

sub decode_text {
    my ($data) = @_;

    return Encode::decode("utf8", uri_unescape($data));
}

my $input = 'à á â ã ä å';

print "Original: $input\n";
print "Decode: " . ($input = decode_text($input)) . "\n";
print "2*Decode: " . ($input = decode_text($input)) . "\n";
print "2*Decode+Encode: " . ($input = encode_text($input)) . "\n";
```

With the following output:

```
# ./decode-encode.pl
Original: à á â ã ä å
Decode:
Wide character in print at ./decode-encode.pl line 27.
2*Decode: � � � � � �
2*Decode+Encode: %EF%BF%BD %EF%BF%BD %EF%BF%BD %EF%BF%BD %EF%BF%BD %EF%BF%BD
```

which is exactly what is stored in the rules.cfg afterwards.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pve-devel] [PATCH ha-manager] rules: fix utf-8 encoding and decoding for comments field
  2025-10-30  9:56   ` Daniel Kral
@ 2025-11-11 17:10     ` Fiona Ebner
  0 siblings, 0 replies; 4+ messages in thread
From: Fiona Ebner @ 2025-11-11 17:10 UTC (permalink / raw)
  To: Daniel Kral, Proxmox VE development discussion

Am 30.10.25 um 10:55 AM schrieb Daniel Kral:
> On Wed Oct 22, 2025 at 2:58 PM CEST, Fiona Ebner wrote:
>> Am 05.09.25 um 12:17 PM schrieb Daniel Kral:
>>> As reported by a user in the community forum [0].
>>>
>>> [0] https://forum.proxmox.com/threads/169258/page-14#post-792521
>>>
>>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>>> ---
>>> Tested a few strings from an unicode example page [1], including at
>>> least a mix of ASCII, Latin-1, Cyrillic and Bengali characters.
>>>
>>> Also checked this applies both on master and on another ha-manager
>>> series [2] without any fuzz.
>>>
>>> [1] https://www.cogsci.ed.ac.uk/~richard/unicode-sample.html
>>> [2] https://lore.proxmox.com/pve-devel/20250821143705.256562-1-d.kral@proxmox.com/
>>>
>>>  src/PVE/HA/Rules.pm | 28 ++++++++++++++++++++++++----
>>>  1 file changed, 24 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/PVE/HA/Rules.pm b/src/PVE/HA/Rules.pm
>>> index 323ad038..8c60b5ce 100644
>>> --- a/src/PVE/HA/Rules.pm
>>> +++ b/src/PVE/HA/Rules.pm
>>> @@ -163,8 +163,6 @@ sub decode_value {
>>>          }
>>>  
>>>          return $res;
>>> -    } elsif ($key eq 'comment') {
>>> -        return PVE::Tools::decode_text($value);
>>>      }
>>>  
>>>      my $plugin = $class->lookup($type);
>>> @@ -198,8 +196,6 @@ sub encode_value {
>>>          PVE::HA::Tools::pve_verify_ha_resource_id($_) for keys %$value;
>>>  
>>>          return join(',', sort keys %$value);
>>> -    } elsif ($key eq 'comment') {
>>> -        return PVE::Tools::encode_text($value);
>>>      }
>>>  
>>>      my $plugin = $class->lookup($type);
>>
>> Why did the original implementation with {de,en}code_value() not work?
>>
> 
> Sorry for the late reply, I had to look into it a bit: It seems with the
> {de,en}code_text($value) in {de,en}code_value(...) from above, we decode
> the comment text twice and then encode it to store it.
> 
> I haven't found where it is done exactly/not sure about it, but it seems
> that we already unescape+utf8-decode text somewhere else for the API
> arguments (maybe AnyEvent::Http or some other handler?) before that, but
> maybe someone more knowledgeable about pve-http-server could answer
> here. The following script tries to capture what happens with the value:

Sorry for the late reply too! It's not actually decoded twice in that
sense. It's just the value we get from the API which was not encoded yet
;) The relevant call is:

PVE::SectionConfig::check_config("PVE::HA::Rules::ResourceAffinity",
"ha-rule-fa5e518c-1f55", HASH(0x57637c23c9e8), 0, 1) called at
/usr/share/perl5/PVE/API2/HA/Rules.pm line 339

The documentation for decode_value() mentions:

Called during C<L<< check_config()|/$base->check_config(...) >>> in
order to convert values
that have been read from a C<L<PVE::SectionConfig>> file which have been
I<encoded> beforehand by C<L<< encode_value()|/$base->encode_value(...) >>>.

Taking that literally, it would mean we cannot use check_config() on
passed-in-via-API values, because we don't fulfill the contract, we
don't have values that have been encoded before. Existing users of
decode_value() do not seem to run into this design issue, except your
recent one here. But of course, existing users already rely on
check_config() to be done for passed-in-via-API values. So we would need
to fix the design and the contract here. Telling check_config() whether
it's dealing with not-previously encoded values and then not decoding
sounds like an obvious approach, however, other callers do rely on
decode_value() to be also called on passed-in-by-API, not previously
encoded values, e.g. for constructing a nodes hash:

    } elsif ($key eq 'nodes') {
        my $res = {};

        foreach my $node (PVE::Tools::split_list($value)) {
            if (PVE::JSONSchema::pve_verify_node_name($node)) {
                $res->{$node} = 1;
            }
        }

        return $res;

It's just that the encoding results in the same value as
passed-in-via-API ;) And that is actually part of the contract of
decode_value() like it's currently used.

Going with your approach fixes the issue at hand, but it leaves the
design issue for the next person to run into. We should clarify the
documentation if we are happy enough with the current design, so that
people don't try to use decode_value() for a use case like yours. It
does seem like it should be a supported use case though. Maybe somebody
has ideas for a fix even? Too late for me today to come up with something :P


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-11-11 17:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-05 10:13 [pve-devel] [PATCH ha-manager] rules: fix utf-8 encoding and decoding for comments field Daniel Kral
2025-10-22 12:58 ` Fiona Ebner
2025-10-30  9:56   ` Daniel Kral
2025-11-11 17:10     ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal