From: Dominik Csapak <d.csapak@proxmox.com>
To: Fiona Ebner <f.ebner@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations
Date: Tue, 10 Mar 2026 10:23:09 +0100 [thread overview]
Message-ID: <06e54db0-cc7a-4f11-8893-2124dfc8f786@proxmox.com> (raw)
In-Reply-To: <c4bab96c-09c7-44f4-be7f-2244a9e53838@proxmox.com>
On 3/10/26 10:18 AM, Fiona Ebner wrote:
> Am 10.03.26 um 9:39 AM schrieb Dominik Csapak:
>> On 3/9/26 4:43 PM, Fiona Ebner wrote:
>>> Configurations registered as UTF-8 will be decoded after reading to
>>> Perl's internal string format and then contain wide characters. The
>>> Digest::SHA::sha1_hex() function croaks on wide characters, so encode
>>> again before calling the function if there are wide characters.
>>
>>
>> just to clarify, it will only contain wide characters if it contains
>> code points bigger than 0xFF, but e.g. the symbol 'Ä' would be
>> codepoint U+00C4 so even on decode it's smaller than 0xFF.
>> (in utf-8 bytes it'd be 0xC3 0x84)
>>
>> it does not play a role here since we only want to be consistent within
>> the parser + api, but in some cases it can make a difference
>> e.g. when we calculate the digest on a value that is always utf8 encoded.
>>
>> i don't think this distinction warrants a new version, but
>> if there is a new version, a better wording can maybe avoid confusion
>> for a future reader of that commit message.
>
> A very good point. I wonder if we should rather extend parse_config()
> with $options too and pass along whether the file is an UTF-8 config
> file when calling the parser. What do you think?
It would be a better way to define/detect utf8 files than checking
for codepoints > 0xFF IMO. Not sure how much work it is though
to wire that up? In practice it won't make much difference, as long
as we use the same logic/bytes everywhere we check the digest.
>
>>>
>>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>>> ---
>>>
>>> Changes in v3:
>>> * use strict 'UTF-8' encoding.
>>>
>>> src/PVE/SectionConfig.pm | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/PVE/SectionConfig.pm b/src/PVE/SectionConfig.pm
>>> index 84ff81a..ed5a632 100644
>>> --- a/src/PVE/SectionConfig.pm
>>> +++ b/src/PVE/SectionConfig.pm
>>> @@ -103,6 +103,7 @@ use warnings;
>>> use Carp;
>>> use Digest::SHA;
>>> +use Encode qw(encode);
>>> use PVE::Exception qw(raise_param_exc);
>>> use PVE::JSONSchema qw(get_standard_option);
>>> @@ -1214,7 +1215,11 @@ sub parse_config {
>>> $raw = '' if !defined($raw);
>>> - my $digest = Digest::SHA::sha1_hex($raw);
>>> + my $bytes = $raw;
>>> + # Digest::SHA croaks on wide characters
>>> + $bytes = encode('UTF-8', $raw) if $raw =~ /[^\x00-\xFF]/;
>>> +
>>> + my $digest = Digest::SHA::sha1_hex($bytes);
>>> my $pri = 1;
>>>
>>
>
next prev parent reply other threads:[~2026-03-10 9:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 1/5] file: set contents: use strict UTF-8 encoding with $force_utf8 Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations Fiona Ebner
2026-03-10 8:40 ` Dominik Csapak
2026-03-10 9:18 ` Fiona Ebner
2026-03-10 9:23 ` Dominik Csapak [this message]
2026-03-09 15:42 ` [PATCH cluster v3 3/5] cfs register file: avoid implicit return Fiona Ebner
2026-03-09 15:42 ` [PATCH cluster v3 4/5] d/control: add versioned breaks for libpve-access-control Fiona Ebner
2026-03-09 15:42 ` [PATCH cluster v3 5/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
2026-03-10 10:18 ` superseded: [RFC common/cluster v3 0/5] " Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06e54db0-cc7a-4f11-8893-2124dfc8f786@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=f.ebner@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox