* [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file
@ 2026-03-09 15:42 Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 1/5] file: set contents: use strict UTF-8 encoding with $force_utf8 Fiona Ebner
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Fiona Ebner @ 2026-03-09 15:42 UTC (permalink / raw)
To: pve-devel
Changes in v3:
* rebase on current master.
* add patch to use strict UTF-8 encoding when writing.
* use strict UTF-8 for {de,en}coding in the rest of the series.
* let cfs_register_file() take an $options parameter rather than
an $utf8 parameter, so future call sites are more readable.
Changes in v2:
* rebase on current master.
* use qw(encode) when importing Encode module.
A configuration file registered as UTF-8 will be automatically decoded
from UTF-8 to Perl's internal string format after reading and encoded
in the other direction before writing.
Patch 1/5 also makes sense without the rest of the RFC.
Note that patch 4/5 (required by 3/5) is a versioned breaks for
libpve-access-control which still needs to be bumped! Could be avoided
by keeping the implicit return of cfs_register_file(), but that's bad
for encapsulation.
common:
Fiona Ebner (2):
file: set contents: use strict UTF-8 encoding with $force_utf8
section config: prepare for supporting UTF-8 encoded configurations
src/PVE/File.pm | 2 +-
src/PVE/SectionConfig.pm | 7 ++++++-
2 files changed, 7 insertions(+), 2 deletions(-)
cluster:
Fiona Ebner (3):
cfs register file: avoid implicit return
d/control: add versioned breaks for libpve-access-control
cluster files: support registering UTF-8 configuration file
debian/control | 2 +-
src/PVE/Cluster.pm | 12 +++++++++---
2 files changed, 10 insertions(+), 4 deletions(-)
Summary over all repositories:
4 files changed, 17 insertions(+), 6 deletions(-)
--
Generated by git-murpp 0.5.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH common v3 1/5] file: set contents: use strict UTF-8 encoding with $force_utf8
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
@ 2026-03-09 15:42 ` Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations Fiona Ebner
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Fiona Ebner @ 2026-03-09 15:42 UTC (permalink / raw)
To: pve-devel
'perldoc Encode' states:
> "UTF-8" means UTF-8 in its current sense, which is conservative and
> strict and security-conscious, whereas "utf8" means UTF-8 in its
> former sense, which was liberal and loose and lax.
Currently, the following callers use the $force_utf8 flag:
1. Notification config
2. The SDN fabrics config
3. The PBS password file
The first two pass the contents to Rust, and the latter already uses
the strict 'UTF-8' encoding when decoding. So in all cases, after this
patch, any potential issues are just caught earlier, before writing.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v3.
src/PVE/File.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/PVE/File.pm b/src/PVE/File.pm
index 7d0f7ab..89884a8 100644
--- a/src/PVE/File.pm
+++ b/src/PVE/File.pm
@@ -60,7 +60,7 @@ sub file_set_contents($filename, $data, $perm = 0644, $force_utf8 = 0) {
die "unable to open file '$tmpname' - $!\n" if !$fh;
if ($force_utf8) {
- $data = encode("utf8", $data);
+ $data = encode("UTF-8", $data);
} else {
# Encode wide characters with print before passing them to syswrite
my $unencoded_data = $data;
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 1/5] file: set contents: use strict UTF-8 encoding with $force_utf8 Fiona Ebner
@ 2026-03-09 15:42 ` Fiona Ebner
2026-03-10 8:40 ` Dominik Csapak
2026-03-09 15:42 ` [PATCH cluster v3 3/5] cfs register file: avoid implicit return Fiona Ebner
` (3 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Fiona Ebner @ 2026-03-09 15:42 UTC (permalink / raw)
To: pve-devel
Configurations registered as UTF-8 will be decoded after reading to
Perl's internal string format and then contain wide characters. The
Digest::SHA::sha1_hex() function croaks on wide characters, so encode
again before calling the function if there are wide characters.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* use strict 'UTF-8' encoding.
src/PVE/SectionConfig.pm | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/PVE/SectionConfig.pm b/src/PVE/SectionConfig.pm
index 84ff81a..ed5a632 100644
--- a/src/PVE/SectionConfig.pm
+++ b/src/PVE/SectionConfig.pm
@@ -103,6 +103,7 @@ use warnings;
use Carp;
use Digest::SHA;
+use Encode qw(encode);
use PVE::Exception qw(raise_param_exc);
use PVE::JSONSchema qw(get_standard_option);
@@ -1214,7 +1215,11 @@ sub parse_config {
$raw = '' if !defined($raw);
- my $digest = Digest::SHA::sha1_hex($raw);
+ my $bytes = $raw;
+ # Digest::SHA croaks on wide characters
+ $bytes = encode('UTF-8', $raw) if $raw =~ /[^\x00-\xFF]/;
+
+ my $digest = Digest::SHA::sha1_hex($bytes);
my $pri = 1;
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH cluster v3 3/5] cfs register file: avoid implicit return
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 1/5] file: set contents: use strict UTF-8 encoding with $force_utf8 Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations Fiona Ebner
@ 2026-03-09 15:42 ` Fiona Ebner
2026-03-09 15:42 ` [PATCH cluster v3 4/5] d/control: add versioned breaks for libpve-access-control Fiona Ebner
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Fiona Ebner @ 2026-03-09 15:42 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Requires the versioned breaks done in the next patch.
New in v3.
src/PVE/Cluster.pm | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index bdb465f..cd5d6b5 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -529,6 +529,8 @@ sub cfs_register_file {
parser => $parser,
writer => $writer,
};
+
+ return;
}
my $ccache_read = sub {
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH cluster v3 4/5] d/control: add versioned breaks for libpve-access-control
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
` (2 preceding siblings ...)
2026-03-09 15:42 ` [PATCH cluster v3 3/5] cfs register file: avoid implicit return Fiona Ebner
@ 2026-03-09 15:42 ` Fiona Ebner
2026-03-09 15:42 ` [PATCH cluster v3 5/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
2026-03-10 10:18 ` superseded: [RFC common/cluster v3 0/5] " Fiona Ebner
5 siblings, 0 replies; 10+ messages in thread
From: Fiona Ebner @ 2026-03-09 15:42 UTC (permalink / raw)
To: pve-devel
Before commit a484970 ("token config: have module return a true
value"), libpve-access-control implicitly relied on the implicit
return value from cfs_register_file() as the return value of the
module.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Note that libpve-access-control needs to be bumped first!
New in v3.
debian/control | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/debian/control b/debian/control
index 594d29c..4265c93 100644
--- a/debian/control
+++ b/debian/control
@@ -40,7 +40,7 @@ Depends: corosync (>= 2.3.4-1),
${misc:Depends},
${perl:Depends},
${shlibs:Depends},
-Breaks: libpve-access-control (<= 6.0-3),
+Breaks: libpve-access-control (<= 9.0.5),
libpve-guest-common-perl (<= 3.0-2),
libpve-storage-perl (<= 6.0-9),
pve-container (<= 3.0-10),
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH cluster v3 5/5] cluster files: support registering UTF-8 configuration file
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
` (3 preceding siblings ...)
2026-03-09 15:42 ` [PATCH cluster v3 4/5] d/control: add versioned breaks for libpve-access-control Fiona Ebner
@ 2026-03-09 15:42 ` Fiona Ebner
2026-03-10 10:18 ` superseded: [RFC common/cluster v3 0/5] " Fiona Ebner
5 siblings, 0 replies; 10+ messages in thread
From: Fiona Ebner @ 2026-03-09 15:42 UTC (permalink / raw)
To: pve-devel
A configuration file registered as UTF-8 will be automatically decoded
from UTF-8 to Perl's internal string format after reading and encoded
in the other direction before writing.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes in v3:
* use strict 'UTF-8' encoding.
* let cfs_register_file() take an $options parameter rather than
an $utf8 parameter, so future call sites are more readable.
src/PVE/Cluster.pm | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index cd5d6b5..8cde9e5 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -519,7 +519,7 @@ sub verify_token {
my $file_info = {};
sub cfs_register_file {
- my ($filename, $parser, $writer) = @_;
+ my ($filename, $parser, $writer, $options) = @_;
$observed->{$filename} || die "unknown file '$filename'";
@@ -529,12 +529,13 @@ sub cfs_register_file {
parser => $parser,
writer => $writer,
};
+ $file_info->{$filename}->{utf8} = 1 if $options && $options->{utf8};
return;
}
my $ccache_read = sub {
- my ($filename, $parser, $version) = @_;
+ my ($filename, $parser, $version, $utf8) = @_;
$ccache->{$filename} = {} if !$ccache->{$filename};
@@ -544,6 +545,7 @@ my $ccache_read = sub {
# we always call the parser, even when the file does not exist
# (in that case $data is undef)
my $data = get_config($filename);
+ $data = decode('UTF-8', $data) if $utf8;
$ci->{data} = &$parser("/etc/pve/$filename", $data);
$ci->{version} = $version;
}
@@ -581,7 +583,7 @@ sub cfs_read_file {
my ($version, $info) = cfs_file_version($filename);
my $parser = $info->{parser};
- return &$ccache_read($filename, $parser, $version);
+ return &$ccache_read($filename, $parser, $version, $info->{utf8});
}
sub cfs_write_file {
@@ -599,6 +601,8 @@ sub cfs_write_file {
$ci->{version} = undef;
}
+ $force_utf8 = 1 if $info->{utf8};
+
PVE::Tools::file_set_contents($fsname, $raw, undef, $force_utf8);
}
--
2.47.3
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations
2026-03-09 15:42 ` [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations Fiona Ebner
@ 2026-03-10 8:40 ` Dominik Csapak
2026-03-10 9:18 ` Fiona Ebner
0 siblings, 1 reply; 10+ messages in thread
From: Dominik Csapak @ 2026-03-10 8:40 UTC (permalink / raw)
To: Fiona Ebner, pve-devel
On 3/9/26 4:43 PM, Fiona Ebner wrote:
> Configurations registered as UTF-8 will be decoded after reading to
> Perl's internal string format and then contain wide characters. The
> Digest::SHA::sha1_hex() function croaks on wide characters, so encode
> again before calling the function if there are wide characters.
just to clarify, it will only contain wide characters if it contains
code points bigger than 0xFF, but e.g. the symbol 'Ä' would be
codepoint U+00C4 so even on decode it's smaller than 0xFF.
(in utf-8 bytes it'd be 0xC3 0x84)
it does not play a role here since we only want to be consistent within
the parser + api, but in some cases it can make a difference
e.g. when we calculate the digest on a value that is always utf8 encoded.
i don't think this distinction warrants a new version, but
if there is a new version, a better wording can maybe avoid confusion
for a future reader of that commit message.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> Changes in v3:
> * use strict 'UTF-8' encoding.
>
> src/PVE/SectionConfig.pm | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/src/PVE/SectionConfig.pm b/src/PVE/SectionConfig.pm
> index 84ff81a..ed5a632 100644
> --- a/src/PVE/SectionConfig.pm
> +++ b/src/PVE/SectionConfig.pm
> @@ -103,6 +103,7 @@ use warnings;
>
> use Carp;
> use Digest::SHA;
> +use Encode qw(encode);
>
> use PVE::Exception qw(raise_param_exc);
> use PVE::JSONSchema qw(get_standard_option);
> @@ -1214,7 +1215,11 @@ sub parse_config {
>
> $raw = '' if !defined($raw);
>
> - my $digest = Digest::SHA::sha1_hex($raw);
> + my $bytes = $raw;
> + # Digest::SHA croaks on wide characters
> + $bytes = encode('UTF-8', $raw) if $raw =~ /[^\x00-\xFF]/;
> +
> + my $digest = Digest::SHA::sha1_hex($bytes);
>
> my $pri = 1;
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations
2026-03-10 8:40 ` Dominik Csapak
@ 2026-03-10 9:18 ` Fiona Ebner
2026-03-10 9:23 ` Dominik Csapak
0 siblings, 1 reply; 10+ messages in thread
From: Fiona Ebner @ 2026-03-10 9:18 UTC (permalink / raw)
To: Dominik Csapak, pve-devel
Am 10.03.26 um 9:39 AM schrieb Dominik Csapak:
> On 3/9/26 4:43 PM, Fiona Ebner wrote:
>> Configurations registered as UTF-8 will be decoded after reading to
>> Perl's internal string format and then contain wide characters. The
>> Digest::SHA::sha1_hex() function croaks on wide characters, so encode
>> again before calling the function if there are wide characters.
>
>
> just to clarify, it will only contain wide characters if it contains
> code points bigger than 0xFF, but e.g. the symbol 'Ä' would be
> codepoint U+00C4 so even on decode it's smaller than 0xFF.
> (in utf-8 bytes it'd be 0xC3 0x84)
>
> it does not play a role here since we only want to be consistent within
> the parser + api, but in some cases it can make a difference
> e.g. when we calculate the digest on a value that is always utf8 encoded.
>
> i don't think this distinction warrants a new version, but
> if there is a new version, a better wording can maybe avoid confusion
> for a future reader of that commit message.
A very good point. I wonder if we should rather extend parse_config()
with $options too and pass along whether the file is an UTF-8 config
file when calling the parser. What do you think?
>>
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>> ---
>>
>> Changes in v3:
>> * use strict 'UTF-8' encoding.
>>
>> src/PVE/SectionConfig.pm | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/PVE/SectionConfig.pm b/src/PVE/SectionConfig.pm
>> index 84ff81a..ed5a632 100644
>> --- a/src/PVE/SectionConfig.pm
>> +++ b/src/PVE/SectionConfig.pm
>> @@ -103,6 +103,7 @@ use warnings;
>> use Carp;
>> use Digest::SHA;
>> +use Encode qw(encode);
>> use PVE::Exception qw(raise_param_exc);
>> use PVE::JSONSchema qw(get_standard_option);
>> @@ -1214,7 +1215,11 @@ sub parse_config {
>> $raw = '' if !defined($raw);
>> - my $digest = Digest::SHA::sha1_hex($raw);
>> + my $bytes = $raw;
>> + # Digest::SHA croaks on wide characters
>> + $bytes = encode('UTF-8', $raw) if $raw =~ /[^\x00-\xFF]/;
>> +
>> + my $digest = Digest::SHA::sha1_hex($bytes);
>> my $pri = 1;
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations
2026-03-10 9:18 ` Fiona Ebner
@ 2026-03-10 9:23 ` Dominik Csapak
0 siblings, 0 replies; 10+ messages in thread
From: Dominik Csapak @ 2026-03-10 9:23 UTC (permalink / raw)
To: Fiona Ebner, pve-devel
On 3/10/26 10:18 AM, Fiona Ebner wrote:
> Am 10.03.26 um 9:39 AM schrieb Dominik Csapak:
>> On 3/9/26 4:43 PM, Fiona Ebner wrote:
>>> Configurations registered as UTF-8 will be decoded after reading to
>>> Perl's internal string format and then contain wide characters. The
>>> Digest::SHA::sha1_hex() function croaks on wide characters, so encode
>>> again before calling the function if there are wide characters.
>>
>>
>> just to clarify, it will only contain wide characters if it contains
>> code points bigger than 0xFF, but e.g. the symbol 'Ä' would be
>> codepoint U+00C4 so even on decode it's smaller than 0xFF.
>> (in utf-8 bytes it'd be 0xC3 0x84)
>>
>> it does not play a role here since we only want to be consistent within
>> the parser + api, but in some cases it can make a difference
>> e.g. when we calculate the digest on a value that is always utf8 encoded.
>>
>> i don't think this distinction warrants a new version, but
>> if there is a new version, a better wording can maybe avoid confusion
>> for a future reader of that commit message.
>
> A very good point. I wonder if we should rather extend parse_config()
> with $options too and pass along whether the file is an UTF-8 config
> file when calling the parser. What do you think?
It would be a better way to define/detect utf8 files than checking
for codepoints > 0xFF IMO. Not sure how much work it is though
to wire that up? In practice it won't make much difference, as long
as we use the same logic/bytes everywhere we check the digest.
>
>>>
>>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>>> ---
>>>
>>> Changes in v3:
>>> * use strict 'UTF-8' encoding.
>>>
>>> src/PVE/SectionConfig.pm | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/PVE/SectionConfig.pm b/src/PVE/SectionConfig.pm
>>> index 84ff81a..ed5a632 100644
>>> --- a/src/PVE/SectionConfig.pm
>>> +++ b/src/PVE/SectionConfig.pm
>>> @@ -103,6 +103,7 @@ use warnings;
>>> use Carp;
>>> use Digest::SHA;
>>> +use Encode qw(encode);
>>> use PVE::Exception qw(raise_param_exc);
>>> use PVE::JSONSchema qw(get_standard_option);
>>> @@ -1214,7 +1215,11 @@ sub parse_config {
>>> $raw = '' if !defined($raw);
>>> - my $digest = Digest::SHA::sha1_hex($raw);
>>> + my $bytes = $raw;
>>> + # Digest::SHA croaks on wide characters
>>> + $bytes = encode('UTF-8', $raw) if $raw =~ /[^\x00-\xFF]/;
>>> +
>>> + my $digest = Digest::SHA::sha1_hex($bytes);
>>> my $pri = 1;
>>>
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* superseded: [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
` (4 preceding siblings ...)
2026-03-09 15:42 ` [PATCH cluster v3 5/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
@ 2026-03-10 10:18 ` Fiona Ebner
5 siblings, 0 replies; 10+ messages in thread
From: Fiona Ebner @ 2026-03-10 10:18 UTC (permalink / raw)
To: pve-devel
superseded-by:
https://lore.proxmox.com/pve-devel/20260310101631.32870-1-f.ebner@proxmox.com/T/
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-10 10:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-09 15:42 [RFC common/cluster v3 0/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 1/5] file: set contents: use strict UTF-8 encoding with $force_utf8 Fiona Ebner
2026-03-09 15:42 ` [PATCH common v3 2/5] section config: prepare for supporting UTF-8 encoded configurations Fiona Ebner
2026-03-10 8:40 ` Dominik Csapak
2026-03-10 9:18 ` Fiona Ebner
2026-03-10 9:23 ` Dominik Csapak
2026-03-09 15:42 ` [PATCH cluster v3 3/5] cfs register file: avoid implicit return Fiona Ebner
2026-03-09 15:42 ` [PATCH cluster v3 4/5] d/control: add versioned breaks for libpve-access-control Fiona Ebner
2026-03-09 15:42 ` [PATCH cluster v3 5/5] cluster files: support registering UTF-8 configuration file Fiona Ebner
2026-03-10 10:18 ` superseded: [RFC common/cluster v3 0/5] " Fiona Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox