public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files
@ 2025-02-14 15:40 Laurențiu Leahu-Vlăducu
  2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs Laurențiu Leahu-Vlăducu
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Laurențiu Leahu-Vlăducu @ 2025-02-14 15:40 UTC (permalink / raw)
  To: pve-devel


This patch series fixes bug #3256:

1. It ensures that general config files (e.g. storage.cfg) are decoded
   from UTF-8 when deserialized. Previously, no decoding happened,
   meaning that Perl interpreted the string as single bytes instead of
   Unicode code points. Note: while I would have preferred to decode
   the text right after reading from the file, there are some Perl
   functions like Digest::SHA::sha1_hex that expect bytes
   instead of UTF-8.

2. It ensures that general config files are explicitly encoded
   as UTF-8 before serialization to prevent similar issues the other
   way around.

3. It adds a unit test to prevent similar issues from happening in
   the future.

4. It fixes the PBS storage plugin for serializing/deserializing the
   password, similar to points 1 and 2, but for the case where the
   password itself contains Unicode characters.

For more information on this topic, please read:
https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?

I'm sending this patch series to begin a discussion on how to handle
encodings in our config files, and eventually also other relevant
files. In my opinion, we should handle them consistently as UTF-8,
also over both Perl and Rust code.

Due to the fact that Linux uses UTF-8 encoding by default since
a long time, as well as browsers* and other software, I doubt that
we have to worry too much about other encodings
like Latin-1 (ISO-8859-1). However, according to the
Perl documentation, Perl could have deserialized such a string
in the past (since it's the default in Perl when not decoding
explicitly), and it is no longer able to after the fixes included
in this patch series.

We have to ask ourselves:

a. Do we want to define, in general, that configuration files should
   always be serialized and deserialized as UTF-8? If yes, should we
   consider this a breaking change?

b. Do we want to introduce any backward-compatibility for existing
   config files? In other words, assume that older files might have
   used other encodings in the past. To be honest, I didn't test
   Latin-1 encoded files yet, so I'm not sure how (or if) our
   current code would handle it.

There are further parsers and plugins that I still need to modify,
but I first wanted to get your feedback on this subject.


* With browsers I mean the encoding in HTML and not the JavaScript
internals with its UTF-16 encoding.


pve-common:

Laurențiu Leahu-Vlăducu (2):
  fix #3256: SectionConfig: ensure UTF-8 encoding for general configs
  SectionConfig: add unit test for UTF-8 configs

 src/PVE/SectionConfig.pm    | 10 +++++++---
 test/section_config_test.pl | 25 +++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 3 deletions(-)


pve-storage:

Laurențiu Leahu-Vlăducu (1):
  fix #3256: Storage: PBS: ensure passwords are saved and loaded as
    UTF-8

 src/PVE/Storage/PBSPlugin.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs
  2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
@ 2025-02-14 15:40 ` Laurențiu Leahu-Vlăducu
  2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-storage 1/1] fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8 Laurențiu Leahu-Vlăducu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Laurențiu Leahu-Vlăducu @ 2025-02-14 15:40 UTC (permalink / raw)
  To: pve-devel

Previously, no decoding happened, meaning that Perl interpreted the
string as single bytes instead of Unicode code points when reading
the config. Note: while I would have preferred to decode the text
right after reading from the file, there are some Perl functions
like Digest::SHA::sha1_hex that expect bytes instead of UTF-8.

Also, config files are now explicitly encoded as UTF-8 when writing
the config, preventing issues the other way around.

For more information, please read:
https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?

Signed-off-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
---
 src/PVE/SectionConfig.pm | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/PVE/SectionConfig.pm b/src/PVE/SectionConfig.pm
index 6a297d3..4e98c1c 100644
--- a/src/PVE/SectionConfig.pm
+++ b/src/PVE/SectionConfig.pm
@@ -98,6 +98,7 @@ use strict;
 use warnings;
 
 use Carp;
+use Encode qw(decode);
 use Digest::SHA;
 
 use PVE::Exception qw(raise_param_exc);
@@ -1091,7 +1092,7 @@ Only used for error messages and warnings, so it may also be something else.
 
 =item C<$raw>
 
-The raw content of C<$filename>.
+The raw content of C<$filename>. It is assumed to be encoded as UTF-8.
 
 =item C<$allow_unknown> (optional)
 
@@ -1185,11 +1186,12 @@ sub parse_config {
     $raw = '' if !defined($raw);
 
     my $digest = Digest::SHA::sha1_hex($raw);
+    my $utf8_text = Encode::decode('UTF-8', $raw);
 
     my $pri = 1;
 
     my $lineno = 0;
-    my @lines = split(/\n/, $raw);
+    my @lines = split(/\n/, $utf8_text);
     my $nextline = sub {
 	while (defined(my $line = shift @lines)) {
 	    $lineno++;
@@ -1430,6 +1432,8 @@ my sub format_config_line {
     $output = $class->write_config($filename, $cfg, $allow_unknown)
 
 Generates the output that should be written to the C<L<PVE::SectionConfig>> file.
+The output is encoded as bytes (encoded from UTF-8) that can be directly
+written to the config file.
 
 =over
 
@@ -1560,7 +1564,7 @@ sub write_config {
 	$out .= "$data\n";
     }
 
-    return $out;
+    return Encode::encode('UTF-8', $out);
 }
 
 sub assert_if_modified {
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [RFC PATCH pve-storage 1/1] fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8
  2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
  2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs Laurențiu Leahu-Vlăducu
@ 2025-02-14 15:40 ` Laurențiu Leahu-Vlăducu
  2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 2/2] SectionConfig: add unit test for UTF-8 configs Laurențiu Leahu-Vlăducu
  2025-02-17 10:15 ` [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Fiona Ebner
  3 siblings, 0 replies; 5+ messages in thread
From: Laurențiu Leahu-Vlăducu @ 2025-02-14 15:40 UTC (permalink / raw)
  To: pve-devel

Previously, no decoding happened, meaning that Perl interpreted the
string as single bytes instead of Unicode code points when reading
the password.

Also, passwords are now explicitly encoded as UTF-8, preventing issues
the other way around.

For more information, please read:
https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?

Signed-off-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
---
 src/PVE/Storage/PBSPlugin.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/Storage/PBSPlugin.pm b/src/PVE/Storage/PBSPlugin.pm
index 0808bcc..6d89440 100644
--- a/src/PVE/Storage/PBSPlugin.pm
+++ b/src/PVE/Storage/PBSPlugin.pm
@@ -86,7 +86,7 @@ sub pbs_set_password {
     my $pwfile = pbs_password_file_name($scfg, $storeid);
     mkdir "/etc/pve/priv/storage";
 
-    PVE::Tools::file_set_contents($pwfile, "$password\n");
+    PVE::Tools::file_set_contents($pwfile, "$password\n", undef, 1);
 }
 
 sub pbs_delete_password {
@@ -102,7 +102,7 @@ sub pbs_get_password {
 
     my $pwfile = pbs_password_file_name($scfg, $storeid);
 
-    return PVE::Tools::file_read_firstline($pwfile);
+    return Encode::decode('UTF-8', PVE::Tools::file_read_firstline($pwfile));
 }
 
 sub pbs_encryption_key_file_name {
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [RFC PATCH pve-common 2/2] SectionConfig: add unit test for UTF-8 configs
  2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
  2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs Laurențiu Leahu-Vlăducu
  2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-storage 1/1] fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8 Laurențiu Leahu-Vlăducu
@ 2025-02-14 15:40 ` Laurențiu Leahu-Vlăducu
  2025-02-17 10:15 ` [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Fiona Ebner
  3 siblings, 0 replies; 5+ messages in thread
From: Laurențiu Leahu-Vlăducu @ 2025-02-14 15:40 UTC (permalink / raw)
  To: pve-devel

The unit test should prevent the issues explained in bug #3256 from
happening in the future.

Signed-off-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
---
 test/section_config_test.pl | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/test/section_config_test.pl b/test/section_config_test.pl
index 343e4c8..5629651 100755
--- a/test/section_config_test.pl
+++ b/test/section_config_test.pl
@@ -194,6 +194,31 @@ two: t3
 	another even more text
 EOF
 
+my $unicode_data = {
+    ids => {
+	    t1 => {
+		type => 'one',
+		common => Encode::decode('UTF-8', '🍕'),
+		field1 => 3,
+		another => Encode::decode('UTF-8', '🟥🟧🟨🟩🟦🟪🟫⬛️⬜️🧮🌈🇨🇭'),
+	    },
+	},
+	order => { t1 => 1 },
+};
+
+my $unicode_text = <<"EOF";
+one: t1
+	common 🍕
+	field1 3
+	another 🟥🟧🟨🟩🟦🟪🟫⬛️⬜️🧮🌈🇨🇭
+EOF
+
+Conf->expect_success(
+    'test_unicode',
+    $unicode_data,
+    $unicode_text);
+
+
 my $with_unknown_data = {
     ids => {
 	t1 => {
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files
  2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
                   ` (2 preceding siblings ...)
  2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 2/2] SectionConfig: add unit test for UTF-8 configs Laurențiu Leahu-Vlăducu
@ 2025-02-17 10:15 ` Fiona Ebner
  3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2025-02-17 10:15 UTC (permalink / raw)
  To: Proxmox VE development discussion, Laurențiu Leahu-Vlăducu

Am 14.02.25 um 16:40 schrieb Laurențiu Leahu-Vlăducu:
> 
> This patch series fixes bug #3256:
> 
> 1. It ensures that general config files (e.g. storage.cfg) are decoded
>    from UTF-8 when deserialized. Previously, no decoding happened,
>    meaning that Perl interpreted the string as single bytes instead of
>    Unicode code points. Note: while I would have preferred to decode
>    the text right after reading from the file, there are some Perl
>    functions like Digest::SHA::sha1_hex that expect bytes
>    instead of UTF-8.


What about pre-existing configs that are not UTF-8? Not breaking those
is very important here.

> 
> 2. It ensures that general config files are explicitly encoded
>    as UTF-8 before serialization to prevent similar issues the other
>    way around.
> 
> 3. It adds a unit test to prevent similar issues from happening in
>    the future.
> 
> 4. It fixes the PBS storage plugin for serializing/deserializing the
>    password, similar to points 1 and 2, but for the case where the
>    password itself contains Unicode characters.
> 
> For more information on this topic, please read:
> https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?
> 
> I'm sending this patch series to begin a discussion on how to handle
> encodings in our config files, and eventually also other relevant
> files. In my opinion, we should handle them consistently as UTF-8,
> also over both Perl and Rust code.

Yes, that is the long-term plan AFAIK, but right now existing config
files might be encoded differently.

> 
> Due to the fact that Linux uses UTF-8 encoding by default since
> a long time, as well as browsers* and other software, I doubt that
> we have to worry too much about other encodings
> like Latin-1 (ISO-8859-1). However, according to the
> Perl documentation, Perl could have deserialized such a string
> in the past (since it's the default in Perl when not decoding
> explicitly), and it is no longer able to after the fixes included
> in this patch series.

Unfortunately, we do. E.g.

> [I] root@pve8a1 ~# pct set 112 --mp1 /root/ö,mp=/o
> [I] root@pve8a1 ~# file /etc/pve/lxc/112.conf
> /etc/pve/lxc/112.conf: ISO-8859 text

> 
> We have to ask ourselves:
> 
> a. Do we want to define, in general, that configuration files should
>    always be serialized and deserialized as UTF-8? If yes, should we
>    consider this a breaking change?

Yes, see above.

> 
> b. Do we want to introduce any backward-compatibility for existing
>    config files? In other words, assume that older files might have
>    used other encodings in the past. To be honest, I didn't test
>    Latin-1 encoded files yet, so I'm not sure how (or if) our
>    current code would handle it.

Yes, we certainly need to.

> 
> There are further parsers and plugins that I still need to modify,
> but I first wanted to get your feedback on this subject.
> 
> 
> * With browsers I mean the encoding in HTML and not the JavaScript
> internals with its UTF-16 encoding.
> 
> 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-17 10:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-storage 1/1] fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8 Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 2/2] SectionConfig: add unit test for UTF-8 configs Laurențiu Leahu-Vlăducu
2025-02-17 10:15 ` [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal