* [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs
2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
@ 2025-02-14 15:40 ` Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-storage 1/1] fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8 Laurențiu Leahu-Vlăducu
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Laurențiu Leahu-Vlăducu @ 2025-02-14 15:40 UTC (permalink / raw)
To: pve-devel
Previously, no decoding happened, meaning that Perl interpreted the
string as single bytes instead of Unicode code points when reading
the config. Note: while I would have preferred to decode the text
right after reading from the file, there are some Perl functions
like Digest::SHA::sha1_hex that expect bytes instead of UTF-8.
Also, config files are now explicitly encoded as UTF-8 when writing
the config, preventing issues the other way around.
For more information, please read:
https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?
Signed-off-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
---
src/PVE/SectionConfig.pm | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/PVE/SectionConfig.pm b/src/PVE/SectionConfig.pm
index 6a297d3..4e98c1c 100644
--- a/src/PVE/SectionConfig.pm
+++ b/src/PVE/SectionConfig.pm
@@ -98,6 +98,7 @@ use strict;
use warnings;
use Carp;
+use Encode qw(decode);
use Digest::SHA;
use PVE::Exception qw(raise_param_exc);
@@ -1091,7 +1092,7 @@ Only used for error messages and warnings, so it may also be something else.
=item C<$raw>
-The raw content of C<$filename>.
+The raw content of C<$filename>. It is assumed to be encoded as UTF-8.
=item C<$allow_unknown> (optional)
@@ -1185,11 +1186,12 @@ sub parse_config {
$raw = '' if !defined($raw);
my $digest = Digest::SHA::sha1_hex($raw);
+ my $utf8_text = Encode::decode('UTF-8', $raw);
my $pri = 1;
my $lineno = 0;
- my @lines = split(/\n/, $raw);
+ my @lines = split(/\n/, $utf8_text);
my $nextline = sub {
while (defined(my $line = shift @lines)) {
$lineno++;
@@ -1430,6 +1432,8 @@ my sub format_config_line {
$output = $class->write_config($filename, $cfg, $allow_unknown)
Generates the output that should be written to the C<L<PVE::SectionConfig>> file.
+The output is encoded as bytes (encoded from UTF-8) that can be directly
+written to the config file.
=over
@@ -1560,7 +1564,7 @@ sub write_config {
$out .= "$data\n";
}
- return $out;
+ return Encode::encode('UTF-8', $out);
}
sub assert_if_modified {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [RFC PATCH pve-storage 1/1] fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8
2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs Laurențiu Leahu-Vlăducu
@ 2025-02-14 15:40 ` Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 2/2] SectionConfig: add unit test for UTF-8 configs Laurențiu Leahu-Vlăducu
2025-02-17 10:15 ` [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Fiona Ebner
3 siblings, 0 replies; 5+ messages in thread
From: Laurențiu Leahu-Vlăducu @ 2025-02-14 15:40 UTC (permalink / raw)
To: pve-devel
Previously, no decoding happened, meaning that Perl interpreted the
string as single bytes instead of Unicode code points when reading
the password.
Also, passwords are now explicitly encoded as UTF-8, preventing issues
the other way around.
For more information, please read:
https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?
Signed-off-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
---
src/PVE/Storage/PBSPlugin.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/PVE/Storage/PBSPlugin.pm b/src/PVE/Storage/PBSPlugin.pm
index 0808bcc..6d89440 100644
--- a/src/PVE/Storage/PBSPlugin.pm
+++ b/src/PVE/Storage/PBSPlugin.pm
@@ -86,7 +86,7 @@ sub pbs_set_password {
my $pwfile = pbs_password_file_name($scfg, $storeid);
mkdir "/etc/pve/priv/storage";
- PVE::Tools::file_set_contents($pwfile, "$password\n");
+ PVE::Tools::file_set_contents($pwfile, "$password\n", undef, 1);
}
sub pbs_delete_password {
@@ -102,7 +102,7 @@ sub pbs_get_password {
my $pwfile = pbs_password_file_name($scfg, $storeid);
- return PVE::Tools::file_read_firstline($pwfile);
+ return Encode::decode('UTF-8', PVE::Tools::file_read_firstline($pwfile));
}
sub pbs_encryption_key_file_name {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pve-devel] [RFC PATCH pve-common 2/2] SectionConfig: add unit test for UTF-8 configs
2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 1/2] fix #3256: SectionConfig: ensure UTF-8 encoding for general configs Laurențiu Leahu-Vlăducu
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-storage 1/1] fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8 Laurențiu Leahu-Vlăducu
@ 2025-02-14 15:40 ` Laurențiu Leahu-Vlăducu
2025-02-17 10:15 ` [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Fiona Ebner
3 siblings, 0 replies; 5+ messages in thread
From: Laurențiu Leahu-Vlăducu @ 2025-02-14 15:40 UTC (permalink / raw)
To: pve-devel
The unit test should prevent the issues explained in bug #3256 from
happening in the future.
Signed-off-by: Laurențiu Leahu-Vlăducu <l.leahu-vladucu@proxmox.com>
---
test/section_config_test.pl | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/test/section_config_test.pl b/test/section_config_test.pl
index 343e4c8..5629651 100755
--- a/test/section_config_test.pl
+++ b/test/section_config_test.pl
@@ -194,6 +194,31 @@ two: t3
another even more text
EOF
+my $unicode_data = {
+ ids => {
+ t1 => {
+ type => 'one',
+ common => Encode::decode('UTF-8', '🍕'),
+ field1 => 3,
+ another => Encode::decode('UTF-8', '🟥🟧🟨🟩🟦🟪🟫⬛️⬜️🧮🌈🇨🇭'),
+ },
+ },
+ order => { t1 => 1 },
+};
+
+my $unicode_text = <<"EOF";
+one: t1
+ common 🍕
+ field1 3
+ another 🟥🟧🟨🟩🟦🟪🟫⬛️⬜️🧮🌈🇨🇭
+EOF
+
+Conf->expect_success(
+ 'test_unicode',
+ $unicode_data,
+ $unicode_text);
+
+
my $with_unknown_data = {
ids => {
t1 => {
--
2.39.5
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files
2025-02-14 15:40 [pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files Laurențiu Leahu-Vlăducu
` (2 preceding siblings ...)
2025-02-14 15:40 ` [pve-devel] [RFC PATCH pve-common 2/2] SectionConfig: add unit test for UTF-8 configs Laurențiu Leahu-Vlăducu
@ 2025-02-17 10:15 ` Fiona Ebner
3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2025-02-17 10:15 UTC (permalink / raw)
To: Proxmox VE development discussion, Laurențiu Leahu-Vlăducu
Am 14.02.25 um 16:40 schrieb Laurențiu Leahu-Vlăducu:
>
> This patch series fixes bug #3256:
>
> 1. It ensures that general config files (e.g. storage.cfg) are decoded
> from UTF-8 when deserialized. Previously, no decoding happened,
> meaning that Perl interpreted the string as single bytes instead of
> Unicode code points. Note: while I would have preferred to decode
> the text right after reading from the file, there are some Perl
> functions like Digest::SHA::sha1_hex that expect bytes
> instead of UTF-8.
What about pre-existing configs that are not UTF-8? Not breaking those
is very important here.
>
> 2. It ensures that general config files are explicitly encoded
> as UTF-8 before serialization to prevent similar issues the other
> way around.
>
> 3. It adds a unit test to prevent similar issues from happening in
> the future.
>
> 4. It fixes the PBS storage plugin for serializing/deserializing the
> password, similar to points 1 and 2, but for the case where the
> password itself contains Unicode characters.
>
> For more information on this topic, please read:
> https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?
>
> I'm sending this patch series to begin a discussion on how to handle
> encodings in our config files, and eventually also other relevant
> files. In my opinion, we should handle them consistently as UTF-8,
> also over both Perl and Rust code.
Yes, that is the long-term plan AFAIK, but right now existing config
files might be encoded differently.
>
> Due to the fact that Linux uses UTF-8 encoding by default since
> a long time, as well as browsers* and other software, I doubt that
> we have to worry too much about other encodings
> like Latin-1 (ISO-8859-1). However, according to the
> Perl documentation, Perl could have deserialized such a string
> in the past (since it's the default in Perl when not decoding
> explicitly), and it is no longer able to after the fixes included
> in this patch series.
Unfortunately, we do. E.g.
> [I] root@pve8a1 ~# pct set 112 --mp1 /root/ö,mp=/o
> [I] root@pve8a1 ~# file /etc/pve/lxc/112.conf
> /etc/pve/lxc/112.conf: ISO-8859 text
>
> We have to ask ourselves:
>
> a. Do we want to define, in general, that configuration files should
> always be serialized and deserialized as UTF-8? If yes, should we
> consider this a breaking change?
Yes, see above.
>
> b. Do we want to introduce any backward-compatibility for existing
> config files? In other words, assume that older files might have
> used other encodings in the past. To be honest, I didn't test
> Latin-1 encoded files yet, so I'm not sure how (or if) our
> current code would handle it.
Yes, we certainly need to.
>
> There are further parsers and plugins that I still need to modify,
> but I first wanted to get your feedback on this subject.
>
>
> * With browsers I mean the encoding in HTML and not the JavaScript
> internals with its UTF-16 encoding.
>
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [flat|nested] 5+ messages in thread