From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 7A18965A4F for ; Tue, 8 Mar 2022 15:42:17 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 64DB64351 for ; Tue, 8 Mar 2022 15:41:47 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id BA0064348 for ; Tue, 8 Mar 2022 15:41:46 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8A085463C0 for ; Tue, 8 Mar 2022 15:41:46 +0100 (CET) From: Dominik Csapak To: pve-devel@lists.proxmox.com Date: Tue, 8 Mar 2022 15:41:45 +0100 Message-Id: <20220308144145.536734-1-d.csapak@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.097 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_NUMSUBJECT 0.5 Subject ends in numbers excluding current years SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [plugin.pm, dirplugin.pm, storage.pm] Subject: [pve-devel] [RFC PATCH storage] Plugins: en/decode notes as UTF-8 X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2022 14:42:17 -0000 When writing into the file, explicitly utf8 encode it, and then try to utf8 decode it on read. If the notes are not valid utf8, we assume it was an iso-8859 comment and return is at is was. Technically this is a breaking change, since there are iso-8859 comments that would sucessfully decode as utf8, for example: the byte sequence "C2 A9" would be "£" in iso, but would decode to "£". >From what i can tell though, this is rather unlikely to happen for "real world" notes, because the first byte would be in the range of C0-F7 (which are mostly language dependent characters like "Â") and the following bytes would have to be in the range of 80-BF, which are only special characters like "£" (or undefined) Signed-off-by: Dominik Csapak --- we may want to have this 'try_decode_utf8' in PVE::Tools i guess? i just put it here for the RFC, so its more easy to review PVE/Storage.pm | 17 +++++++++++++++++ PVE/Storage/DirPlugin.pm | 9 +++++++-- PVE/Storage/Plugin.pm | 2 +- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/PVE/Storage.pm b/PVE/Storage.pm index b1d31bb..4335ee9 100755 --- a/PVE/Storage.pm +++ b/PVE/Storage.pm @@ -14,6 +14,7 @@ use File::Path; use Cwd 'abs_path'; use Socket; use Time::Local qw(timelocal); +use Encode qw(decode); use PVE::Tools qw(run_command file_read_firstline dir_glob_foreach $IPV6RE); use PVE::Cluster qw(cfs_read_file cfs_write_file cfs_lock_file); @@ -2077,4 +2078,20 @@ sub normalize_content_filename { return $filename; } +sub try_decode_utf8 { + my ($data) = @_; + + my $decoded = eval { + decode('UTF-8', $data, 1); + }; + + if (!defined($decoded)) { + # we could not decode, it's probably iso-8859, + # so return original value + return $data; + } + + return $decoded; +} + 1; diff --git a/PVE/Storage/DirPlugin.pm b/PVE/Storage/DirPlugin.pm index c60818b..bc559e6 100644 --- a/PVE/Storage/DirPlugin.pm +++ b/PVE/Storage/DirPlugin.pm @@ -7,6 +7,7 @@ use Cwd; use File::Path; use IO::File; use POSIX; +use Encode qw(encode); use PVE::Storage::Plugin; use PVE::JSONSchema qw(get_standard_option); @@ -103,7 +104,10 @@ sub get_volume_notes { my $path = $class->filesystem_path($scfg, $volname); $path .= $class->SUPER::NOTES_EXT; - return PVE::Tools::file_get_contents($path) if -f $path; + if (-f $path) { + my $data = PVE::Tools::file_get_contents($path); + return PVE::Storage::try_decode_utf8($data); + } return ''; } @@ -120,7 +124,8 @@ sub update_volume_notes { $path .= $class->SUPER::NOTES_EXT; if (defined($notes) && $notes ne '') { - PVE::Tools::file_set_contents($path, $notes); + my $encoded = encode('UTF-8', $notes); + PVE::Tools::file_set_contents($path, $encoded); } else { unlink $path or $! == ENOENT or die "could not delete notes - $!\n"; } diff --git a/PVE/Storage/Plugin.pm b/PVE/Storage/Plugin.pm index a6b0bdd..edec516 100644 --- a/PVE/Storage/Plugin.pm +++ b/PVE/Storage/Plugin.pm @@ -1172,7 +1172,7 @@ my $get_subdir_files = sub { my $notes_fn = $original.NOTES_EXT; if (-f $notes_fn) { my $notes = PVE::Tools::file_read_firstline($notes_fn); - $info->{notes} = $notes if defined($notes); + $info->{notes} = PVE::Storage::try_decode_utf8($notes) if defined($notes); } $info->{protected} = 1 if -e PVE::Storage::protection_file_path($original); -- 2.30.2