From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 01CCD65B79 for ; Tue, 8 Mar 2022 19:10:51 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E58C35760 for ; Tue, 8 Mar 2022 19:10:20 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 7C9455755 for ; Tue, 8 Mar 2022 19:10:19 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 51515463DB for ; Tue, 8 Mar 2022 19:10:19 +0100 (CET) Message-ID: Date: Tue, 8 Mar 2022 19:10:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 Content-Language: en-US To: Proxmox VE development discussion , Dominik Csapak References: <20220308144145.536734-1-d.csapak@proxmox.com> From: Thomas Lamprecht In-Reply-To: <20220308144145.536734-1-d.csapak@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.191 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_NUMSUBJECT 0.5 Subject ends in numbers excluding current years NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [plugin.pm, dirplugin.pm, storage.pm] Subject: Re: [pve-devel] [RFC PATCH storage] Plugins: en/decode notes as UTF-8 X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2022 18:10:51 -0000 On 08.03.22 15:41, Dominik Csapak wrote: > When writing into the file, explicitly utf8 encode it, and then try to > utf8 decode it on read. > > If the notes are not valid utf8, we assume it was an iso-8859 comment > and return is at is was. > > Technically this is a breaking change, since there are iso-8859 comments > that would sucessfully decode as utf8, for example: s/sucessfully/successfully/ > the byte sequence "C2 A9" would be "£" in iso, but would decode to "£". > > From what i can tell though, this is rather unlikely to happen for > "real world" notes, because the first byte would be in the range of > C0-F7 (which are mostly language dependent characters like "Â") > and the following bytes would have to be in the range of > 80-BF, which are only special characters like "£" (or undefined) IMO a bit strange to trying to reason about free-form content that end user can edit is hardly going to be right, but oh well you made it sound like really being more of an edge case and I'd like to avoid versioning comment notes, so fine for me. > > Signed-off-by: Dominik Csapak > --- > we may want to have this 'try_decode_utf8' in PVE::Tools i guess? > i just put it here for the RFC, so its more easy to review meh, it's hardly any complicated logic, just calling into Encode and falling back, but yeah the version below makes it seem a bit bloated, you made a one liner expand into 14 ^^ > > PVE/Storage.pm | 17 +++++++++++++++++ > PVE/Storage/DirPlugin.pm | 9 +++++++-- > PVE/Storage/Plugin.pm | 2 +- > 3 files changed, 25 insertions(+), 3 deletions(-) > > diff --git a/PVE/Storage.pm b/PVE/Storage.pm > index b1d31bb..4335ee9 100755 > --- a/PVE/Storage.pm > +++ b/PVE/Storage.pm > @@ -14,6 +14,7 @@ use File::Path; > use Cwd 'abs_path'; > use Socket; > use Time::Local qw(timelocal); > +use Encode qw(decode); > > use PVE::Tools qw(run_command file_read_firstline dir_glob_foreach $IPV6RE); > use PVE::Cluster qw(cfs_read_file cfs_write_file cfs_lock_file); > @@ -2077,4 +2078,20 @@ sub normalize_content_filename { > return $filename; > } > > +sub try_decode_utf8 { > + my ($data) = @_; > + > + my $decoded = eval { > + decode('UTF-8', $data, 1); > + }; assignment evals should to be in a single line if text width allows it > + > + if (!defined($decoded)) { > + # we could not decode, it's probably iso-8859, > + # so return original value please stop breaking up comments always that early > + return $data; > + } > + > + return $decoded; > +} > + In general, why not just inline it? The following would be just as good as the whole 14 line method here... my $foo = eval { decode('UTF-8', $data, 1) } // $data; And if we want it centrally, then we want a set/get_notes helper somewhere around that does the note-exists check + encode stuff, but as all is very centrally for now and churn is not /that/ likely I'd slightly favoring just in-lining it.. > 1; > diff --git a/PVE/Storage/DirPlugin.pm b/PVE/Storage/DirPlugin.pm > index c60818b..bc559e6 100644 > --- a/PVE/Storage/DirPlugin.pm > +++ b/PVE/Storage/DirPlugin.pm > @@ -7,6 +7,7 @@ use Cwd; > use File::Path; > use IO::File; > use POSIX; > +use Encode qw(encode); > > use PVE::Storage::Plugin; > use PVE::JSONSchema qw(get_standard_option); > @@ -103,7 +104,10 @@ sub get_volume_notes { > my $path = $class->filesystem_path($scfg, $volname); > $path .= $class->SUPER::NOTES_EXT; > > - return PVE::Tools::file_get_contents($path) if -f $path; > + if (-f $path) { > + my $data = PVE::Tools::file_get_contents($path); > + return PVE::Storage::try_decode_utf8($data); return eval { decode('UTF-8', $data, 1) } // $data; > + } > > return ''; > } > @@ -120,7 +124,8 @@ sub update_volume_notes { > $path .= $class->SUPER::NOTES_EXT; > > if (defined($notes) && $notes ne '') { > - PVE::Tools::file_set_contents($path, $notes); > + my $encoded = encode('UTF-8', $notes); > + PVE::Tools::file_set_contents($path, $encoded); > } else { > unlink $path or $! == ENOENT or die "could not delete notes - $!\n"; > } > diff --git a/PVE/Storage/Plugin.pm b/PVE/Storage/Plugin.pm > index a6b0bdd..edec516 100644 > --- a/PVE/Storage/Plugin.pm > +++ b/PVE/Storage/Plugin.pm > @@ -1172,7 +1172,7 @@ my $get_subdir_files = sub { > my $notes_fn = $original.NOTES_EXT; > if (-f $notes_fn) { > my $notes = PVE::Tools::file_read_firstline($notes_fn); > - $info->{notes} = $notes if defined($notes); > + $info->{notes} = PVE::Storage::try_decode_utf8($notes) if defined($notes); $info->{notes} = eval { decode('UTF-8', $notes, 1) } // $notes if defined($notes) > } > > $info->{protected} = 1 if -e PVE::Storage::protection_file_path($original);