From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 9C5EA1FF146 for ; Tue, 28 Apr 2026 17:20:45 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 76DB518767; Tue, 28 Apr 2026 17:20:45 +0200 (CEST) Date: Tue, 28 Apr 2026 17:20:37 +0200 From: Wolfgang Bumiller To: "Max R. Carrara" Subject: Re: [PATCH pve-storage v1 22/54] tree-wide: introduce parsing module and replace usages of ISO_EXT_RE_0 Message-ID: References: <20260422111322.257380-1-m.carrara@proxmox.com> <20260422111322.257380-23-m.carrara@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260422111322.257380-23-m.carrara@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1777389541436 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.087 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [storage.pm,status.pm,parse.pm,common.pm,plugin.pm] Message-ID-Hash: CUPUFKY2WTQ7PHFGTNNXTTPN5LVT5ENT X-Message-ID-Hash: CUPUFKY2WTQ7PHFGTNNXTTPN5LVT5ENT X-MailFrom: w.bumiller@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: pve-devel@lists.proxmox.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Wed, Apr 22, 2026 at 01:12:48PM +0200, Max R. Carrara wrote: > Introduce the `PVE::Storage::Common::Parse` module with the following > subroutines: > > * `parse_path_as_volname_parts($path, $vtype)` > > Parses a given file path into several smaller parts that constitute > a volume name for the given volume type. > > * `parse_path_as_volname($path, $vtype)` > > Parses a given file path directly into a volume name. > > * `parse_volname_as_parts($volname)` > > Parses an existing volume name into its constituent parts. > > * `parse_path_as_volid_parts($storeid, $scfg, $path, $vtype)` > > Parses a given file path into several smaller parts that constitute > a volume ID for the given storage ID, its configuration and volume > type. > > * `parse_path_as_volid($storeid, $scfg, $path, $vtype)` > > Like `parse_path_as_volid_parts()`, but parses the given path > directly into a volume ID. > > * `parse_volid_as_parts($volid)` > > Parses an existing volume ID into its storage ID and volume name > parts. > > As of this commit, these parsing functions only support the 'iso' > volume type, with the exception of `parse_volid_as_parts()`, which > does not depend on knowing the volume type. > > Using the newly introduced parsing helpers, replace all occurrences of > the `PVE::Storage::ISO_EXT_RE_0` regex across the repository. > > Keep the `ISO_EXT_RE_0` regex around for now, since its visibility is > public, which also means that it is part of the storage API (whether > that was intended or not). On an APIVER + APIAGE bump, this regex > should be marked for removal. > > Additional Notes Regarding the new Parsers > ========================================== > > Support for other volume types will be added individually in future > commits. > > The new *private* regex used for 'iso' vtype parsing is still matching > file extensions case-insensitively, but also matches the entire path > and file name portions of 'iso' file paths and volume names. > > These parts are extracted using named regex groups, as that is much > easier to handle and keep track of mentally, even with smaller > regexes. These named groups are the "constituent parts" that are > returned by some of the new parser subroutines. > > However, one important difference here is that named regex groups do > not support dashes `-` in their names, only underscores `_`. To keep > things consistent with our style (using dashes instead of underscores > in hash keys and the API), these named groups are formatted before > being returned—underscores are simply substituted with dashes. > > Finally, all parser subroutines check whether a parent directory > reference (`..` or "double dots") is contained in the passed or > extracted file path, and return early if there is. This is an > additional safety measure that is intentionally introduced in this > commit to guard against any mishaps in the future as the parsers gain > more functionality, such as supporting nested directories inside > different volume types' subdirectories. > > Signed-off-by: Max R. Carrara > --- > src/PVE/API2/Storage/Status.pm | 9 +- > src/PVE/Storage.pm | 7 +- > src/PVE/Storage/Common.pm | 2 + > src/PVE/Storage/Common/Makefile | 1 + > src/PVE/Storage/Common/Parse.pm | 330 ++++++++++++++++++++++++++++++++ > src/PVE/Storage/Plugin.pm | 19 +- > 6 files changed, 358 insertions(+), 10 deletions(-) > create mode 100644 src/PVE/Storage/Common/Parse.pm > > diff --git a/src/PVE/API2/Storage/Status.pm b/src/PVE/API2/Storage/Status.pm > index 2b0b121a..cf60c148 100644 > --- a/src/PVE/API2/Storage/Status.pm > +++ b/src/PVE/API2/Storage/Status.pm > @@ -23,6 +23,9 @@ use PVE::Storage; > use PVE::Storage::Common qw( > plugin_get_vtype_subdir > ); > +use PVE::Storage::Common::Parse qw( > + parse_path_as_volname_parts > +); > > use base qw(PVE::RESTHandler); > > @@ -71,11 +74,13 @@ my sub parse_transferred_file_path_extension : prototype($$) { > my ($path, $vtype) = @_; > > if ($vtype eq 'iso') { > - if ($path !~ m![^/]+$PVE::Storage::ISO_EXT_RE_0$!) { > + my $parts = parse_path_as_volname_parts($path, $vtype); > + > + if (!defined($parts)) { > raise_param_exc({ filename => "wrong file extension" }); > } > > - my $ext = $1; > + my $ext = $parts->{ext}; > return $ext; > } > > diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm > index 1b0569dd..d7d683ad 100755 > --- a/src/PVE/Storage.pm > +++ b/src/PVE/Storage.pm > @@ -26,6 +26,9 @@ use PVE::Storage::Common qw( > plugin_get_default_vtype_subdirs > plugin_get_vtype_subdir > ); > +use PVE::Storage::Common::Parse qw( > + parse_path_as_volid > +); > use PVE::RESTEnvironment qw(log_warn); > > use PVE::Storage::Plugin; > @@ -754,9 +757,7 @@ sub path_to_volume_id { > } > > if ($vtype eq 'iso') { > - return if $filename !~ m!/([^/]+$ISO_EXT_RE_0)$!; > - my $name = $1; > - return "$sid:iso/$name"; > + return parse_path_as_volid($sid, $scfg, $path, $vtype); > } > > if ($vtype eq 'vztmpl') { > diff --git a/src/PVE/Storage/Common.pm b/src/PVE/Storage/Common.pm > index abccf52b..49defb98 100644 > --- a/src/PVE/Storage/Common.pm > +++ b/src/PVE/Storage/Common.pm > @@ -55,6 +55,8 @@ be grouped in a submodule can also be found here. > > =over > > +=item * C> > + > =back > > =head1 STANDARD OPTIONS FOR JSON SCHEMA > diff --git a/src/PVE/Storage/Common/Makefile b/src/PVE/Storage/Common/Makefile > index 0c4bba5b..0d9b1be1 100644 > --- a/src/PVE/Storage/Common/Makefile > +++ b/src/PVE/Storage/Common/Makefile > @@ -1,4 +1,5 @@ > SOURCES = \ > + Parse.pm \ > > > .PHONY: install > diff --git a/src/PVE/Storage/Common/Parse.pm b/src/PVE/Storage/Common/Parse.pm > new file mode 100644 > index 00000000..bcc6f9fc > --- /dev/null > +++ b/src/PVE/Storage/Common/Parse.pm > @@ -0,0 +1,330 @@ > +package PVE::Storage::Common::Parse; > + > +use v5.36; > + > +use PVE::Storage::Common qw( > + plugin_get_vtype_subdir > +); > + > +use Exporter qw(import); > + > +our @EXPORT_OK = qw( > + parse_path_as_volname_parts > + parse_path_as_volname > + parse_volname_as_parts > + > + parse_path_as_volid_parts > + parse_path_as_volid > + parse_volid_as_parts > +); > + > +=head1 NAME > + > +C - Storage-related Parsing Functions > + > +=head1 DESCRIPTION > + > +This module contains various parsing functions for use within C>, > +its submodules (including storage plugins) and other related modules. > + > +Parsing functions are categorized by their main purpose / area of application > +and may be further subdivided depending on what kind of data type they are > +primarily related to. > + > +=cut > + > +my $RE_PARENT_DIR = quotemeta('..'); > +my $RE_CONTAINS_PARENT_DIR = qr! > + ( ^$RE_PARENT_DIR/ ) # ../ --> Beginning of path > + | > + ( /$RE_PARENT_DIR/ ) # /../ --> Between two path components > + | > + ( /$RE_PARENT_DIR$ ) # /.. --> End of path > +!xn; > + > +my $RE_ISO_FILE_PATH = qr! > + (? > + (? [^/]+ \. (? (?i: iso|img) ) ) > + ) > +!xn; > + > +my $RE_FILE_PATH_FOR_VTYPE = { > + iso => qr/^$RE_ISO_FILE_PATH$/, > +}; > + > +my $RE_VOLNAME_FOR_VTYPE = { > + iso => qr/^$RE_ISO_FILE_PATH$/, > +}; > + > +my sub contains_parent_dir($path) { > + return $path =~ $RE_CONTAINS_PARENT_DIR; > +} > + > +my sub strip_leading_path_separators($path) { > + return $path =~ s!^/+!!r; > +} > + > +my sub strip_trailing_path_separators($path) { > + return $path =~ s!/+$!!r; > +} > + > +my sub format_named_groups(%groups) { > + my $result = {}; > + > + for my $old_key (keys %groups) { > + my $new_key = $old_key =~ s/_/-/gr; > + $result->{$new_key} = $groups{$old_key}; > + } > + > + my @disk_path_components = (); > + > + if (defined($result->{file})) { > + $result->{file} = strip_leading_path_separators($result->{file}); > + push(@disk_path_components, $result->{file}); > + } > + > + if (scalar(@disk_path_components)) { > + $result->{'disk-path'} = join('/', @disk_path_components); > + } > + > + return $result; > +} > + > +my sub split_leading_dir_from_path($path, $directory) { > + $directory = strip_trailing_path_separators($directory); > + > + if ($path =~ m!^(? \Q$directory\E ) / (? .* ) $ !xn) { > + return { > + 'leading-dir' => strip_trailing_path_separators($+{leading_dir}), > + remainder => strip_leading_path_separators($+{remainder}), Do we really want a hash here? The 'leading-dir' one is literally `$directory` 99% of the time - with $directory getting additional sanitization here, so maybe just `return ($directory, $rest);` That said... for consistency within the file I won't block it based on that, but I'm not happy about it... Code wise, we could use `substr` with a `''` replacement. Since we match an exact string we already have in a var, it's fine to throw it away. Then again I also wouldn't be surprised if perl's RE engine performed better than the following: if ("$directory/" eq substr($path, 0, length($directory) + 1, '')) { return ($directory, $path); } > + }; > + } > + > + return; > +} > + > +=head1 PARSERS RELATED TO VOLUMES > + > +The parsing functions in this section primarily deal with parsing data related > +to storage volumes, primarily C>s and C>s. > + > +=head2 VOLUME NAMES > + > +=cut > + > +=head3 parse_path_as_volname_parts > + > +Parses the given relative file path C<$path> according to the given C<$vtype> > +and returns a hashref containing the individually extracted parts that make up > +a C on success. > + > +On failure or when the provided C<$vtype> is not supported or does not exist, > +returns C instead. > + > +B This function assumes that C<$path> is already relative to the > +directory that corresponds to the given C<$vtype>. I wonder if we should then distinguish the sub as `parse_relative_path_as_volname_parts`? > + > +For example, the path C points to an > +ISO file on the default C directory storage. In this case, the storage > +path is C, the subdirectory for the C<$vtype = "iso"> is > +C