From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 22CF71FF164 for ; Fri, 31 Jan 2025 13:24:32 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C646D2BD94; Fri, 31 Jan 2025 13:24:27 +0100 (CET) Message-ID: Date: Fri, 31 Jan 2025 13:23:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Proxmox VE development discussion , Max Carrara References: <20250109144818.430185-1-m.carrara@proxmox.com> <20250109144818.430185-2-m.carrara@proxmox.com> Content-Language: en-US From: Fiona Ebner In-Reply-To: <20250109144818.430185-2-m.carrara@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -2.550 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_SOMETLD_ARE_BAD_TLD 5 .bar, .beauty, .buzz, .cam, .casa, .cfd, .club, .date, .guru, .link, .live, .monster, .online, .press, .pw, .quest, .rest, .sbs, .shop, .stream, .top, .trade, .wiki, .work, .xyz TLD abuse RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Am 09.01.25 um 15:48 schrieb Max Carrara: > +=head3 path_join(@paths) > + > +Joins multiple paths together. All kinds of paths are supported. > + > +Does not perform any C>>. > + > + my $joined = path_join("foo", "bar/baz", "qux.txt"); > + # foo/bar/baz/qux.txt > + > + my $joined = path_join("/", "etc/pve/", "storage.cfg"); > + # /etc/pve/storage.cfg > + > +Similar to C>>, should any of the > +C<@paths> be an absolute path, it I all preceding paths. Seems like the docs are somehow broken here, looking at it with perldoc, it just states 'Similar to "path_push()"' and the rest of the sentence is missing. Why this kind of behavior with absolute paths? Seems surprising to me. Wouldn't failing the call be better? > + > + my $joined = path_join("foo/bar", "/etc", "resolv.conf"); > + # /etc/resolv.conf > + > + my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts"); > + # /etc/hosts > + > +Throws an exception if any of the passed C<@paths> is C. > + > +=cut > + > +sub path_join : prototype(@) { > + my (@paths) = @_; > + > + if (!scalar(@paths)) { > + return ''; > + } > + > + croak "one of the provided paths is undef" if any { !defined($_) } @paths; > + I think the rest could be written more efficiently like (untested): my $resulting_path = shift @paths; for my $path (@paths) { if ($path =~ m#^/#) { $resulting_path = $path; } else { $resulting_path = path_push($resulting_path, $path); } } > + # Find the last occurrence of a root directory and start conjoining the > + # components from there onwards > + my $index = scalar(@paths) - 1; > + while ($index > 0) { > + last if $paths[$index] =~ m#^/#; > + $index--; > + } > + > + @paths = @paths[$index .. (scalar(@paths) - 1)]; > + > + my $resulting_path = shift @paths; > + > + for my $path (@paths) { > + $resulting_path = path_push($resulting_path, $path); > + } > + > + return $resulting_path; > +} > + > +=head3 path_normalize($path) > + > +Wrapper for L>. Performs a logical cleanup of the given > +C<$path>. > + > +This removes unnecessary components of a path that can be safely > +removed, such as C<.>, trailing C or repeated occurrences of C. > + > +For example, C and C will both become > +C. > + > +B This will I remove components referencing the parent directory, > +i.e. C<..>. For example, C and C will therefore > +remain as they are. However, the parent directory of C is C, so > +C will be normalized to C. > + > +Throws an exception if C<$path> is C or the call to C failed. > + > +=cut > + > +sub path_normalize : prototype($) { > + my ($path) = @_; > + > + croak "\$path is undef" if !defined($path); > + > + my $cleaned_path = eval { > + File::Spec->canonpath($path); > + }; > + Style nit: blank lines between eval and using $@ are better avoided IMHO. I'd also have the eval expression be a single line, but no strong feelings. > + croak "failed to clean up path: $@" if $@; > + > + return $cleaned_path; > +} > + > +=head3 path_parent($path) > + > +Returns the given C<$path> without its final component, if there is one. > + > +Trailing and repeated occurrences of C and C<.> are normalized on the fly > +when needed. This means that e.g. C becomes C, but > +C becomes C. > + > +This function's behaviour is almost identical to Rust's > +L<< Path::parent|https://doc.rust-lang.org/std/path/struct.Path.html#method.parent >>, > +with a few adaptations made wherever Perl treats things differently: > + > +=over > + > +=item * C becomes C, C becomes C. > + > +=item * C becomes C. > + > +=item * C becomes C. Note that C is not > +necessarily the real parent in the filesystem in the case of e.g. symlinks. > + > +=item * C becomes C. > + > +=item * C is interpreted as C<./foo> and becomes C<.>. This is because Perl's > +C> interprets C<./foo> and C as the same thing. I fail to see the "because" here. Why is this special handling important in practice, i.e. why exactly care about the behavior of canonpath here? It seems really surprising to get "." when asking for the parent of "foo". With the normalization explained above, I'd argue it'd be much more natural to return the empty string for both "./foo" and "foo". Or if you want to follow Rust's parent() more closely (which I guess is the reason for "/" and "." having different results), have "./foo" return "." and "foo" return the empty string. > +=head3 path_push($path, $other) > + > +Extends C<$path> with C<$other>, returning a new path. > + > +If C<$other> is absolute, it will be returned instead. > + Again, seems rather surprising to me to auto-magically do this. > +Throws an exception if any of the arguments is C. > + > +=cut > + > +sub path_push : prototype($$) { > + my ($path, $other) = @_; > + > + croak "\$path is undef" if !defined($path); > + croak "\$other is undef" if !defined($other); > + > + return $path if $other eq ''; > + return $other if path_is_absolute($other); > + > + my $need_sep = $path ne '' && $path !~ m#/$#; > + > + $path .= "/" if $need_sep; > + $path .= $other; > + > + return $path; > +} > + > +=head3 path_pop($path) > + > +Alias for C>>. > + Why have this alias? The name suggests it would behave like an inverse to path_push(), but it does not, because of the normalization of path_parent(). So I'd rather not have it or have it be a non-normalizing version (but maybe not worth it) to avoid surprises. > +=cut > + > +sub path_pop : prototype($) { > + my ($path) = @_; > + return path_parent($path); > +} > + > +=head3 path_file_name($path) > + > +Returns the last component of the given C<$path>, if it is a legal file name, > +or C otherwise. > + > +If C<$path> is an empty string, C, C<.> or ends with a C<..> component, > +there is no valid file name. > + > +B This does not check whether the given C<$path> actually points to a > +file or a directory etc. > + > +Throws an exception if C<$path> is C. > + > +=cut > + > +sub path_file_name : prototype($) { > + my ($path) = @_; > + > + croak "\$path is undef" if !defined($path); > + > + my @components = path_components($path); > + > + if (!scalar(@components)) { > + return; > + } > + > + if ( > + scalar(@components) == 1 > + && ($components[0] eq '/' || $components[0] eq '.') > + ) { Style nit: this conditional fits well into 100 characters, so can be one line instead of four > + return; > + } > + > + if ($components[-1] eq '..') { > + return; > + } > + > + return $components[-1]; > +} > + > +=head3 path_with_file_name($path, $file_name) > + > +Returns C<$path> with C<$file_name> as the new last component. > + > +This is essentially like calling C>> > +and using C>> to append the new > +file name, but handles a few extra cases: > + > +=over > + > +=item * If C<$path> is C, appends C<$file_name>. > + > +=item * If C<$path> is an empty string, appends C<$file_name>. > + > +=item * If C<$path> ends with a parent directory reference (C<..>), replaces it > +with C<$file_name>. Hmm, why is that an "extra case" compared to calling path_parent() and path_push()? I.e. your implementation doesn't handle ".." specifically. > + > +=back > + > +Throws an exception if any of the arguments is C or if C<$file_name> > +contains a path separator (C). > + > +=cut > + > +sub path_with_file_name : prototype($$) { > + my ($path, $file_name) = @_; > + > + croak "\$path is undef" if !defined($path); > + croak "\$file_name is undef" if !defined($file_name); > + croak "\$file_name contains a path separator: $file_name" if $file_name =~ m|/|; > + > + my $parent = path_parent($path); > + > + # undef means that $path was either "" or "/", so we can just append to it > + return ($path . $file_name) if !defined($parent); > + > + # Because the parent of "foo" is ".", return $file_name to stay consistent. > + # Otherwise, we'd end up with a current path ref prepended ("./$file_name") > + if ($parent eq '.' && $path !~ m|/|) { > + return $file_name; > + } See, you already need special handling because of that surprising behavior ;) > + > + return path_push($parent, $file_name); > +} > + > +my sub _path_file_prefix_suffix_str : prototype($) { > + my ($file_name) = @_; > + > + confess "\$file_name is undef" if !defined($file_name); > + > + confess "\$prefix not matched" if $file_name !~ m|^(\.*[^\.]*)(.*)|; > + my ($prefix, $suffix_str) = ($1, $2); > + > + return ($prefix, $suffix_str); > +} > + > +my sub _path_file_suffixes_from_str : prototype($) { > + my ($suffix_str) = @_; > + > + confess "\$suffix_str is undef" if !defined($suffix_str); > + > + # Suffixes are extracted "manually" because join()ing the result of split() > + # results in a different file name than the original. Let's say you have a > + # file named "foo.bar.". The correct suffixes would be ("bar", ""). > + # With split, you get the following: > + # split(/\./, ".bar.") --> ("", "bar") --> join()ed to "foo..bar" > + # split(/\./, ".bar.", -1) --> ("", "bar", "") --> join()ed to "foo..bar." Couldn't you just shift away the first entry from the result? > + my @suffixes = (); > + while ($suffix_str =~ s|^(\.[^\.]*)||) { > + my $suffix = $1; > + $suffix =~ s|^\.||; Nit: Could also use substr() rather then regex replace. > + push(@suffixes, $suffix); > + } > + > + return @suffixes; > +} > + > +=head3 path_file_prefix($path) > + > +Returns the prefix of the file name of the given C<$path>. If the C<$path> does > +not have a valid file name and thus no prefix, C is returned instead. > + > +The prefix of a file name is the part before any extensions (suffixes). > + > + my $prefix = path_file_prefix("/etc/resolv.conf"); > + # resolv > + > + my $prefix = path_file_prefix("/tmp/archive.tar.zst"); > + # archive > + > +Throws an exception if C<$path> is C. > + Documentation should mention how dot files (e.g. .foo.txt) are treated. > +=cut > + > +sub path_file_prefix : prototype($) { > + my ($path) = @_; > + > + croak "\$path is undef" if !defined($path); > + > + my $file_name = path_file_name($path); > + return undef if !defined($file_name); > + > + my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name); > + return $prefix; > +} > + > +=head3 path_with_file_prefix($path, $prefix) > + Hmm, looking at this and path_with_file_suffix{,es}(), would it maybe be nicer to have a single path_with_file_name_from_parts($path, $prefix, @suffixes) function instead of these? Would seem more natural/straightforward to me. The implementations are rather involved IMHO compared to how useful the functions are. It's very easy to get the behavior for the (I suspect rather uncommon) case of where you want to replace only prefix or suffix(es) without already knowing the other. Just need to call path_file_parts() first. Or did you take inspiration from somewhere else for these? ---snip 8<--- > +=head3 path_file_parts($path) > + > +Returns the parts that constitute the file name (prefix and suffixes) of a > +C<$path> as a list. If the C<$path> does not have a valid file name, an empty > +list is returned instead. > + > +These parts are split in such a way that allows them to be Ced together, > +resulting in the original file name of the given C<$path> again. > + > + my @file_parts = path_file_parts("/etc/pve/firewall/cluster.fw"); > + # ("cluster", "fw") > + my $file_name = join(".", @file_parts); > + > + my @file_parts = path_file_parts("/tmp/archive.tar.gz"); > + # ("archive", "tar", "gz") > + my $file_name = join(".", @file_parts); > + > +Throws an exception if C<$path> is C. > + Should mention how dot files are treated. > +=cut > + > +sub path_file_parts : prototype($) { > + my ($path) = @_; > + > + croak "\$path is undef" if !defined($path); > + > + my $file_name = path_file_name($path); > + if (!defined($file_name)) { > + return (); > + } > + > + my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name); > + my @suffixes = _path_file_suffixes_from_str($suffix_str); > + > + return ($prefix, @suffixes); > +} > + > +=head3 path_starts_with($path, $other_path) > + > +Checks whether a C<$path> starts with the components of C<$other_path>. > + > + my $starts_with = path_starts_with("/etc/pve/firewall/cluster.fw", "/etc/pve"); > + # 1 > + > +Throws an exception if any of the arguments is C. > + Should document the behavior for empty paths as well as the normalization. > +=cut > + > +sub path_starts_with : prototype($$) { > + my ($path, $other_path) = @_; > + > + croak "\$path is undef" if !defined($path); > + croak "\$other_path if undef" if !defined($other_path); > + > + # Nothing starts with nothing > + return 1 if ($path eq '' && $other_path eq ''); > + > + # Nothing cannot start with something > + # Something cannot start with nothing > + return if ($path eq '' || $other_path eq ''); > + > + my @components = path_components($path); > + my @other_components = path_components($other_path); > + > + my @pairs = zip_shortest(\@components, \@other_components); Why not just shift off from both in a loop? > + > + # for my ($comp, $other_comp) (@pairs) is experimental > + for my $pair (@pairs) { > + my ($comp, $other_comp) = $pair->@*; > + > + if ($comp ne $other_comp) { > + return; > + } > + } > + > + return 1; > +} > + > +=head3 path_ends_with($path, $other_path) > + > +Checks whether a C<$path> ends with the components of C<$other_path>. > + > + my $ends_with = path_ends_with("/etc/pve/firewall/cluster.fw", "firewall/cluster.fw"); > + # 1 > + > +Throws an exception if any of the arguments is C. > + Should document the behavior for empty paths as well as the normalization. > +=cut > + > +sub path_ends_with : prototype($$) { > + my ($path, $other_path) = @_; > + > + croak "\$path is undef" if !defined($path); > + croak "\$other_path if undef" if !defined($other_path); > + > + # Nothing ends with nothing > + return 1 if ($path eq '' && $other_path eq ''); > + > + # Nothing cannot end with something > + # Something cannot end with nothing > + return if ($path eq '' || $other_path eq ''); > + > + my @components_rev = reverse(path_components($path)); > + my @other_components_rev = reverse(path_components($other_path)); > + > + my @pairs_rev = zip_shortest(\@components_rev, \@other_components_rev); Why not just shift off from both in a loop? > + > + # for my ($comp, $other_comp) (@pairs_rev) is experimental > + for my $pair (@pairs_rev) { > + my ($comp, $other_comp) = $pair->@*; > + > + if ($comp ne $other_comp) { > + return; > + } > + } > + > + return 1; > +} > + > +=head3 path_equals($path, $other_path) > + > +Checks whether C<$path> equals C<$other_path>. The paths are compared > +by their components, meaning that it's not necessary to > +L<< normalize|/"path_normalize($path)" >> them beforehand. > + > +=cut > + > +sub path_equals : prototype($$) { > + my ($path, $other_path) = @_; > + > + croak "\$path is undef" if !defined($path); > + croak "\$other_path if undef" if !defined($other_path); > + > + # Nothing is nothing > + return 1 if ($path eq '' && $other_path eq ''); > + > + # Nothing is not something > + # Something is not nothing > + return if ($path eq '' || $other_path eq ''); > + > + my @components = path_components($path); > + my @other_components = path_components($other_path); > + > + return if scalar(@components) != scalar(@other_components); > + > + my @pairs = zip_longest(\@components, \@other_components); Why not just shift off from both in a loop? > + > + # for my ($comp, $other_comp) (@pairs_rev) is experimental > + for my $pair (@pairs) { > + my ($comp, $other_comp) = $pair->@*; > + > + return if !defined($comp) || !defined($other_comp); > + > + if ($comp ne $other_comp) { > + return; > + } > + } > + > + return 1; > +} > + > +1; _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel