public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Max Carrara <m.carrara@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path
Date: Thu,  9 Jan 2025 15:48:07 +0100	[thread overview]
Message-ID: <20250109144818.430185-2-m.carrara@proxmox.com> (raw)
In-Reply-To: <20250109144818.430185-1-m.carrara@proxmox.com>

The PVE::Path module concerns itself with file / directory path
operations, like getting the parent directory of a path, extracting
the file name of a path, splitting a path into its individual
components, joining path components together, comparing paths, and so
on.

This module is added here in order to address the shortcomings of the
Perl core modules (such as lacking a lot the aforementioned
functionalities) without relying on some kind of object-based
abstraction.

PVE::Path strictly only contains functions that manipulate file paths
without meddling with the filesystem. The reasoning here is to be able
to just import any function when its necessary without having to
change the surrounding code in order to adapt to some kind of
abstraction.

The other motivation for this module is that path manipulation
operations have been getting more and more common recently, especially
in regards to storage.

Signed-off-by: Max Carrara <m.carrara@proxmox.com>
---
Changes v2 --> v3:
  * Don't return a reference to a list anymore when path_components,
    path_file_suffixes, path_file_parts are called in scalar context
  * Mention '/' being added at the start of the components being
    returned by path_components in its docstring
  * Mention special case of how absolute paths are handled and refer to
    path_push in path_join's docstring
  * Check whether path is absolute after checking whether it's empty
    instead of the other way around in path_push
  * Rework private helper functions and make them a little more
    efficient

Changes v1 --> v2:
  * Improve some wording in the docstring of path_components
  * Simplify some logic in path_parent and remove an unnecessary sanity
    check
  * Actually treat "foo" as "./foo" in path_parent as mentioned in the
    docstring -- This means that path_parent("foo") now returns "."
    instead of "".
  * Adapt the path_with_file_* functions to the above accordingly, so
    that e.g. path_with_file_name("foo", "bar") returns "bar" instead of
    "./bar".
  * Improve the "boolean" behaviour of path_is_absolute and
    path_is_absolute and return 1 when true, but use an empty return
    when false.
    - An empty return means "undef" in scalar context and an empty list
      in list context, so those functions will always return something
      that's correctly truthy or falsy for Perl, regardless of context

 src/Makefile    |   1 +
 src/PVE/Path.pm | 991 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 992 insertions(+)
 create mode 100644 src/PVE/Path.pm

diff --git a/src/Makefile b/src/Makefile
index 2d8bdc4..25bc490 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ LIB_SOURCES = \
 	LDAP.pm \
 	Network.pm \
 	OTP.pm \
+	Path.pm \
 	PBSClient.pm \
 	PTY.pm \
 	ProcFSTools.pm \
diff --git a/src/PVE/Path.pm b/src/PVE/Path.pm
new file mode 100644
index 0000000..ef47d66
--- /dev/null
+++ b/src/PVE/Path.pm
@@ -0,0 +1,991 @@
+=head1 NAME
+
+C<PVE::Path> - Utilities related to handling file and directory paths
+
+=head1 DESCRIPTION
+
+This module provides functions concerned with file and directory path
+manipulation.
+
+None of the functions provided alter the filesystem in any way.
+
+The reason for this module's existence is to address a couple shortcomings:
+
+=over
+
+=item 1. The Perl core modules lack most of what is required for manipulating
+paths, for example getting the parent directory of a path, extracting the
+prefix of a file name (the "stem"), extracting the suffixes of a file name (the
+"endings" or "extensions"), checking whether two paths are the same, and so on.
+
+=item 2. If the Perl core modules provide something in that regard, it's usually
+provided in a not very ergonomic manner (L<C<File::Basename>>).
+
+=item 3. Additionally, the path utilities of the core modules are scattered
+across multiple modules, making them hard to discover.
+
+=item 4. Third-party libraries on CPAN mostly provide objects representing
+paths. Using any of these would require fundamental changes on how file paths
+are handled throughout our code, for almost no benefit.
+
+=back
+
+C<L<PVE::Path>> instead does without objects and strictly provides functions
+for path manipulation only. Any operation that is needed can simply be
+performed ad hoc by importing the corresponding function and doesn't require
+the surrounding code to conform to an abstraction like a path object.
+
+Additionally, some of the core modules' functionality is re-exported or
+re-implemented for ergonomic or logical purposes. The goal is to provide
+functions that don't come with any surprises and just behave like one assumes
+they would.
+
+This module takes inspiration from Rust's C<std::path> and Python's C<pathlib>,
+which are more modern path manipulation libraries.
+
+=head1 LIMITATIONS
+
+This module is limited to manipulating Unix-like / Linux file paths.
+
+=cut
+
+package PVE::Path;
+
+use strict;
+use warnings;
+
+use Carp qw(carp croak confess);
+use File::Spec ();
+use List::Util qw(any zip_shortest zip_longest);
+
+use Exporter qw(import);
+
+our @EXPORT_OK = qw(
+    path_is_absolute
+    path_is_relative
+
+    path_components
+    path_join
+
+    path_normalize
+
+    path_parent
+    path_push
+    path_pop
+
+    path_file_name
+    path_with_file_name
+
+    path_file_prefix
+    path_with_file_prefix
+
+    path_file_suffixes
+    path_with_file_suffixes
+
+    path_file_suffix
+    path_with_file_suffix
+
+    path_file_parts
+
+    path_starts_with
+    path_ends_with
+    path_equals
+);
+
+=head2 FUNCTIONS
+
+=cut
+
+=head3 path_is_absolute($path)
+
+Returns C<1> if C<$path> is absolute (starts with a C</>).
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_is_absolute : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if ($path =~ m#^/#) {
+	return 1;
+    }
+
+    return;
+}
+
+=head3 path_is_relative($path)
+
+Returns C<1> if C<$path> is relative (doesn't start with a C</>).
+
+The opposite of C<L<< path_is_absolute()|/"path_is_absolute($path)" >>>.
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_is_relative : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if ($path !~ m#^/#) {
+	return 1;
+    }
+
+    return;
+}
+
+=head3 path_components($path)
+
+Returns a list of the given C<$path>'s individual components.
+
+The C<$path> is normalized a little during the parse:
+
+=over
+
+=item Repeated occurrences of C</> are removed, so C<foo/bar> and C<foo//bar>
+both have C<foo> and C<bar> as components.
+
+=item Trailing slashes C</> are removed.
+
+=item Occurrences of C<.> are normalized away, except the first C<.> at
+beginning of a path. This means that C<foo/bar>, C<foo/./bar>, C<foo/bar/.>,
+C<foo/././bar/./.>, etc. all have C<foo> and C<bar> as components, while
+C<./foo/bar>, C<./././foo/bar>, C<./foo/./bar/.> have C<.>, C<foo> and C<bar>
+as components.
+
+=items Absolute paths will retain a C</> at the beginning. This means that
+C</foo/bar> has C</>, C<foo> and C<bar> as components.
+
+=back
+
+No other normalization is performed to account for the possibility of symlinks
+existing. This means that C<foo/baz> and C<foo/bar/../baz> are distinct (because
+C<bar> could be a symlink and thus C<foo> isn't its parent).
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_components : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $is_abs = path_is_absolute($path);
+    my $has_cur_dir = $path =~ m#^\.$|^\./#;
+
+    my @components = split('/', $path);
+    my @normalized_components = ();
+
+    for my $component (@components) {
+	next if $component eq '' || $component eq '.';
+
+	push(@normalized_components, $component);
+    }
+
+    unshift(@normalized_components, '/') if $is_abs;
+    unshift(@normalized_components, '.') if $has_cur_dir;
+
+    return @normalized_components;
+}
+
+
+=head3 path_join(@paths)
+
+Joins multiple paths together. All kinds of paths are supported.
+
+Does not perform any C<L<< normalization|/"path_normalize($path)" >>>.
+
+    my $joined = path_join("foo", "bar/baz", "qux.txt");
+    # foo/bar/baz/qux.txt
+
+    my $joined = path_join("/", "etc/pve/", "storage.cfg");
+    # /etc/pve/storage.cfg
+
+Similar to C<L<< path_push()|/"path_push($path, $other)">>>, should any of the
+C<@paths> be an absolute path, it I<replaces> all preceding paths.
+
+    my $joined = path_join("foo/bar", "/etc", "resolv.conf");
+    # /etc/resolv.conf
+
+    my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts");
+    # /etc/hosts
+
+Throws an exception if any of the passed C<@paths> is C<undef>.
+
+=cut
+
+sub path_join : prototype(@) {
+    my (@paths) = @_;
+
+    if (!scalar(@paths)) {
+	return '';
+    }
+
+    croak "one of the provided paths is undef" if any { !defined($_) } @paths;
+
+    # Find the last occurrence of a root directory and start conjoining the
+    # components from there onwards
+    my $index = scalar(@paths) - 1;
+    while ($index > 0) {
+	last if $paths[$index] =~ m#^/#;
+	$index--;
+    }
+
+    @paths = @paths[$index .. (scalar(@paths) - 1)];
+
+    my $resulting_path = shift @paths;
+
+    for my $path (@paths) {
+	$resulting_path = path_push($resulting_path, $path);
+    }
+
+    return $resulting_path;
+}
+
+=head3 path_normalize($path)
+
+Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of the given
+C<$path>.
+
+This removes unnecessary components of a path that can be safely
+removed, such as C<.>, trailing C</> or repeated occurrences of C</>.
+
+For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become
+C<foo/bar/baz>.
+
+B<Note:> This will I<not> remove components referencing the parent directory,
+i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will therefore
+remain as they are. However, the parent directory of C</> is C</>, so
+C</../../foo> will be normalized to C</foo>.
+
+Throws an exception if C<$path> is C<undef> or the call to C<canonpath> failed.
+
+=cut
+
+sub path_normalize : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $cleaned_path = eval {
+	File::Spec->canonpath($path);
+    };
+
+    croak "failed to clean up path: $@" if $@;
+
+    return $cleaned_path;
+}
+
+=head3 path_parent($path)
+
+Returns the given C<$path> without its final component, if there is one.
+
+Trailing and repeated occurrences of C</> and C<.> are normalized on the fly
+when needed. This means that e.g. C<foo////bar///.//> becomes C<foo>, but
+C<foo/.//bar//./baz> becomes C<foo/.//bar>.
+
+This function's behaviour is almost identical to Rust's
+L<< Path::parent|https://doc.rust-lang.org/std/path/struct.Path.html#method.parent >>,
+with a few adaptations made wherever Perl treats things differently:
+
+=over
+
+=item * C</foo/bar> becomes C</foo>, C<foo/bar> becomes C<foo>.
+
+=item * C</foo> becomes C</>.
+
+=item * C<foo/bar/..> becomes C<foo/bar>. Note that C<foo/bar> is not
+necessarily the real parent in the filesystem in the case of e.g. symlinks.
+
+=item * C<foo/../bar> becomes C<foo/..>.
+
+=item * C<foo> is interpreted as C<./foo> and becomes C<.>. This is because Perl's
+C<L<File::Spec/canonpath>> interprets C<./foo> and C<foo> as the same thing.
+
+=item * C</> and an I<empty string> result in C<undef> being returned.
+
+=item * C<.> results in an empty string.
+
+=back
+
+Will raise an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_parent : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if ($path eq '') {
+	return;
+    }
+
+    # A limit of -1 retains empty components at the end
+    my @components = split('/', $path, -1);
+
+    # Trim off needless extra components until actual final component is encountered, e.g.
+    # foo////bar////baz//// -> foo////bar////baz
+    # /// -> /
+    # ././//.///./ -> .
+    while (scalar(@components) > 1 && ($components[-1] eq '' || $components[-1] eq '.')) {
+	pop(@components);
+    }
+
+    my $final_component = pop(@components);
+
+    # We had a root dir with needless extra components, e.g. "//" or "////" or "//.///./" etc.
+    return if $final_component eq '';
+
+    # We had a current dir reference with needless extra components, e.g.
+    # "././" or ".///////" or "./././//./././//" etc.
+    return '' if $final_component eq '.';
+
+    # We had some other kind of single component like "foo", "bar" or "..",
+    # and because File::Spec->canonpath treats "foo" and "./foo" the same,
+    # return a single current dir reference
+    return '.' if !scalar(@components);
+
+    # Trim off needless extra components until actual parent component is encountered, like above
+    while (scalar(@components) > 1 && ($components[-1] eq '' || $components[-1] eq '.')) {
+	pop(@components);
+    }
+
+    # Handle lone root dir (@components with only one empty string)
+    if (scalar(@components) == 1 && $components[0] eq '') {
+	return '/';
+    }
+
+    return join('/', @components);
+}
+
+=head3 path_push($path, $other)
+
+Extends C<$path> with C<$other>, returning a new path.
+
+If C<$other> is absolute, it will be returned instead.
+
+Throws an exception if any of the arguments is C<undef>.
+
+=cut
+
+sub path_push : prototype($$) {
+    my ($path, $other) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other is undef" if !defined($other);
+
+    return $path if $other eq '';
+    return $other if path_is_absolute($other);
+
+    my $need_sep = $path ne '' && $path !~ m#/$#;
+
+    $path .= "/" if $need_sep;
+    $path .= $other;
+
+    return $path;
+}
+
+=head3 path_pop($path)
+
+Alias for C<L<< path_parent()|/"path_parent($path)" >>>.
+
+=cut
+
+sub path_pop : prototype($) {
+    my ($path) = @_;
+    return path_parent($path);
+}
+
+=head3 path_file_name($path)
+
+Returns the last component of the given C<$path>, if it is a legal file name,
+or C<undef> otherwise.
+
+If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component,
+there is no valid file name.
+
+B<Note:> This does not check whether the given C<$path> actually points to a
+file or a directory etc.
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_name : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my @components = path_components($path);
+
+    if (!scalar(@components)) {
+	return;
+    }
+
+    if (
+	scalar(@components) == 1
+	&& ($components[0] eq '/' || $components[0] eq '.')
+    ) {
+	return;
+    }
+
+    if ($components[-1] eq '..') {
+	return;
+    }
+
+    return $components[-1];
+}
+
+=head3 path_with_file_name($path, $file_name)
+
+Returns C<$path> with C<$file_name> as the new last component.
+
+This is essentially like calling C<L<< path_parent()|/"path_parent($path)" >>>
+and using C<L<< path_push()|/"path_push($path, $other)" >>> to append the new
+file name, but handles a few extra cases:
+
+=over
+
+=item * If C<$path> is C</>, appends C<$file_name>.
+
+=item * If C<$path> is an empty string, appends C<$file_name>.
+
+=item * If C<$path> ends with a parent directory reference (C<..>), replaces it
+with C<$file_name>.
+
+=back
+
+Throws an exception if any of the arguments is C<undef> or if C<$file_name>
+contains a path separator (C</>).
+
+=cut
+
+sub path_with_file_name : prototype($$) {
+    my ($path, $file_name) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$file_name is undef" if !defined($file_name);
+    croak "\$file_name contains a path separator: $file_name" if $file_name =~ m|/|;
+
+    my $parent = path_parent($path);
+
+    # undef means that $path was either "" or "/", so we can just append to it
+    return ($path . $file_name) if !defined($parent);
+
+    # Because the parent of "foo" is ".", return $file_name to stay consistent.
+    # Otherwise, we'd end up with a current path ref prepended ("./$file_name")
+    if ($parent eq '.' && $path !~ m|/|) {
+	return $file_name;
+    }
+
+    return path_push($parent, $file_name);
+}
+
+my sub _path_file_prefix_suffix_str : prototype($) {
+    my ($file_name) = @_;
+
+    confess "\$file_name is undef" if !defined($file_name);
+
+    confess "\$prefix not matched" if $file_name !~ m|^(\.*[^\.]*)(.*)|;
+    my ($prefix, $suffix_str) = ($1, $2);
+
+    return ($prefix, $suffix_str);
+}
+
+my sub _path_file_suffixes_from_str : prototype($) {
+    my ($suffix_str) = @_;
+
+    confess "\$suffix_str is undef" if !defined($suffix_str);
+
+    # Suffixes are extracted "manually" because join()ing the result of split()
+    # results in a different file name than the original. Let's say you have a
+    # file named "foo.bar.". The correct suffixes would be ("bar", "").
+    # With split, you get the following:
+    #     split(/\./, ".bar.")     --> ("", "bar")     --> join()ed to "foo..bar"
+    #     split(/\./, ".bar.", -1) --> ("", "bar", "") --> join()ed to "foo..bar."
+    my @suffixes = ();
+    while ($suffix_str =~ s|^(\.[^\.]*)||) {
+	my $suffix = $1;
+	$suffix =~ s|^\.||;
+	push(@suffixes, $suffix);
+    }
+
+    return @suffixes;
+}
+
+=head3 path_file_prefix($path)
+
+Returns the prefix of the file name of the given C<$path>. If the C<$path> does
+not have a valid file name and thus no prefix, C<undef> is returned instead.
+
+The prefix of a file name is the part before any extensions (suffixes).
+
+    my $prefix = path_file_prefix("/etc/resolv.conf");
+    # resolv
+
+    my $prefix = path_file_prefix("/tmp/archive.tar.zst");
+    # archive
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_prefix : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+    return $prefix;
+}
+
+=head3 path_with_file_prefix($path, $prefix)
+
+Returns C<$path> with a new C<$prefix>. This is similar to
+C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>,
+except that the file prefix is replaced or appended.
+
+If C<$path> does not have a file name or if C<$prefix> is an empty string,
+C<undef> is returned.
+
+    my $new_path = path_with_file_prefix("/tmp/archive.tar.zst", "backup");
+    # /tmp/backup.tar.zst
+
+    my $new_path = path_with_file_prefix("/etc/pve", "ceph");
+    # /etc/ceph
+
+Throws an exception if any of the arguments is C<undef>, or if C<$prefix>
+contains a path separator (C</>), ends with C<.>, or is an empty string.
+
+=cut
+
+sub path_with_file_prefix : prototype($$) {
+    my ($path, $prefix) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$prefix is undef" if !defined($prefix);
+    croak "\$prefix contains a path separator" if $prefix =~ m|/|;
+    croak "\$prefix ends with a dot" if $prefix =~ m|\.$|;
+
+    return undef if $prefix eq '';
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my $parent = path_parent($path);
+
+    # sanity check -- should not happen because we checked for file name,
+    # and the existence of a file name implies there's a parent
+    confess "parent of \$path is undef" if !defined($parent);
+
+    my (undef, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+
+    my $new_file_name = $prefix . $suffix_str;
+
+    # Because the parent of "foo" is ".", return $new_file_name to stay consistent.
+    # Otherwise, we'd end up with a current path ref prepended ("./$new_file_name")
+    # (Done also in path_with_new_file_name)
+    if ($parent eq '.' && $path !~ m|/|) {
+	return $new_file_name;
+    }
+
+    return path_push($parent, $new_file_name);
+}
+
+=head3 path_file_suffixes($path)
+
+Returns the suffixes of the C<$path>'s file name as a list. If the C<$path> does
+not have a valid file name, an empty list is returned instead.
+
+The suffixes of a path are essentially the file name's extensions, the parts
+that come after the L<< prefix|/"path_file_prefix($path)" >>.
+
+    my @suffixes = path_file_suffixes("/etc/resolv.conf");
+    # ("conf")
+
+    my @suffixes = path_file_prefix("/tmp/archive.tar.zst");
+    # ("tar", "zst")
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_suffixes : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    if (!defined($file_name)) {
+	return ();
+    }
+
+    my $suffix_str;
+    (undef, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+
+    return _path_file_suffixes_from_str($suffix_str);
+}
+
+=head3 path_with_file_suffixes($path, @suffixes)
+
+Returns C<$path> with new C<@suffixes>. This is similar to
+C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>,
+except that the suffixes of the file name are replaced.
+
+If the C<$path> does not have a file name, C<undef> is returned.
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "pxar", "gz");
+    # /tmp/archive.pxar.gz
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "gz");
+    # /tmp/archive.gz
+
+If the file name has no suffixes, the C<@suffixes> are appended instead:
+
+    my $new_path = path_with_file_suffixes("/etc/resolv", "conf");
+    # /etc/resolv.conf
+
+    my $new_path = path_with_file_suffixes("/etc/resolv", "conf", "zst");
+    # /etc/resolv.conf.zst
+
+If there are no C<@suffixes> provided, the file name's suffixes will
+be removed (if there are any):
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst");
+    # /tmp/archive
+
+Note that an empty string is still a valid suffix (an "empty" file ending):
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "", "", "", "zst");
+    # /tmp/archive....zst
+
+Throws an exception if C<$path> or any of the C<@suffixes> is C<undef>, or
+if any suffix contains a path separator (C</>) or a C<.>.
+
+=cut
+
+sub path_with_file_suffixes : prototype($@) {
+    my ($path, @suffixes) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "one of the provided suffixes is undef"
+	if any { !defined($_) } @suffixes;
+    croak "one of the provided suffixes contains a path separator"
+	if any { $_ =~ m|/| } @suffixes;
+    croak "one of the provided suffixes contains a dot"
+	if any { $_ =~ m|\.| } @suffixes;
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my $parent = path_parent($path);
+
+    # sanity check -- should not happen because we checked for file name,
+    # and the existence of a file name implies there's a parent
+    confess "parent of \$path is undef" if !defined($parent);
+
+    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+
+    # Don't modify $path if there are no suffixes to be removed
+    return $path if !scalar(@suffixes) && $suffix_str eq '';
+
+    # sanity check
+    confess "\$prefix is undef" if !defined($prefix);
+
+    my $new_file_name = join(".", $prefix, @suffixes);
+
+    # Because the parent of "foo" is ".", return $new_file_name to stay consistent.
+    # Otherwise, we'd end up with a current path ref prepended ("./$new_file_name")
+    # (Done also in path_with_new_file_name)
+    if ($parent eq '.' && $path !~ m|/|) {
+	return $new_file_name;
+    }
+
+    return path_push($parent, $new_file_name);
+}
+
+=head3 path_file_suffix($path)
+
+Returns the suffix of the C<$path>'s file name. If the C<$path> does not have a
+valid file name or if the file name has no suffix, C<undef> is returned
+instead.
+
+The suffix of a file name is essentially its extension, e.g.
+C</etc/resolv.conf> has the suffix C<conf>. If there are multiple suffixes,
+only the last will be returned; e.g. C</tmp/archive.tar.gz> has the suffix C<gz>.
+
+B<Note:> Files like e.g. C</tmp/foo.> have an empty string as suffix.
+
+For getting all suffixes of a path, see C<L<< path_file_suffixes()|/"path_file_suffixes($path)" >>>.
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_suffix : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my $suffix_str;
+    (undef, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+    my @suffixes = _path_file_suffixes_from_str($suffix_str);
+
+    return pop(@suffixes);
+}
+
+=head3 path_with_file_suffix($path, $suffix)
+
+Returns C<$path> with a new C<$suffix>. This is similar to
+C<L<< path_with_file_suffixes()|/"path_with_file_suffixes($path, @suffixes)" >>>,
+except that only the last suffix of the file name is replaced.
+
+If the C<$path> does not have a file name, C<undef> is returned.
+
+    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", "gz");
+    # /tmp/archive.tar.gz
+
+If the file name has no suffixes, the C<$suffix> is appended instead:
+
+    my $new_path = path_with_file_suffix("/etc/resolv", "conf");
+    # /etc/resolv.conf
+
+If C<$suffix> is C<undef>, the file name's (last) suffix will be removed (if
+there is one):
+
+    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", undef);
+    # /tmp/archive.tar
+
+    my $new_path = path_with_file_suffix("/etc/resolv", undef);
+    # /etc/resolv
+
+Note that an empty string is still a valid suffix (an "empty" file ending):
+
+    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", "");
+    # /tmp/archive.tar.
+
+    my $new_path = path_with_file_suffix("/etc/resolv", "");
+    # /etc/resolv.
+
+Throws an exception if any of the arguments is C<undef>, or if C<$suffix>
+contains a path separator (C</>) or a C<.>.
+
+=cut
+
+sub path_with_file_suffix : prototype($$) {
+    my ($path, $suffix) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if (defined($suffix)) {
+	croak "\$suffix contains a path separator" if $suffix =~ m|/|;
+	croak "\$suffix contains a dot" if $suffix =~ m|\.|;
+    }
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my $parent = path_parent($path);
+
+    # sanity check -- should not happen because we checked for file name,
+    # and the existence of a file name implies there's a parent
+    confess "parent of \$path is undef" if !defined($parent);
+
+    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+    my @suffixes = _path_file_suffixes_from_str($suffix_str);
+
+    # Don't modify $path if there is no suffix to be removed
+    return $path if !scalar(@suffixes) && !defined($suffix);
+
+    pop(@suffixes);
+    push(@suffixes, $suffix) if defined($suffix);
+
+    # sanity check
+    confess "\$prefix is undef" if !defined($prefix);
+
+    my $new_file_name = join(".", $prefix, @suffixes);
+
+    # Because the parent of "foo" is ".", return $new_file_name to stay consistent.
+    # Otherwise, we'd end up with a current path ref prepended ("./$new_file_name")
+    # (Done also in path_with_new_file_name)
+    if ($parent eq '.' && $path !~ m|/|) {
+	return $new_file_name;
+    }
+
+    return path_push($parent, $new_file_name);
+}
+
+=head3 path_file_parts($path)
+
+Returns the parts that constitute the file name (prefix and suffixes) of a
+C<$path> as a list. If the C<$path> does not have a valid file name, an empty
+list is returned instead.
+
+These parts are split in such a way that allows them to be C<join>ed together,
+resulting in the original file name of the given C<$path> again.
+
+    my @file_parts = path_file_parts("/etc/pve/firewall/cluster.fw");
+    # ("cluster", "fw")
+    my $file_name = join(".", @file_parts);
+
+    my @file_parts = path_file_parts("/tmp/archive.tar.gz");
+    # ("archive", "tar", "gz")
+    my $file_name = join(".", @file_parts);
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_parts : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    if (!defined($file_name)) {
+	return ();
+    }
+
+    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+    my @suffixes = _path_file_suffixes_from_str($suffix_str);
+
+    return ($prefix, @suffixes);
+}
+
+=head3 path_starts_with($path, $other_path)
+
+Checks whether a C<$path> starts with the components of C<$other_path>.
+
+    my $starts_with = path_starts_with("/etc/pve/firewall/cluster.fw", "/etc/pve");
+    # 1
+
+Throws an exception if any of the arguments is C<undef>.
+
+=cut
+
+sub path_starts_with : prototype($$) {
+    my ($path, $other_path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other_path if undef" if !defined($other_path);
+
+    # Nothing starts with nothing
+    return 1 if ($path eq '' && $other_path eq '');
+
+    # Nothing cannot start with something
+    # Something cannot start with nothing
+    return if ($path eq '' || $other_path eq '');
+
+    my @components = path_components($path);
+    my @other_components = path_components($other_path);
+
+    my @pairs = zip_shortest(\@components, \@other_components);
+
+    # for my ($comp, $other_comp) (@pairs) is experimental
+    for my $pair (@pairs) {
+	my ($comp, $other_comp) = $pair->@*;
+
+	if ($comp ne $other_comp) {
+	    return;
+	}
+    }
+
+    return 1;
+}
+
+=head3 path_ends_with($path, $other_path)
+
+Checks whether a C<$path> ends with the components of C<$other_path>.
+
+    my $ends_with = path_ends_with("/etc/pve/firewall/cluster.fw", "firewall/cluster.fw");
+    # 1
+
+Throws an exception if any of the arguments is C<undef>.
+
+=cut
+
+sub path_ends_with : prototype($$) {
+    my ($path, $other_path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other_path if undef" if !defined($other_path);
+
+    # Nothing ends with nothing
+    return 1 if ($path eq '' && $other_path eq '');
+
+    # Nothing cannot end with something
+    # Something cannot end with nothing
+    return if ($path eq '' || $other_path eq '');
+
+    my @components_rev = reverse(path_components($path));
+    my @other_components_rev = reverse(path_components($other_path));
+
+    my @pairs_rev = zip_shortest(\@components_rev, \@other_components_rev);
+
+    # for my ($comp, $other_comp) (@pairs_rev) is experimental
+    for my $pair (@pairs_rev) {
+	my ($comp, $other_comp) = $pair->@*;
+
+	if ($comp ne $other_comp) {
+	    return;
+	}
+    }
+
+    return 1;
+}
+
+=head3 path_equals($path, $other_path)
+
+Checks whether C<$path> equals C<$other_path>. The paths are compared
+by their components, meaning that it's not necessary to
+L<< normalize|/"path_normalize($path)" >> them beforehand.
+
+=cut
+
+sub path_equals : prototype($$) {
+    my ($path, $other_path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other_path if undef" if !defined($other_path);
+
+    # Nothing is nothing
+    return 1 if ($path eq '' && $other_path eq '');
+
+    # Nothing is not something
+    # Something is not nothing
+    return if ($path eq '' || $other_path eq '');
+
+    my @components = path_components($path);
+    my @other_components = path_components($other_path);
+
+    return if scalar(@components) != scalar(@other_components);
+
+    my @pairs = zip_longest(\@components, \@other_components);
+
+    # for my ($comp, $other_comp) (@pairs_rev) is experimental
+    for my $pair (@pairs) {
+	my ($comp, $other_comp) = $pair->@*;
+
+	return if !defined($comp) || !defined($other_comp);
+
+	if ($comp ne $other_comp) {
+	    return;
+	}
+    }
+
+    return 1;
+}
+
+1;
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2025-01-09 14:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09 14:48 [pve-devel] [PATCH v3 pve-common 00/12] Introduce and Package PVE::Path & PVE::Filesystem Max Carrara
2025-01-09 14:48 ` Max Carrara [this message]
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 02/12] test: add directory for tests of PVE::Path module Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 03/12] test: add tests for path_is_absolute and path_is_relative of PVE::Path Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 04/12] test: add tests for path_components " Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 05/12] test: add tests for path_join " Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 06/12] test: add tests for path_push " Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 07/12] test: add tests for path_parent " Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 08/12] test: add tests for path_starts_with, path_ends_with, path_equals Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 09/12] test: add tests for file path operation functions of PVE::Path Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 10/12] test: add tests for path_normalize " Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 11/12] introduce PVE::Filesystem Max Carrara
2025-01-09 14:48 ` [pve-devel] [PATCH v3 pve-common 12/12] debian: introduce package libproxmox-fs-path-utils-perl Max Carrara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250109144818.430185-2-m.carrara@proxmox.com \
    --to=m.carrara@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal