public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Max Carrara" <m.carrara@proxmox.com>
To: "Wolfgang Bumiller" <w.bumiller@proxmox.com>
Cc: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH v2 pve-common 01/12] introduce PVE::Path
Date: Thu, 09 Jan 2025 13:56:15 +0100	[thread overview]
Message-ID: <D6XK4S035SMX.29YOUT77ZCCA0@proxmox.com> (raw)
In-Reply-To: <ipkh37upodgful7xzaschqfyupvy4yb454vljll7v2ab6ury62@u2nksoadgcc2>

On Thu Jan 9, 2025 at 12:06 PM CET, Wolfgang Bumiller wrote:
> On Thu, Jan 09, 2025 at 10:56:16AM +0100, Max Carrara wrote:
> > On Wed Jan 8, 2025 at 3:05 PM CET, Wolfgang Bumiller wrote:
> > > On Fri, Dec 20, 2024 at 07:51:56PM +0100, Max Carrara wrote:
> > > > +my sub _path_file_suffixes : prototype($) {
> > > > +    my ($file_name_no_prefix) = @_;
> > > > +
> > > > +    confess "\$file_name_no_prefix is undef" if !defined($file_name_no_prefix);
> > > > +
> > > > +    # Suffixes are extracted "manually" because join()ing the result of split()
> > > > +    # results in a different file name than the original. Let's say you have a
> > > > +    # file named "foo.bar.". The correct suffixes would be ("bar", "").
> > > > +    # With split, you get the following:
> > > > +    #     split(/\./, ".bar.")     --> ("", "bar")     --> join()ed to "foo..bar"
> > > > +    #     split(/\./, ".bar.", -1) --> ("", "bar", "") --> join()ed to "foo..bar."
> > > > +    my @suffixes = ();
> > > > +    while ($file_name_no_prefix =~ s|^(\.[^\.]*)||) {
> > > > +	my $suffix = $1;
> > > > +	$suffix =~ s|^\.||;
> > > > +	push(@suffixes, $suffix);
> > > > +    }
> > > > +
> > > > +    return @suffixes;
> > > > +}
> > > > +
> > > > +=head3 path_file_suffixes($path)
> > > > +
> > > > +Returns the suffixes of the C<$path>'s file name as a list. If the C<$path> does
> > > > +not have a valid file name, an empty list is returned instead.
> > > > +
> > > > +In scalar context, returns a reference to a list.
> > >
> > > (^ Isn't this a bit awkward?)
> > 
> > Well, do you think it is? I've been using that kind of "return style"
> > here and there and have been liking it, but if it's weird to others, I
> > can adapt it. :P
>
> If you remove the distinction and always return the list, the scalar
> context would give you the final element, which would be "the extension"
> ('gz', not 'tar.gz' :P), which would also be a nice shortcut for what
> I'd anticipate to be one of the most common(?) uses.

Sure, that's fine by me!

Though, I still wanna keep path_file_suffix() around, as that returns
only the last suffix (the extension), as that's a bit more explicit.

There are a few other functions that return a listref in scalar context
though; I'd prefer to adapt those too then, for consistency's sake and
all that. (Since this is a fresh library, I want it to have no surprises
about its usage ;P )

I'll have to adapt the tests a little for this, but otherwise this
shouldn't be much of a hassle to change.

>
> > 
> > >
> > > > +
> > > > +The suffixes of a path are essentially the file name's extensions, the parts
> > > > +that come after the L<< prefix|/"path_file_prefix($path)" >>.
> > > > +
> > > > +    my @suffixes = path_file_suffixes("/etc/resolv.conf");
> > > > +    # ("conf")
> > > > +
> > > > +    my $suffixes = path_file_prefix("/tmp/archive.tar.zst");
> > > > +    # ["tar", "zst"]
> > > > +
> > > > +Throws an exception if C<$path> is C<undef>.
> > > > +
> > > > +=cut
> > > > +
> > > > +sub path_file_suffixes : prototype($) {
> > > > +    my ($path) = @_;
> > > > +
> > > > +    croak "\$path is undef" if !defined($path);
> > > > +
> > > > +    my $file_name = path_file_name($path);
> > > > +    if (!defined($file_name)) {
> > > > +	return wantarray ? () : [];
> > > > +    }
> > > > +
> > > > +    (undef, $file_name) = _path_file_prefix($file_name);
> > > > +
> > > > +    my @suffixes = _path_file_suffixes($file_name);
> > > > +
> > > > +    return wantarray ? @suffixes : \@suffixes;
> > > > +}
> > > > +
> > > > +=head3 path_with_file_suffixes($path, @suffixes)
> > > > +
> > > > +Returns C<$path> with new C<@suffixes>. This is similar to
> > > > +C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>,
> > > > +except that the suffixes of the file name are replaced.
> > > > +
> > > > +If the C<$path> does not have a file name, C<undef> is returned.
> > > > +
> > > > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "pxar", "gz");
> > > > +    # /tmp/archive.pxar.gz
> > > > +
> > > > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "gz");
> > > > +    # /tmp/archive.gz
> > > > +
> > > > +If the file name has no suffixes, the C<@suffixes> are appended instead:
> > > > +
> > > > +    my $new_path = path_with_file_suffixes("/etc/resolv", "conf");
> > > > +    # /etc/resolv.conf
> > > > +
> > > > +    my $new_path = path_with_file_suffixes("/etc/resolv", "conf", "zst");
> > > > +    # /etc/resolv.conf.zst
> > > > +
> > > > +If there are no C<@suffixes> provided, the file name's suffixes will
> > > > +be removed (if there are any):
> > > > +
> > > > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst");
> > > > +    # /tmp/archive
> > > > +
> > > > +Note that an empty string is still a valid suffix (an "empty" file ending):
> > > > +
> > > > +    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "", "", "", "zst");
> > > > +    # /tmp/archive....zst
> > > > +
> > > > +Throws an exception if C<$path> or any of the C<@suffixes> is C<undef>, or
> > > > +if any suffix contains a path separator (C</>) or a C<.>.
> > > > +
> > > > +=cut
> > > > +
> > > > +sub path_with_file_suffixes : prototype($@) {
> > > > +    my ($path, @suffixes) = @_;
> > >
> > > I am questioning a bit the sanity of having "suffixes" throughout this
> > > module instead of simply an "extension" that covers them all and can be
> > > split if needed...
> > >
> > > Do we have/anticipate particular use cases where this is more
> > > convenient?
> > 
> > To be really honest, it was 50/50 on the "extension" vs "suffixes"
> > decision. I then opted for suffixes instead, because they seemed to be
> > less ambiguous than an "extension". Let me elaborate.
> > 
> > Let's say we have a file called "foo.tar.gz" -- what would its extension
> > be? Some might say that ".tar.gz" is its extension, while some others
> > might say ".gz"; both have different reasonings but aren't necessarily
> > more or less correct than the other.
> > 
> > In our case, both you and I would say that ".tar.gz" is the extension,
> > but...
> > 
> > Rust's std::path::Path::extension [1] would return "gz" for the above,
> > while C++'s std::filesystem::path::extension [2] returns ".gz" (but they
> > don't have such a case in their docs!). Java has java.nio.file.Path, but
> > conveniently doesn't care about extensions. Kotlin fixest his by adding
> > an "extension" property to that class, which returns "gz" [3].
> > 
> > Python's pathlib is the only one I've seen go the "suffix route" and
> > provides the pathlib.PurePath.suffix and .suffixes methods [4]. This
> > type of approach also came up on the (Rust) tracking issue for
> > Path::file_prefix [5].
> > 
> > When I was looking up all that, I decided to go for the prefix +
> > suffix(es) route, as that just seemed to be the most unambiguous.
> > Convenience wasn't really a factor here, because there's the
> > "path_with_file_* family" of functions that should handle most of the
> > replacement cases.
> > 
> > (Besides, I found it quite nice that I could join() on the prefix +
> > suffixes (e.g. join(".", "foo", "tar", "gz")) that I first extracted
> > with path_file_prefix() and path_file_suffixes(); I like that there's an
> > "inverse" operation.)
> > 
> > *But,* I wouldn't be opposed to adding a function that just returns
> > "tar.gz" for the above case. Perhaps with a different name though :P
> > 
> > [1] https://doc.rust-lang.org/std/path/struct.Path.html#method.extension
> > [2] https://en.cppreference.com/w/cpp/filesystem/path/extension
> > [3] https://github.com/JetBrains/kotlin/blob/rrr/2.1.0/core-docs/libraries/stdlib/jdk7/src/kotlin/io/path/PathUtils.kt#L46
> > [4] https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.suffix
> > [5] https://github.com/rust-lang/rust/issues/86319
>
> TBH I wouldn't care if it returned "tar.gz", ".tar.gz", "gz" or ".gz"
> as long as it's documented ;-).
>
> But yeah, I guess suffixes makes sense - "forces"(🤪) you to not ignore
> it in your code...

ACK -- will keep it that way then.



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-01-09 12:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20 18:51 [pve-devel] [PATCH v2 pve-common 00/12] Introduce and Package PVE::Path & PVE::Filesystem Max Carrara
2024-12-20 18:51 ` [pve-devel] [PATCH v2 pve-common 01/12] introduce PVE::Path Max Carrara
2025-01-08 14:05   ` Wolfgang Bumiller
2025-01-09  9:56     ` Max Carrara
2025-01-09 11:06       ` Wolfgang Bumiller
2025-01-09 12:56         ` Max Carrara [this message]
2024-12-20 18:51 ` [pve-devel] [PATCH v2 pve-common 02/12] test: add directory for tests of PVE::Path module Max Carrara
2024-12-20 18:51 ` [pve-devel] [PATCH v2 pve-common 03/12] test: add tests for path_is_absolute and path_is_relative of PVE::Path Max Carrara
2024-12-20 18:51 ` [pve-devel] [PATCH v2 pve-common 04/12] test: add tests for path_components " Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 05/12] test: add tests for path_join " Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 06/12] test: add tests for path_push " Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 07/12] test: add tests for path_parent " Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 08/12] test: add tests for path_starts_with, path_ends_with, path_equals Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 09/12] test: add tests for file path ops functions of PVE::Path Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 10/12] test: add tests for path_normalize " Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 11/12] introduce PVE::Filesystem Max Carrara
2024-12-20 18:52 ` [pve-devel] [PATCH v2 pve-common 12/12] debian: introduce package libproxmox-fs-path-utils-perl Max Carrara
2025-01-02 13:46 ` [pve-devel] [PATCH v2 pve-common 00/12] Introduce and Package PVE::Path & PVE::Filesystem Fiona Ebner
2025-01-02 13:53   ` Fiona Ebner
2025-01-02 15:54     ` Max Carrara
2025-01-03  9:49       ` Fiona Ebner
2025-01-03 10:41         ` Thomas Lamprecht
2025-01-03 12:37           ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D6XK4S035SMX.29YOUT77ZCCA0@proxmox.com \
    --to=m.carrara@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal