From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 1ED171FF16E
	for <inbox@lore.proxmox.com>; Mon,  3 Feb 2025 17:45:40 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 2B3D958D1;
	Mon,  3 Feb 2025 17:45:38 +0100 (CET)
Mime-Version: 1.0
Date: Mon, 03 Feb 2025 17:45:32 +0100
Message-Id: <D7IYNXZZS8NC.1QM8JMZ0GPFAN@proxmox.com>
To: "Fiona Ebner" <f.ebner@proxmox.com>, "Proxmox VE development discussion"
 <pve-devel@lists.proxmox.com>
From: "Max Carrara" <m.carrara@proxmox.com>
X-Mailer: aerc 0.18.2-0-ge037c095a049
References: <20250109144818.430185-1-m.carrara@proxmox.com>
 <20250109144818.430185-2-m.carrara@proxmox.com>
 <b1d89b2a-4fdc-4e2b-84e4-1c91abb6dbb5@proxmox.com>
 <D7IVLKXJY6X0.3GP8C2WBUTDFO@proxmox.com>
 <bac41f45-92cb-4b4a-ab49-4cedfdb2968a@proxmox.com>
In-Reply-To: <bac41f45-92cb-4b4a-ab49-4cedfdb2968a@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.066 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

On Mon Feb 3, 2025 at 4:42 PM CET, Fiona Ebner wrote:
> Am 03.02.25 um 15:21 schrieb Max Carrara:
> > On Fri Jan 31, 2025 at 1:23 PM CET, Fiona Ebner wrote:
> >> Am 09.01.25 um 15:48 schrieb Max Carrara:
> >>
> >> Why this kind of behavior with absolute paths? Seems surprising to me.
> >> Wouldn't failing the call be better?
> > 
> > I decided to stick with Rust's behaviour [pathbuf] here and in most
> > other places, simply in order to keep the behaviour consistent.
> > 
> > I'm not *really* a fan of it either, but I think it's better than
> > failing the call -- IMO having to wrap path_join / path_push into an
> > eval-block would be quite awkward for what's a rather "small" operation
> > :s
> > 
> > Perhaps I should provide some additional context here as well so as to
> > make my "design philosophy" more clear: The reason why I modeled
> > PVE::Path (mostly) after Rust's std::path::{Path, PathBuf} is because:
> > 
> >   1. Most (newer) developers will be much more likely to be familiar
> >      with Rust instead of Perl, so if they should see this library,
> >      they'll have a much easier time using it (or at least that's what
> >      I *hope* would happen). Sort of as in: "Oh, this is just like the
> >      path stuff in Rust, except for a few Perl-esque exceptions!"
> > 
> >   2. I wanted the library to be essentially exception-free, except where
> >      obvious invariants aren't upheld by the caller (e.g. passing undef).
> >      Unless you match the error message with a regex, there's no
> >      structurally typed way to differ between errors in Perl, so I'd
> >      like the "types" of errors to be mostly the same wherever possible.
> > 
> >   3. I didn't want to reinvent the wheel and instead stick to what other
> >      popular libraries do as much as possible, while also avoiding any
> >      additional abstractions (e.g. introducing an object for paths).
> > 
> > With all that being said, since that behaviour is surprising to you, I
> > have no quarrels changing it! :P After all I'd like to have something
> > that doesn't cause any friction when using it.
> > 
> > [pathbuf]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push
> > 
>
> You don't need to change it if there was good rationale behind it, which
> you provided :) A short comment in the function documentation regarding
> the inspiration would still be nice. Puts the responsibility on the
> caller to not accidentally use an absolute path though. Maybe we can at
> least warn instead? I suspect almost all calls with an absolute path in
> the middle will be accidental.

Sure, that sounds fine by me! :)

>
> >>
> >>> +
> >>> +    my $joined = path_join("foo/bar", "/etc", "resolv.conf");
> >>> +    # /etc/resolv.conf
> >>> +
> >>> +    my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts");
> >>> +    # /etc/hosts
> >>> +
> >>> +Throws an exception if any of the passed C<@paths> is C<undef>.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_join : prototype(@) {
> >>> +    my (@paths) = @_;
> >>> +
> >>> +    if (!scalar(@paths)) {
> >>> +	return '';
> >>> +    }
> >>> +
> >>> +    croak "one of the provided paths is undef" if any { !defined($_) } @paths;
> >>> +
> >>
> >> I think the rest could be written more efficiently like (untested):
> >>
> >> my $resulting_path = shift @paths;
> >> for my $path (@paths) {
> >>   if ($path =~ m#^/#) {
> >>     $resulting_path = $path;
> >>   } else {
> >>     $resulting_path = path_push($resulting_path, $path);
> >>   }
> >> }
> > 
> > Note that I'm doing a reverse iteration below; I think doing it either
> > way should be fine, because I doubt we'll ever need to join that many
> > path components where it's actually going to make a difference ;P
> > 
>
> I also meant "efficiently" in terms of lines/readability ;) Shorter,
> less loops and avoids index arithmetic.

Oooh, that makes sense then! Sure, I'll give it a try :)

>
> >>
> >>> +    # Find the last occurrence of a root directory and start conjoining the
> >>> +    # components from there onwards
> >>> +    my $index = scalar(@paths) - 1;
> >>> +    while ($index > 0) {
> >>> +	last if $paths[$index] =~ m#^/#;
> >>> +	$index--;
> >>> +    }
> >>> +
> >>> +    @paths = @paths[$index .. (scalar(@paths) - 1)];
> >>> +
> >>> +    my $resulting_path = shift @paths;
> >>> +
> >>> +    for my $path (@paths) {
> >>> +	$resulting_path = path_push($resulting_path, $path);
> >>> +    }
> >>> +
> >>> +    return $resulting_path;
> >>> +}
> >>> +
> >>> +=head3 path_normalize($path)
> >>> +
> >>> +Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of the given
> >>> +C<$path>.
> >>> +
> >>> +This removes unnecessary components of a path that can be safely
> >>> +removed, such as C<.>, trailing C</> or repeated occurrences of C</>.
> >>> +
> >>> +For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become
> >>> +C<foo/bar/baz>.
> >>> +
> >>> +B<Note:> This will I<not> remove components referencing the parent directory,
> >>> +i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will therefore
> >>> +remain as they are. However, the parent directory of C</> is C</>, so
> >>> +C</../../foo> will be normalized to C</foo>.
> >>> +
> >>> +Throws an exception if C<$path> is C<undef> or the call to C<canonpath> failed.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_normalize : prototype($) {
> >>> +    my ($path) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +
> >>> +    my $cleaned_path = eval {
> >>> +	File::Spec->canonpath($path);
> >>> +    };
> >>> +
> >>
> >> Style nit: blank lines between eval and using $@ are better avoided
> >> IMHO. I'd also have the eval expression be a single line, but no strong
> >> feelings.
> > 
> > Could you perhaps elaborate why, please?
> > 
>
> Because new code gets added more easily in between. I've seen that
> happen before.

Ah, okay! Sure, then I'll change it. At first I feared that there was
some (--> yet another) obscure Perl edge case I didn't know about. :P

>
> ---snip 8<---
>
> >>> +Throws an exception if any of the arguments is C<undef>.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_push : prototype($$) {
> >>> +    my ($path, $other) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +    croak "\$other is undef" if !defined($other);
> >>> +
> >>> +    return $path if $other eq '';
> >>> +    return $other if path_is_absolute($other);
> >>> +
> >>> +    my $need_sep = $path ne '' && $path !~ m#/$#;
> >>> +
> >>> +    $path .= "/" if $need_sep;
> >>> +    $path .= $other;
> >>> +
> >>> +    return $path;
> >>> +}
> >>> +
> >>> +=head3 path_pop($path)
> >>> +
> >>> +Alias for C<L<< path_parent()|/"path_parent($path)" >>>.
> >>> +
> >>
> >> Why have this alias? The name suggests it would behave like an inverse
> >> to path_push(), but it does not, because of the normalization of
> >> path_parent(). So I'd rather not have it or have it be a non-normalizing
> >> version (but maybe not worth it) to avoid surprises.
> > 
> > Could you please elaborate what you mean regarding the normalization of
> > path_parent()?
>
> I just mean push "." followed by pop is not the same as you start with.
>
> > 
> > path_parent() (or path_pop()) will only normalize as much as it needs to
> > in order to remove the last path component and get to the one that
> > precedes that:
> > 
> >   my $p = "foo/././bar/././baz/./.";
> > 
> >   $p = PVE::Path::path_parent($p):  # or path_pop()
> >   # $p is now "foo/././bar"
> > 
> > path_push() does actually behave like an inverse if you include the
> > redundant stuff:
> > 
> >   $p = PVE::Path::path_push($p, "././baz/./.")
> >   # $p is now "foo/././bar/././baz/./." again
> > 
> > It's not a complete inverse, that's true (as in, it's not bijective ;P),
> > because you can't replicate the example above with a path like
> > "foo///bar///baz" since you can't push "///baz" onto "foo///bar"
> > ("///baz" is treated as an absolute path).
> > 
> > I think a non-normalizing version wouldn't really be that nice to use;
> > one would have to constantly remember to call path_normalize()
> > beforehand to actually get the desired result (i.e. getting the parent
> > directory).
>
> Agreed.
>
> > 
> > I can just remove the alias if you prefer; it's not equivalent to Rust's
> > PathBuf::pop() [pop] and if it's a source of confusion, we can probably
> > do without.
> > 
> > [pop]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.pop
> > 
>
> I see where the inspiration comes from then :) I get why Rust has both,
> because one is for Path and the other for PathBuf. And since they are
> essentially the same operation, fine by me to keep this alias.
>
> >>
> >>> +=cut
> >>> +
> >>> +sub path_pop : prototype($) {
> >>> +    my ($path) = @_;
> >>> +    return path_parent($path);
> >>> +}
> >>> +
> >>> +=head3 path_file_name($path)
> >>> +
> >>> +Returns the last component of the given C<$path>, if it is a legal file name,
> >>> +or C<undef> otherwise.
> >>> +
> >>> +If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component,
> >>> +there is no valid file name.
> >>> +
> >>> +B<Note:> This does not check whether the given C<$path> actually points to a
> >>> +file or a directory etc.
> >>> +
> >>> +Throws an exception if C<$path> is C<undef>.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_file_name : prototype($) {
> >>> +    my ($path) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +
> >>> +    my @components = path_components($path);
> >>> +
> >>> +    if (!scalar(@components)) {
> >>> +	return;
> >>> +    }
> >>> +
> >>> +    if (
> >>> +	scalar(@components) == 1
> >>> +	&& ($components[0] eq '/' || $components[0] eq '.')
> >>> +    ) {
> >>
> >> Style nit: this conditional fits well into 100 characters, so can be one
> >> line instead of four
> > 
> > Fair point, but I prefer to have it a little more readable here, and
> > it's still in line with our style guide. :P
>
> Personally, I find this version "more noisy" to read, because of the
> split, but no hard feelings :)
>
> ---snip 8<---
>
> >>> +=cut
> >>> +
> >>> +sub path_file_prefix : prototype($) {
> >>> +    my ($path) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +
> >>> +    my $file_name = path_file_name($path);
> >>> +    return undef if !defined($file_name);
> >>> +
> >>> +    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
> >>> +    return $prefix;
> >>> +}
> >>> +
> >>> +=head3 path_with_file_prefix($path, $prefix)
> >>> +
> >>
> >> Hmm, looking at this and path_with_file_suffix{,es}(), would it maybe be
> >> nicer to have a single
> >> path_with_file_name_from_parts($path, $prefix, @suffixes)
> >> function instead of these? Would seem more natural/straightforward to
> >> me. The implementations are rather involved IMHO compared to how useful
> >> the functions are. It's very easy to get the behavior for the (I suspect
> >> rather uncommon) case of where you want to replace only prefix or
> >> suffix(es) without already knowing the other. Just need to call
> >> path_file_parts() first. Or did you take inspiration from somewhere else
> >> for these?
> > 
> > I'll have to disagree here precisely because having to call
> > path_file_parts() first is IMO tedious from the caller's perspective. In
> > many cases I rarely care about all of the individual parts; instead I
> > just e.g. want to rename a file while keeping the suffix(es), or I want to
> > trim off the last suffix of something like "archive.tar.gz" because I'm
> > decompressing it to "archive.tar" or similar.
> > 
> > In other words, in a lot of cases I just want to perform one individual
> > operation, and the path_file_* and path_with_file_* functions represent
> > all those individual operations as concisely as possible. The burden of
> > "figuring out what to do" shouldn't be given to the caller in such
> > cases.
> > 
> > Regarding inspiration: These functions were inspired after having read a
> > related Rust GitHub issue [issue] where a lot of this was discussed as
> > well.
> > 
> > That issue also suggests having some kind of iterator for file parts,
> > that other methods of std::path::Path can be based on; this is actually
> > what led me to adding path_file_parts() below. You have access to the
> > individual parts there and can just join() them to the file name. That
> > would be very similar to the path_with_file_name_from_parts() function
> > you suggested. :P
> > 
> > [issue]: https://github.com/rust-lang/rust/issues/86319
> > 
>
> Okay, thank you for elaborating! Maybe such calls are more common than I
> think.

You're welcome!

Also thanks for the thorough review, it's much appreciated :) I'll see
if I can send out a v4 soon enough.



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel