public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory
@ 2023-09-05  9:37 Gabriel Goller
  2023-09-07 15:05 ` Thomas Lamprecht
  0 siblings, 1 reply; 5+ messages in thread
From: Gabriel Goller @ 2023-09-05  9:37 UTC (permalink / raw)
  To: pbs-devel

When walking through a datastore on a GC run, it can
happen that the snapshot is deleted, and then walked over.
For example:
- read dir entry for group
- walk entries (snapshots)
- snapshot X is removed/pruned
- walking reaches snapshot X, but ENOENT
Previously we bailed here, now we just ignore it.

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index fe75d9b5..d135ad90 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -869,18 +869,24 @@ impl DataStore {
         let handle_entry_err = |err: walkdir::Error| {
             if let Some(inner) = err.io_error() {
                 if let Some(path) = err.path() {
-                    if inner.kind() == io::ErrorKind::PermissionDenied {
-                        // only allow to skip ext4 fsck directory, avoid GC if, for example,
-                        // a user got file permissions wrong on datastore rsync to new server
-                        if err.depth() > 1 || !path.ends_with("lost+found") {
-                            bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
+                    match inner.kind() {
+                        io::ErrorKind::PermissionDenied => {
+                            // only allow to skip ext4 fsck directory, avoid GC if, for example,
+                            // a user got file permissions wrong on datastore rsync to new server
+                            if err.depth() > 1 || !path.ends_with("lost+found") {
+                                bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
+                            }
+                        }
+                        io::ErrorKind::NotFound => {
+                            // ignore vanished file
+                        }
+                        _ => {
+                            bail!(
+                                "unexpected error on datastore traversal: {} - {:?}",
+                                inner,
+                                path
+                            )
                         }
-                    } else {
-                        bail!(
-                            "unexpected error on datastore traversal: {} - {:?}",
-                            inner,
-                            path
-                        )
                     }
                 } else {
                     bail!("unexpected error on datastore traversal: {}", inner)
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory
  2023-09-05  9:37 [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory Gabriel Goller
@ 2023-09-07 15:05 ` Thomas Lamprecht
  2023-09-08  7:41   ` Gabriel Goller
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Lamprecht @ 2023-09-07 15:05 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Gabriel Goller

On 05/09/2023 11:37, Gabriel Goller wrote:
> When walking through a datastore on a GC run, it can
> happen that the snapshot is deleted, and then walked over.
> For example:
> - read dir entry for group
> - walk entries (snapshots)
> - snapshot X is removed/pruned
> - walking reaches snapshot X, but ENOENT
> Previously we bailed here, now we just ignore it.
> 

looks mostly fine, some style nits and a actual comment inline.

> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
>  pbs-datastore/src/datastore.rs | 28 +++++++++++++++++-----------
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index fe75d9b5..d135ad90 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -869,18 +869,24 @@ impl DataStore {
>          let handle_entry_err = |err: walkdir::Error| {
>              if let Some(inner) = err.io_error() {
>                  if let Some(path) = err.path() {
> -                    if inner.kind() == io::ErrorKind::PermissionDenied {
> -                        // only allow to skip ext4 fsck directory, avoid GC if, for example,
> -                        // a user got file permissions wrong on datastore rsync to new server
> -                        if err.depth() > 1 || !path.ends_with("lost+found") {
> -                            bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
> +                    match inner.kind() {
> +                        io::ErrorKind::PermissionDenied => {

that's some deep indentation level.. not a must, but maybe you find
some good/simple way to refactor some of this to make it a bit less
crowded here (if, then in a separate patch please) 

> +                            // only allow to skip ext4 fsck directory, avoid GC if, for example,
> +                            // a user got file permissions wrong on datastore rsync to new server
> +                            if err.depth() > 1 || !path.ends_with("lost+found") {
> +                                bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
> +                            }
> +                        }
> +                        io::ErrorKind::NotFound => {
> +                            // ignore vanished file

would be still good to log that here, at least at debug level
if it can be noisy; but as there wasn't many that run into this
in the four years of PBS existing I'd guess a always visible
level is fine as long as the log message doesn't sounds scary. 

> +                        }
> +                        _ => {
> +                            bail!(
> +                                "unexpected error on datastore traversal: {} - {:?}",
> +                                inner,
> +                                path

as already mentioned once, please use captured variables directly
in the format strings for new additions or lines that you touch anyway.
E.g.:

bail!("unexpected error on datastore traversal: {inner} - {path:?}")

> +                            )
>                          }
> -                    } else {
> -                        bail!(
> -                            "unexpected error on datastore traversal: {} - {:?}",
> -                            inner,
> -                            path
> -                        )

same here

>                      }
>                  } else {
>                      bail!("unexpected error on datastore traversal: {}", inner)





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory
  2023-09-07 15:05 ` Thomas Lamprecht
@ 2023-09-08  7:41   ` Gabriel Goller
  2023-09-08  9:36     ` Thomas Lamprecht
  0 siblings, 1 reply; 5+ messages in thread
From: Gabriel Goller @ 2023-09-08  7:41 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion


On 9/7/23 17:05, Thomas Lamprecht wrote:
> On 05/09/2023 11:37, Gabriel Goller wrote:
>> When walking through a datastore on a GC run, it can
>> happen that the snapshot is deleted, and then walked over.
>> For example:
>> - read dir entry for group
>> - walk entries (snapshots)
>> - snapshot X is removed/pruned
>> - walking reaches snapshot X, but ENOENT
>> Previously we bailed here, now we just ignore it.
>>
> looks mostly fine, some style nits and a actual comment inline.
>
>> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
>> ---
>>   pbs-datastore/src/datastore.rs | 28 +++++++++++++++++-----------
>>   1 file changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index fe75d9b5..d135ad90 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -869,18 +869,24 @@ impl DataStore {
>>           let handle_entry_err = |err: walkdir::Error| {
>>               if let Some(inner) = err.io_error() {
>>                   if let Some(path) = err.path() {
>> -                    if inner.kind() == io::ErrorKind::PermissionDenied {
>> -                        // only allow to skip ext4 fsck directory, avoid GC if, for example,
>> -                        // a user got file permissions wrong on datastore rsync to new server
>> -                        if err.depth() > 1 || !path.ends_with("lost+found") {
>> -                            bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
>> +                    match inner.kind() {
>> +                        io::ErrorKind::PermissionDenied => {
> that's some deep indentation level.. not a must, but maybe you find
> some good/simple way to refactor some of this to make it a bit less
> crowded here (if, then in a separate patch please)
I could check if `err` is an `io:Error`, thus returning early. Then 
calling `.unwrap()`
to get the actual `io::Error` later on.
>> +                            // only allow to skip ext4 fsck directory, avoid GC if, for example,
>> +                            // a user got file permissions wrong on datastore rsync to new server
>> +                            if err.depth() > 1 || !path.ends_with("lost+found") {
>> +                                bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
>> +                            }
>> +                        }
>> +                        io::ErrorKind::NotFound => {
>> +                            // ignore vanished file
> would be still good to log that here, at least at debug level
> if it can be noisy; but as there wasn't many that run into this
> in the four years of PBS existing I'd guess a always visible
> level is fine as long as the log message doesn't sounds scary.
How about a "ignoring/skipping vanished file: {path}" on the info log 
level?
>> +                        }
>> +                        _ => {
>> +                            bail!(
>> +                                "unexpected error on datastore traversal: {} - {:?}",
>> +                                inner,
>> +                                path
> as already mentioned once, please use captured variables directly
> in the format strings for new additions or lines that you touch anyway.
> E.g.:
>
> bail!("unexpected error on datastore traversal: {inner} - {path:?}")
>
>> +                            )
>>                           }
>> -                    } else {
>> -                        bail!(
>> -                            "unexpected error on datastore traversal: {} - {:?}",
>> -                            inner,
>> -                            path
>> -                        )
> same here
>
>>                       }
>>                   } else {
>>                       bail!("unexpected error on datastore traversal: {}", inner)




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory
  2023-09-08  7:41   ` Gabriel Goller
@ 2023-09-08  9:36     ` Thomas Lamprecht
  2023-09-08 12:56       ` Gabriel Goller
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Lamprecht @ 2023-09-08  9:36 UTC (permalink / raw)
  To: Gabriel Goller, Proxmox Backup Server development discussion

On 08/09/2023 09:41, Gabriel Goller wrote:
>> that's some deep indentation level.. not a must, but maybe you find
>> some good/simple way to refactor some of this to make it a bit less
>> crowded here (if, then in a separate patch please)
> I could check if `err` is an `io:Error`, thus returning early. Then calling `.unwrap()`
> to get the actual `io::Error` later on.

I re-checked the whole code and pushed a clean-up that reduces the
indentation level and also code lines. Can you please re-check on that
if to ensure I did not missed something by mistake?

Oh, and while checking this out I noticed something odd, i.e., that we
silence any error that isn't a io::Error one,  as of Walkdir::Error
rustdocs [0] that can only happen if there's a (symlink) loop, but
still. So I checked the history and it seems that I caused this way
back in 2020 by the original addition of the "ignore lost+found" EPERM
commit [1].
But just making that an error again now might be a breaking change,
depending on what effect symlink loops currently actually have on GC.
Can you test this please? If it does makes GC endlessly loop then we
can just introduce the error again, as it's now broken already anyway.
But if walkdir breaks the loop, or the like, then we might just make
that an log with warning level. As we do not enable WalkDir's opt-in
follow_symlinks it might not matter after all as the Loop error
shouldn't be triggered if symlinks aren't followed, but still good
to test and be certain.

[0]: https://rust.velas.com/walkdir/struct.Error.html#method.io_error
[1]: c3b090ac ("backup: list images: handle walkdir error, catch "lost+found"")

>>> +                            // only allow to skip ext4 fsck directory, avoid GC if, for example,
>>> +                            // a user got file permissions wrong on datastore rsync to new server
>>> +                            if err.depth() > 1 || !path.ends_with("lost+found") {
>>> +                                bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
>>> +                            }
>>> +                        }
>>> +                        io::ErrorKind::NotFound => {
>>> +                            // ignore vanished file
>> would be still good to log that here, at least at debug level
>> if it can be noisy; but as there wasn't many that run into this
>> in the four years of PBS existing I'd guess a always visible
>> level is fine as long as the log message doesn't sounds scary.
> How about a "ignoring/skipping vanished file: {path}" on the info log level?

Yeah, something like that, but I'd use just one of "ignoring" or
"skipping"

Oh, and I would also mention in the commit message that backups that
are just being created (where a atomic rename from e.g. a tmpdir to
the actual one might also cause a ENOENT) are not problematic here,
as GC is handling those explicitly anyway.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory
  2023-09-08  9:36     ` Thomas Lamprecht
@ 2023-09-08 12:56       ` Gabriel Goller
  0 siblings, 0 replies; 5+ messages in thread
From: Gabriel Goller @ 2023-09-08 12:56 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion


On 9/8/23 11:36, Thomas Lamprecht wrote:
> On 08/09/2023 09:41, Gabriel Goller wrote:
>>> that's some deep indentation level.. not a must, but maybe you find
>>> some good/simple way to refactor some of this to make it a bit less
>>> crowded here (if, then in a separate patch please)
>> I could check if `err` is an `io:Error`, thus returning early. Then calling `.unwrap()`
>> to get the actual `io::Error` later on.
> I re-checked the whole code and pushed a clean-up that reduces the
> indentation level and also code lines. Can you please re-check on that
> if to ensure I did not missed something by mistake?
>
> Oh, and while checking this out I noticed something odd, i.e., that we
> silence any error that isn't a io::Error one,  as of Walkdir::Error
> rustdocs [0] that can only happen if there's a (symlink) loop, but
> still. So I checked the history and it seems that I caused this way
> back in 2020 by the original addition of the "ignore lost+found" EPERM
> commit [1].
> But just making that an error again now might be a breaking change,
> depending on what effect symlink loops currently actually have on GC.
> Can you test this please? If it does makes GC endlessly loop then we
> can just introduce the error again, as it's now broken already anyway.
> But if walkdir breaks the loop, or the like, then we might just make
> that an log with warning level. As we do not enable WalkDir's opt-in
> follow_symlinks it might not matter after all as the Loop error
> shouldn't be triggered if symlinks aren't followed, but still good
> to test and be certain.
>
> [0]: https://rust.velas.com/walkdir/struct.Error.html#method.io_error
> [1]: c3b090ac ("backup: list images: handle walkdir error, catch "lost+found"")

We don't follow symbolic links when traversing files.
`walkdir` has a builder function `follow_links()` with which you can
select if you want to follow symbolic links or not, the default is no [0].

[0]: 
https://docs.rs/walkdir/latest/walkdir/struct.WalkDir.html#method.follow_links 

>>>> +                            // only allow to skip ext4 fsck directory, avoid GC if, for example,
>>>> +                            // a user got file permissions wrong on datastore rsync to new server
>>>> +                            if err.depth() > 1 || !path.ends_with("lost+found") {
>>>> +                                bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
>>>> +                            }
>>>> +                        }
>>>> +                        io::ErrorKind::NotFound => {
>>>> +                            // ignore vanished file
>>> would be still good to log that here, at least at debug level
>>> if it can be noisy; but as there wasn't many that run into this
>>> in the four years of PBS existing I'd guess a always visible
>>> level is fine as long as the log message doesn't sounds scary.
>> How about a "ignoring/skipping vanished file: {path}" on the info log level?
> Yeah, something like that, but I'd use just one of "ignoring" or
> "skipping"
>
> Oh, and I would also mention in the commit message that backups that
> are just being created (where a atomic rename from e.g. a tmpdir to
> the actual one might also cause a ENOENT) are not problematic here,
> as GC is handling those explicitly anyway.
Submitted a new patch.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-08 12:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-05  9:37 [pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory Gabriel Goller
2023-09-07 15:05 ` Thomas Lamprecht
2023-09-08  7:41   ` Gabriel Goller
2023-09-08  9:36     ` Thomas Lamprecht
2023-09-08 12:56       ` Gabriel Goller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal