public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Max Carrara" <m.carrara@proxmox.com>
To: "Christian Ebner" <c.ebner@proxmox.com>,
	"Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH v8 proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored
Date: Wed, 09 Apr 2025 12:37:13 +0200	[thread overview]
Message-ID: <D921JCX67RXY.W8KKT9VP73W9@proxmox.com> (raw)
In-Reply-To: <7c458efa-ce3b-4b6a-946c-f87098a08f2b@proxmox.com>

On Sat Apr 5, 2025 at 11:16 AM CEST, Christian Ebner wrote:
> On 4/4/25 17:41, Max Carrara wrote:
> > Regarding my comment in the cover letter: The call to
> > `self.cond_touch_path` above will fail on a directory-based datastore
> > backed by ext4 mounted with -o rw,relatime:
> > 
> >      2025-04-04T16:25:44+02:00: TASK ERROR: atime safety check failed: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
> > 
> >  From `man 2 utimensat` (formatted for your reading pleasure):
> > 
> > EPERM
> >      The caller attempted to change one or both timestamps to a value
> >      other than the current time, or to change one of the timestamps to
> >      the current time while leaving the other timestamp unchanged, (i.e.,
> >      times  is  not NULL, neither tv_nsec field is UTIME_NOW, and neither
> >      tv_nsec field is UTIME_OMIT) and either:
> > 
> >      •  the caller's effective user ID does not match the owner of file,
> >         and the caller is not privileged (Linux: does not have the
> >         CAP_FOWNER capability); or,
> > 
> >      •  the file is marked append-only or immutable (see chattr(1)).
> > 
> > Sooo, I investigated a little more, since the EPERM here initially
> > didn't make any sense to me, because the same atime check actually
> > passes when creating the datastore.
> > 
> > Furthermore, the file (chunk) used for this check isn't append-only or
> > immutable (otherwise you'd see an `a` or `i` down there):
> > 
> >      $ lsattr .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> >      --------------e------- .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> > 
> > 
> > And finally, just for completeness' sake, we're using both UTIME_NOW and
> > UTIME_OMIT in `cond_touch_path`:
> > 
> >      let times: [libc::timespec; 2] = [
> >          // access time -> update to now
> >          libc::timespec {
> >              tv_sec: 0,
> >              tv_nsec: libc::UTIME_NOW,
> >          },
> >          // modification time -> keep as is
> >          libc::timespec {
> >              tv_sec: 0,
> >              tv_nsec: libc::UTIME_OMIT,
> >          },
> >      ];
> > 
> > 
> > According to the docs of `man 2 utimensat`, this would suggest that
> > we're not running under the expected effective UID and don't have
> > the CAP_FOWNER capability.
> > 
> > This is the actual crux of the issue, since the chunk is owned by
> > `root`:
> > 
> >      $ ls -alh .chunks/bb9f
> >      total 1.1M
> >      drwxr-x--- 2 backup backup 4.0K Apr  4 17:11 .
> >      drwxr-x--- 1 backup backup 1.1M Apr  4 17:11 ..
> >      -rw-r--r-- 1 root   root    159 Apr  4 17:11 bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> > 
> > 
> > Just to add some additional findings -- when the *check is turned off*,
> > GC on that single chunk (no backups made yet!) runs just fine:
> > 
> >      2025-04-04T17:00:52+02:00: starting garbage collection on store test-noatime
> >      2025-04-04T17:00:52+02:00: Access time update check disabled by datastore tuning options.
> >      2025-04-04T17:00:52+02:00: Using access time cutoff 1d 5m, minimum access time is 2025-04-03T14:55:52Z
> >      2025-04-04T17:00:52+02:00: Start GC phase1 (mark used chunks)
> >      2025-04-04T17:00:52+02:00: Start GC phase2 (sweep unused chunks)
> >      2025-04-04T17:00:52+02:00: processed 73% (0 chunks)
> >      2025-04-04T17:00:52+02:00: Removed garbage: 0 B
> >      2025-04-04T17:00:52+02:00: Removed chunks: 0
> >      2025-04-04T17:00:52+02:00: Pending removals: 159 B (in 1 chunks)
> >      2025-04-04T17:00:52+02:00: Original data usage: 0 B
> >      2025-04-04T17:00:52+02:00: On-Disk chunks: 0
> >      2025-04-04T17:00:52+02:00: Deduplication factor: 1.00
> >      2025-04-04T17:00:52+02:00: queued notification (id=4bab0755-3f24-4b1a-99b0-de8e9613db9b)
> >      2025-04-04T17:00:52+02:00: TASK OK
> > 
> > *BUT*, when creating a backup via PVE, the check actually runs, despite
> > being turned off -- this appears to be something separate that you might
> > want to check as well:
>
> That is expected if the chunk has the wrong ownership, but not because 
> the fs_atime_check is executed here. Rather this happens because your VM 
> backup includes an all zero chunk, does not have it as known chunk 
> however (since the is no previous index file). This means the chunk will 
> be uploaded to the PBS, but the chunk store detects it as already 
> present. In that case the chunks atime will be updated too, but that 
> then fails because of the permission issue. See the corresponding code 
> paths here:
>   https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=pbs-datastore/src/chunk_store.rs;h=dc267d752980a988d22e6d410855d2ceb3c8db46;hb=HEAD#l460

Ah yes, of course! I don't know why I didn't think of that. Thanks for
the clarification!

>
> > 
> >      INFO: starting new backup job: vzdump 100 --mode snapshot --notes-template '{{guestname}}' --remove 0 --node roxy --notification-mode auto --storage pbs-dev-test-atime
> >      INFO: Starting Backup of VM 100 (qemu)
> >      INFO: Backup started at 2025-04-04 17:02:49
> >      INFO: status = running
> >      INFO: VM Name: dns
> >      INFO: include disk 'scsi0' 'local-zfs-main:vm-100-disk-0' 16G
> >      INFO: backup mode: snapshot
> >      INFO: ionice priority: 7
> >      INFO: snapshots found (not included into backup)
> >      INFO: creating Proxmox Backup Server archive 'vm/100/2025-04-04T15:02:49Z'
> >      INFO: issuing guest-agent 'fs-freeze' command
> >      INFO: issuing guest-agent 'fs-thaw' command
> >      ERROR: VM 100 qmp command 'backup' failed - backup register image failed: command error: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
> >      INFO: aborting backup job
> >      INFO: resuming VM again
> >      ERROR: Backup of VM 100 failed - VM 100 qmp command 'backup' failed - backup register image failed: command error: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
> >      INFO: Failed at 2025-04-04 17:02:49
> >      INFO: Backup job finished with errors
> >      TASK ERROR: job errors
> > 
> > Just to be sure:
> > 
> >      $ cat /etc/proxmox-backup/datastore.cfg
> >      # [...]
> >      datastore: test-noatime
> >              comment
> >              gc-schedule daily
> >              notification-mode notification-system
> >              path /mnt/datastore/test-noatime
> >              tuning gc-atime-safety-check=0
> > 
> > 
> > 
> > So, fixing the permissions with:
> > 
> >      # chown backup: .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> > 
> > .. allows the check to pass:
> > 
> >      2025-04-04T17:14:34+02:00: starting garbage collection on store test-noatime
> >      2025-04-04T17:14:35+02:00: Access time update check successful, proceeding with GC.
> >      2025-04-04T17:14:35+02:00: Using access time cutoff 1d 5m, minimum access time is 2025-04-03T15:09:34Z
> >      2025-04-04T17:14:35+02:00: Start GC phase1 (mark used chunks)
> >      2025-04-04T17:14:35+02:00: Start GC phase2 (sweep unused chunks)
> >      2025-04-04T17:14:35+02:00: processed 73% (0 chunks)
> >      2025-04-04T17:14:35+02:00: Removed garbage: 0 B
> >      2025-04-04T17:14:35+02:00: Removed chunks: 0
> >      2025-04-04T17:14:35+02:00: Original data usage: 0 B
> >      2025-04-04T17:14:35+02:00: On-Disk chunks: 1
> >      2025-04-04T17:14:35+02:00: Deduplication factor: 0.00
> >      2025-04-04T17:14:35+02:00: Average chunk size: 159 B
> >      2025-04-04T17:14:35+02:00: queued notification (id=06d76349-26af-43da-9ddb-5b0ee2fabfa8)
> >      2025-04-04T17:14:35+02:00: TASK OK
> > 
> > So, guess you just have to change the UID and GID when creating the
> > test chunk ;P
>
> Yes, in v9 of the series, the chunk files ownership will be set if an 
> `root`s UID is detected as effective UID. This fixes the issue here and 
> makes the chunk insertion more robust in general IMHO.
>
> Thanks again for catching this early! If you manage to give it another 
> spin would be great.

Yeah I saw that--and it's already applied, too. So, guess another spin's
not needed ;)



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-04-09 10:37 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-21  8:02 [pbs-devel] [PATCH v8 proxmox proxmox-backup 0/9] fix #5982: check atime update is honored Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox 1/9] pbs api types: add garbage collection atime safety check flag Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox 2/9] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 3/9] garbage collection: format error including anyhow error context Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored Christian Ebner
2025-04-04 15:41   ` Max Carrara
2025-04-05  9:16     ` Christian Ebner
2025-04-09 10:37       ` Max Carrara [this message]
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 5/9] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
2025-04-04 15:41   ` Max Carrara
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 6/9] docs: mention GC atime update check for " Christian Ebner
2025-04-04 15:41   ` Max Carrara
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 7/9] datastore: use custom GC atime cutoff if set Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 8/9] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 9/9] docs: mention gc-atime-cutoff as " Christian Ebner
2025-04-04 15:41 ` [pbs-devel] [PATCH v8 proxmox proxmox-backup 0/9] fix #5982: check atime update is honored Max Carrara
2025-04-04 15:56   ` Max Carrara
2025-04-04 15:56   ` Christian Ebner
2025-04-04 16:10     ` Max Carrara
2025-04-05  9:06 ` [pbs-devel] superseded: " Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D921JCX67RXY.W8KKT9VP73W9@proxmox.com \
    --to=m.carrara@proxmox.com \
    --cc=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal