public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Max Carrara" <m.carrara@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH v8 proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored
Date: Fri, 04 Apr 2025 17:41:40 +0200	[thread overview]
Message-ID: <D8XYVQETUYDL.NA06AOPJ6WDK@proxmox.com> (raw)
In-Reply-To: <20250321080254.68931-5-c.ebner@proxmox.com>

On Fri Mar 21, 2025 at 9:02 AM CET, Christian Ebner wrote:
> Check if the filesystem backing the chunk store actually updates the
> atime to avoid potential data loss in phase 2 of garbage collection,
> in case the atime update is not honored.
>
> Perform the check before phase 1 of garbage collection, as well as
> on datastore creation. The latter to early detect and disallow
> datastore creation on filesystem configurations which otherwise most
> likely would lead to data losses. To perform the check also when
> reusing an existing datastore, open the chunks store also on reuse.
>
> Enable the atime update check by default, but allow to opt-out by
> setting a datastore tuning parameter flag for backwards compatibility.
> This is honored by both, garbage collection and datastore creation.
>
> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
> as test marker, inserted if not present. This all zero-chunk is very
> likely anyways for unencrypted backup contents with large all-zero
> regions using fixed size chunking (e.g. VMs).
>
> To avoid cases were the timestamp will not be updated because of the
> Linux kernels timestamp granularity, sleep in-between chunk insert
> (including an atime update if pre-existing) and the subsequent
> stating + utimensat for 1 second.
>
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 7:
> - rebased on current master
> - removed unused import
>
>  pbs-datastore/src/chunk_store.rs | 72 ++++++++++++++++++++++++++++++--
>  pbs-datastore/src/datastore.rs   | 15 ++++++-
>  src/api2/config/datastore.rs     | 28 ++++++++++---
>  3 files changed, 105 insertions(+), 10 deletions(-)
>
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index dc267d752..fd5f215e2 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -1,9 +1,11 @@
> +use std::os::unix::fs::MetadataExt;
>  use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
> +use std::time::Duration;
>
> -use anyhow::{bail, format_err, Error};
> -use tracing::info;
> +use anyhow::{bail, format_err, Context, Error};
> +use tracing::{info, warn};
>
>  use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>  use proxmox_io::ReadExt;
> @@ -13,6 +15,7 @@ use proxmox_sys::process_locker::{
>  };
>  use proxmox_worker_task::WorkerTaskContext;
>
> +use crate::data_blob::DataChunkBuilder;
>  use crate::file_formats::{
>      COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>  };
> @@ -177,7 +180,7 @@ impl ChunkStore {
>      /// Note that this must be used with care, as it's dangerous to create two instances on the
>      /// same base path, as closing the underlying ProcessLocker drops all locks from this process
>      /// on the lockfile (even if separate FDs)
> -    pub(crate) fn open<P: Into<PathBuf>>(
> +    pub fn open<P: Into<PathBuf>>(
>          name: &str,
>          base: P,
>          sync_level: DatastoreFSyncLevel,
> @@ -442,6 +445,69 @@ impl ChunkStore {
>          Ok(())
>      }
>
> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
> +    ///
> +    /// Checks if the atime is always updated by utimensat taking into consideration the Linux
> +    /// kernel timestamp granularity.
> +    /// If `retry_on_file_changed` is set to true, the check is performed again on the changed file
> +    /// if a file change while testing is detected by differences in bith time or inode number.
> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
> +    /// the chunk store if not yet present.
> +    /// Returns with error if the check could not be performed.
> +    pub fn check_fs_atime_updates(&self, retry_on_file_changed: bool) -> Result<(), Error> {
> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
> +        let (pre_existing, _) = self.insert_chunk(&zero_chunk, &digest)?;
> +        let (path, _digest) = self.chunk_path(&digest);
> +
> +        // Take into account timestamp update granularity in the kernel
> +        // Blocking the thread is fine here since this runs in a worker.
> +        std::thread::sleep(Duration::from_secs(1));
> +
> +        let metadata_before = std::fs::metadata(&path).context(format!(
> +            "failed to get metadata for {path:?} before atime update"
> +        ))?;
> +
> +        // Second atime update if chunk pre-existed, insert_chunk already updates pre-existing ones
> +        self.cond_touch_path(&path, true)?;

Regarding my comment in the cover letter: The call to
`self.cond_touch_path` above will fail on a directory-based datastore
backed by ext4 mounted with -o rw,relatime:

    2025-04-04T16:25:44+02:00: TASK ERROR: atime safety check failed: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted

From `man 2 utimensat` (formatted for your reading pleasure):

EPERM
    The caller attempted to change one or both timestamps to a value
    other than the current time, or to change one of the timestamps to
    the current time while leaving the other timestamp unchanged, (i.e.,
    times  is  not NULL, neither tv_nsec field is UTIME_NOW, and neither
    tv_nsec field is UTIME_OMIT) and either:

    •  the caller's effective user ID does not match the owner of file,
       and the caller is not privileged (Linux: does not have the
       CAP_FOWNER capability); or,

    •  the file is marked append-only or immutable (see chattr(1)).

Sooo, I investigated a little more, since the EPERM here initially
didn't make any sense to me, because the same atime check actually
passes when creating the datastore.

Furthermore, the file (chunk) used for this check isn't append-only or
immutable (otherwise you'd see an `a` or `i` down there):

    $ lsattr .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
    --------------e------- .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8


And finally, just for completeness' sake, we're using both UTIME_NOW and
UTIME_OMIT in `cond_touch_path`:

    let times: [libc::timespec; 2] = [
        // access time -> update to now
        libc::timespec {
            tv_sec: 0,
            tv_nsec: libc::UTIME_NOW,
        },
        // modification time -> keep as is
        libc::timespec {
            tv_sec: 0,
            tv_nsec: libc::UTIME_OMIT,
        },
    ];


According to the docs of `man 2 utimensat`, this would suggest that
we're not running under the expected effective UID and don't have
the CAP_FOWNER capability.

This is the actual crux of the issue, since the chunk is owned by
`root`:

    $ ls -alh .chunks/bb9f
    total 1.1M
    drwxr-x--- 2 backup backup 4.0K Apr  4 17:11 .
    drwxr-x--- 1 backup backup 1.1M Apr  4 17:11 ..
    -rw-r--r-- 1 root   root    159 Apr  4 17:11 bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8


Just to add some additional findings -- when the *check is turned off*,
GC on that single chunk (no backups made yet!) runs just fine:

    2025-04-04T17:00:52+02:00: starting garbage collection on store test-noatime
    2025-04-04T17:00:52+02:00: Access time update check disabled by datastore tuning options.
    2025-04-04T17:00:52+02:00: Using access time cutoff 1d 5m, minimum access time is 2025-04-03T14:55:52Z
    2025-04-04T17:00:52+02:00: Start GC phase1 (mark used chunks)
    2025-04-04T17:00:52+02:00: Start GC phase2 (sweep unused chunks)
    2025-04-04T17:00:52+02:00: processed 73% (0 chunks)
    2025-04-04T17:00:52+02:00: Removed garbage: 0 B
    2025-04-04T17:00:52+02:00: Removed chunks: 0
    2025-04-04T17:00:52+02:00: Pending removals: 159 B (in 1 chunks)
    2025-04-04T17:00:52+02:00: Original data usage: 0 B
    2025-04-04T17:00:52+02:00: On-Disk chunks: 0
    2025-04-04T17:00:52+02:00: Deduplication factor: 1.00
    2025-04-04T17:00:52+02:00: queued notification (id=4bab0755-3f24-4b1a-99b0-de8e9613db9b)
    2025-04-04T17:00:52+02:00: TASK OK

*BUT*, when creating a backup via PVE, the check actually runs, despite
being turned off -- this appears to be something separate that you might
want to check as well:

    INFO: starting new backup job: vzdump 100 --mode snapshot --notes-template '{{guestname}}' --remove 0 --node roxy --notification-mode auto --storage pbs-dev-test-atime
    INFO: Starting Backup of VM 100 (qemu)
    INFO: Backup started at 2025-04-04 17:02:49
    INFO: status = running
    INFO: VM Name: dns
    INFO: include disk 'scsi0' 'local-zfs-main:vm-100-disk-0' 16G
    INFO: backup mode: snapshot
    INFO: ionice priority: 7
    INFO: snapshots found (not included into backup)
    INFO: creating Proxmox Backup Server archive 'vm/100/2025-04-04T15:02:49Z'
    INFO: issuing guest-agent 'fs-freeze' command
    INFO: issuing guest-agent 'fs-thaw' command
    ERROR: VM 100 qmp command 'backup' failed - backup register image failed: command error: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
    INFO: aborting backup job
    INFO: resuming VM again
    ERROR: Backup of VM 100 failed - VM 100 qmp command 'backup' failed - backup register image failed: command error: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
    INFO: Failed at 2025-04-04 17:02:49
    INFO: Backup job finished with errors
    TASK ERROR: job errors

Just to be sure:

    $ cat /etc/proxmox-backup/datastore.cfg
    # [...]
    datastore: test-noatime
            comment
            gc-schedule daily
            notification-mode notification-system
            path /mnt/datastore/test-noatime
            tuning gc-atime-safety-check=0



So, fixing the permissions with:

    # chown backup: .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8

.. allows the check to pass:

    2025-04-04T17:14:34+02:00: starting garbage collection on store test-noatime
    2025-04-04T17:14:35+02:00: Access time update check successful, proceeding with GC.
    2025-04-04T17:14:35+02:00: Using access time cutoff 1d 5m, minimum access time is 2025-04-03T15:09:34Z
    2025-04-04T17:14:35+02:00: Start GC phase1 (mark used chunks)
    2025-04-04T17:14:35+02:00: Start GC phase2 (sweep unused chunks)
    2025-04-04T17:14:35+02:00: processed 73% (0 chunks)
    2025-04-04T17:14:35+02:00: Removed garbage: 0 B
    2025-04-04T17:14:35+02:00: Removed chunks: 0
    2025-04-04T17:14:35+02:00: Original data usage: 0 B
    2025-04-04T17:14:35+02:00: On-Disk chunks: 1
    2025-04-04T17:14:35+02:00: Deduplication factor: 0.00
    2025-04-04T17:14:35+02:00: Average chunk size: 159 B
    2025-04-04T17:14:35+02:00: queued notification (id=06d76349-26af-43da-9ddb-5b0ee2fabfa8)
    2025-04-04T17:14:35+02:00: TASK OK

So, guess you just have to change the UID and GID when creating the
test chunk ;P

> +
> +        let metadata_now = std::fs::metadata(&path).context(format!(
> +            "failed to get metadata for {path:?} after atime update"
> +        ))?;
> +
> +        // Check for the unlikely case that the file changed in-between the
> +        // two metadata calls, try to check once again on changed file
> +        if metadata_before.ino() != metadata_now.ino() {
> +            if retry_on_file_changed {
> +                return self.check_fs_atime_updates(false);
> +            }
> +            bail!("chunk {path:?} changed twice during access time safety check, cannot proceed.");
> +        }
> +
> +        if metadata_before.accessed()? >= metadata_now.accessed()? {
> +            let chunk_info_str = if pre_existing {
> +                "pre-existing"
> +            } else {
> +                "newly inserted"
> +            };
> +            warn!("Chunk metadata was not correctly updated during access time safety check:");
> +            info!(
> +                "Timestamps before update: accessed {:?}, modified {:?}, created {:?}",
> +                metadata_before.accessed().ok(),
> +                metadata_before.modified().ok(),
> +                metadata_before.created().ok(),
> +            );
> +            info!(
> +                "Timestamps after update: accessed {:?}, modified {:?}, created {:?}",
> +                metadata_now.accessed().ok(),
> +                metadata_now.modified().ok(),
> +                metadata_now.created().ok(),
> +            );
> +            bail!("access time safety check using {chunk_info_str} chunk failed, aborting GC!");
> +        }
> +
> +        Ok(())
> +    }
> +
>      pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index a6a91ca79..9bdb96b18 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -5,7 +5,7 @@ use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, LazyLock, Mutex};
>
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
>  use nix::unistd::{unlinkat, UnlinkatFlags};
>  use tracing::{info, warn};
>
> @@ -1170,6 +1170,19 @@ impl DataStore {
>                  upid: Some(upid.to_string()),
>                  ..Default::default()
>              };
> +            let tuning: DatastoreTuning = serde_json::from_value(
> +                DatastoreTuning::API_SCHEMA
> +                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
> +            )?;
> +            if tuning.gc_atime_safety_check.unwrap_or(true) {
> +                self.inner
> +                    .chunk_store
> +                    .check_fs_atime_updates(true)
> +                    .context("atime safety check failed")?;
> +                info!("Access time update check successful, proceeding with GC.");
> +            } else {
> +                info!("Access time update check disabled by datastore tuning options.");
> +            }
>
>              info!("Start GC phase1 (mark used chunks)");
>
> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index 58acaa861..2df8301df 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -1,10 +1,10 @@
>  use std::path::{Path, PathBuf};
>
>  use ::serde::{Deserialize, Serialize};
> -use anyhow::{bail, Error};
> +use anyhow::{bail, Context, Error};
>  use hex::FromHex;
>  use serde_json::Value;
> -use tracing::warn;
> +use tracing::{info, warn};
>
>  use proxmox_router::{http_bail, Permission, Router, RpcEnvironment, RpcEnvironmentType};
>  use proxmox_schema::{api, param_bail, ApiType};
> @@ -122,8 +122,16 @@ pub(crate) fn do_create_datastore(
>          UnmountGuard::new(None)
>      };
>
> -    if reuse_datastore {
> -        ChunkStore::verify_chunkstore(&path)?;
> +    let chunk_store = if reuse_datastore {
> +        ChunkStore::verify_chunkstore(&path).and_then(|_| {
> +            // Must be the only instance accessing and locking the chunk store,
> +            // dropping will close all other locks from this process on the lockfile as well.
> +            ChunkStore::open(
> +                &datastore.name,
> +                &path,
> +                tuning.sync_level.unwrap_or_default(),
> +            )
> +        })?
>      } else {
>          if let Ok(dir) = std::fs::read_dir(&path) {
>              for file in dir {
> @@ -141,10 +149,18 @@ pub(crate) fn do_create_datastore(
>              backup_user.uid,
>              backup_user.gid,
>              tuning.sync_level.unwrap_or_default(),
> -        )
> -        .map(|_| ())?;
> +        )?
>      };
>
> +    if tuning.gc_atime_safety_check.unwrap_or(true) {
> +        chunk_store
> +            .check_fs_atime_updates(true)
> +            .context("access time safety check failed")?;
> +        info!("Access time update check successful.");
> +    } else {
> +        info!("Access time update check skipped.");
> +    }
> +
>      config.set_data(&datastore.name, "datastore", &datastore)?;
>
>      pbs_config::datastore::save_config(&config)?;



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-04-04 15:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-21  8:02 [pbs-devel] [PATCH v8 proxmox proxmox-backup 0/9] fix #5982: check atime update is honored Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox 1/9] pbs api types: add garbage collection atime safety check flag Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox 2/9] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 3/9] garbage collection: format error including anyhow error context Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored Christian Ebner
2025-04-04 15:41   ` Max Carrara [this message]
2025-04-05  9:16     ` Christian Ebner
2025-04-09 10:37       ` Max Carrara
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 5/9] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
2025-04-04 15:41   ` Max Carrara
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 6/9] docs: mention GC atime update check for " Christian Ebner
2025-04-04 15:41   ` Max Carrara
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 7/9] datastore: use custom GC atime cutoff if set Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 8/9] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
2025-03-21  8:02 ` [pbs-devel] [PATCH v8 proxmox-backup 9/9] docs: mention gc-atime-cutoff as " Christian Ebner
2025-04-04 15:41 ` [pbs-devel] [PATCH v8 proxmox proxmox-backup 0/9] fix #5982: check atime update is honored Max Carrara
2025-04-04 15:56   ` Christian Ebner
2025-04-04 16:10     ` Max Carrara
2025-04-04 15:56   ` Max Carrara
2025-04-05  9:06 ` [pbs-devel] superseded: " Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D8XYVQETUYDL.NA06AOPJ6WDK@proxmox.com \
    --to=m.carrara@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal