public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored
@ 2025-03-06 14:52 Christian Ebner
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

These patches add a check to phase 1 of garbage collection and datastore
creation in order to detect when the filesystem backing the chunk store
does not honor atime updates. This avoids possible data loss for situations
where garbage collection could otherwise delete chunks still referenced by
a backup snaphost's index file.

The check is performed on a fixed size 4 MiB unencrypted and compressed
chunk of all-zeros, inserted if not present yet.
The Linux kernel timestamp granularity is taken into account by sleeping
for 1 second to avoid discarded atime update attempts by utimensat calls.
The test is enabled by default, but an opt-out option can be set via the
datastore tuning parameters for backwards compatibility.

Further, add a datastore tuning parameter to reduce the wait period for
chunk removal in phase 2 of garbage collection.

Most notable changes sice version 4 (thanks Fabian for feedback):
- Decouple gc-atime-safety-check from gc-atime-cutoff
- Improve logging and error handling
- fixed typo in docs (thanks Maximilliano for noticing).

Most notable changes sice version 3 (thanks Fabian for feedback):
- Drop check for relatime like behaviour, as this is not supported and
  does not show up in any of the tests performed on btrfs, cephfs, ext4,
  NFS3, NFS4, ntfs, SMB3_11, xfs or ZFS.
- Additionally check chunk inode to detect possible but very unlikely
  file changes, perform check once again in that case.
- Move atime cutoff selection and min_atime calculation to the same
  location, as they are logically related.

Most notable changes sice version 2 (thanks Fabian and Thomas for
comments and suggestions):
- Take into account Linux timestamp granularity, do not set timestamp
  to the past, as that introduces other error paths such as lack of
  permissions or fs limitations.
- Check relatime behavior, if atime behaviour is not honored. Fallback
  to original cutoff in that case.
- Adapt tuning parameter names.

Most notable changes sice version 1 (thanks Fabian and Thomas for
comments and suggestions):
- Optimize check by using the all zero chunk
- Enable the check by default and fail GC job if not honored, but allow
  to opt-out
- Add GC wait period tuning option

Link to the issue in the bugtracker:
https://bugzilla.proxmox.com/show_bug.cgi?id=5982

proxmox:

Christian Ebner (2):
  pbs api types: add garbage collection atime safety check flag
  pbs api types: add option to set GC chunk cleanup atime cutoff

 pbs-api-types/src/datastore.rs | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

proxmox-backup:

Christian Ebner (6):
  fix #5982: garbage collection: check atime updates are honored
  ui: expose GC atime safety check flag in datastore tuning options
  docs: mention GC atime update check for tuning options
  datastore: use custom GC atime cutoff if set
  ui: expose GC atime cutoff in datastore tuning option
  docs: mention gc-atime-cutoff as datastore tuning option

 docs/storage.rst                 |  16 ++++-
 pbs-datastore/src/chunk_store.rs | 111 ++++++++++++++++++++++++++-----
 pbs-datastore/src/datastore.rs   |  39 ++++++++++-
 src/api2/config/datastore.rs     |   1 +
 www/Utils.js                     |   9 +++
 www/datastore/OptionView.js      |  17 +++++
 6 files changed, 174 insertions(+), 19 deletions(-)

-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox 1/8] pbs api types: add garbage collection atime safety check flag
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Add the `gc-atime-safety-check` flag to the datastore tuning
parameters. This flag allows to enable/disable a check to detect if
the atime update is honored by the filesystem backing the chunk store
during phase 1 of garbage collection and during datastore creation.

The default is to perform the check.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-api-types/src/datastore.rs | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index ddd8d3c6..1a4d9e2d 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -229,6 +229,14 @@ pub enum DatastoreFSyncLevel {
             type: ChunkOrder,
             optional: true,
         },
+        "gc-atime-safety-check": {
+            description:
+                "Check filesystem atime updates are honored during store creation and garbage \
+                collection",
+            optional: true,
+            default: true,
+            type: bool,
+        },
     },
 )]
 #[derive(Serialize, Deserialize, Default)]
@@ -240,6 +248,8 @@ pub struct DatastoreTuning {
     pub chunk_order: Option<ChunkOrder>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub sync_level: Option<DatastoreFSyncLevel>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub gc_atime_safety_check: Option<bool>,
 }
 
 pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-19  8:10   ` Thomas Lamprecht
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Add the `gc-atime-cutoff` option to the datastore tuning parameters.
This allows to specify the time after which the chunks are not
considered in use anymore if their atime has not been updated since
then.

The default is to keep chunks within the 24h 5m timespan (given no
active writers).

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- drop mentioning this being conditional on gc-atime-safety-check,
  it isn't anymore

 pbs-api-types/src/datastore.rs | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 1a4d9e2d..467417e8 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -223,6 +223,15 @@ pub enum DatastoreFSyncLevel {
     Filesystem,
 }
 
+pub const GC_ATIME_CUTOFF_SCHEMA: Schema = IntegerSchema::new(
+    "Cutoff (in minutes) for chunk cleanup atime check in garbage collection phase 2 \
+        (default 24h 5m)",
+)
+.minimum(1) // safety margin for kernel timestamp granularity, but stay within minute range
+.maximum(2880) // 2 days
+.default(1445)
+.schema();
+
 #[api(
     properties: {
         "chunk-order": {
@@ -237,6 +246,10 @@ pub enum DatastoreFSyncLevel {
             default: true,
             type: bool,
         },
+        "gc-atime-cutoff": {
+            schema: GC_ATIME_CUTOFF_SCHEMA,
+            optional: true,
+        },
     },
 )]
 #[derive(Serialize, Deserialize, Default)]
@@ -250,6 +263,8 @@ pub struct DatastoreTuning {
     pub sync_level: Option<DatastoreFSyncLevel>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub gc_atime_safety_check: Option<bool>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub gc_atime_cutoff: Option<usize>,
 }
 
 pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-19  8:45   ` Thomas Lamprecht
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Check if the filesystem backing the chunk store actually updates the
atime to avoid potential data loss in phase 2 of garbage collection,
in case the atime update is not honored.

Perform the check before phase 1 of garbage collection, as well as
on datastore creation. The latter to early detect and disallow
datastore creation on filesystem configurations which otherwise most
likely would lead to data losses.

Enable the atime update check by default, but allow to opt-out by
setting a datastore tuning parameter flag for backwards compatibility.
This is honored by both, garbage collection and datastore creation.

The check uses a 4 MiB fixed sized, unencypted and compressed chunk
as test marker, inserted if not present. This all zero-chunk is very
likely anyways for unencrypted backup contents with large all-zero
regions using fixed size chunking (e.g. VMs).

To avoid cases were the timestamp will not be updated because of the
Linux kernels timestamp granularity, sleep in-between stating and
utimensat for 1 second.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- Improve logging of values if atime update fails.
- fix incorrect comment

 pbs-datastore/src/chunk_store.rs | 101 ++++++++++++++++++++++++++++---
 pbs-datastore/src/datastore.rs   |  13 ++++
 src/api2/config/datastore.rs     |   1 +
 3 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 5e02909a1..a8c826353 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -1,9 +1,11 @@
+use std::os::unix::fs::MetadataExt;
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, Mutex};
+use std::time::{Duration, UNIX_EPOCH};
 
-use anyhow::{bail, format_err, Error};
-use tracing::info;
+use anyhow::{bail, format_err, Context, Error};
+use tracing::{info, warn};
 
 use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
 use proxmox_io::ReadExt;
@@ -13,6 +15,7 @@ use proxmox_sys::process_locker::{
 };
 use proxmox_worker_task::WorkerTaskContext;
 
+use crate::data_blob::DataChunkBuilder;
 use crate::file_formats::{
     COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
 };
@@ -93,6 +96,7 @@ impl ChunkStore {
         uid: nix::unistd::Uid,
         gid: nix::unistd::Gid,
         sync_level: DatastoreFSyncLevel,
+        atime_safety_check: bool,
     ) -> Result<Self, Error>
     where
         P: Into<PathBuf>,
@@ -147,7 +151,17 @@ impl ChunkStore {
             }
         }
 
-        Self::open(name, base, sync_level)
+        let chunk_store = Self::open(name, base, sync_level)?;
+        if atime_safety_check {
+            chunk_store
+                .check_fs_atime_updates(true)
+                .map_err(|err| format_err!("access time safety check failed - {err:#}"))?;
+            info!("Access time safety check successful.");
+        } else {
+            info!("Access time safety check skipped.");
+        }
+
+        Ok(chunk_store)
     }
 
     fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
@@ -442,6 +456,66 @@ impl ChunkStore {
         Ok(())
     }
 
+    /// Check if atime updates are honored by the filesystem backing the chunk store.
+    ///
+    /// Checks if the atime is always updated by utimensat taking into consideration the Linux
+    /// kernel timestamp granularity.
+    /// If `retry_on_file_changed` is set to true, the check is performed again on the changed file
+    /// if a file change while testing is detected by differences in bith time or inode number.
+    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
+    /// the chunk store if not yet present.
+    /// Returns with error if the check could not be performed.
+    pub fn check_fs_atime_updates(&self, retry_on_file_changed: bool) -> Result<(), Error> {
+        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
+        let (pre_existing, _) = self.insert_chunk(&zero_chunk, &digest)?;
+        let (path, _digest) = self.chunk_path(&digest);
+
+        // Take into account timestamp update granularity in the kernel
+        std::thread::sleep(Duration::from_secs(1));
+
+        let metadata_before =
+            std::fs::metadata(&path).context(format!("failed to get metadata for {path:?}"))?;
+
+        // Second atime update if chunk pre-existed, insert_chunk already updates pre-existing ones
+        self.cond_touch_path(&path, true)?;
+
+        let metadata_now =
+            std::fs::metadata(&path).context(format!("failed to get metadata for {path:?}"))?;
+
+        // Check for the unlikely case that the file changed in-between the
+        // two metadata calls, try to check once again on changed file
+        if metadata_before.ino() != metadata_now.ino() {
+            if retry_on_file_changed {
+                return self.check_fs_atime_updates(false);
+            }
+            bail!("chunk {path:?} changed twice during access time safety check, cannot proceed.");
+        }
+
+        if metadata_before.accessed()? >= metadata_now.accessed()? {
+            let chunk_info_str = if pre_existing {
+                "pre-existing"
+            } else {
+                "newly inserted"
+            };
+            warn!("Chunk metadata was not correctly updated during access time safety check:");
+            info!(
+                "Timestamps before update: accessed {:?}, modified {:?}, created {:?}",
+                metadata_before.accessed().unwrap_or(UNIX_EPOCH),
+                metadata_before.modified().unwrap_or(UNIX_EPOCH),
+                metadata_before.created().unwrap_or(UNIX_EPOCH),
+            );
+            info!(
+                "Timestamps after update: accessed {:?}, modified {:?}, created {:?}",
+                metadata_now.accessed().unwrap_or(UNIX_EPOCH),
+                metadata_now.modified().unwrap_or(UNIX_EPOCH),
+                metadata_now.created().unwrap_or(UNIX_EPOCH),
+            );
+            bail!("access time safety check using {chunk_info_str} chunk failed, aborting GC!");
+        }
+
+        Ok(())
+    }
+
     pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
@@ -628,8 +702,15 @@ fn test_chunk_store1() {
     let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
         .unwrap()
         .unwrap();
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        true,
+    )
+    .unwrap();
 
     let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
         .build()
@@ -641,8 +722,14 @@ fn test_chunk_store1() {
     let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
     assert!(exists);
 
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        true,
+    );
     assert!(chunk_store.is_err());
 
     if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 75c0c16ab..5558bb1ac 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1170,6 +1170,19 @@ impl DataStore {
                 upid: Some(upid.to_string()),
                 ..Default::default()
             };
+            let tuning: DatastoreTuning = serde_json::from_value(
+                DatastoreTuning::API_SCHEMA
+                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
+            )?;
+            if tuning.gc_atime_safety_check.unwrap_or(true) {
+                self.inner
+                    .chunk_store
+                    .check_fs_atime_updates(true)
+                    .map_err(|err| format_err!("atime safety check failed - {err:#}"))?;
+                info!("Access time safety check successful, proceeding with GC.");
+            } else {
+                info!("Filesystem atime safety check disabled by datastore tuning options.");
+            }
 
             info!("Start GC phase1 (mark used chunks)");
 
diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index fe3260f6d..35847fc45 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
                 backup_user.uid,
                 backup_user.gid,
                 tuning.sync_level.unwrap_or_default(),
+                tuning.gc_atime_safety_check.unwrap_or(true),
             )
             .map(|_| ())
         } else {
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (2 preceding siblings ...)
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-19  8:45   ` Thomas Lamprecht
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 5/8] docs: mention GC atime update check for " Christian Ebner
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Allow to edit the atime safety check flag via the datastore tuning
options edit window.

Do not expose the flag for datastore creation as it is strongly
discouraged to create datastores on filesystems not correctly handling
atime updates as the garbage collection expects. It is nevertheless
still possible to create a datastore via the cli and pass in the
`--tuning gc-atime-safety-check=false` option.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 www/Utils.js                | 4 ++++
 www/datastore/OptionView.js | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/www/Utils.js b/www/Utils.js
index 2746ef0b5..9bd7e1615 100644
--- a/www/Utils.js
+++ b/www/Utils.js
@@ -846,6 +846,10 @@ Ext.define('PBS.Utils', {
 	sync = PBS.Utils.tuningOptions['sync-level'][sync ?? '__default__'];
 	options.push(`${gettext('Sync Level')}: ${sync}`);
 
+	let gc_atime_safety_check = tuning['gc-atime-safety-check'];
+	delete tuning['gc-atime-safety-check'];
+	options.push(`${gettext('GC Access Time Safety Check')}: ${gc_atime_safety_check ?? true}`);
+
 	for (const [k, v] of Object.entries(tuning)) {
 	    options.push(`${k}: ${v}`);
 	}
diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
index e1f38af6f..9ce3ce388 100644
--- a/www/datastore/OptionView.js
+++ b/www/datastore/OptionView.js
@@ -271,6 +271,14 @@ Ext.define('PBS.Datastore.Options', {
 			    deleteEmpty: true,
 			    value: '__default__',
 			},
+			{
+			    xtype: 'proxmoxcheckbox',
+			    name: 'gc-atime-safety-check',
+			    fieldLabel: gettext('GC Access Time Safety Check'),
+			    defaultValue: 1,
+			    uncheckedValue: 0,
+			    deleteDefaultValue: true,
+			},
 		    ],
 		},
 	    },
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 5/8] docs: mention GC atime update check for tuning options
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (3 preceding siblings ...)
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 6/8] datastore: use custom GC atime cutoff if set Christian Ebner
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Document the gc-atime-check flag and describe the behavior it
controls, by adding it as additional bullet point to the documented
datastore tuning options.

This also fixes the intendation for the cli example how to set the
sync level, to make it clear that still belongs to the previous
bullet point.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- s/behaviour/behavior/

 docs/storage.rst | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/storage.rst b/docs/storage.rst
index 490302955..db61ca58f 100644
--- a/docs/storage.rst
+++ b/docs/storage.rst
@@ -435,9 +435,16 @@ There are some tuning related options for the datastore that are more advanced:
 
   This can be set with:
 
-.. code-block:: console
+  .. code-block:: console
+
+    # proxmox-backup-manager datastore update <storename> --tuning 'sync-level=filesystem'
 
-  # proxmox-backup-manager datastore update <storename> --tuning 'sync-level=filesystem'
+* ``gc-atime-safety-check``: Datastore GC atime update safety check:
+  You can explicitly `enable` or `disable` the atime update safety check
+  performed on datastore creation and garbage collection. This checks if atime
+  updates are handled as expected by garbage collection and therefore avoids the
+  risk of data loss by unexpected filesystem behavior. It is recommended to set
+  this to enabled, which is also the default value.
 
 If you want to set multiple tuning options simultaneously, you can separate them
 with a comma, like this:
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 6/8] datastore: use custom GC atime cutoff if set
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (4 preceding siblings ...)
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 5/8] docs: mention GC atime update check for " Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Use the user configured atime cutoff over the default 24h 5m
margin if explicitly set, otherwise fallback to the default.

Move the minimum atime calculation based on the atime cutoff to the
sweep_unused_chunks() callside and pass in the calculated values, as
to have the logic in the same place.

Add log outputs shownig which cutoff and minimum access time is used
by the garbage collection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- make gc-atime-cutoff independent from gc-atime-safety-check.
- Refactor cutoff calculation and log it together with the minimum
  atime.
- Conditionally also log in case of oldest writer is outside of range.

 pbs-datastore/src/chunk_store.rs | 10 +---------
 pbs-datastore/src/datastore.rs   | 28 ++++++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index a8c826353..c0e7ded3f 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -367,7 +367,7 @@ impl ChunkStore {
     pub fn sweep_unused_chunks(
         &self,
         oldest_writer: i64,
-        phase1_start_time: i64,
+        min_atime: i64,
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
     ) -> Result<(), Error> {
@@ -377,14 +377,6 @@ impl ChunkStore {
         use nix::sys::stat::fstatat;
         use nix::unistd::{unlinkat, UnlinkatFlags};
 
-        let mut min_atime = phase1_start_time - 3600 * 24; // at least 24h (see mount option relatime)
-
-        if oldest_writer < min_atime {
-            min_atime = oldest_writer;
-        }
-
-        min_atime -= 300; // add 5 mins gap for safety
-
         let mut last_percentage = 0;
         let mut chunk_count = 0;
 
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 5558bb1ac..4af6432ce 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -4,6 +4,7 @@ use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, LazyLock, Mutex};
+use std::time::Duration;
 
 use anyhow::{bail, format_err, Error};
 use nix::unistd::{unlinkat, UnlinkatFlags};
@@ -17,6 +18,7 @@ use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
 use proxmox_sys::fs::{lock_dir_noblock, DirLockGuard};
 use proxmox_sys::linux::procfs::MountInfo;
 use proxmox_sys::process_locker::ProcessLockSharedGuard;
+use proxmox_time::TimeSpan;
 use proxmox_worker_task::WorkerTaskContext;
 
 use pbs_api_types::{
@@ -1174,6 +1176,7 @@ impl DataStore {
                 DatastoreTuning::API_SCHEMA
                     .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
             )?;
+
             if tuning.gc_atime_safety_check.unwrap_or(true) {
                 self.inner
                     .chunk_store
@@ -1181,7 +1184,28 @@ impl DataStore {
                     .map_err(|err| format_err!("atime safety check failed - {err:#}"))?;
                 info!("Access time safety check successful, proceeding with GC.");
             } else {
-                info!("Filesystem atime safety check disabled by datastore tuning options.");
+                info!("Access time safety check disabled by datastore tuning options.");
+            };
+
+            // Fallback to default 24h 5m if not set
+            let cutoff = tuning
+                .gc_atime_cutoff
+                .map(|cutoff| cutoff * 60)
+                .unwrap_or(3600 * 24 + 300);
+
+            let mut min_atime = phase1_start_time - cutoff as i64;
+            info!(
+                "Using access time cutoff {}, minimum access time is {}",
+                TimeSpan::from(Duration::from_secs(cutoff as u64)),
+                proxmox_time::epoch_to_rfc3339_utc(min_atime)?,
+            );
+            if oldest_writer < min_atime {
+                min_atime = oldest_writer - 300; // account for 5 min safety gap
+                info!(
+                    "Oldest backup writer started at {}, extending minimum access time to {}",
+                    TimeSpan::from(Duration::from_secs(oldest_writer as u64)),
+                    proxmox_time::epoch_to_rfc3339_utc(min_atime)?,
+                );
             }
 
             info!("Start GC phase1 (mark used chunks)");
@@ -1191,7 +1215,7 @@ impl DataStore {
             info!("Start GC phase2 (sweep unused chunks)");
             self.inner.chunk_store.sweep_unused_chunks(
                 oldest_writer,
-                phase1_start_time,
+                min_atime,
                 &mut gc_status,
                 worker,
             )?;
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (5 preceding siblings ...)
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 6/8] datastore: use custom GC atime cutoff if set Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-19  8:49   ` Thomas Lamprecht
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 8/8] docs: mention gc-atime-cutoff as " Christian Ebner
  2025-03-19 17:26 ` [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
  8 siblings, 1 reply; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Allows to set the atime cutoff for phase 2 of garbage collection in
the datastores tuning parameters. This value changes the time after
which a chunk is not considered in use anymore if it falls outside of
the cutoff window.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 www/Utils.js                | 5 +++++
 www/datastore/OptionView.js | 9 +++++++++
 2 files changed, 14 insertions(+)

diff --git a/www/Utils.js b/www/Utils.js
index 9bd7e1615..13b5eceda 100644
--- a/www/Utils.js
+++ b/www/Utils.js
@@ -850,6 +850,11 @@ Ext.define('PBS.Utils', {
 	delete tuning['gc-atime-safety-check'];
 	options.push(`${gettext('GC Access Time Safety Check')}: ${gc_atime_safety_check ?? true}`);
 
+	let gc_atime_cutoff = tuning['gc-atime-cutoff'];
+	delete tuning['gc-atime-cutoff'];
+	gc_atime_cutoff = gc_atime_cutoff ?? '1445';
+	options.push(`${gettext('GC Access Time Cutoff')}: ${gc_atime_cutoff}m`);
+
 	for (const [k, v] of Object.entries(tuning)) {
 	    options.push(`${k}: ${v}`);
 	}
diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
index 9ce3ce388..9e8dcc4fb 100644
--- a/www/datastore/OptionView.js
+++ b/www/datastore/OptionView.js
@@ -279,6 +279,15 @@ Ext.define('PBS.Datastore.Options', {
 			    uncheckedValue: 0,
 			    deleteDefaultValue: true,
 			},
+			{
+			    xtype: 'proxmoxintegerfield',
+			    emptyText: Proxmox.Utils.defaultText,
+			    name: 'gc-atime-cutoff',
+			    minValue: 1,
+			    maxValue: 2880,
+			    fieldLabel: gettext('GC Access Time Cutoff (min)'),
+			    deleteEmpty: true,
+			},
 		    ],
 		},
 	    },
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 8/8] docs: mention gc-atime-cutoff as datastore tuning option
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (6 preceding siblings ...)
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
@ 2025-03-06 14:52 ` Christian Ebner
  2025-03-19 17:26 ` [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
  8 siblings, 0 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-06 14:52 UTC (permalink / raw)
  To: pbs-devel

Document the gc-atime-cutoff option and describe the behavior it
controls, by adding it as additional bullet point to the
documented datastore tuning options.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 docs/storage.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/storage.rst b/docs/storage.rst
index db61ca58f..da748c682 100644
--- a/docs/storage.rst
+++ b/docs/storage.rst
@@ -446,6 +446,11 @@ There are some tuning related options for the datastore that are more advanced:
   risk of data loss by unexpected filesystem behavior. It is recommended to set
   this to enabled, which is also the default value.
 
+* ``gc-atime-cutoff``: Datastore GC atime cutoff for chunk cleanup:
+  This allows to set the cutoff for which a chunk is still considered in-use
+  during phase 2 of garbage collection (given no older writers). If the
+  ``atime`` of the chunk is outside the range, it will be removed.
+
 If you want to set multiple tuning options simultaneously, you can separate them
 with a comma, like this:
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
@ 2025-03-19  8:10   ` Thomas Lamprecht
  2025-03-19  8:48     ` Christian Ebner
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Lamprecht @ 2025-03-19  8:10 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

Am 06.03.25 um 15:52 schrieb Christian Ebner:
> Add the `gc-atime-cutoff` option to the datastore tuning parameters.
> This allows to specify the time after which the chunks are not
> considered in use anymore if their atime has not been updated since
> then.
> 
> The default is to keep chunks within the 24h 5m timespan (given no
> active writers).
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 4:
> - drop mentioning this being conditional on gc-atime-safety-check,
>   it isn't anymore
> 
>  pbs-api-types/src/datastore.rs | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
> index 1a4d9e2d..467417e8 100644
> --- a/pbs-api-types/src/datastore.rs
> +++ b/pbs-api-types/src/datastore.rs
> @@ -223,6 +223,15 @@ pub enum DatastoreFSyncLevel {
>      Filesystem,
>  }
>  
> +pub const GC_ATIME_CUTOFF_SCHEMA: Schema = IntegerSchema::new(
> +    "Cutoff (in minutes) for chunk cleanup atime check in garbage collection phase 2 \
> +        (default 24h 5m)",
> +)
> +.minimum(1) // safety margin for kernel timestamp granularity, but stay within minute range
> +.maximum(2880) // 2 days
> +.default(1445)

tiny nit that could definitively get fixed up on applying or if there
is the need for a v6 due to other stuff that actually matters, otherwise
we can keep it as is:
I'd prefer writing time constants that are less common – like these here
– such, that one more easily can tell what the outcome is even without a
comment (that might get outdated), e.g. for this here:

.maximum(2 * 24 * 60)
.default(24 * 60 + 5)

Very common values like 60 m or even 600 s for 1 h are IMO normally fine
though.

> +.schema();
> +
>  #[api(
>      properties: {
>          "chunk-order": {
> @@ -237,6 +246,10 @@ pub enum DatastoreFSyncLevel {
>              default: true,
>              type: bool,
>          },
> +        "gc-atime-cutoff": {
> +            schema: GC_ATIME_CUTOFF_SCHEMA,
> +            optional: true,
> +        },
>      },
>  )]
>  #[derive(Serialize, Deserialize, Default)]
> @@ -250,6 +263,8 @@ pub struct DatastoreTuning {
>      pub sync_level: Option<DatastoreFSyncLevel>,
>      #[serde(skip_serializing_if = "Option::is_none")]
>      pub gc_atime_safety_check: Option<bool>,
> +    #[serde(skip_serializing_if = "Option::is_none")]
> +    pub gc_atime_cutoff: Option<usize>,
>  }
>  
>  pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
@ 2025-03-19  8:45   ` Thomas Lamprecht
  2025-03-19  9:22     ` Christian Ebner
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Lamprecht @ 2025-03-19  8:45 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

Am 06.03.25 um 15:52 schrieb Christian Ebner:
> Check if the filesystem backing the chunk store actually updates the
> atime to avoid potential data loss in phase 2 of garbage collection,
> in case the atime update is not honored.
> 
> Perform the check before phase 1 of garbage collection, as well as
> on datastore creation. The latter to early detect and disallow
> datastore creation on filesystem configurations which otherwise most
> likely would lead to data losses.
> 
> Enable the atime update check by default, but allow to opt-out by
> setting a datastore tuning parameter flag for backwards compatibility.
> This is honored by both, garbage collection and datastore creation.
> 
> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
> as test marker, inserted if not present. This all zero-chunk is very
> likely anyways for unencrypted backup contents with large all-zero
> regions using fixed size chunking (e.g. VMs).
> 
> To avoid cases were the timestamp will not be updated because of the
> Linux kernels timestamp granularity, sleep in-between stating and
> utimensat for 1 second.
> 
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 4:
> - Improve logging of values if atime update fails.
> - fix incorrect comment
> 
>  pbs-datastore/src/chunk_store.rs | 101 ++++++++++++++++++++++++++++---
>  pbs-datastore/src/datastore.rs   |  13 ++++
>  src/api2/config/datastore.rs     |   1 +
>  3 files changed, 108 insertions(+), 7 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 5e02909a1..a8c826353 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -1,9 +1,11 @@
> +use std::os::unix::fs::MetadataExt;
>  use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
> +use std::time::{Duration, UNIX_EPOCH};
>  
> -use anyhow::{bail, format_err, Error};
> -use tracing::info;
> +use anyhow::{bail, format_err, Context, Error};
> +use tracing::{info, warn};
>  
>  use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>  use proxmox_io::ReadExt;
> @@ -13,6 +15,7 @@ use proxmox_sys::process_locker::{
>  };
>  use proxmox_worker_task::WorkerTaskContext;
>  
> +use crate::data_blob::DataChunkBuilder;
>  use crate::file_formats::{
>      COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>  };
> @@ -93,6 +96,7 @@ impl ChunkStore {
>          uid: nix::unistd::Uid,
>          gid: nix::unistd::Gid,
>          sync_level: DatastoreFSyncLevel,
> +        atime_safety_check: bool,

Not a huge fan of another flat parameter here, albeit it's a bit borderline.
I do not want to add scope creep, but if we really want this here, then moving
those options into a struct might be nicer, could be done afterward too though.

But stepping a bit back, why is this a parameter for the fn creating a new
chunkstore in the first place? Why cannot the call sites that are interested
in doing this call check_fs_atime_updates themselves on the chunkstore instance
after getting it instead of having to loop through a parameter?

We do not have many places that create datastores, so I do not think there's a
high likelihood that we forget to call it.

>      ) -> Result<Self, Error>
>      where
>          P: Into<PathBuf>,
> @@ -147,7 +151,17 @@ impl ChunkStore {
>              }
>          }
>  
> -        Self::open(name, base, sync_level)
> +        let chunk_store = Self::open(name, base, sync_level)?;
> +        if atime_safety_check {
> +            chunk_store
> +                .check_fs_atime_updates(true)
> +                .map_err(|err| format_err!("access time safety check failed - {err:#}"))?;
> +            info!("Access time safety check successful.");
> +        } else {
> +            info!("Access time safety check skipped.");
> +        }
> +
> +        Ok(chunk_store)
>      }
>  
>      fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
> @@ -442,6 +456,66 @@ impl ChunkStore {
>          Ok(())
>      }
>  
> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
> +    ///
> +    /// Checks if the atime is always updated by utimensat taking into consideration the Linux
> +    /// kernel timestamp granularity.
> +    /// If `retry_on_file_changed` is set to true, the check is performed again on the changed file
> +    /// if a file change while testing is detected by differences in bith time or inode number.
> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
> +    /// the chunk store if not yet present.
> +    /// Returns with error if the check could not be performed.
> +    pub fn check_fs_atime_updates(&self, retry_on_file_changed: bool) -> Result<(), Error> {
> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
> +        let (pre_existing, _) = self.insert_chunk(&zero_chunk, &digest)?;
> +        let (path, _digest) = self.chunk_path(&digest);
> +
> +        // Take into account timestamp update granularity in the kernel
> +        std::thread::sleep(Duration::from_secs(1));

If there's the need for a v6 I'd maybe add to the comment that this normally
runs in a worker, so sleep blocking the whole thread is fine here.

Also, you write "sleep in-between stating and utimensat for 1 second", but
this is sleeping before the stat + touch + stat calls, or am I overlooking
something?

> +        let metadata_before =
> +            std::fs::metadata(&path).context(format!("failed to get metadata for {path:?}"))?;
> +
> +        // Second atime update if chunk pre-existed, insert_chunk already updates pre-existing ones
> +        self.cond_touch_path(&path, true)?;
> +
> +        let metadata_now =
> +            std::fs::metadata(&path).context(format!("failed to get metadata for {path:?}"))?;

Another tiny nit: maybe encode now/after in the context to be easier able
to distinguish which stat syscall failed – while it's very unlikely that
one succeeds and the other doesn't, it would not hurt to be able to tell
that case apart.

> +
> +        // Check for the unlikely case that the file changed in-between the
> +        // two metadata calls, try to check once again on changed file
> +        if metadata_before.ino() != metadata_now.ino() {
> +            if retry_on_file_changed {
> +                return self.check_fs_atime_updates(false);
> +            }
> +            bail!("chunk {path:?} changed twice during access time safety check, cannot proceed.");
> +        }
> +
> +        if metadata_before.accessed()? >= metadata_now.accessed()? {
> +            let chunk_info_str = if pre_existing {
> +                "pre-existing"
> +            } else {
> +                "newly inserted"
> +            };
> +            warn!("Chunk metadata was not correctly updated during access time safety check:");
> +            info!(
> +                "Timestamps before update: accessed {:?}, modified {:?}, created {:?}",
> +                metadata_before.accessed().unwrap_or(UNIX_EPOCH),
> +                metadata_before.modified().unwrap_or(UNIX_EPOCH),
> +                metadata_before.created().unwrap_or(UNIX_EPOCH),
> +            );
> +            info!(
> +                "Timestamps after update: accessed {:?}, modified {:?}, created {:?}",
> +                metadata_now.accessed().unwrap_or(UNIX_EPOCH),
> +                metadata_now.modified().unwrap_or(UNIX_EPOCH),
> +                metadata_now.created().unwrap_or(UNIX_EPOCH),
> +            );
> +            bail!("access time safety check using {chunk_info_str} chunk failed, aborting GC!");
> +        }
> +
> +        Ok(())
> +    }
> +
>      pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());


> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 75c0c16ab..5558bb1ac 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1170,6 +1170,19 @@ impl DataStore {
>                  upid: Some(upid.to_string()),
>                  ..Default::default()
>              };
> +            let tuning: DatastoreTuning = serde_json::from_value(
> +                DatastoreTuning::API_SCHEMA
> +                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
> +            )?;
> +            if tuning.gc_atime_safety_check.unwrap_or(true) {
> +                self.inner
> +                    .chunk_store
> +                    .check_fs_atime_updates(true)
> +                    .map_err(|err| format_err!("atime safety check failed - {err:#}"))?;
> +                info!("Access time safety check successful, proceeding with GC.");
> +            } else {
> +                info!("Filesystem atime safety check disabled by datastore tuning options.");

Sounds IMO scarier than what this is, i.e. like the whole GC would not
be safe anymore, or would not use atime safety checks anymore.

I'd rather either drop that or reword it to something else, maybe just doing
s/safety/update/ would be enough.

> +            }
>  
>              info!("Start GC phase1 (mark used chunks)");
>  
> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index fe3260f6d..35847fc45 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
>                  backup_user.uid,
>                  backup_user.gid,
>                  tuning.sync_level.unwrap_or_default(),
> +                tuning.gc_atime_safety_check.unwrap_or(true),
>              )
>              .map(|_| ())
>          } else {


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
@ 2025-03-19  8:45   ` Thomas Lamprecht
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Lamprecht @ 2025-03-19  8:45 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

Am 06.03.25 um 15:52 schrieb Christian Ebner:
> Allow to edit the atime safety check flag via the datastore tuning
> options edit window.
> 
> Do not expose the flag for datastore creation as it is strongly
> discouraged to create datastores on filesystems not correctly handling
> atime updates as the garbage collection expects. It is nevertheless
> still possible to create a datastore via the cli and pass in the
> `--tuning gc-atime-safety-check=false` option.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 4:
> - no changes
> 
>  www/Utils.js                | 4 ++++
>  www/datastore/OptionView.js | 8 ++++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/www/Utils.js b/www/Utils.js
> index 2746ef0b5..9bd7e1615 100644
> --- a/www/Utils.js
> +++ b/www/Utils.js
> @@ -846,6 +846,10 @@ Ext.define('PBS.Utils', {
>  	sync = PBS.Utils.tuningOptions['sync-level'][sync ?? '__default__'];
>  	options.push(`${gettext('Sync Level')}: ${sync}`);
>  
> +	let gc_atime_safety_check = tuning['gc-atime-safety-check'];
> +	delete tuning['gc-atime-safety-check'];
> +	options.push(`${gettext('GC Access Time Safety Check')}: ${gc_atime_safety_check ?? true}`);
> +
>  	for (const [k, v] of Object.entries(tuning)) {
>  	    options.push(`${k}: ${v}`);
>  	}
> diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
> index e1f38af6f..9ce3ce388 100644
> --- a/www/datastore/OptionView.js
> +++ b/www/datastore/OptionView.js
> @@ -271,6 +271,14 @@ Ext.define('PBS.Datastore.Options', {
>  			    deleteEmpty: true,
>  			    value: '__default__',
>  			},
> +			{
> +			    xtype: 'proxmoxcheckbox',
> +			    name: 'gc-atime-safety-check',
> +			    fieldLabel: gettext('GC Access Time Safety Check'),

Such long field labels are normally not nice for the layout, maybe move
some of the info in the boxLabel, which gets rendered to the right of the
box? E.g., something like:

fieldLabel: gettect('GC atime Check'),
boxLabel: gettect('Ensure underlying storage honors access time updates'),



> +			    defaultValue: 1,
> +			    uncheckedValue: 0,
> +			    deleteDefaultValue: true,
> +			},
>  		    ],
>  		},
>  	    },



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-19  8:10   ` Thomas Lamprecht
@ 2025-03-19  8:48     ` Christian Ebner
  2025-03-19  9:01       ` Thomas Lamprecht
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Ebner @ 2025-03-19  8:48 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 3/19/25 09:10, Thomas Lamprecht wrote:
> Am 06.03.25 um 15:52 schrieb Christian Ebner:
>> Add the `gc-atime-cutoff` option to the datastore tuning parameters.
>> This allows to specify the time after which the chunks are not
>> considered in use anymore if their atime has not been updated since
>> then.
>>
>> The default is to keep chunks within the 24h 5m timespan (given no
>> active writers).
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 4:
>> - drop mentioning this being conditional on gc-atime-safety-check,
>>    it isn't anymore
>>
>>   pbs-api-types/src/datastore.rs | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
>> index 1a4d9e2d..467417e8 100644
>> --- a/pbs-api-types/src/datastore.rs
>> +++ b/pbs-api-types/src/datastore.rs
>> @@ -223,6 +223,15 @@ pub enum DatastoreFSyncLevel {
>>       Filesystem,
>>   }
>>   
>> +pub const GC_ATIME_CUTOFF_SCHEMA: Schema = IntegerSchema::new(
>> +    "Cutoff (in minutes) for chunk cleanup atime check in garbage collection phase 2 \
>> +        (default 24h 5m)",
>> +)
>> +.minimum(1) // safety margin for kernel timestamp granularity, but stay within minute range
>> +.maximum(2880) // 2 days
>> +.default(1445)
> 
> tiny nit that could definitively get fixed up on applying or if there
> is the need for a v6 due to other stuff that actually matters, otherwise
> we can keep it as is:
> I'd prefer writing time constants that are less common – like these here
> – such, that one more easily can tell what the outcome is even without a
> comment (that might get outdated), e.g. for this here:
> 
> .maximum(2 * 24 * 60)
> .default(24 * 60 + 5)
> 
> Very common values like 60 m or even 600 s for 1 h are IMO normally fine
> though.

Agreed, makes these constant definitely more intuitive!

Given this I did check also the output in the docs, which however shows 
the very unintuitive integer rage and default value.

So given that, maybe it would be better to switch from an integer schema 
to a string schema and parse the values as HumanTime?
Similar to what is done for the GC schedule.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
@ 2025-03-19  8:49   ` Thomas Lamprecht
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Lamprecht @ 2025-03-19  8:49 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

Am 06.03.25 um 15:52 schrieb Christian Ebner:
> Allows to set the atime cutoff for phase 2 of garbage collection in
> the datastores tuning parameters. This value changes the time after
> which a chunk is not considered in use anymore if it falls outside of
> the cutoff window.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 4:
> - no changes
> 
>  www/Utils.js                | 5 +++++
>  www/datastore/OptionView.js | 9 +++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/www/Utils.js b/www/Utils.js
> index 9bd7e1615..13b5eceda 100644
> --- a/www/Utils.js
> +++ b/www/Utils.js
> @@ -850,6 +850,11 @@ Ext.define('PBS.Utils', {
>  	delete tuning['gc-atime-safety-check'];
>  	options.push(`${gettext('GC Access Time Safety Check')}: ${gc_atime_safety_check ?? true}`);
>  
> +	let gc_atime_cutoff = tuning['gc-atime-cutoff'];
> +	delete tuning['gc-atime-cutoff'];
> +	gc_atime_cutoff = gc_atime_cutoff ?? '1445';
> +	options.push(`${gettext('GC Access Time Cutoff')}: ${gc_atime_cutoff}m`);
> +
>  	for (const [k, v] of Object.entries(tuning)) {
>  	    options.push(`${k}: ${v}`);
>  	}
> diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
> index 9ce3ce388..9e8dcc4fb 100644
> --- a/www/datastore/OptionView.js
> +++ b/www/datastore/OptionView.js
> @@ -279,6 +279,15 @@ Ext.define('PBS.Datastore.Options', {
>  			    uncheckedValue: 0,
>  			    deleteDefaultValue: true,
>  			},
> +			{
> +			    xtype: 'proxmoxintegerfield',
> +			    emptyText: Proxmox.Utils.defaultText,

This won't change that soon I think, so we could convey the actual default
value, like e.g.:

gettext('1445 (24 hours 5 minutes)'),

> +			    name: 'gc-atime-cutoff',

tiny nit, I prefer ordering of properties such that xtype and name come first,
as normally those are the most important options w.r.t. what this is and how
it will be submitted. It's nice to get the label then afterward, but way less
hard feelings there.

> +			    minValue: 1,
> +			    maxValue: 2880,
> +			    fieldLabel: gettext('GC Access Time Cutoff (min)'),
> +			    deleteEmpty: true,
> +			},
>  		    ],
>  		},
>  	    },



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-19  8:48     ` Christian Ebner
@ 2025-03-19  9:01       ` Thomas Lamprecht
  2025-03-19  9:08         ` Christian Ebner
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Lamprecht @ 2025-03-19  9:01 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

Am 19.03.25 um 09:48 schrieb Christian Ebner:
> Given this I did check also the output in the docs, which however shows 
> the very unintuitive integer rage and default value.
> 
> So given that, maybe it would be better to switch from an integer schema 
> to a string schema and parse the values as HumanTime?
> Similar to what is done for the GC schedule.

That might indeed have some benefits UX wise, albeit I'm not 100% sure how
good this is in  the web UI, or is there a pre-existing use case for
HumanTime in the UI and thus ideally have already a nice validator there to
provide quicker feedback to users if they enter something bogus.

As the GC schedule is a calendar event, i.e. time instants not really time
durations like HumanTime is, e.g. `daily` or `hourly`, which calendar event
support would not make that much sense here.

That said, while I have some slight reservations those probably could be
resolved with some frontend validation, and it probably would not get
worse than plain minutes, albeit I'd expect that most people either will
reduce this to the minimum of 1 to ensure GC collects chunks without
references much faster or leave it as is, so not sure how much hassle
this is worth it, at least for this specific duration setting.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-19  9:01       ` Thomas Lamprecht
@ 2025-03-19  9:08         ` Christian Ebner
  2025-03-19  9:13           ` Thomas Lamprecht
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Ebner @ 2025-03-19  9:08 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 3/19/25 10:01, Thomas Lamprecht wrote:
> Am 19.03.25 um 09:48 schrieb Christian Ebner:
>> Given this I did check also the output in the docs, which however shows
>> the very unintuitive integer rage and default value.
>>
>> So given that, maybe it would be better to switch from an integer schema
>> to a string schema and parse the values as HumanTime?
>> Similar to what is done for the GC schedule.
> 
> That might indeed have some benefits UX wise, albeit I'm not 100% sure how
> good this is in  the web UI, or is there a pre-existing use case for
> HumanTime in the UI and thus ideally have already a nice validator there to
> provide quicker feedback to users if they enter something bogus.

I think this could be done in the schema validation function, but have 
to take a closer look.

> As the GC schedule is a calendar event, i.e. time instants not really time
> durations like HumanTime is, e.g. `daily` or `hourly`, which calendar event
> support would not make that much sense here.

Yes, these would definitely need to be excluded.

> That said, while I have some slight reservations those probably could be
> resolved with some frontend validation, and it probably would not get
> worse than plain minutes, albeit I'd expect that most people either will
> reduce this to the minimum of 1 to ensure GC collects chunks without
> references much faster or leave it as is, so not sure how much hassle
> this is worth it, at least for this specific duration setting.

Okay, will take a closer look, if it requires to much adaption we can 
keep it as integer schema (+ your improvement suggestions).


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-19  9:08         ` Christian Ebner
@ 2025-03-19  9:13           ` Thomas Lamprecht
  2025-03-19  9:25             ` Christian Ebner
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Lamprecht @ 2025-03-19  9:13 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

Am 19.03.25 um 10:08 schrieb Christian Ebner:
> On 3/19/25 10:01, Thomas Lamprecht wrote:
>> Am 19.03.25 um 09:48 schrieb Christian Ebner:
>>> Given this I did check also the output in the docs, which however shows
>>> the very unintuitive integer rage and default value.
>>>
>>> So given that, maybe it would be better to switch from an integer schema
>>> to a string schema and parse the values as HumanTime?
>>> Similar to what is done for the GC schedule.
>>
>> That might indeed have some benefits UX wise, albeit I'm not 100% sure how
>> good this is in  the web UI, or is there a pre-existing use case for
>> HumanTime in the UI and thus ideally have already a nice validator there to
>> provide quicker feedback to users if they enter something bogus.
> 
> I think this could be done in the schema validation function, but have 
> to take a closer look.

Sure, but that would mean a roundtrip to the backend is required, having
a frontend validation provides less latency w.r.t. feedback if a value
is fine.

Albeit we do not have a good one for calendar events, so this might be
a bit of a high bar to ask for now. FWIW, this will get "solved" by a
rust based UI where we can reuse the exact same types and validators also
for the frontend without any (developer) overhead (we could also compile
just the validators we want now as WASM and use that in the ExtJS UI, but
that not sure if that's worth the hassle).

> 
>> As the GC schedule is a calendar event, i.e. time instants not really time
>> durations like HumanTime is, e.g. `daily` or `hourly`, which calendar event
>> support would not make that much sense here.
> 
> Yes, these would definitely need to be excluded.

I think we mean the same, but just to be sure: It should not need to be
excluded as those are just two different types, as mentioned one is for
instants on a calendar and the other is for durations.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored
  2025-03-19  8:45   ` Thomas Lamprecht
@ 2025-03-19  9:22     ` Christian Ebner
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-19  9:22 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 3/19/25 09:45, Thomas Lamprecht wrote:
> Am 06.03.25 um 15:52 schrieb Christian Ebner:
>> Check if the filesystem backing the chunk store actually updates the
>> atime to avoid potential data loss in phase 2 of garbage collection,
>> in case the atime update is not honored.
>>
>> Perform the check before phase 1 of garbage collection, as well as
>> on datastore creation. The latter to early detect and disallow
>> datastore creation on filesystem configurations which otherwise most
>> likely would lead to data losses.
>>
>> Enable the atime update check by default, but allow to opt-out by
>> setting a datastore tuning parameter flag for backwards compatibility.
>> This is honored by both, garbage collection and datastore creation.
>>
>> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
>> as test marker, inserted if not present. This all zero-chunk is very
>> likely anyways for unencrypted backup contents with large all-zero
>> regions using fixed size chunking (e.g. VMs).
>>
>> To avoid cases were the timestamp will not be updated because of the
>> Linux kernels timestamp granularity, sleep in-between stating and
>> utimensat for 1 second.
>>
>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 4:
>> - Improve logging of values if atime update fails.
>> - fix incorrect comment
>>
>>   pbs-datastore/src/chunk_store.rs | 101 ++++++++++++++++++++++++++++---
>>   pbs-datastore/src/datastore.rs   |  13 ++++
>>   src/api2/config/datastore.rs     |   1 +
>>   3 files changed, 108 insertions(+), 7 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 5e02909a1..a8c826353 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -1,9 +1,11 @@
>> +use std::os::unix::fs::MetadataExt;
>>   use std::os::unix::io::AsRawFd;
>>   use std::path::{Path, PathBuf};
>>   use std::sync::{Arc, Mutex};
>> +use std::time::{Duration, UNIX_EPOCH};
>>   
>> -use anyhow::{bail, format_err, Error};
>> -use tracing::info;
>> +use anyhow::{bail, format_err, Context, Error};
>> +use tracing::{info, warn};
>>   
>>   use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>>   use proxmox_io::ReadExt;
>> @@ -13,6 +15,7 @@ use proxmox_sys::process_locker::{
>>   };
>>   use proxmox_worker_task::WorkerTaskContext;
>>   
>> +use crate::data_blob::DataChunkBuilder;
>>   use crate::file_formats::{
>>       COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>>   };
>> @@ -93,6 +96,7 @@ impl ChunkStore {
>>           uid: nix::unistd::Uid,
>>           gid: nix::unistd::Gid,
>>           sync_level: DatastoreFSyncLevel,
>> +        atime_safety_check: bool,
> 
> Not a huge fan of another flat parameter here, albeit it's a bit borderline.
> I do not want to add scope creep, but if we really want this here, then moving
> those options into a struct might be nicer, could be done afterward too though.
> 
> But stepping a bit back, why is this a parameter for the fn creating a new
> chunkstore in the first place? Why cannot the call sites that are interested
> in doing this call check_fs_atime_updates themselves on the chunkstore instance
> after getting it instead of having to loop through a parameter?
> 
> We do not have many places that create datastores, so I do not think there's a
> high likelihood that we forget to call it.

Yes, can move this to the call side instead. The intention of having 
this here is so that it will always be considered when creating a chunk 
store. But doing this in the datastore creation only might even be 
better. As currently this is only checked on chunk store creation, not 
when reusing a pre-existing datastore. It might however make sense to 
run the check there too.

> 
>>       ) -> Result<Self, Error>
>>       where
>>           P: Into<PathBuf>,
>> @@ -147,7 +151,17 @@ impl ChunkStore {
>>               }
>>           }
>>   
>> -        Self::open(name, base, sync_level)
>> +        let chunk_store = Self::open(name, base, sync_level)?;
>> +        if atime_safety_check {
>> +            chunk_store
>> +                .check_fs_atime_updates(true)
>> +                .map_err(|err| format_err!("access time safety check failed - {err:#}"))?;
>> +            info!("Access time safety check successful.");
>> +        } else {
>> +            info!("Access time safety check skipped.");
>> +        }
>> +
>> +        Ok(chunk_store)
>>       }
>>   
>>       fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
>> @@ -442,6 +456,66 @@ impl ChunkStore {
>>           Ok(())
>>       }
>>   
>> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
>> +    ///
>> +    /// Checks if the atime is always updated by utimensat taking into consideration the Linux
>> +    /// kernel timestamp granularity.
>> +    /// If `retry_on_file_changed` is set to true, the check is performed again on the changed file
>> +    /// if a file change while testing is detected by differences in bith time or inode number.
>> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
>> +    /// the chunk store if not yet present.
>> +    /// Returns with error if the check could not be performed.
>> +    pub fn check_fs_atime_updates(&self, retry_on_file_changed: bool) -> Result<(), Error> {
>> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
>> +        let (pre_existing, _) = self.insert_chunk(&zero_chunk, &digest)?;
>> +        let (path, _digest) = self.chunk_path(&digest);
>> +
>> +        // Take into account timestamp update granularity in the kernel
>> +        std::thread::sleep(Duration::from_secs(1));
> 
> If there's the need for a v6 I'd maybe add to the comment that this normally
> runs in a worker, so sleep blocking the whole thread is fine here.

Acked, will be included.

> Also, you write "sleep in-between stating and utimensat for 1 second", but
> this is sleeping before the stat + touch + stat calls, or am I overlooking
> something?

Yes, the wording is not correct anymore after iterating the patch 
series, as the order of execution changed a bit.
The first atime update happens in the insert chunk, then we read the 
metadata after sleeping for 1 second. This is to reduce the chance of 
another atime update happening from somewhere else in the mean time 
(suggested by Fabian in his previous review of the series).

> 
>> +        let metadata_before =
>> +            std::fs::metadata(&path).context(format!("failed to get metadata for {path:?}"))?;
>> +
>> +        // Second atime update if chunk pre-existed, insert_chunk already updates pre-existing ones
>> +        self.cond_touch_path(&path, true)?;
>> +
>> +        let metadata_now =
>> +            std::fs::metadata(&path).context(format!("failed to get metadata for {path:?}"))?;
> 
> Another tiny nit: maybe encode now/after in the context to be easier able
> to distinguish which stat syscall failed – while it's very unlikely that
> one succeeds and the other doesn't, it would not hurt to be able to tell
> that case apart.

Acked, will incorporate that.

> 
>> +
>> +        // Check for the unlikely case that the file changed in-between the
>> +        // two metadata calls, try to check once again on changed file
>> +        if metadata_before.ino() != metadata_now.ino() {
>> +            if retry_on_file_changed {
>> +                return self.check_fs_atime_updates(false);
>> +            }
>> +            bail!("chunk {path:?} changed twice during access time safety check, cannot proceed.");
>> +        }
>> +
>> +        if metadata_before.accessed()? >= metadata_now.accessed()? {
>> +            let chunk_info_str = if pre_existing {
>> +                "pre-existing"
>> +            } else {
>> +                "newly inserted"
>> +            };
>> +            warn!("Chunk metadata was not correctly updated during access time safety check:");
>> +            info!(
>> +                "Timestamps before update: accessed {:?}, modified {:?}, created {:?}",
>> +                metadata_before.accessed().unwrap_or(UNIX_EPOCH),
>> +                metadata_before.modified().unwrap_or(UNIX_EPOCH),
>> +                metadata_before.created().unwrap_or(UNIX_EPOCH),
>> +            );
>> +            info!(
>> +                "Timestamps after update: accessed {:?}, modified {:?}, created {:?}",
>> +                metadata_now.accessed().unwrap_or(UNIX_EPOCH),
>> +                metadata_now.modified().unwrap_or(UNIX_EPOCH),
>> +                metadata_now.created().unwrap_or(UNIX_EPOCH),
>> +            );
>> +            bail!("access time safety check using {chunk_info_str} chunk failed, aborting GC!");
>> +        }
>> +
>> +        Ok(())
>> +    }
>> +
>>       pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
> 
> 
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 75c0c16ab..5558bb1ac 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1170,6 +1170,19 @@ impl DataStore {
>>                   upid: Some(upid.to_string()),
>>                   ..Default::default()
>>               };
>> +            let tuning: DatastoreTuning = serde_json::from_value(
>> +                DatastoreTuning::API_SCHEMA
>> +                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
>> +            )?;
>> +            if tuning.gc_atime_safety_check.unwrap_or(true) {
>> +                self.inner
>> +                    .chunk_store
>> +                    .check_fs_atime_updates(true)
>> +                    .map_err(|err| format_err!("atime safety check failed - {err:#}"))?;
>> +                info!("Access time safety check successful, proceeding with GC.");
>> +            } else {
>> +                info!("Filesystem atime safety check disabled by datastore tuning options.");
> 
> Sounds IMO scarier than what this is, i.e. like the whole GC would not
> be safe anymore, or would not use atime safety checks anymore.
> 
> I'd rather either drop that or reword it to something else, maybe just doing
> s/safety/update/ would be enough.

Agreed, will replace as suggested.

>> +            }
>>   
>>               info!("Start GC phase1 (mark used chunks)");
>>   
>> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
>> index fe3260f6d..35847fc45 100644
>> --- a/src/api2/config/datastore.rs
>> +++ b/src/api2/config/datastore.rs
>> @@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
>>                   backup_user.uid,
>>                   backup_user.gid,
>>                   tuning.sync_level.unwrap_or_default(),
>> +                tuning.gc_atime_safety_check.unwrap_or(true),
>>               )
>>               .map(|_| ())
>>           } else {



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-19  9:13           ` Thomas Lamprecht
@ 2025-03-19  9:25             ` Christian Ebner
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-19  9:25 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 3/19/25 10:13, Thomas Lamprecht wrote:
> I think we mean the same, but just to be sure: It should not need to be
> excluded as those are just two different types, as mentioned one is for
> instants on a calendar and the other is for durations.

Yeah, we are on the same page here. What I meant is excluded as in 
excluded as valid user input.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored
  2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (7 preceding siblings ...)
  2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 8/8] docs: mention gc-atime-cutoff as " Christian Ebner
@ 2025-03-19 17:26 ` Christian Ebner
  8 siblings, 0 replies; 20+ messages in thread
From: Christian Ebner @ 2025-03-19 17:26 UTC (permalink / raw)
  To: pbs-devel

superseded-by version 6:
https://lore.proxmox.com/pbs-devel/20250319172432.563885-1-c.ebner@proxmox.com/T/


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-03-19 17:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-06 14:52 [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
2025-03-19  8:10   ` Thomas Lamprecht
2025-03-19  8:48     ` Christian Ebner
2025-03-19  9:01       ` Thomas Lamprecht
2025-03-19  9:08         ` Christian Ebner
2025-03-19  9:13           ` Thomas Lamprecht
2025-03-19  9:25             ` Christian Ebner
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
2025-03-19  8:45   ` Thomas Lamprecht
2025-03-19  9:22     ` Christian Ebner
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
2025-03-19  8:45   ` Thomas Lamprecht
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 5/8] docs: mention GC atime update check for " Christian Ebner
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 6/8] datastore: use custom GC atime cutoff if set Christian Ebner
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
2025-03-19  8:49   ` Thomas Lamprecht
2025-03-06 14:52 ` [pbs-devel] [PATCH v5 proxmox-backup 8/8] docs: mention gc-atime-cutoff as " Christian Ebner
2025-03-19 17:26 ` [pbs-devel] [PATCH v5 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal