public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored
@ 2025-03-05 15:14 Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

These patches add a check to phase 1 of garbage collection and datastore
creation in order to detect when the filesystem backing the chunk store
does not honor atime updates. This avoids possible data loss for situations
where garbage collection could otherwise delete chunks still referenced by
a backup snaphost's index file.

The check is performed on a fixed size 4 MiB unencrypted and compressed
chunk of all-zeros, inserted if not present yet.
The Linux kernel timestamp granularity is taken into account by sleeping
for 1 second to avoid discarded atime update attempts by utimensat calls.
The test is enabled by default, but an opt-out option can be set via the
datastore tuning parameters for backwards compatibility.

Further, add a datastore tuning parameter to reduce the wait period for
chunk removal in phase 2 of garbage collection. Make this conditional on
the atime update check being enabled and successful, to avoid possible
data loss.

Most notable changes sice version 3 (thanks Fabian for feedback):
- Drop check for relatime like behaviour, as this is not supported and
  does not show up in any of the tests performed on btrfs, cephfs, ext4,
  NFS3, NFS4, ntfs, SMB3_11, xfs or ZFS.
- Additionally check chunk inode to detect possible but very unlikely
  file changes, perform check once again in that case.
- Move atime cutoff selection and min_atime calculation to the same
  location, as they are logically related.

Most notable changes sice version 2 (thanks Fabian and Thomas for
comments and suggestions):
- Take into account Linux timestamp granularity, do not set timestamp
  to the past, as that introduces other error paths such as lack of
  permissions or fs limitations.
- Check relatime behavior, if atime behaviour is not honored. Fallback
  to original cutoff in that case.
- Adapt tuning parameter names.

Most notable changes sice version 1 (thanks Fabian and Thomas for
comments and suggestions):
- Optimize check by using the all zero chunk
- Enable the check by default and fail GC job if not honored, but allow
  to opt-out
- Add GC wait period tuning option

Link to the issue in the bugtracker:
https://bugzilla.proxmox.com/show_bug.cgi?id=5982

proxmox:

Christian Ebner (2):
  pbs api types: add garbage collection atime safety check flag
  pbs api types: add option to set GC chunk cleanup atime cutoff

 pbs-api-types/src/datastore.rs | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

proxmox-backup:

Christian Ebner (6):
  fix #5982: garbage collection: check atime updates are honored
  ui: expose GC atime safety check flag in datastore tuning options
  docs: mention GC atime update check for tuning options
  datastore: conditionally use custom GC atime cutoff if set
  ui: expose GC atime cutoff in datastore tuning option
  docs: mention gc-atime-cutoff as datastore tuning option

 docs/storage.rst                 | 19 ++++++-
 pbs-datastore/src/chunk_store.rs | 97 +++++++++++++++++++++++++++-----
 pbs-datastore/src/datastore.rs   | 37 +++++++++++-
 src/api2/config/datastore.rs     |  1 +
 www/Utils.js                     |  9 +++
 www/datastore/OptionView.js      | 17 ++++++
 6 files changed, 162 insertions(+), 18 deletions(-)

-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 1/8] pbs api types: add garbage collection atime safety check flag
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Add the `gc-atime-safety-check` flag to the datastore tuning
parameters. This flag allows to enable/disable a check to detect if
the atime update is honored by the filesystem backing the chunk store
during phase 1 of garbage collection and during datastore creation.

The default is to perform the check.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- no changes

 pbs-api-types/src/datastore.rs | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index ddd8d3c6..1a4d9e2d 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -229,6 +229,14 @@ pub enum DatastoreFSyncLevel {
             type: ChunkOrder,
             optional: true,
         },
+        "gc-atime-safety-check": {
+            description:
+                "Check filesystem atime updates are honored during store creation and garbage \
+                collection",
+            optional: true,
+            default: true,
+            type: bool,
+        },
     },
 )]
 #[derive(Serialize, Deserialize, Default)]
@@ -240,6 +248,8 @@ pub struct DatastoreTuning {
     pub chunk_order: Option<ChunkOrder>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub sync_level: Option<DatastoreFSyncLevel>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub gc_atime_safety_check: Option<bool>,
 }
 
 pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Add the `gc-atime-cutoff` option to the datastore tuning parameters.
This allows to specify the time after which the chunks are not
considered in use anymore if their atime has not been updated since
then. This option is only considered, if the `gc-atime-safety-check`
is enabled, to avoid potential data loss.

The default is to keep chunks within the 24h 5m timespan (given no
active writers).

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- no changes

 pbs-api-types/src/datastore.rs | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 1a4d9e2d..467417e8 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -223,6 +223,15 @@ pub enum DatastoreFSyncLevel {
     Filesystem,
 }
 
+pub const GC_ATIME_CUTOFF_SCHEMA: Schema = IntegerSchema::new(
+    "Cutoff (in minutes) for chunk cleanup atime check in garbage collection phase 2 \
+        (default 24h 5m)",
+)
+.minimum(1) // safety margin for kernel timestamp granularity, but stay within minute range
+.maximum(2880) // 2 days
+.default(1445)
+.schema();
+
 #[api(
     properties: {
         "chunk-order": {
@@ -237,6 +246,10 @@ pub enum DatastoreFSyncLevel {
             default: true,
             type: bool,
         },
+        "gc-atime-cutoff": {
+            schema: GC_ATIME_CUTOFF_SCHEMA,
+            optional: true,
+        },
     },
 )]
 #[derive(Serialize, Deserialize, Default)]
@@ -250,6 +263,8 @@ pub struct DatastoreTuning {
     pub sync_level: Option<DatastoreFSyncLevel>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub gc_atime_safety_check: Option<bool>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub gc_atime_cutoff: Option<usize>,
 }
 
 pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-06 11:02   ` Fabian Grünbichler
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Check if the filesystem backing the chunk store actually updates the
atime to avoid potential data loss in phase 2 of garbage collection,
in case the atime update is not honored.

Perform the check before phase 1 of garbage collection, as well as
on datastore creation. The latter to early detect and disallow
datastore creation on filesystem configurations which otherwise most
likely would lead to data losses.

Enable the atime update check by default, but allow to opt-out by
setting a datastore tuning parameter flag for backwards compatibility.
This is honored by both, garbage collection and datastore creation.

The check uses a 4 MiB fixed sized, unencypted and compressed chunk
as test marker, inserted if not present. This all zero-chunk is very
likely anyways for unencrypted backup contents with large all-zero
regions using fixed size chunking (e.g. VMs).

To avoid cases were the timestamp will not be updated because of the
Linux kernels timestamp granularity, sleep in-between stating and
utimensat for 1 second.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- Drop check for relatime like behaviour as all tested filesystem (see
  coverletter) honor the direct atime update.
- Reduce risk of races by moving the sleep to be before the first
  metadata call.
- Log also if the test chunk was newly inserted or pre-existed.

 pbs-datastore/src/chunk_store.rs | 87 ++++++++++++++++++++++++++++++--
 pbs-datastore/src/datastore.rs   |  9 ++++
 src/api2/config/datastore.rs     |  1 +
 3 files changed, 92 insertions(+), 5 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 5e02909a1..e482330b3 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -1,6 +1,8 @@
+use std::os::unix::fs::MetadataExt;
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, Mutex};
+use std::time::Duration;
 
 use anyhow::{bail, format_err, Error};
 use tracing::info;
@@ -13,6 +15,7 @@ use proxmox_sys::process_locker::{
 };
 use proxmox_worker_task::WorkerTaskContext;
 
+use crate::data_blob::DataChunkBuilder;
 use crate::file_formats::{
     COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
 };
@@ -93,6 +96,7 @@ impl ChunkStore {
         uid: nix::unistd::Uid,
         gid: nix::unistd::Gid,
         sync_level: DatastoreFSyncLevel,
+        atime_safety_check: bool,
     ) -> Result<Self, Error>
     where
         P: Into<PathBuf>,
@@ -147,7 +151,14 @@ impl ChunkStore {
             }
         }
 
-        Self::open(name, base, sync_level)
+        let chunk_store = Self::open(name, base, sync_level)?;
+        if atime_safety_check {
+            chunk_store.check_fs_atime_updates(true)?;
+        } else {
+            info!("atime safety check skipped.");
+        }
+
+        Ok(chunk_store)
     }
 
     fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
@@ -442,6 +453,59 @@ impl ChunkStore {
         Ok(())
     }
 
+    /// Check if atime updates are honored by the filesystem backing the chunk store.
+    ///
+    /// Checks if the atime is always updated by utimensat taking into consideration the Linux
+    /// kernel timestamp granularity.
+    /// If `retry_on_file_changed` is set to true, the check is performed again on the changed file
+    /// if a file change while testing is detected by differences in bith time or inode number.
+    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
+    /// the chunk store if not yet present.
+    /// Returns with error if the check could not be performed.
+    pub fn check_fs_atime_updates(&self, retry_on_file_changed: bool) -> Result<(), Error> {
+        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
+        let (pre_existing, _) = self.insert_chunk(&zero_chunk, &digest)?;
+        let (path, _digest) = self.chunk_path(&digest);
+
+        // Take into account timestamp update granularity in the kernel
+        std::thread::sleep(Duration::from_secs(1));
+
+        let metadata_before = std::fs::metadata(&path).map_err(Error::from)?;
+
+        // Second atime update if chunk was newly inserted above
+        self.cond_touch_path(&path, true)?;
+
+        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;
+
+        // Check for the unlikely case that the file changed in-between the
+        // two metadata calls, try to check once again on changed file
+        if metadata_before.ino() != metadata_now.ino() {
+            if retry_on_file_changed {
+                return self.check_fs_atime_updates(false);
+            }
+            bail!("chunk file changed twice during atime testing, cannot proceed.");
+        }
+
+        match (
+            metadata_before.accessed()? < metadata_now.accessed()?,
+            pre_existing,
+        ) {
+            (true, false) => info!("Access time safety check successful (inserted chunk for test)."),
+            (true, true) => {
+                info!("Access time safety check successful (used pre-existing chunk for test).")
+            }
+            (false, false) => bail!(
+                "Access time safety check failed (inserted chunk for test), is atime support \
+                enabled on datastore backing filesystem?"
+            ),
+            (false, true) => bail!(
+                "Access time safety check failed (used pre-existing chunk for test), is atime \
+                support enabled on datastore backing filesystem?"
+            ),
+        }
+        Ok(())
+    }
+
     pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
@@ -628,8 +692,15 @@ fn test_chunk_store1() {
     let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
         .unwrap()
         .unwrap();
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        true,
+    )
+    .unwrap();
 
     let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
         .build()
@@ -641,8 +712,14 @@ fn test_chunk_store1() {
     let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
     assert!(exists);
 
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        true,
+    );
     assert!(chunk_store.is_err());
 
     if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 75c0c16ab..b62ddc172 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1170,6 +1170,15 @@ impl DataStore {
                 upid: Some(upid.to_string()),
                 ..Default::default()
             };
+            let tuning: DatastoreTuning = serde_json::from_value(
+                DatastoreTuning::API_SCHEMA
+                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
+            )?;
+            if tuning.gc_atime_safety_check.unwrap_or(true) {
+                self.inner.chunk_store.check_fs_atime_updates(true)?;
+            } else {
+                info!("Filesystem atime safety check disabled by datastore tuning options.");
+            }
 
             info!("Start GC phase1 (mark used chunks)");
 
diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index fe3260f6d..35847fc45 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
                 backup_user.uid,
                 backup_user.gid,
                 tuning.sync_level.unwrap_or_default(),
+                tuning.gc_atime_safety_check.unwrap_or(true),
             )
             .map(|_| ())
         } else {
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (2 preceding siblings ...)
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 5/8] docs: mention GC atime update check for " Christian Ebner
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Allow to edit the atime safety check flag via the datastore tuning
options edit window.

Do not expose the flag for datastore creation as it is strongly
discouraged to create datastores on filesystems not correctly handling
atime updates as the garbage collection expects. It is nevertheless
still possible to create a datastore via the cli and pass in the
`--tuning gc-atime-safety-check=false` option.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- no changes

 www/Utils.js                | 4 ++++
 www/datastore/OptionView.js | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/www/Utils.js b/www/Utils.js
index 2746ef0b5..9bd7e1615 100644
--- a/www/Utils.js
+++ b/www/Utils.js
@@ -846,6 +846,10 @@ Ext.define('PBS.Utils', {
 	sync = PBS.Utils.tuningOptions['sync-level'][sync ?? '__default__'];
 	options.push(`${gettext('Sync Level')}: ${sync}`);
 
+	let gc_atime_safety_check = tuning['gc-atime-safety-check'];
+	delete tuning['gc-atime-safety-check'];
+	options.push(`${gettext('GC Access Time Safety Check')}: ${gc_atime_safety_check ?? true}`);
+
 	for (const [k, v] of Object.entries(tuning)) {
 	    options.push(`${k}: ${v}`);
 	}
diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
index e1f38af6f..9ce3ce388 100644
--- a/www/datastore/OptionView.js
+++ b/www/datastore/OptionView.js
@@ -271,6 +271,14 @@ Ext.define('PBS.Datastore.Options', {
 			    deleteEmpty: true,
 			    value: '__default__',
 			},
+			{
+			    xtype: 'proxmoxcheckbox',
+			    name: 'gc-atime-safety-check',
+			    fieldLabel: gettext('GC Access Time Safety Check'),
+			    defaultValue: 1,
+			    uncheckedValue: 0,
+			    deleteDefaultValue: true,
+			},
 		    ],
 		},
 	    },
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 5/8] docs: mention GC atime update check for tuning options
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (3 preceding siblings ...)
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 6/8] datastore: conditionally use custom GC atime cutoff if set Christian Ebner
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Document the gc-atime-check flag and describe the behavior it
controls, by adding it as additional bullet point to the documented
datastore tuning options.

This also fixes the intendation for the cli example how to set the
sync level, to make it clear that still belongs to the previous
bullet point.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- no changes

 docs/storage.rst | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/storage.rst b/docs/storage.rst
index 490302955..46faed840 100644
--- a/docs/storage.rst
+++ b/docs/storage.rst
@@ -435,9 +435,16 @@ There are some tuning related options for the datastore that are more advanced:
 
   This can be set with:
 
-.. code-block:: console
+  .. code-block:: console
+
+    # proxmox-backup-manager datastore update <storename> --tuning 'sync-level=filesystem'
 
-  # proxmox-backup-manager datastore update <storename> --tuning 'sync-level=filesystem'
+* ``gc-atime-safety-check``: Datastore GC atime update safety check:
+  You can explicitly `enable` or `disable` the atime update safety check
+  performed on datastore creation and garbage collection. This checks if atime
+  updates are handled as expected by garbage collection and therefore avoids the
+  risk of data loss by unexpected filesystem behaviour. It is recommended to set
+  this to enabled, which is also the default value.
 
 If you want to set multiple tuning options simultaneously, you can separate them
 with a comma, like this:
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 6/8] datastore: conditionally use custom GC atime cutoff if set
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (4 preceding siblings ...)
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 5/8] docs: mention GC atime update check for " Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-06 11:00   ` Fabian Grünbichler
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Use the user configured atime cutoff over the default 24h 5m
margin if explicitly set, but only if the atime safety check is
enabled and succeeded. If the atime safety check is not enabled,
fallback to use the current default.

Move the minimum atime calculation based on the atime cutoff to the
sweep_unused_chunks() callside and pass in the calculated values, as
to have the logic in the same place.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- move min_atime calculation to the gc atime cutoff logic, pass the
  min_atime to sweep_unused_chunks()

 pbs-datastore/src/chunk_store.rs | 10 +---------
 pbs-datastore/src/datastore.rs   | 30 ++++++++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index e482330b3..0b211a44c 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -364,7 +364,7 @@ impl ChunkStore {
     pub fn sweep_unused_chunks(
         &self,
         oldest_writer: i64,
-        phase1_start_time: i64,
+        min_atime: i64,
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
     ) -> Result<(), Error> {
@@ -374,14 +374,6 @@ impl ChunkStore {
         use nix::sys::stat::fstatat;
         use nix::unistd::{unlinkat, UnlinkatFlags};
 
-        let mut min_atime = phase1_start_time - 3600 * 24; // at least 24h (see mount option relatime)
-
-        if oldest_writer < min_atime {
-            min_atime = oldest_writer;
-        }
-
-        min_atime -= 300; // add 5 mins gap for safety
-
         let mut last_percentage = 0;
         let mut chunk_count = 0;
 
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index b62ddc172..16767832e 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1174,12 +1174,38 @@ impl DataStore {
                 DatastoreTuning::API_SCHEMA
                     .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
             )?;
-            if tuning.gc_atime_safety_check.unwrap_or(true) {
+            let gc_atime_safety_check = tuning.gc_atime_safety_check.unwrap_or(true);
+            if gc_atime_safety_check {
                 self.inner.chunk_store.check_fs_atime_updates(true)?;
             } else {
                 info!("Filesystem atime safety check disabled by datastore tuning options.");
+            };
+
+            let mut min_atime = tuning
+                .gc_atime_cutoff
+                .and_then(|cutoff| {
+                    if gc_atime_safety_check {
+                        info!("Using GC atime cutoff {cutoff}m.");
+                        // Account for the 5 min default offset subtracted below
+                        Some(phase1_start_time + 300 - cutoff as i64 * 60)
+                    } else {
+                        warn!(
+                            "Ignoring GC atime cutoff of {cutoff}m since atime check is disabled."
+                        );
+                        None
+                    }
+                })
+                .unwrap_or_else(|| {
+                    info!("Using default GC atime cutoff of 24h 5m");
+                    phase1_start_time - 3600 * 24
+                });
+
+            if oldest_writer < min_atime {
+                min_atime = oldest_writer;
             }
 
+            min_atime -= 300; // add 5 mins gap for safety
+
             info!("Start GC phase1 (mark used chunks)");
 
             self.mark_used_chunks(&mut gc_status, worker)?;
@@ -1187,7 +1213,7 @@ impl DataStore {
             info!("Start GC phase2 (sweep unused chunks)");
             self.inner.chunk_store.sweep_unused_chunks(
                 oldest_writer,
-                phase1_start_time,
+                min_atime,
                 &mut gc_status,
                 worker,
             )?;
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (5 preceding siblings ...)
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 6/8] datastore: conditionally use custom GC atime cutoff if set Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 8/8] docs: mention gc-atime-cutoff as " Christian Ebner
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Allows to set the atime cutoff for phase 2 of garbage collection in
the datastores tuning parameters. This value changes the time after
which a chunk is not considered in use anymore if it falls outside of
the cutoff window.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- no changes

 www/Utils.js                | 5 +++++
 www/datastore/OptionView.js | 9 +++++++++
 2 files changed, 14 insertions(+)

diff --git a/www/Utils.js b/www/Utils.js
index 9bd7e1615..13b5eceda 100644
--- a/www/Utils.js
+++ b/www/Utils.js
@@ -850,6 +850,11 @@ Ext.define('PBS.Utils', {
 	delete tuning['gc-atime-safety-check'];
 	options.push(`${gettext('GC Access Time Safety Check')}: ${gc_atime_safety_check ?? true}`);
 
+	let gc_atime_cutoff = tuning['gc-atime-cutoff'];
+	delete tuning['gc-atime-cutoff'];
+	gc_atime_cutoff = gc_atime_cutoff ?? '1445';
+	options.push(`${gettext('GC Access Time Cutoff')}: ${gc_atime_cutoff}m`);
+
 	for (const [k, v] of Object.entries(tuning)) {
 	    options.push(`${k}: ${v}`);
 	}
diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
index 9ce3ce388..9e8dcc4fb 100644
--- a/www/datastore/OptionView.js
+++ b/www/datastore/OptionView.js
@@ -279,6 +279,15 @@ Ext.define('PBS.Datastore.Options', {
 			    uncheckedValue: 0,
 			    deleteDefaultValue: true,
 			},
+			{
+			    xtype: 'proxmoxintegerfield',
+			    emptyText: Proxmox.Utils.defaultText,
+			    name: 'gc-atime-cutoff',
+			    minValue: 1,
+			    maxValue: 2880,
+			    fieldLabel: gettext('GC Access Time Cutoff (min)'),
+			    deleteEmpty: true,
+			},
 		    ],
 		},
 	    },
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH v4 proxmox-backup 8/8] docs: mention gc-atime-cutoff as datastore tuning option
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (6 preceding siblings ...)
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
@ 2025-03-05 15:14 ` Christian Ebner
  2025-03-05 15:23 ` [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
  2025-03-06 14:54 ` Christian Ebner
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:14 UTC (permalink / raw)
  To: pbs-devel

Document the gc-atime-cutoff option and describe the behavior it
controls, by adding it as additional bullet point to the
documented datastore tuning options.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 3:
- no changes

 docs/storage.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/storage.rst b/docs/storage.rst
index 46faed840..047cb3e59 100644
--- a/docs/storage.rst
+++ b/docs/storage.rst
@@ -446,6 +446,12 @@ There are some tuning related options for the datastore that are more advanced:
   risk of data loss by unexpected filesystem behaviour. It is recommended to set
   this to enabled, which is also the default value.
 
+* ``gc-atime-cutoff``: Datastore GC atime cutoff for chunk cleanup:
+  This allows to set the cutoff for which a chunk is still considered in-use
+  during phase 2 of garbage collection (given no older writers). If the
+  ``atime`` of the chunk is outside the range, it will be removed. This option
+  is only used in combination with the ``gc-atime-safety-check`` being enabled.
+
 If you want to set multiple tuning options simultaneously, you can separate them
 with a comma, like this:
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (7 preceding siblings ...)
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 8/8] docs: mention gc-atime-cutoff as " Christian Ebner
@ 2025-03-05 15:23 ` Christian Ebner
  2025-03-06 14:54 ` Christian Ebner
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-05 15:23 UTC (permalink / raw)
  To: pbs-devel

Please note, patch 1 and 2 belong to proxmox, not proxmox-backup of course.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup 6/8] datastore: conditionally use custom GC atime cutoff if set
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 6/8] datastore: conditionally use custom GC atime cutoff if set Christian Ebner
@ 2025-03-06 11:00   ` Fabian Grünbichler
  2025-03-06 11:19     ` Christian Ebner
  0 siblings, 1 reply; 16+ messages in thread
From: Fabian Grünbichler @ 2025-03-06 11:00 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On March 5, 2025 4:14 pm, Christian Ebner wrote:
> Use the user configured atime cutoff over the default 24h 5m
> margin if explicitly set, but only if the atime safety check is
> enabled and succeeded. If the atime safety check is not enabled,
> fallback to use the current default.
> 
> Move the minimum atime calculation based on the atime cutoff to the
> sweep_unused_chunks() callside and pass in the calculated values, as
> to have the logic in the same place.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 3:
> - move min_atime calculation to the gc atime cutoff logic, pass the
>   min_atime to sweep_unused_chunks()
> 
>  pbs-datastore/src/chunk_store.rs | 10 +---------
>  pbs-datastore/src/datastore.rs   | 30 ++++++++++++++++++++++++++++--
>  2 files changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index e482330b3..0b211a44c 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -364,7 +364,7 @@ impl ChunkStore {
>      pub fn sweep_unused_chunks(
>          &self,
>          oldest_writer: i64,
> -        phase1_start_time: i64,
> +        min_atime: i64,
>          status: &mut GarbageCollectionStatus,
>          worker: &dyn WorkerTaskContext,
>      ) -> Result<(), Error> {
> @@ -374,14 +374,6 @@ impl ChunkStore {
>          use nix::sys::stat::fstatat;
>          use nix::unistd::{unlinkat, UnlinkatFlags};
>  
> -        let mut min_atime = phase1_start_time - 3600 * 24; // at least 24h (see mount option relatime)
> -
> -        if oldest_writer < min_atime {
> -            min_atime = oldest_writer;
> -        }
> -
> -        min_atime -= 300; // add 5 mins gap for safety
> -
>          let mut last_percentage = 0;
>          let mut chunk_count = 0;
>  
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index b62ddc172..16767832e 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1174,12 +1174,38 @@ impl DataStore {
>                  DatastoreTuning::API_SCHEMA
>                      .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
>              )?;
> -            if tuning.gc_atime_safety_check.unwrap_or(true) {
> +            let gc_atime_safety_check = tuning.gc_atime_safety_check.unwrap_or(true);
> +            if gc_atime_safety_check {
>                  self.inner.chunk_store.check_fs_atime_updates(true)?;
>              } else {
>                  info!("Filesystem atime safety check disabled by datastore tuning options.");
> +            };
> +
> +            let mut min_atime = tuning
> +                .gc_atime_cutoff
> +                .and_then(|cutoff| {
> +                    if gc_atime_safety_check {
> +                        info!("Using GC atime cutoff {cutoff}m.");
> +                        // Account for the 5 min default offset subtracted below
> +                        Some(phase1_start_time + 300 - cutoff as i64 * 60)
> +                    } else {
> +                        warn!(
> +                            "Ignoring GC atime cutoff of {cutoff}m since atime check is disabled."

should these be tied together like this? the other way round I can see
at some point in the future.. (lowering the implicit default if the
check is enabled), but why should a non-default cutoff be ignored
because the check is also explicitly disabled?

> +                        );
> +                        None
> +                    }
> +                })
> +                .unwrap_or_else(|| {
> +                    info!("Using default GC atime cutoff of 24h 5m");
> +                    phase1_start_time - 3600 * 24
> +                });

the logging here is a bit confusing.. does it really matter whether the
cutoff is default or explicit?

> +
> +            if oldest_writer < min_atime {
> +                min_atime = oldest_writer;

and if we log the above things, shouldn't we also log this?

Maybe something like:

GC atime cutoff XX minutes / <human readable min_atime timestamp>
[Older backup writer started at <human readable timestamp> detected, extending cutoff to <timestamp>]

>              }
>  
> +            min_atime -= 300; // add 5 mins gap for safety
> +
>              info!("Start GC phase1 (mark used chunks)");
>  
>              self.mark_used_chunks(&mut gc_status, worker)?;
> @@ -1187,7 +1213,7 @@ impl DataStore {
>              info!("Start GC phase2 (sweep unused chunks)");
>              self.inner.chunk_store.sweep_unused_chunks(
>                  oldest_writer,
> -                phase1_start_time,
> +                min_atime,
>                  &mut gc_status,
>                  worker,
>              )?;
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored
  2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
@ 2025-03-06 11:02   ` Fabian Grünbichler
  2025-03-06 11:30     ` Christian Ebner
  0 siblings, 1 reply; 16+ messages in thread
From: Fabian Grünbichler @ 2025-03-06 11:02 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On March 5, 2025 4:14 pm, Christian Ebner wrote:
> Check if the filesystem backing the chunk store actually updates the
> atime to avoid potential data loss in phase 2 of garbage collection,
> in case the atime update is not honored.
> 
> Perform the check before phase 1 of garbage collection, as well as
> on datastore creation. The latter to early detect and disallow
> datastore creation on filesystem configurations which otherwise most
> likely would lead to data losses.
> 
> Enable the atime update check by default, but allow to opt-out by
> setting a datastore tuning parameter flag for backwards compatibility.
> This is honored by both, garbage collection and datastore creation.
> 
> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
> as test marker, inserted if not present. This all zero-chunk is very
> likely anyways for unencrypted backup contents with large all-zero
> regions using fixed size chunking (e.g. VMs).
> 
> To avoid cases were the timestamp will not be updated because of the
> Linux kernels timestamp granularity, sleep in-between stating and
> utimensat for 1 second.
> 
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 3:
> - Drop check for relatime like behaviour as all tested filesystem (see
>   coverletter) honor the direct atime update.
> - Reduce risk of races by moving the sleep to be before the first
>   metadata call.
> - Log also if the test chunk was newly inserted or pre-existed.
> 
>  pbs-datastore/src/chunk_store.rs | 87 ++++++++++++++++++++++++++++++--
>  pbs-datastore/src/datastore.rs   |  9 ++++
>  src/api2/config/datastore.rs     |  1 +
>  3 files changed, 92 insertions(+), 5 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 5e02909a1..e482330b3 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -1,6 +1,8 @@
> +use std::os::unix::fs::MetadataExt;
>  use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
> +use std::time::Duration;
>  
>  use anyhow::{bail, format_err, Error};
>  use tracing::info;
> @@ -13,6 +15,7 @@ use proxmox_sys::process_locker::{
>  };
>  use proxmox_worker_task::WorkerTaskContext;
>  
> +use crate::data_blob::DataChunkBuilder;
>  use crate::file_formats::{
>      COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>  };
> @@ -93,6 +96,7 @@ impl ChunkStore {
>          uid: nix::unistd::Uid,
>          gid: nix::unistd::Gid,
>          sync_level: DatastoreFSyncLevel,
> +        atime_safety_check: bool,
>      ) -> Result<Self, Error>
>      where
>          P: Into<PathBuf>,
> @@ -147,7 +151,14 @@ impl ChunkStore {
>              }
>          }
>  
> -        Self::open(name, base, sync_level)
> +        let chunk_store = Self::open(name, base, sync_level)?;
> +        if atime_safety_check {
> +            chunk_store.check_fs_atime_updates(true)?;
> +        } else {
> +            info!("atime safety check skipped.");
> +        }
> +
> +        Ok(chunk_store)
>      }
>  
>      fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
> @@ -442,6 +453,59 @@ impl ChunkStore {
>          Ok(())
>      }
>  
> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
> +    ///
> +    /// Checks if the atime is always updated by utimensat taking into consideration the Linux
> +    /// kernel timestamp granularity.
> +    /// If `retry_on_file_changed` is set to true, the check is performed again on the changed file
> +    /// if a file change while testing is detected by differences in bith time or inode number.
> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
> +    /// the chunk store if not yet present.
> +    /// Returns with error if the check could not be performed.
> +    pub fn check_fs_atime_updates(&self, retry_on_file_changed: bool) -> Result<(), Error> {
> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
> +        let (pre_existing, _) = self.insert_chunk(&zero_chunk, &digest)?;

this one logs the path if something goes wrong, which is nice in case
the problem is related to permissions/..

> +        let (path, _digest) = self.chunk_path(&digest);
> +
> +        // Take into account timestamp update granularity in the kernel
> +        std::thread::sleep(Duration::from_secs(1));
> +
> +        let metadata_before = std::fs::metadata(&path).map_err(Error::from)?;

but this one probably wouldn't, and would benefit from it?

> +
> +        // Second atime update if chunk was newly inserted above

nit: comment is inverted - it's the second atime *update* if the chunk
already existed. it's the first *update* if we inserted it ;)

> +        self.cond_touch_path(&path, true)?;

this one includes the path in errors :)

> +
> +        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;

this one might benefit from some context as well?

> +
> +        // Check for the unlikely case that the file changed in-between the
> +        // two metadata calls, try to check once again on changed file
> +        if metadata_before.ino() != metadata_now.ino() {
> +            if retry_on_file_changed {
> +                return self.check_fs_atime_updates(false);
> +            }
> +            bail!("chunk file changed twice during atime testing, cannot proceed.");

adding the path or digest here might help as well

> +        }
> +
> +        match (
> +            metadata_before.accessed()? < metadata_now.accessed()?,

error context as well ;)

> +            pre_existing,
> +        ) {
> +            (true, false) => info!("Access time safety check successful (inserted chunk for test)."),
> +            (true, true) => {
> +                info!("Access time safety check successful (used pre-existing chunk for test).")
> +            }
> +            (false, false) => bail!(
> +                "Access time safety check failed (inserted chunk for test), is atime support \
> +                enabled on datastore backing filesystem?"
> +            ),
> +            (false, true) => bail!(
> +                "Access time safety check failed (used pre-existing chunk for test), is atime \
> +                support enabled on datastore backing filesystem?"
> +            ),
> +        }

this is a bit excessive while at the same time not including the
interesting bits..

how about the following:

if metadata_before.accessed()? >= metadata_now.accessed()? {
    let chunk_info_str = if pre_existing { "pre-existing" } else { "newly inserted" };
    log::warn!("Chunk metadata was not correctly updated during atime test:");
    info!("Metadata before update: {metadata_before:?}");
    info!("Metadata after update: {metadata_now:?}");
    bail!("atime safety check using {chunk_info_str} chunk failed, aborting GC!");
}

info!("atime safety check successful, proceeding with GC.");


would look like this in the task log:

Chunk metadata was not correctly updated during atime test:
Metadata before update: Metadata { file_type: FileType { is_file: true, is_dir: false, is_symlink: false, .. }, permissions: Permissions(FilePermissions { mode: 0o100500 (-r-x------) }), len: 158, modified: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, accessed: SystemTime { tv_sec: 1741256689, tv_nsec: 18294604 }, created: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, .. }
Metadata after update: Metadata { file_type: FileType { is_file: true, is_dir: false, is_symlink: false, .. }, permissions: Permissions(FilePermissions { mode: 0o100500 (-r-x------) }), len: 158, modified: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, accessed: SystemTime { tv_sec: 1741256689, tv_nsec: 18294604 }, created: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, .. }
TASK ERROR: atime safety check using pre-existing chunk failed, aborting GC!

alternatively we could of course do our own pretty printing, or add `#`
to the format string to get:

Metadata before update:
Metadata {
    file_type: FileType {
        is_file: true,
        is_dir: false,
        is_symlink: false,
        ..
    },
    permissions: Permissions(
        FilePermissions {
            mode: 0o100500 (-r-x------),
        },
    ),
    len: 158,
    modified: SystemTime {
        tv_sec: 1673516596,
        tv_nsec: 65483144,
    },
    accessed: SystemTime {
        tv_sec: 1741256877,
        tv_nsec: 225020600,
    },
    created: SystemTime {
        tv_sec: 1673516596,
        tv_nsec: 65483144,
    },
    ..
}

I think it is important to get the logging right here because if
somebody runs into a failing check, they should report the relevant data
to us without the need to jump through hoops.

> +        Ok(())
> +    }
> +
>      pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());
> @@ -628,8 +692,15 @@ fn test_chunk_store1() {
>      let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>          .unwrap()
>          .unwrap();
> -    let chunk_store =
> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        DatastoreFSyncLevel::None,
> +        true,
> +    )
> +    .unwrap();
>  
>      let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>          .build()
> @@ -641,8 +712,14 @@ fn test_chunk_store1() {
>      let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>      assert!(exists);
>  
> -    let chunk_store =
> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        DatastoreFSyncLevel::None,
> +        true,
> +    );
>      assert!(chunk_store.is_err());
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 75c0c16ab..b62ddc172 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1170,6 +1170,15 @@ impl DataStore {
>                  upid: Some(upid.to_string()),
>                  ..Default::default()
>              };
> +            let tuning: DatastoreTuning = serde_json::from_value(
> +                DatastoreTuning::API_SCHEMA
> +                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
> +            )?;
> +            if tuning.gc_atime_safety_check.unwrap_or(true) {
> +                self.inner.chunk_store.check_fs_atime_updates(true)?;
> +            } else {
> +                info!("Filesystem atime safety check disabled by datastore tuning options.");
> +            }
>  
>              info!("Start GC phase1 (mark used chunks)");
>  
> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index fe3260f6d..35847fc45 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
>                  backup_user.uid,
>                  backup_user.gid,
>                  tuning.sync_level.unwrap_or_default(),
> +                tuning.gc_atime_safety_check.unwrap_or(true),
>              )
>              .map(|_| ())
>          } else {
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup 6/8] datastore: conditionally use custom GC atime cutoff if set
  2025-03-06 11:00   ` Fabian Grünbichler
@ 2025-03-06 11:19     ` Christian Ebner
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-06 11:19 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/6/25 12:00, Fabian Grünbichler wrote:
> On March 5, 2025 4:14 pm, Christian Ebner wrote:
>> Use the user configured atime cutoff over the default 24h 5m
>> margin if explicitly set, but only if the atime safety check is
>> enabled and succeeded. If the atime safety check is not enabled,
>> fallback to use the current default.
>>
>> Move the minimum atime calculation based on the atime cutoff to the
>> sweep_unused_chunks() callside and pass in the calculated values, as
>> to have the logic in the same place.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 3:
>> - move min_atime calculation to the gc atime cutoff logic, pass the
>>    min_atime to sweep_unused_chunks()
>>
>>   pbs-datastore/src/chunk_store.rs | 10 +---------
>>   pbs-datastore/src/datastore.rs   | 30 ++++++++++++++++++++++++++++--
>>   2 files changed, 29 insertions(+), 11 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index e482330b3..0b211a44c 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -364,7 +364,7 @@ impl ChunkStore {
>>       pub fn sweep_unused_chunks(
>>           &self,
>>           oldest_writer: i64,
>> -        phase1_start_time: i64,
>> +        min_atime: i64,
>>           status: &mut GarbageCollectionStatus,
>>           worker: &dyn WorkerTaskContext,
>>       ) -> Result<(), Error> {
>> @@ -374,14 +374,6 @@ impl ChunkStore {
>>           use nix::sys::stat::fstatat;
>>           use nix::unistd::{unlinkat, UnlinkatFlags};
>>   
>> -        let mut min_atime = phase1_start_time - 3600 * 24; // at least 24h (see mount option relatime)
>> -
>> -        if oldest_writer < min_atime {
>> -            min_atime = oldest_writer;
>> -        }
>> -
>> -        min_atime -= 300; // add 5 mins gap for safety
>> -
>>           let mut last_percentage = 0;
>>           let mut chunk_count = 0;
>>   
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index b62ddc172..16767832e 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1174,12 +1174,38 @@ impl DataStore {
>>                   DatastoreTuning::API_SCHEMA
>>                       .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
>>               )?;
>> -            if tuning.gc_atime_safety_check.unwrap_or(true) {
>> +            let gc_atime_safety_check = tuning.gc_atime_safety_check.unwrap_or(true);
>> +            if gc_atime_safety_check {
>>                   self.inner.chunk_store.check_fs_atime_updates(true)?;
>>               } else {
>>                   info!("Filesystem atime safety check disabled by datastore tuning options.");
>> +            };
>> +
>> +            let mut min_atime = tuning
>> +                .gc_atime_cutoff
>> +                .and_then(|cutoff| {
>> +                    if gc_atime_safety_check {
>> +                        info!("Using GC atime cutoff {cutoff}m.");
>> +                        // Account for the 5 min default offset subtracted below
>> +                        Some(phase1_start_time + 300 - cutoff as i64 * 60)
>> +                    } else {
>> +                        warn!(
>> +                            "Ignoring GC atime cutoff of {cutoff}m since atime check is disabled."
> 
> should these be tied together like this? the other way round I can see
> at some point in the future.. (lowering the implicit default if the
> check is enabled), but why should a non-default cutoff be ignored
> because the check is also explicitly disabled?

Yes, as stated in the commit message my intention here is to fallback to 
the current default if the check is explicitly disabled. So some form of 
legacy mode if you will.
But now that you mention this, the other way around sounds more like 
what is needed, agreed. So let's not make them depend on each other.

> 
>> +                        );
>> +                        None
>> +                    }
>> +                })
>> +                .unwrap_or_else(|| {
>> +                    info!("Using default GC atime cutoff of 24h 5m");
>> +                    phase1_start_time - 3600 * 24
>> +                });
> 
> the logging here is a bit confusing.. does it really matter whether the
> cutoff is default or explicit?

Does not really mattern, no. It covered both cases above, default and 
reset because the check flag not set. But since they will be no longer 
intedependent, simply logging the used cutoff as such should be fine.

> 
>> +
>> +            if oldest_writer < min_atime {
>> +                min_atime = oldest_writer;
> 
> and if we log the above things, shouldn't we also log this?
> 
> Maybe something like:
> 
> GC atime cutoff XX minutes / <human readable min_atime timestamp>
> [Older backup writer started at <human readable timestamp> detected, extending cutoff to <timestamp>]

Acked, will add that as well.

> 
>>               }
>>   
>> +            min_atime -= 300; // add 5 mins gap for safety
>> +
>>               info!("Start GC phase1 (mark used chunks)");
>>   
>>               self.mark_used_chunks(&mut gc_status, worker)?;
>> @@ -1187,7 +1213,7 @@ impl DataStore {
>>               info!("Start GC phase2 (sweep unused chunks)");
>>               self.inner.chunk_store.sweep_unused_chunks(
>>                   oldest_writer,
>> -                phase1_start_time,
>> +                min_atime,
>>                   &mut gc_status,
>>                   worker,
>>               )?;
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored
  2025-03-06 11:02   ` Fabian Grünbichler
@ 2025-03-06 11:30     ` Christian Ebner
  2025-03-06 12:02       ` Fabian Grünbichler
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Ebner @ 2025-03-06 11:30 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/6/25 12:02, Fabian Grünbichler wrote:
> On March 5, 2025 4:14 pm, Christian Ebner wrote:
>> Check if the filesystem backing the chunk store actually updates the
>> atime to avoid potential data loss in phase 2 of garbage collection,
>> in case the atime update is not honored.
>>
>> Perform the check before phase 1 of garbage collection, as well as
>> on datastore creation. The latter to early detect and disallow
>> datastore creation on filesystem configurations which otherwise most
>> likely would lead to data losses.
>>
>> Enable the atime update check by default, but allow to opt-out by
>> setting a datastore tuning parameter flag for backwards compatibility.
>> This is honored by both, garbage collection and datastore creation.
>>
>> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
>> as test marker, inserted if not present. This all zero-chunk is very
>> likely anyways for unencrypted backup contents with large all-zero
>> regions using fixed size chunking (e.g. VMs).
>>
>> To avoid cases were the timestamp will not be updated because of the
>> Linux kernels timestamp granularity, sleep in-between stating and
>> utimensat for 1 second.
>>
>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 3:
>> - Drop check for relatime like behaviour as all tested filesystem (see
>>    coverletter) honor the direct atime update.
>> - Reduce risk of races by moving the sleep to be before the first
>>    metadata call.
>> - Log also if the test chunk was newly inserted or pre-existed.
>>
>>   pbs-datastore/src/chunk_store.rs | 87 ++++++++++++++++++++++++++++++--
>>   pbs-datastore/src/datastore.rs   |  9 ++++
>>   src/api2/config/datastore.rs     |  1 +
>>   3 files changed, 92 insertions(+), 5 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 5e02909a1..e482330b3 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -1,6 +1,8 @@
>> +use std::os::unix::fs::MetadataExt;
>>   use std::os::unix::io::AsRawFd;
>>   use std::path::{Path, PathBuf};
>>   use std::sync::{Arc, Mutex};
>> +use std::time::Duration;
>>   
>>   use anyhow::{bail, format_err, Error};
>>   use tracing::info;
>> @@ -13,6 +15,7 @@ use proxmox_sys::process_locker::{
>>   };
>>   use proxmox_worker_task::WorkerTaskContext;
>>   
>> +use crate::data_blob::DataChunkBuilder;
>>   use crate::file_formats::{
>>       COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>>   };
>> @@ -93,6 +96,7 @@ impl ChunkStore {
>>           uid: nix::unistd::Uid,
>>           gid: nix::unistd::Gid,
>>           sync_level: DatastoreFSyncLevel,
>> +        atime_safety_check: bool,
>>       ) -> Result<Self, Error>
>>       where
>>           P: Into<PathBuf>,
>> @@ -147,7 +151,14 @@ impl ChunkStore {
>>               }
>>           }
>>   
>> -        Self::open(name, base, sync_level)
>> +        let chunk_store = Self::open(name, base, sync_level)?;
>> +        if atime_safety_check {
>> +            chunk_store.check_fs_atime_updates(true)?;
>> +        } else {
>> +            info!("atime safety check skipped.");
>> +        }
>> +
>> +        Ok(chunk_store)
>>       }
>>   
>>       fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
>> @@ -442,6 +453,59 @@ impl ChunkStore {
>>           Ok(())
>>       }
>>   
>> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
>> +    ///
>> +    /// Checks if the atime is always updated by utimensat taking into consideration the Linux
>> +    /// kernel timestamp granularity.
>> +    /// If `retry_on_file_changed` is set to true, the check is performed again on the changed file
>> +    /// if a file change while testing is detected by differences in bith time or inode number.
>> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
>> +    /// the chunk store if not yet present.
>> +    /// Returns with error if the check could not be performed.
>> +    pub fn check_fs_atime_updates(&self, retry_on_file_changed: bool) -> Result<(), Error> {
>> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
>> +        let (pre_existing, _) = self.insert_chunk(&zero_chunk, &digest)?;
> 
> this one logs the path if something goes wrong, which is nice in case
> the problem is related to permissions/..
> 
>> +        let (path, _digest) = self.chunk_path(&digest);
>> +
>> +        // Take into account timestamp update granularity in the kernel
>> +        std::thread::sleep(Duration::from_secs(1));
>> +
>> +        let metadata_before = std::fs::metadata(&path).map_err(Error::from)?;
> 
> but this one probably wouldn't, and would benefit from it?
> 
>> +
>> +        // Second atime update if chunk was newly inserted above
> 
> nit: comment is inverted - it's the second atime *update* if the chunk
> already existed. it's the first *update* if we inserted it ;)
> 
>> +        self.cond_touch_path(&path, true)?;
> 
> this one includes the path in errors :)
> 
>> +
>> +        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;
> 
> this one might benefit from some context as well?
> 
>> +
>> +        // Check for the unlikely case that the file changed in-between the
>> +        // two metadata calls, try to check once again on changed file
>> +        if metadata_before.ino() != metadata_now.ino() {
>> +            if retry_on_file_changed {
>> +                return self.check_fs_atime_updates(false);
>> +            }
>> +            bail!("chunk file changed twice during atime testing, cannot proceed.");
> 
> adding the path or digest here might help as well
> 
>> +        }
>> +
>> +        match (
>> +            metadata_before.accessed()? < metadata_now.accessed()?,
> 
> error context as well ;)
> 
>> +            pre_existing,
>> +        ) {
>> +            (true, false) => info!("Access time safety check successful (inserted chunk for test)."),
>> +            (true, true) => {
>> +                info!("Access time safety check successful (used pre-existing chunk for test).")
>> +            }
>> +            (false, false) => bail!(
>> +                "Access time safety check failed (inserted chunk for test), is atime support \
>> +                enabled on datastore backing filesystem?"
>> +            ),
>> +            (false, true) => bail!(
>> +                "Access time safety check failed (used pre-existing chunk for test), is atime \
>> +                support enabled on datastore backing filesystem?"
>> +            ),
>> +        }
> 
> this is a bit excessive while at the same time not including the
> interesting bits..
> 
> how about the following:
> 
> if metadata_before.accessed()? >= metadata_now.accessed()? {
>      let chunk_info_str = if pre_existing { "pre-existing" } else { "newly inserted" };
>      log::warn!("Chunk metadata was not correctly updated during atime test:");
>      info!("Metadata before update: {metadata_before:?}");
>      info!("Metadata after update: {metadata_now:?}");
>      bail!("atime safety check using {chunk_info_str} chunk failed, aborting GC!");
> }
> 
> info!("atime safety check successful, proceeding with GC.");
> 
> 
> would look like this in the task log:
> 
> Chunk metadata was not correctly updated during atime test:
> Metadata before update: Metadata { file_type: FileType { is_file: true, is_dir: false, is_symlink: false, .. }, permissions: Permissions(FilePermissions { mode: 0o100500 (-r-x------) }), len: 158, modified: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, accessed: SystemTime { tv_sec: 1741256689, tv_nsec: 18294604 }, created: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, .. }
> Metadata after update: Metadata { file_type: FileType { is_file: true, is_dir: false, is_symlink: false, .. }, permissions: Permissions(FilePermissions { mode: 0o100500 (-r-x------) }), len: 158, modified: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, accessed: SystemTime { tv_sec: 1741256689, tv_nsec: 18294604 }, created: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, .. }
> TASK ERROR: atime safety check using pre-existing chunk failed, aborting GC!
> 
> alternatively we could of course do our own pretty printing, or add `#`
> to the format string to get:
> 
> Metadata before update:
> Metadata {
>      file_type: FileType {
>          is_file: true,
>          is_dir: false,
>          is_symlink: false,
>          ..
>      },
>      permissions: Permissions(
>          FilePermissions {
>              mode: 0o100500 (-r-x------),
>          },
>      ),
>      len: 158,
>      modified: SystemTime {
>          tv_sec: 1673516596,
>          tv_nsec: 65483144,
>      },
>      accessed: SystemTime {
>          tv_sec: 1741256877,
>          tv_nsec: 225020600,
>      },
>      created: SystemTime {
>          tv_sec: 1673516596,
>          tv_nsec: 65483144,
>      },
>      ..
> }
> 
> I think it is important to get the logging right here because if
> somebody runs into a failing check, they should report the relevant data
> to us without the need to jump through hoops.

Agreed on this and above comments regarding error context, would however 
opt for a custom output, e.g. something along the line of what is used 
to output pxar entries via `format_single_line_entry`...

Do we really need information like file type, size and permissions here? 
That should already be covered by the chunk insert above?

> 
>> +        Ok(())
>> +    }
>> +
>>       pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>> @@ -628,8 +692,15 @@ fn test_chunk_store1() {
>>       let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>>           .unwrap()
>>           .unwrap();
>> -    let chunk_store =
>> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
>> +    let chunk_store = ChunkStore::create(
>> +        "test",
>> +        &path,
>> +        user.uid,
>> +        user.gid,
>> +        DatastoreFSyncLevel::None,
>> +        true,
>> +    )
>> +    .unwrap();
>>   
>>       let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>>           .build()
>> @@ -641,8 +712,14 @@ fn test_chunk_store1() {
>>       let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>>       assert!(exists);
>>   
>> -    let chunk_store =
>> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
>> +    let chunk_store = ChunkStore::create(
>> +        "test",
>> +        &path,
>> +        user.uid,
>> +        user.gid,
>> +        DatastoreFSyncLevel::None,
>> +        true,
>> +    );
>>       assert!(chunk_store.is_err());
>>   
>>       if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 75c0c16ab..b62ddc172 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1170,6 +1170,15 @@ impl DataStore {
>>                   upid: Some(upid.to_string()),
>>                   ..Default::default()
>>               };
>> +            let tuning: DatastoreTuning = serde_json::from_value(
>> +                DatastoreTuning::API_SCHEMA
>> +                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
>> +            )?;
>> +            if tuning.gc_atime_safety_check.unwrap_or(true) {
>> +                self.inner.chunk_store.check_fs_atime_updates(true)?;
>> +            } else {
>> +                info!("Filesystem atime safety check disabled by datastore tuning options.");
>> +            }
>>   
>>               info!("Start GC phase1 (mark used chunks)");
>>   
>> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
>> index fe3260f6d..35847fc45 100644
>> --- a/src/api2/config/datastore.rs
>> +++ b/src/api2/config/datastore.rs
>> @@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
>>                   backup_user.uid,
>>                   backup_user.gid,
>>                   tuning.sync_level.unwrap_or_default(),
>> +                tuning.gc_atime_safety_check.unwrap_or(true),
>>               )
>>               .map(|_| ())
>>           } else {
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored
  2025-03-06 11:30     ` Christian Ebner
@ 2025-03-06 12:02       ` Fabian Grünbichler
  0 siblings, 0 replies; 16+ messages in thread
From: Fabian Grünbichler @ 2025-03-06 12:02 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

On March 6, 2025 12:30 pm, Christian Ebner wrote:
> On 3/6/25 12:02, Fabian Grünbichler wrote:
>> On March 5, 2025 4:14 pm, Christian Ebner wrote:
>>> +            pre_existing,
>>> +        ) {
>>> +            (true, false) => info!("Access time safety check successful (inserted chunk for test)."),
>>> +            (true, true) => {
>>> +                info!("Access time safety check successful (used pre-existing chunk for test).")
>>> +            }
>>> +            (false, false) => bail!(
>>> +                "Access time safety check failed (inserted chunk for test), is atime support \
>>> +                enabled on datastore backing filesystem?"
>>> +            ),
>>> +            (false, true) => bail!(
>>> +                "Access time safety check failed (used pre-existing chunk for test), is atime \
>>> +                support enabled on datastore backing filesystem?"
>>> +            ),
>>> +        }
>> 
>> this is a bit excessive while at the same time not including the
>> interesting bits..
>> 
>> how about the following:
>> 
>> if metadata_before.accessed()? >= metadata_now.accessed()? {
>>      let chunk_info_str = if pre_existing { "pre-existing" } else { "newly inserted" };
>>      log::warn!("Chunk metadata was not correctly updated during atime test:");
>>      info!("Metadata before update: {metadata_before:?}");
>>      info!("Metadata after update: {metadata_now:?}");
>>      bail!("atime safety check using {chunk_info_str} chunk failed, aborting GC!");
>> }
>> 
>> info!("atime safety check successful, proceeding with GC.");
>> 
>> 
>> would look like this in the task log:
>> 
>> Chunk metadata was not correctly updated during atime test:
>> Metadata before update: Metadata { file_type: FileType { is_file: true, is_dir: false, is_symlink: false, .. }, permissions: Permissions(FilePermissions { mode: 0o100500 (-r-x------) }), len: 158, modified: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, accessed: SystemTime { tv_sec: 1741256689, tv_nsec: 18294604 }, created: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, .. }
>> Metadata after update: Metadata { file_type: FileType { is_file: true, is_dir: false, is_symlink: false, .. }, permissions: Permissions(FilePermissions { mode: 0o100500 (-r-x------) }), len: 158, modified: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, accessed: SystemTime { tv_sec: 1741256689, tv_nsec: 18294604 }, created: SystemTime { tv_sec: 1673516596, tv_nsec: 65483144 }, .. }
>> TASK ERROR: atime safety check using pre-existing chunk failed, aborting GC!
>> 
>> alternatively we could of course do our own pretty printing, or add `#`
>> to the format string to get:
>> 
>> Metadata before update:
>> Metadata {
>>      file_type: FileType {
>>          is_file: true,
>>          is_dir: false,
>>          is_symlink: false,
>>          ..
>>      },
>>      permissions: Permissions(
>>          FilePermissions {
>>              mode: 0o100500 (-r-x------),
>>          },
>>      ),
>>      len: 158,
>>      modified: SystemTime {
>>          tv_sec: 1673516596,
>>          tv_nsec: 65483144,
>>      },
>>      accessed: SystemTime {
>>          tv_sec: 1741256877,
>>          tv_nsec: 225020600,
>>      },
>>      created: SystemTime {
>>          tv_sec: 1673516596,
>>          tv_nsec: 65483144,
>>      },
>>      ..
>> }
>> 
>> I think it is important to get the logging right here because if
>> somebody runs into a failing check, they should report the relevant data
>> to us without the need to jump through hoops.
> 
> Agreed on this and above comments regarding error context, would however 
> opt for a custom output, e.g. something along the line of what is used 
> to output pxar entries via `format_single_line_entry`...
> 
> Do we really need information like file type, size and permissions here? 
> That should already be covered by the chunk insert above?

that's a bit hard to tell a priori ;)

file type is probably not *that* relevant since we assume it is a
regular file, the permissions might very well be relevant, but probably
the actual attributes (like append-only or immutable) would be even more
interesting, in case a filesystem is broken and ignores the update
instead of failing in such cases.

so yeah, maybe just pretty printing the timestamps already is enough
here, it might need to be followed-up with lsattr and questions about
their storage setup in many cases anyway.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored
  2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
                   ` (8 preceding siblings ...)
  2025-03-05 15:23 ` [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
@ 2025-03-06 14:54 ` Christian Ebner
  9 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2025-03-06 14:54 UTC (permalink / raw)
  To: pbs-devel

Superseded-by version 5:
https://lore.proxmox.com/pbs-devel/20250306145252.565270-1-c.ebner@proxmox.com/T/


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-06 14:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-05 15:14 [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 1/8] pbs api types: add garbage collection atime safety check flag Christian Ebner
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 2/8] pbs api types: add option to set GC chunk cleanup atime cutoff Christian Ebner
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 3/8] fix #5982: garbage collection: check atime updates are honored Christian Ebner
2025-03-06 11:02   ` Fabian Grünbichler
2025-03-06 11:30     ` Christian Ebner
2025-03-06 12:02       ` Fabian Grünbichler
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 4/8] ui: expose GC atime safety check flag in datastore tuning options Christian Ebner
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 5/8] docs: mention GC atime update check for " Christian Ebner
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 6/8] datastore: conditionally use custom GC atime cutoff if set Christian Ebner
2025-03-06 11:00   ` Fabian Grünbichler
2025-03-06 11:19     ` Christian Ebner
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 7/8] ui: expose GC atime cutoff in datastore tuning option Christian Ebner
2025-03-05 15:14 ` [pbs-devel] [PATCH v4 proxmox-backup 8/8] docs: mention gc-atime-cutoff as " Christian Ebner
2025-03-05 15:23 ` [pbs-devel] [PATCH v4 proxmox-backup 0/8] fix #5982: check atime update is honored Christian Ebner
2025-03-06 14:54 ` Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal