[pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored

public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored
@ 2025-02-19 16:48 Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag Christian Ebner
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

These patches add a check to phase 1 of garbage collection in order
to detect when the filesystem backing the chunk store does not honor
atime updates. This avoids possible data loss for situations where
garbage collection could otherwise delete chunks still referenced by
a backup snaphost's index file.

The check is performed by inserting a fixed size 4 MiB unencrypted and
compressed chunk of all-zeros. The check is performed by setting the
atime to 1 second in the past followed by setting it to now in order to
test if atime updates are honored immediately. The test is performed
on datastore creation for early detection and before garbage collection
phase 1. The test is enabled by default, but an opt-out option can be
set via the datastore tuning parameters for backwards compatibility.

Further, add a datastore tuning parameter to reduce the wait period for
chunk removal in phase 2 of garbage collection. Make this conditional on
the atime update check being enabled and successful, to avoid possible
data loss.

Most notable changes sice version 1 (thanks Fabian and Thomas for
comments and suggestions):
- Optimize check by using the all zero chunk
- Enable the check by default and fail GC job if not honored, but allow
  to opt-out
- Add GC wait period tuning option

Link to the issue in the bugtracker:
https://bugzilla.proxmox.com/show_bug.cgi?id=5982

proxmox:

Christian Ebner (2):
  pbs api types: add check garbage collection atime updates flag
  pbs api types: add option to set GC chunk cleanup wait period

 pbs-api-types/src/datastore.rs | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

proxmox-backup:

Christian Ebner (7):
  datastore: move utimensat() related constants to module scope
  fix #5982: garbage collection: check atime updates are honored
  ui: expose atime update check flag in datastore tuning options
  docs: mention GC atime update check for tuning options
  datastore: conditionally use custom GC wait period if set
  ui: expose GC wait period in datastore tuning option
  docs: mention gc-wait-period option as datastore tuning parameter

 docs/storage.rst                 | 19 ++++++-
 pbs-datastore/src/chunk_store.rs | 96 +++++++++++++++++++++++++++++---
 pbs-datastore/src/datastore.rs   | 23 ++++++++
 src/api2/config/datastore.rs     |  1 +
 www/Utils.js                     |  9 +++
 www/datastore/OptionView.js      | 17 ++++++
 6 files changed, 153 insertions(+), 12 deletions(-)

-- 
2.39.5

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-03-03 12:58   ` Fabian Grünbichler
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period Christian Ebner
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Add the `gc-atime-check` flag to the datastore tuning parameters.
This flag allows to enable/disable a check to detect if the atime
update is honored by the filesystem backing the chunk store during
phase 1 of garbage collection and during datastore creation.

The default is to perform the check.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- Rename from check-gc-atime to gc-atime-check, so that the option is
  prefixed with gc for better association

 pbs-api-types/src/datastore.rs | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index ddd8d3c6..a7a9db7d 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -229,6 +229,12 @@ pub enum DatastoreFSyncLevel {
             type: ChunkOrder,
             optional: true,
         },
+        "gc-atime-check": {
+            description: "Check filesystem atime update during store creation and garbage collection",
+            optional: true,
+            default: true,
+            type: bool,
+        },
     },
 )]
 #[derive(Serialize, Deserialize, Default)]
@@ -240,6 +246,8 @@ pub struct DatastoreTuning {
     pub chunk_order: Option<ChunkOrder>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub sync_level: Option<DatastoreFSyncLevel>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub gc_atime_check: Option<bool>,
 }
 
 pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag Christian Ebner
@ 2025-03-03 12:58   ` Fabian Grünbichler
  2025-03-03 13:20     ` Christian Ebner
  0 siblings, 1 reply; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-03 12:58 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On February 19, 2025 5:48 pm, Christian Ebner wrote:
> Add the `gc-atime-check` flag to the datastore tuning parameters.
> This flag allows to enable/disable a check to detect if the atime
> update is honored by the filesystem backing the chunk store during
> phase 1 of garbage collection and during datastore creation.
> 
> The default is to perform the check.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - Rename from check-gc-atime to gc-atime-check, so that the option is
>   prefixed with gc for better association
> 
>  pbs-api-types/src/datastore.rs | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
> index ddd8d3c6..a7a9db7d 100644
> --- a/pbs-api-types/src/datastore.rs
> +++ b/pbs-api-types/src/datastore.rs
> @@ -229,6 +229,12 @@ pub enum DatastoreFSyncLevel {
>              type: ChunkOrder,
>              optional: true,
>          },
> +        "gc-atime-check": {
> +            description: "Check filesystem atime update during store creation and garbage collection",

this should probably spell out that the check is for atime updates
*working as expected*, else this could also be interpreted to mean
"check atime during GC", which we always do ;)

maybe also encode this in the setting name?

> +            optional: true,
> +            default: true,
> +            type: bool,
> +        },
>      },
>  )]
>  #[derive(Serialize, Deserialize, Default)]
> @@ -240,6 +246,8 @@ pub struct DatastoreTuning {
>      pub chunk_order: Option<ChunkOrder>,
>      #[serde(skip_serializing_if = "Option::is_none")]
>      pub sync_level: Option<DatastoreFSyncLevel>,
> +    #[serde(skip_serializing_if = "Option::is_none")]
> +    pub gc_atime_check: Option<bool>,
>  }
>  
>  pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag
  2025-03-03 12:58   ` Fabian Grünbichler
@ 2025-03-03 13:20     ` Christian Ebner
  0 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-03-03 13:20 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/3/25 13:58, Fabian Grünbichler wrote:
> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>> Add the `gc-atime-check` flag to the datastore tuning parameters.
>> This flag allows to enable/disable a check to detect if the atime
>> update is honored by the filesystem backing the chunk store during
>> phase 1 of garbage collection and during datastore creation.
>>
>> The default is to perform the check.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - Rename from check-gc-atime to gc-atime-check, so that the option is
>>    prefixed with gc for better association
>>
>>   pbs-api-types/src/datastore.rs | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
>> index ddd8d3c6..a7a9db7d 100644
>> --- a/pbs-api-types/src/datastore.rs
>> +++ b/pbs-api-types/src/datastore.rs
>> @@ -229,6 +229,12 @@ pub enum DatastoreFSyncLevel {
>>               type: ChunkOrder,
>>               optional: true,
>>           },
>> +        "gc-atime-check": {
>> +            description: "Check filesystem atime update during store creation and garbage collection",
> 
> this should probably spell out that the check is for atime updates
> *working as expected*, else this could also be interpreted to mean
> "check atime during GC", which we always do ;)
> 
> maybe also encode this in the setting name?

Agreed, will extend the description accordingly and come up with a 
better option name.

> 
>> +            optional: true,
>> +            default: true,
>> +            type: bool,
>> +        },
>>       },
>>   )]
>>   #[derive(Serialize, Deserialize, Default)]
>> @@ -240,6 +246,8 @@ pub struct DatastoreTuning {
>>       pub chunk_order: Option<ChunkOrder>,
>>       #[serde(skip_serializing_if = "Option::is_none")]
>>       pub sync_level: Option<DatastoreFSyncLevel>,
>> +    #[serde(skip_serializing_if = "Option::is_none")]
>> +    pub gc_atime_check: Option<bool>,
>>   }
>>   
>>   pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-03-03 12:58   ` Fabian Grünbichler
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 3/9] datastore: move utimensat() related constants to module scope Christian Ebner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Add the `gc-wait-period` option to the datastore tuning parameters.
This allows to specify the time after which the chunks are not
considered in use anymore if their atime has not been updated since
then. This option is only considered, if the `gc-atime-check` is
enabled, to avoid potential data loss.

The default is to keep chunks within the 24h 5m timespan (given no
active writers). The 5m minimum was chosen to be in line with the
already used safety offset for garbage collection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 pbs-api-types/src/datastore.rs | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index a7a9db7d..7542e969 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -223,6 +223,13 @@ pub enum DatastoreFSyncLevel {
     Filesystem,
 }
 
+pub const GC_WAIT_PERIOD_SCHEMA: Schema =
+    IntegerSchema::new("Wait period (in minutes) for garbage collection phase 2 chunk cleanup (default 24h 5m)")
+        .minimum(5)
+        .maximum(1445)
+        .default(1445)
+        .schema();
+
 #[api(
     properties: {
         "chunk-order": {
@@ -235,6 +242,10 @@ pub enum DatastoreFSyncLevel {
             default: true,
             type: bool,
         },
+        "gc-wait-period": {
+            schema: GC_WAIT_PERIOD_SCHEMA,
+            optional: true,
+        },
     },
 )]
 #[derive(Serialize, Deserialize, Default)]
@@ -248,6 +259,8 @@ pub struct DatastoreTuning {
     pub sync_level: Option<DatastoreFSyncLevel>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub gc_atime_check: Option<bool>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub gc_wait_period: Option<usize>,
 }
 
 pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period Christian Ebner
@ 2025-03-03 12:58   ` Fabian Grünbichler
  2025-03-03 13:30     ` Christian Ebner
  2025-03-04 16:01     ` Thomas Lamprecht
  0 siblings, 2 replies; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-03 12:58 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On February 19, 2025 5:48 pm, Christian Ebner wrote:
> Add the `gc-wait-period` option to the datastore tuning parameters.
> This allows to specify the time after which the chunks are not
> considered in use anymore if their atime has not been updated since
> then. This option is only considered, if the `gc-atime-check` is
> enabled, to avoid potential data loss.
> 
> The default is to keep chunks within the 24h 5m timespan (given no
> active writers). The 5m minimum was chosen to be in line with the
> already used safety offset for garbage collection.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - not present in previous version
> 
>  pbs-api-types/src/datastore.rs | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
> index a7a9db7d..7542e969 100644
> --- a/pbs-api-types/src/datastore.rs
> +++ b/pbs-api-types/src/datastore.rs
> @@ -223,6 +223,13 @@ pub enum DatastoreFSyncLevel {
>      Filesystem,
>  }
>  
> +pub const GC_WAIT_PERIOD_SCHEMA: Schema =
> +    IntegerSchema::new("Wait period (in minutes) for garbage collection phase 2 chunk cleanup (default 24h 5m)")
> +        .minimum(5)
> +        .maximum(1445)

these seem a bit conservative - if we introduce the option, we could
also allow to reduce the grace period to 0 or to require longer grace
periods?

wait period is also a bit of a misnomer IMHO, this is not something
where we "wait" per se, but rather something to account for the
timestamp information potentially not being accurate.. not sure what a
better term would be, "grace period" doesn't really fit either..

it's a kind of "safety margin", maybe something related to that? naming
can be quite hard..

maybe gc_atime_safety_margin ? that also implies that setting a
non-default value has potential safety implications/risks your data,
IMHO more than "reducing a wait period" does..

> +        .default(1445)
> +        .schema();
> +
>  #[api(
>      properties: {
>          "chunk-order": {
> @@ -235,6 +242,10 @@ pub enum DatastoreFSyncLevel {
>              default: true,
>              type: bool,
>          },
> +        "gc-wait-period": {
> +            schema: GC_WAIT_PERIOD_SCHEMA,
> +            optional: true,
> +        },
>      },
>  )]
>  #[derive(Serialize, Deserialize, Default)]
> @@ -248,6 +259,8 @@ pub struct DatastoreTuning {
>      pub sync_level: Option<DatastoreFSyncLevel>,
>      #[serde(skip_serializing_if = "Option::is_none")]
>      pub gc_atime_check: Option<bool>,
> +    #[serde(skip_serializing_if = "Option::is_none")]
> +    pub gc_wait_period: Option<usize>,
>  }
>  
>  pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
> -- 
> 2.39.5


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-03 12:58   ` Fabian Grünbichler
@ 2025-03-03 13:30     ` Christian Ebner
  2025-03-03 15:30       ` Fabian Grünbichler
  2025-03-04 16:01     ` Thomas Lamprecht
  1 sibling, 1 reply; 28+ messages in thread
From: Christian Ebner @ 2025-03-03 13:30 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/3/25 13:58, Fabian Grünbichler wrote:
> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>> Add the `gc-wait-period` option to the datastore tuning parameters.
>> This allows to specify the time after which the chunks are not
>> considered in use anymore if their atime has not been updated since
>> then. This option is only considered, if the `gc-atime-check` is
>> enabled, to avoid potential data loss.
>>
>> The default is to keep chunks within the 24h 5m timespan (given no
>> active writers). The 5m minimum was chosen to be in line with the
>> already used safety offset for garbage collection.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - not present in previous version
>>
>>   pbs-api-types/src/datastore.rs | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
>> index a7a9db7d..7542e969 100644
>> --- a/pbs-api-types/src/datastore.rs
>> +++ b/pbs-api-types/src/datastore.rs
>> @@ -223,6 +223,13 @@ pub enum DatastoreFSyncLevel {
>>       Filesystem,
>>   }
>>   
>> +pub const GC_WAIT_PERIOD_SCHEMA: Schema =
>> +    IntegerSchema::new("Wait period (in minutes) for garbage collection phase 2 chunk cleanup (default 24h 5m)")
>> +        .minimum(5)
>> +        .maximum(1445)
> 
> these seem a bit conservative - if we introduce the option, we could
> also allow to reduce the grace period to 0 or to require longer grace
> periods?

I did stick to the additional 5 minutes safety gap, as this was also 
introduced with the oldest writer change in commit 11861a48 
("src/backup/chunk_store.rs: fix GC"). Since that commit does not 
explicitly mention the reason (I assume to prevent a race between writer 
instantiation and garbage collection), I sticked with this additional 
lower limit.

Increasing the upper limit might be possible without such concerns, I do 
not really see however the use case for that.

> 
> wait period is also a bit of a misnomer IMHO, this is not something
> where we "wait" per se, but rather something to account for the
> timestamp information potentially not being accurate.. not sure what a
> better term would be, "grace period" doesn't really fit either..

I sticked with Thomas' naming suggestion here, but maybe something like 
`gc-clenup-safety-margin` is more fitting? Although it does not imply 
that this is due to the atime... Other suggestions?

> 
> it's a kind of "safety margin", maybe something related to that? naming
> can be quite hard..
> 
> maybe gc_atime_safety_margin ? that also implies that setting a
> non-default value has potential safety implications/risks your data,
> IMHO more than "reducing a wait period" does..
> 
>> +        .default(1445)
>> +        .schema();
>> +
>>   #[api(
>>       properties: {
>>           "chunk-order": {
>> @@ -235,6 +242,10 @@ pub enum DatastoreFSyncLevel {
>>               default: true,
>>               type: bool,
>>           },
>> +        "gc-wait-period": {
>> +            schema: GC_WAIT_PERIOD_SCHEMA,
>> +            optional: true,
>> +        },
>>       },
>>   )]
>>   #[derive(Serialize, Deserialize, Default)]
>> @@ -248,6 +259,8 @@ pub struct DatastoreTuning {
>>       pub sync_level: Option<DatastoreFSyncLevel>,
>>       #[serde(skip_serializing_if = "Option::is_none")]
>>       pub gc_atime_check: Option<bool>,
>> +    #[serde(skip_serializing_if = "Option::is_none")]
>> +    pub gc_wait_period: Option<usize>,
>>   }
>>   
>>   pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
>> -- 
>> 2.39.5
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-03 13:30     ` Christian Ebner
@ 2025-03-03 15:30       ` Fabian Grünbichler
  0 siblings, 0 replies; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-03 15:30 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

On March 3, 2025 2:30 pm, Christian Ebner wrote:
> On 3/3/25 13:58, Fabian Grünbichler wrote:
>> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>>> Add the `gc-wait-period` option to the datastore tuning parameters.
>>> This allows to specify the time after which the chunks are not
>>> considered in use anymore if their atime has not been updated since
>>> then. This option is only considered, if the `gc-atime-check` is
>>> enabled, to avoid potential data loss.
>>>
>>> The default is to keep chunks within the 24h 5m timespan (given no
>>> active writers). The 5m minimum was chosen to be in line with the
>>> already used safety offset for garbage collection.
>>>
>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>> ---
>>> changes since version 1:
>>> - not present in previous version
>>>
>>>   pbs-api-types/src/datastore.rs | 13 +++++++++++++
>>>   1 file changed, 13 insertions(+)
>>>
>>> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
>>> index a7a9db7d..7542e969 100644
>>> --- a/pbs-api-types/src/datastore.rs
>>> +++ b/pbs-api-types/src/datastore.rs
>>> @@ -223,6 +223,13 @@ pub enum DatastoreFSyncLevel {
>>>       Filesystem,
>>>   }
>>>   
>>> +pub const GC_WAIT_PERIOD_SCHEMA: Schema =
>>> +    IntegerSchema::new("Wait period (in minutes) for garbage collection phase 2 chunk cleanup (default 24h 5m)")
>>> +        .minimum(5)
>>> +        .maximum(1445)
>> 
>> these seem a bit conservative - if we introduce the option, we could
>> also allow to reduce the grace period to 0 or to require longer grace
>> periods?
> 
> I did stick to the additional 5 minutes safety gap, as this was also 
> introduced with the oldest writer change in commit 11861a48 
> ("src/backup/chunk_store.rs: fix GC"). Since that commit does not 
> explicitly mention the reason (I assume to prevent a race between writer 
> instantiation and garbage collection), I sticked with this additional 
> lower limit.

if there is a race, we should close it instead of papering over it with
a 5m threshold ;)

a writer must not touch chunks before it has registered (that is a
given, else none of this makes any sense whatsoever, just mentioning it
for completeness sake).

GC must not touch chunks before it has determined the cutoff (also a
given).

the writer registration happens using a shared lock (it's part of
obtaining the shared lock). the same shared lock is used by GC to
retrieve the oldest registered worker (so no race here).

so either:
- at the time GC checks for the older writer, no writer is registered
  yet, and the timestamp obtained right before that check minus the
  safety margin is a valid cutoff
- a worker has registered then GC *must* see it or it must already have
  completed


            let phase1_start_time = proxmox_time::epoch_i64();
            let oldest_writer = self
                .inner
                .chunk_store
                .oldest_writer()
                .unwrap_or(phase1_start_time);

the writer timestamp value is generated during writer creation while the
mutex is held. the GC timestamp value is generated before the mutex is
locked. I don't see a way for a writer timestamp value smaller than the
GC timestamp to be ever stored without GC seeing it?

we basically have a sequence of
- W obtains mutex
- W generates timestamp and persists it
- W drops mutex (and only after this can it start touching/inserting
  chunks)

and another of
- GC generates timestmap
- GC obtains mutex
- GC looks at list of writers
- GC drops mutex

that might interleave. since the latter doesn't write to the protected
data, we can condense it to a single step of "lock and read", and it's
mostly the ordering of GC generating its own timestamp compared to when
the olders writer does the same while holding the mutex that is
interesting, AFAICT.

we either have GC generating its timestamp before the writer, but
reading afterwards:

T0: GC generates timestamp value
sometime between T0 and T1: W obtains mutex
T1: W generates timestamp and persists it in shared guard list
T2: W drops mutex
T3: GC obtains mutex and sees W registered with T1

which is okay, the cutoff can be T0 or T1 in this case it doesn't matter

or the writer generating it before the GC:

sometime before T0: W obtains mutex
T0: W generates timestamp and persists it in shared guard list
T1: W drops mutex
T2: GC generates timestamp value
T3: GC obtains mutex and sees W registered with T0

the cutoff must be T0 in this case

or GC generating it while a writer is registering:

sometime before T0: W obtains mutex
T0: W generates timestamp and persists it in shared guard list
T1: GC generates timestamp value
T2: W drops mutex
T3: GC obtains mutex and sees W registered with T0

T0 will still be seen and is the right cutoff

or another ordering for GC generating while writer registration is
taking place:

sometime before T0: W obtains mutex
T0: GC generates timestamp value
T1: W generates timestamp and persists it in shared guard list
T2: W drops mutex
T3: GC obtains mutex and sees W registered with T1

since T0 < T1, T0 is the right value to use here as well

since we only care about the olders worker, we don't have to worry about
writers racing with eachother - there's only a single mutex, so one of
them must be the oldest and only one we care about. theoretically lots
of writers being spawned could keep GC waiting for the mutext for quite
a while, even allowing writers to complete their run, drop the guard and
thus deregister. but since GC hasn't started its work yet, the index
files generated by those writers must be visible already, and GC doesn't
have to know about their already vanished timestamps.

the only thing I could think of would be time jumps? but then, 5 minutes
are completely arbitrary and might not save anything anyway.. maybe
Dietmar (CCed) remembers where the extra 5 minutes come from?

> Increasing the upper limit might be possible without such concerns, I do 
> not really see however the use case for that.
> 
>> 
>> wait period is also a bit of a misnomer IMHO, this is not something
>> where we "wait" per se, but rather something to account for the
>> timestamp information potentially not being accurate.. not sure what a
>> better term would be, "grace period" doesn't really fit either..
> 
> I sticked with Thomas' naming suggestion here, but maybe something like 
> `gc-clenup-safety-margin` is more fitting? Although it does not imply 
> that this is due to the atime... Other suggestions?

to me, wait period evokes that we *want* to wait. we ideally don't want
to, except if we have to. but like I said, naming *is* hard and I also
don't want to spend too much time on bike shedding ;)

>> it's a kind of "safety margin", maybe something related to that? naming
>> can be quite hard..
>> 
>> maybe gc_atime_safety_margin ? that also implies that setting a
>> non-default value has potential safety implications/risks your data,
>> IMHO more than "reducing a wait period" does..


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-03 12:58   ` Fabian Grünbichler
  2025-03-03 13:30     ` Christian Ebner
@ 2025-03-04 16:01     ` Thomas Lamprecht
  2025-03-04 16:37       ` Christian Ebner
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Lamprecht @ 2025-03-04 16:01 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

Am 03.03.25 um 13:58 schrieb Fabian Grünbichler:
> On February 19, 2025 5:48 pm, Christian Ebner wrote: 
>> +pub const GC_WAIT_PERIOD_SCHEMA: Schema =
>> +    IntegerSchema::new("Wait period (in minutes) for garbage collection phase 2 chunk cleanup (default 24h 5m)")
>> +        .minimum(5)
>> +        .maximum(1445)
> 
> these seem a bit conservative - if we introduce the option, we could
> also allow to reduce the grace period to 0 or to require longer grace
> periods?

Would be fine by me to reducing the minimum to zero. And the extra 5
minutes are "just to be sure" safety-margin, not a requirement for
anything IIRC.


> wait period is also a bit of a misnomer IMHO, this is not something
> where we "wait" per se, but rather something to account for the
> timestamp information potentially not being accurate.. not sure what a
> better term would be, "grace period" doesn't really fit either..
> 
> it's a kind of "safety margin", maybe something related to that? naming
> can be quite hard..
> 
> maybe gc_atime_safety_margin ? that also implies that setting a
> non-default value has potential safety implications/risks your data,
> IMHO more than "reducing a wait period" does..

Maybe something with "cutoff", like just gc_cutoff or gc_atime_cutoff,
as a cut-off of which chunks we even consider for removal is basically
what this is.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-04 16:01     ` Thomas Lamprecht
@ 2025-03-04 16:37       ` Christian Ebner
  2025-03-04 16:49         ` Thomas Lamprecht
  0 siblings, 1 reply; 28+ messages in thread
From: Christian Ebner @ 2025-03-04 16:37 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Thomas Lamprecht,
	Fabian Grünbichler

On 3/4/25 17:01, Thomas Lamprecht wrote:
> Am 03.03.25 um 13:58 schrieb Fabian Grünbichler:
>> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>>> +pub const GC_WAIT_PERIOD_SCHEMA: Schema =
>>> +    IntegerSchema::new("Wait period (in minutes) for garbage collection phase 2 chunk cleanup (default 24h 5m)")
>>> +        .minimum(5)
>>> +        .maximum(1445)
>>
>> these seem a bit conservative - if we introduce the option, we could
>> also allow to reduce the grace period to 0 or to require longer grace
>> periods?
> 
> Would be fine by me to reducing the minimum to zero. And the extra 5
> minutes are "just to be sure" safety-margin, not a requirement for
> anything IIRC.

Discussed this with Fabian rather extensively today. Only reason to keep 
a small safety margin here is for small time drift in case of remote 
storages (if they use their local time for timestamps).

But this can be much lower, would opt for 1 minute to stay within the 
minute range.

Also, atime always uses the coarse resolution for timestamp updates, 
that will also not change with the multi-grained timestamp resolutions 
in https://origin.kernel.org/doc/html/v6.13/filesystems/multigrain-ts.html
So this has to be taken into account for the atime update check, and 
since setting the atime into the past will introduce other error modes 
(permissions, fs impl, ...), a short wait of a 1 second in-between must 
be used.

Also, there is no distinction to be made between filesystems mounted 
with atime and relatime, if the explicit atime update fails, the GC 
should fail.

> 
>> wait period is also a bit of a misnomer IMHO, this is not something
>> where we "wait" per se, but rather something to account for the
>> timestamp information potentially not being accurate.. not sure what a
>> better term would be, "grace period" doesn't really fit either..
>>
>> it's a kind of "safety margin", maybe something related to that? naming
>> can be quite hard..
>>
>> maybe gc_atime_safety_margin ? that also implies that setting a
>> non-default value has potential safety implications/risks your data,
>> IMHO more than "reducing a wait period" does..
> 
> Maybe something with "cutoff", like just gc_cutoff or gc_atime_cutoff,
> as a cut-off of which chunks we even consider for removal is basically
> what this is.

I would opt for gc-atime-safety-check and gc-atime-safety-margin, to 
show that they are related and their implicit function

Also, there will be no upper limit for the gc-atime-safety-margin, as 
Fabian pointed out correctly, setting this to large values might be 
desired to avoid data loss if something is off, and one notices from 
e.g. the pending removals.

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-04 16:37       ` Christian Ebner
@ 2025-03-04 16:49         ` Thomas Lamprecht
  2025-03-04 17:03           ` Christian Ebner
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Thomas Lamprecht @ 2025-03-04 16:49 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion,
	Fabian Grünbichler

Am 04.03.25 um 17:37 schrieb Christian Ebner:
> On 3/4/25 17:01, Thomas Lamprecht wrote:
>> Would be fine by me to reducing the minimum to zero. And the extra 5
>> minutes are "just to be sure" safety-margin, not a requirement for
>> anything IIRC.
> 
> Discussed this with Fabian rather extensively today. Only reason to keep 
> a small safety margin here is for small time drift in case of remote 
> storages (if they use their local time for timestamps).

Ah, you mean network attached remote storage, but while your reasons
below are fine, doing this for time drifts is IMO not really strong
argumentation, as if one allows for no time synchronisation then there
won't be a limit to the drift amount, but ...


> But this can be much lower, would opt for 1 minute to stay within the 
> minute range.
> 
> Also, atime always uses the coarse resolution for timestamp updates, 
> that will also not change with the multi-grained timestamp resolutions 
> in https://origin.kernel.org/doc/html/v6.13/filesystems/multigrain-ts.html
> So this has to be taken into account for the atime update check, and 
> since setting the atime into the past will introduce other error modes 
> (permissions, fs impl, ...), a short wait of a 1 second in-between must 
> be used.
> 
> Also, there is no distinction to be made between filesystems mounted 
> with atime and relatime, if the explicit atime update fails, the GC 

... this actually is a strong argument, so I'm fine with a Minute as
minimum.
>> Maybe something with "cutoff", like just gc_cutoff or gc_atime_cutoff,
>> as a cut-off of which chunks we even consider for removal is basically
>> what this is.
> 
> I would opt for gc-atime-safety-check and gc-atime-safety-margin, to 
> show that they are related and their implicit function

meh, do not find safety-check/margin very telling and safety is a bit too
generic and also slightly overused term IMO. What's wrong with cutoff?

> Also, there will be no upper limit for the gc-atime-safety-margin, as 
> Fabian pointed out correctly, setting this to large values might be 
> desired to avoid data loss if something is off, and one notices from 
> e.g. the pending removals.

I'm rather a fan of _some_ limits even if only to be proven wrong by user
demand to go over that, but then we actually know the use case. As until
now we got no request at all for this to be higher, IIRC, I'd go for two
days.

Data loss is prevented by additional copies on other servers and mediums
(keyword: tape or soon s3) not some rather internal GC cut-off that
provides no nice mechanism to recover any data from it, so I also do
not really buy that it will be used as that in practice, we certainly
should not advertise it as candidate for that.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-04 16:49         ` Thomas Lamprecht
@ 2025-03-04 17:03           ` Christian Ebner
  2025-03-05  7:57           ` Fabian Grünbichler
  2025-03-05  8:12           ` Fabian Grünbichler
  2 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-03-04 17:03 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion,
	Fabian Grünbichler

On 3/4/25 17:49, Thomas Lamprecht wrote:
> Am 04.03.25 um 17:37 schrieb Christian Ebner:
>> On 3/4/25 17:01, Thomas Lamprecht wrote:
>>> Would be fine by me to reducing the minimum to zero. And the extra 5
>>> minutes are "just to be sure" safety-margin, not a requirement for
>>> anything IIRC.
>>
>> Discussed this with Fabian rather extensively today. Only reason to keep
>> a small safety margin here is for small time drift in case of remote
>> storages (if they use their local time for timestamps).
> 
> Ah, you mean network attached remote storage, but while your reasons
> below are fine, doing this for time drifts is IMO not really strong
> argumentation, as if one allows for no time synchronisation then there
> won't be a limit to the drift amount, but ...
> 
> 
>> But this can be much lower, would opt for 1 minute to stay within the
>> minute range.
>>
>> Also, atime always uses the coarse resolution for timestamp updates,
>> that will also not change with the multi-grained timestamp resolutions
>> in https://origin.kernel.org/doc/html/v6.13/filesystems/multigrain-ts.html
>> So this has to be taken into account for the atime update check, and
>> since setting the atime into the past will introduce other error modes
>> (permissions, fs impl, ...), a short wait of a 1 second in-between must
>> be used.
>>
>> Also, there is no distinction to be made between filesystems mounted
>> with atime and relatime, if the explicit atime update fails, the GC
> 
> ... this actually is a strong argument, so I'm fine with a Minute as
> minimum.
>>> Maybe something with "cutoff", like just gc_cutoff or gc_atime_cutoff,
>>> as a cut-off of which chunks we even consider for removal is basically
>>> what this is.
>>
>> I would opt for gc-atime-safety-check and gc-atime-safety-margin, to
>> show that they are related and their implicit function
> 
> meh, do not find safety-check/margin very telling and safety is a bit too
> generic and also slightly overused term IMO. What's wrong with cutoff?

Nothing wrong with it, but Fabian pointed out that naming the opt-out 
flag for the check `gc-atime-check` is a bit to generic, as atimes are 
always checked by the GC, the better naming was `gc-atime-safety-check`, 
and therefore the `gc-atime-safety-margin` might make the correlation 
more clear, as it will only be allowed to set the latter, if the former 
is enabled. But I can of course go with the `gc-atime-cutoff`.

>> Also, there will be no upper limit for the gc-atime-safety-margin, as
>> Fabian pointed out correctly, setting this to large values might be
>> desired to avoid data loss if something is off, and one notices from
>> e.g. the pending removals.
> 
> I'm rather a fan of _some_ limits even if only to be proven wrong by user
> demand to go over that, but then we actually know the use case. As until
> now we got no request at all for this to be higher, IIRC, I'd go for two
> days.
> 
> Data loss is prevented by additional copies on other servers and mediums
> (keyword: tape or soon s3)

Yes, agreed! Although still a lot of work ahead for the latter one...


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-04 16:49         ` Thomas Lamprecht
  2025-03-04 17:03           ` Christian Ebner
@ 2025-03-05  7:57           ` Fabian Grünbichler
  2025-03-05  8:12           ` Fabian Grünbichler
  2 siblings, 0 replies; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-05  7:57 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion,
	Thomas Lamprecht

On March 4, 2025 5:49 pm, Thomas Lamprecht wrote:
> Am 04.03.25 um 17:37 schrieb Christian Ebner:
>> Also, there will be no upper limit for the gc-atime-safety-margin, as 
>> Fabian pointed out correctly, setting this to large values might be 
>> desired to avoid data loss if something is off, and one notices from 
>> e.g. the pending removals.
> 
> I'm rather a fan of _some_ limits even if only to be proven wrong by user
> demand to go over that, but then we actually know the use case. As until
> now we got no request at all for this to be higher, IIRC, I'd go for two
> days.
> 
> Data loss is prevented by additional copies on other servers and mediums
> (keyword: tape or soon s3) not some rather internal GC cut-off that
> provides no nice mechanism to recover any data from it, so I also do
> not really buy that it will be used as that in practice, we certainly
> should not advertise it as candidate for that.

that's true, keeping the chunks for a longer period just makes
recovering from an accidental "over-pruning" much faster (by syncing
just the metadata back from whichever offsite copy), it doesn't
allow recovering on its own without an extra copy.

so if your datastore has the headroom, bumping the limit to a week for
example and monitoring your GC stats to notice pending removal outliers
can save you days of copying chunks back..

it's probably not a very common use case though, and temporarily or
permanently reducing it down to a very low bound will probably be the
common (and maybe future default) setting ;)


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period
  2025-03-04 16:49         ` Thomas Lamprecht
  2025-03-04 17:03           ` Christian Ebner
  2025-03-05  7:57           ` Fabian Grünbichler
@ 2025-03-05  8:12           ` Fabian Grünbichler
  2 siblings, 0 replies; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-05  8:12 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion,
	Thomas Lamprecht

On March 4, 2025 5:49 pm, Thomas Lamprecht wrote:
> Am 04.03.25 um 17:37 schrieb Christian Ebner:
>> On 3/4/25 17:01, Thomas Lamprecht wrote:
>>> Would be fine by me to reducing the minimum to zero. And the extra 5
>>> minutes are "just to be sure" safety-margin, not a requirement for
>>> anything IIRC.
>> 
>> Discussed this with Fabian rather extensively today. Only reason to keep 
>> a small safety margin here is for small time drift in case of remote 
>> storages (if they use their local time for timestamps).
> 
> Ah, you mean network attached remote storage, but while your reasons
> below are fine, doing this for time drifts is IMO not really strong
> argumentation, as if one allows for no time synchronisation then there
> won't be a limit to the drift amount, but ...

actually, the translation from UTIME_NOW to timestamp happens in the
local kernel, so time drift shouldn't matter much here..

>> But this can be much lower, would opt for 1 minute to stay within the 
>> minute range.
>> 
>> Also, atime always uses the coarse resolution for timestamp updates, 
>> that will also not change with the multi-grained timestamp resolutions 
>> in https://origin.kernel.org/doc/html/v6.13/filesystems/multigrain-ts.html
>> So this has to be taken into account for the atime update check, and 
>> since setting the atime into the past will introduce other error modes 
>> (permissions, fs impl, ...), a short wait of a 1 second in-between must 
>> be used.
>> 
>> Also, there is no distinction to be made between filesystems mounted 
>> with atime and relatime, if the explicit atime update fails, the GC 
> 
> ... this actually is a strong argument, so I'm fine with a Minute as
> minimum.

but this is still a valid reason to leave a small buffer :)


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 3/9] datastore: move utimensat() related constants to module scope
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-03-03 13:03   ` Fabian Grünbichler
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored Christian Ebner
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Move the UTIME_NOW and UTIME_OMIT constants from the current function
scope to the module scope in order to allow to reuse them.

This is in preparation for adding a check to test if the filesystem
backing the chunk store honors immediate atime updates by calling
utimensat().

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 pbs-datastore/src/chunk_store.rs | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 29d5874a1..7bdcb0297 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -18,6 +18,9 @@ use crate::file_formats::{
 };
 use crate::DataBlob;
 
+const UTIME_NOW: i64 = (1 << 30) - 1;
+const UTIME_OMIT: i64 = (1 << 30) - 2;
+
 /// File system based chunk store
 pub struct ChunkStore {
     name: String, // used for error reporting
@@ -220,9 +223,6 @@ impl ChunkStore {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
 
-        const UTIME_NOW: i64 = (1 << 30) - 1;
-        const UTIME_OMIT: i64 = (1 << 30) - 2;
-
         let times: [libc::timespec; 2] = [
             // access time -> update to now
             libc::timespec {
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 3/9] datastore: move utimensat() related constants to module scope
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 3/9] datastore: move utimensat() related constants to module scope Christian Ebner
@ 2025-03-03 13:03   ` Fabian Grünbichler
  2025-03-03 13:32     ` Christian Ebner
  0 siblings, 1 reply; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-03 13:03 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On February 19, 2025 5:48 pm, Christian Ebner wrote:
> Move the UTIME_NOW and UTIME_OMIT constants from the current function
> scope to the module scope in order to allow to reuse them.
> 
> This is in preparation for adding a check to test if the filesystem
> backing the chunk store honors immediate atime updates by calling
> utimensat().
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - not present in previous version
> 
>  pbs-datastore/src/chunk_store.rs | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 29d5874a1..7bdcb0297 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -18,6 +18,9 @@ use crate::file_formats::{
>  };
>  use crate::DataBlob;
>  
> +const UTIME_NOW: i64 = (1 << 30) - 1;
> +const UTIME_OMIT: i64 = (1 << 30) - 2;

these are exported by libc, couldn't we just use the definition from
there?

> +
>  /// File system based chunk store
>  pub struct ChunkStore {
>      name: String, // used for error reporting
> @@ -220,9 +223,6 @@ impl ChunkStore {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());
>  
> -        const UTIME_NOW: i64 = (1 << 30) - 1;
> -        const UTIME_OMIT: i64 = (1 << 30) - 2;
> -
>          let times: [libc::timespec; 2] = [
>              // access time -> update to now
>              libc::timespec {
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 3/9] datastore: move utimensat() related constants to module scope
  2025-03-03 13:03   ` Fabian Grünbichler
@ 2025-03-03 13:32     ` Christian Ebner
  0 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-03-03 13:32 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/3/25 14:03, Fabian Grünbichler wrote:
> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>> Move the UTIME_NOW and UTIME_OMIT constants from the current function
>> scope to the module scope in order to allow to reuse them.
>>
>> This is in preparation for adding a check to test if the filesystem
>> backing the chunk store honors immediate atime updates by calling
>> utimensat().
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - not present in previous version
>>
>>   pbs-datastore/src/chunk_store.rs | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 29d5874a1..7bdcb0297 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -18,6 +18,9 @@ use crate::file_formats::{
>>   };
>>   use crate::DataBlob;
>>   
>> +const UTIME_NOW: i64 = (1 << 30) - 1;
>> +const UTIME_OMIT: i64 = (1 << 30) - 2;
> 
> these are exported by libc, couldn't we just use the definition from
> there?

Yes, even better so. Will switch over to the constants exposed by libc 
crate for the next version of the patches, thx!

> 
>> +
>>   /// File system based chunk store
>>   pub struct ChunkStore {
>>       name: String, // used for error reporting
>> @@ -220,9 +223,6 @@ impl ChunkStore {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>>   
>> -        const UTIME_NOW: i64 = (1 << 30) - 1;
>> -        const UTIME_OMIT: i64 = (1 << 30) - 2;
>> -
>>           let times: [libc::timespec; 2] = [
>>               // access time -> update to now
>>               libc::timespec {
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
                   ` (2 preceding siblings ...)
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 3/9] datastore: move utimensat() related constants to module scope Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-03-03 13:51   ` Fabian Grünbichler
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 5/9] ui: expose atime update check flag in datastore tuning options Christian Ebner
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Check if the filesystem backing the chunk store actually updates the
atime to avoid potential data loss in phase 2 of garbage collection
in case the atime update is not honored.

Perform the check before phase 1 of garbage collection, as well as
on datastore creation. The latter to early detect and disallow
datastore creation on filesystem configurations which otherwise most
likely would lead to data losses.

Enable the atime update check by default, but allow to opt-out by
setting a datastore tuning parameter flag for backwards compatibility.
This is honored by both, garbage collection and datastore creation.

The check uses a 4 MiB fixed sized, unencypted and compressed chunk
as test marker, inserted if not present. This all zero-chunk is very
likely anyways for unencrypted backup contents with large all-zero
regions using fixed size chunking (e.g. VMs).

The check sets the chunk's atime twice, once to set it to 1 second
into the past, and once to set it to now. By this one can detect if
the atime is set at all and if it also set if the filesystem is e.g.
mounted via relatime, which is intended to deferr the atime update.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- Use all-zero fixed size chunk for atime update test
- Test on datastore creation and before garbage collection phase 1
- fail datastore creation and garbage collection jobs if check fails,
  but keep opt-out

 pbs-datastore/src/chunk_store.rs | 80 ++++++++++++++++++++++++++++++--
 pbs-datastore/src/datastore.rs   | 10 ++++
 src/api2/config/datastore.rs     |  1 +
 3 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 7bdcb0297..be57e3c87 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -1,6 +1,7 @@
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, Mutex};
+use std::time::{Duration, SystemTime, UNIX_EPOCH};
 
 use anyhow::{bail, format_err, Error};
 use tracing::info;
@@ -13,6 +14,7 @@ use proxmox_sys::process_locker::{
 };
 use proxmox_worker_task::WorkerTaskContext;
 
+use crate::data_blob::DataChunkBuilder;
 use crate::file_formats::{
     COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
 };
@@ -96,6 +98,7 @@ impl ChunkStore {
         uid: nix::unistd::Uid,
         gid: nix::unistd::Gid,
         sync_level: DatastoreFSyncLevel,
+        atime_update_check: bool,
     ) -> Result<Self, Error>
     where
         P: Into<PathBuf>,
@@ -150,7 +153,16 @@ impl ChunkStore {
             }
         }
 
-        Self::open(name, base, sync_level)
+        let chunk_store = Self::open(name, base, sync_level)?;
+        if atime_update_check {
+            chunk_store
+                .check_fs_atime_update()
+                .map(|()| info!("atime update check successful."))?;
+        } else {
+            info!("atime update check skipped.");
+        }
+
+        Ok(chunk_store)
     }
 
     fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
@@ -445,6 +457,51 @@ impl ChunkStore {
         Ok(())
     }
 
+    /// Check if atime updates are honored by the filesystem backing the chunk store.
+    ///
+    /// Sets the atime twice, once into the past by one second, checking that this was honored and
+    /// then to now again, as otherwise it is not guaranteed to also cover time ranges by `relatime`.
+    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
+    /// the chunk store if not yet present.
+    pub fn check_fs_atime_update(&self) -> Result<(), Error> {
+        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
+        self.insert_chunk(&zero_chunk, &digest)?;
+
+        let past = SystemTime::now().duration_since(UNIX_EPOCH)? - Duration::from_secs(1);
+        let times: [libc::timespec; 2] = [
+            libc::timespec {
+                tv_sec: past.as_secs().try_into()?,
+                tv_nsec: past.subsec_nanos().into(),
+            },
+            libc::timespec {
+                tv_sec: 0,
+                tv_nsec: UTIME_OMIT,
+            },
+        ];
+
+        use nix::NixPath;
+        let (path, _digest) = self.chunk_path(&digest);
+        path.with_nix_path(|cstr| unsafe {
+            let tmp = libc::utimensat(-1, cstr.as_ptr(), &times[0], libc::AT_SYMLINK_NOFOLLOW);
+            nix::errno::Errno::result(tmp)
+        })??;
+
+        let metadata_past = std::fs::metadata(&path).map_err(Error::from)?;
+        let atime_past = metadata_past.accessed()?;
+        if past != atime_past.duration_since(UNIX_EPOCH)? {
+            bail!("setting atime to the past failed, is atime support enabled on datastore backing filesystem?");
+        }
+
+        self.cond_touch_chunk(&digest, true)?;
+
+        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;
+        let atime_now = metadata_now.accessed()?;
+        if atime_past >= atime_now {
+            bail!("atime update check failed, is atime support enabled on datastore backing filesystem?");
+        }
+        Ok(())
+    }
+
     pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
@@ -631,8 +688,15 @@ fn test_chunk_store1() {
     let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
         .unwrap()
         .unwrap();
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        true,
+    )
+    .unwrap();
 
     let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
         .build()
@@ -644,8 +708,14 @@ fn test_chunk_store1() {
     let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
     assert!(exists);
 
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        true,
+    );
     assert!(chunk_store.is_err());
 
     if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 75c0c16ab..9b01790cc 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1170,6 +1170,16 @@ impl DataStore {
                 upid: Some(upid.to_string()),
                 ..Default::default()
             };
+            let tuning: DatastoreTuning = serde_json::from_value(
+                DatastoreTuning::API_SCHEMA
+                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
+            )?;
+            if tuning.gc_atime_check.unwrap_or(true) {
+                self.inner.chunk_store.check_fs_atime_update()?;
+                info!("Filesystem atime update check successful.");
+            } else {
+                info!("Filesystem atime update check disabled by datastore tuning options.");
+            }
 
             info!("Start GC phase1 (mark used chunks)");
 
diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index fe3260f6d..1cd38f2db 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
                 backup_user.uid,
                 backup_user.gid,
                 tuning.sync_level.unwrap_or_default(),
+                tuning.gc_atime_check.unwrap_or(true),
             )
             .map(|_| ())
         } else {
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored Christian Ebner
@ 2025-03-03 13:51   ` Fabian Grünbichler
  2025-03-03 15:23     ` Christian Ebner
  0 siblings, 1 reply; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-03 13:51 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On February 19, 2025 5:48 pm, Christian Ebner wrote:
> Check if the filesystem backing the chunk store actually updates the
> atime to avoid potential data loss in phase 2 of garbage collection
> in case the atime update is not honored.
> 
> Perform the check before phase 1 of garbage collection, as well as
> on datastore creation. The latter to early detect and disallow
> datastore creation on filesystem configurations which otherwise most
> likely would lead to data losses.
> 
> Enable the atime update check by default, but allow to opt-out by
> setting a datastore tuning parameter flag for backwards compatibility.
> This is honored by both, garbage collection and datastore creation.
> 
> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
> as test marker, inserted if not present. This all zero-chunk is very
> likely anyways for unencrypted backup contents with large all-zero
> regions using fixed size chunking (e.g. VMs).
> 
> The check sets the chunk's atime twice, once to set it to 1 second
> into the past, and once to set it to now. By this one can detect if
> the atime is set at all and if it also set if the filesystem is e.g.
> mounted via relatime, which is intended to deferr the atime update.

that's not how relatime works? see below..

> 
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - Use all-zero fixed size chunk for atime update test
> - Test on datastore creation and before garbage collection phase 1
> - fail datastore creation and garbage collection jobs if check fails,
>   but keep opt-out
> 
>  pbs-datastore/src/chunk_store.rs | 80 ++++++++++++++++++++++++++++++--
>  pbs-datastore/src/datastore.rs   | 10 ++++
>  src/api2/config/datastore.rs     |  1 +
>  3 files changed, 86 insertions(+), 5 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 7bdcb0297..be57e3c87 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -1,6 +1,7 @@
>  use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
> +use std::time::{Duration, SystemTime, UNIX_EPOCH};
>  
>  use anyhow::{bail, format_err, Error};
>  use tracing::info;
> @@ -13,6 +14,7 @@ use proxmox_sys::process_locker::{
>  };
>  use proxmox_worker_task::WorkerTaskContext;
>  
> +use crate::data_blob::DataChunkBuilder;
>  use crate::file_formats::{
>      COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>  };
> @@ -96,6 +98,7 @@ impl ChunkStore {
>          uid: nix::unistd::Uid,
>          gid: nix::unistd::Gid,
>          sync_level: DatastoreFSyncLevel,
> +        atime_update_check: bool,
>      ) -> Result<Self, Error>
>      where
>          P: Into<PathBuf>,
> @@ -150,7 +153,16 @@ impl ChunkStore {
>              }
>          }
>  
> -        Self::open(name, base, sync_level)
> +        let chunk_store = Self::open(name, base, sync_level)?;
> +        if atime_update_check {
> +            chunk_store
> +                .check_fs_atime_update()
> +                .map(|()| info!("atime update check successful."))?;
> +        } else {
> +            info!("atime update check skipped.");
> +        }
> +
> +        Ok(chunk_store)
>      }
>  
>      fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
> @@ -445,6 +457,51 @@ impl ChunkStore {
>          Ok(())
>      }
>  
> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
> +    ///
> +    /// Sets the atime twice, once into the past by one second, checking that this was honored and
> +    /// then to now again, as otherwise it is not guaranteed to also cover time ranges by `relatime`.
> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
> +    /// the chunk store if not yet present.
> +    pub fn check_fs_atime_update(&self) -> Result<(), Error> {
> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
> +        self.insert_chunk(&zero_chunk, &digest)?;
> +
> +        let past = SystemTime::now().duration_since(UNIX_EPOCH)? - Duration::from_secs(1);
> +        let times: [libc::timespec; 2] = [
> +            libc::timespec {
> +                tv_sec: past.as_secs().try_into()?,
> +                tv_nsec: past.subsec_nanos().into(),
> +            },
> +            libc::timespec {
> +                tv_sec: 0,
> +                tv_nsec: UTIME_OMIT,
> +            },
> +        ];

this sets *atime* one second in the past, while not touching *mtime*.
this is not something we ever need? if this fails, something is weird,
but it wouldn't mean that atime handling was broken for our use case..

> +
> +        use nix::NixPath;
> +        let (path, _digest) = self.chunk_path(&digest);
> +        path.with_nix_path(|cstr| unsafe {
> +            let tmp = libc::utimensat(-1, cstr.as_ptr(), &times[0], libc::AT_SYMLINK_NOFOLLOW);
> +            nix::errno::Errno::result(tmp)
> +        })??;
> +
> +        let metadata_past = std::fs::metadata(&path).map_err(Error::from)?;
> +        let atime_past = metadata_past.accessed()?;
> +        if past != atime_past.duration_since(UNIX_EPOCH)? {
> +            bail!("setting atime to the past failed, is atime support enabled on datastore backing filesystem?");
> +        }
> +
> +        self.cond_touch_chunk(&digest, true)?;
> +
> +        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;
> +        let atime_now = metadata_now.accessed()?;
> +        if atime_past >= atime_now {
> +            bail!("atime update check failed, is atime support enabled on datastore backing filesystem?");
> +        }

if something implements explicit timestamp updates to work like relatime
does for read-triggered updates, this would still potentially fail -
relatime covers checking *mtime* (which we haven't updated above!) first
to see if *atime* should be updated when doing reads. it's basically a
performance optimization (skip most atime updates for read-heavy work
loads) without breaking a very specific use case (that of mutt abusing
atime to know when it last read a local mailbox[0], which would break if
atime were never updated on reads).

I think what we actually want to do at GC time is:

(potentially) insert the chunk
stat the chunk
touch the chunk (regularly)
stat the chunk again
compare the timestamps

we now have three cases:

A) the old atime is before the new atime => the storage doesn't discard
*all* atime updates and we can proceed
B) the old atime and the new atime are identical and atime is *before*
mtime and ctime => atime should have been updated even under
"relatime-like" semantics, very likely this storage is discarding all
atime updates, we should abort with an error
C) the old atime and the new atime are identical, but atime is *after*
ctime/mtime => we need to check whether we just ran into relatime-like
behaviour

C means we either abort as well (if we don't want to support such
setups), or we try to detect if setting both mtime and atime works, and
force the GC into doing that

we might want to add a "wait a second" between the first stat and the
touching to account for filesystems with more granular timestamps
(inserting touches, and other tasks might touch at "the same time" as
well by accident), since the requested timestamps might be modified to
become "the greatest value supported by the filesystem that is not
greater than the specified time" (utimensat(2)), and one second doesn't
really hurt is in the scope of a GC..

note that the 24h grace period is totally independent of this - it comes
from lazytime and relates to how often changes of timestamps are
*persisted on disk*. that shouldn't affect us for local storages (since
we talk to the kernel and always get the updated view anyway, no matter
if it is in-memory or on-disk. if the PBS system crashes GC is gone as
well and needs to start over, no risk of inconsistency).

we might have a problem if the storage uses lazytime internally and
lives somewhere else
- GC starts phase1
- updates a few atimes
- storage crashes and loses some of the updates while I/O to it blocks
- storage comes back up
- GC continues not knowing that some of the atime updates have been lost
  in the ether
- GC enters phase2
- chunk timestamp makes it eligible for deletion, even though it was
  touched in phase1
- data loss

I don't think we can really prevent this other than by ditching the
whole atime mechanism and instead keep a list of written chunks in each
backup writer, and generate such a list in phase1, merging them all and
using that for phase2 (which is really expensive as it scales with the
number of touched chunks and would require syncing up writers and GC).
thankfully I'd expect most such failure scenarios to actually abort the
GC anyway for I/O error reasons..


at datastore creation we could do more in-depth checks (including
"faking" past timestamps and seeing if updates are honored for those if
they are not honored when doing the simple tests) - if we think we gain
something from that?

0: http://www.mutt.org/doc/manual/#new-mail-formats

> +        Ok(())
> +    }
> +
>      pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());
> @@ -631,8 +688,15 @@ fn test_chunk_store1() {
>      let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>          .unwrap()
>          .unwrap();
> -    let chunk_store =
> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        DatastoreFSyncLevel::None,
> +        true,
> +    )
> +    .unwrap();
>  
>      let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>          .build()
> @@ -644,8 +708,14 @@ fn test_chunk_store1() {
>      let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>      assert!(exists);
>  
> -    let chunk_store =
> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        DatastoreFSyncLevel::None,
> +        true,
> +    );
>      assert!(chunk_store.is_err());
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 75c0c16ab..9b01790cc 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1170,6 +1170,16 @@ impl DataStore {
>                  upid: Some(upid.to_string()),
>                  ..Default::default()
>              };
> +            let tuning: DatastoreTuning = serde_json::from_value(
> +                DatastoreTuning::API_SCHEMA
> +                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
> +            )?;
> +            if tuning.gc_atime_check.unwrap_or(true) {
> +                self.inner.chunk_store.check_fs_atime_update()?;
> +                info!("Filesystem atime update check successful.");
> +            } else {
> +                info!("Filesystem atime update check disabled by datastore tuning options.");
> +            }
>  
>              info!("Start GC phase1 (mark used chunks)");
>  
> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index fe3260f6d..1cd38f2db 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
>                  backup_user.uid,
>                  backup_user.gid,
>                  tuning.sync_level.unwrap_or_default(),
> +                tuning.gc_atime_check.unwrap_or(true),
>              )
>              .map(|_| ())
>          } else {
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored
  2025-03-03 13:51   ` Fabian Grünbichler
@ 2025-03-03 15:23     ` Christian Ebner
  2025-03-03 15:50       ` Fabian Grünbichler
  0 siblings, 1 reply; 28+ messages in thread
From: Christian Ebner @ 2025-03-03 15:23 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/3/25 14:51, Fabian Grünbichler wrote:
> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>> Check if the filesystem backing the chunk store actually updates the
>> atime to avoid potential data loss in phase 2 of garbage collection
>> in case the atime update is not honored.
>>
>> Perform the check before phase 1 of garbage collection, as well as
>> on datastore creation. The latter to early detect and disallow
>> datastore creation on filesystem configurations which otherwise most
>> likely would lead to data losses.
>>
>> Enable the atime update check by default, but allow to opt-out by
>> setting a datastore tuning parameter flag for backwards compatibility.
>> This is honored by both, garbage collection and datastore creation.
>>
>> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
>> as test marker, inserted if not present. This all zero-chunk is very
>> likely anyways for unencrypted backup contents with large all-zero
>> regions using fixed size chunking (e.g. VMs).
>>
>> The check sets the chunk's atime twice, once to set it to 1 second
>> into the past, and once to set it to now. By this one can detect if
>> the atime is set at all and if it also set if the filesystem is e.g.
>> mounted via relatime, which is intended to deferr the atime update.
> 
> that's not how relatime works? see below..
> 
>>
>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - Use all-zero fixed size chunk for atime update test
>> - Test on datastore creation and before garbage collection phase 1
>> - fail datastore creation and garbage collection jobs if check fails,
>>    but keep opt-out
>>
>>   pbs-datastore/src/chunk_store.rs | 80 ++++++++++++++++++++++++++++++--
>>   pbs-datastore/src/datastore.rs   | 10 ++++
>>   src/api2/config/datastore.rs     |  1 +
>>   3 files changed, 86 insertions(+), 5 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 7bdcb0297..be57e3c87 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -1,6 +1,7 @@
>>   use std::os::unix::io::AsRawFd;
>>   use std::path::{Path, PathBuf};
>>   use std::sync::{Arc, Mutex};
>> +use std::time::{Duration, SystemTime, UNIX_EPOCH};
>>   
>>   use anyhow::{bail, format_err, Error};
>>   use tracing::info;
>> @@ -13,6 +14,7 @@ use proxmox_sys::process_locker::{
>>   };
>>   use proxmox_worker_task::WorkerTaskContext;
>>   
>> +use crate::data_blob::DataChunkBuilder;
>>   use crate::file_formats::{
>>       COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>>   };
>> @@ -96,6 +98,7 @@ impl ChunkStore {
>>           uid: nix::unistd::Uid,
>>           gid: nix::unistd::Gid,
>>           sync_level: DatastoreFSyncLevel,
>> +        atime_update_check: bool,
>>       ) -> Result<Self, Error>
>>       where
>>           P: Into<PathBuf>,
>> @@ -150,7 +153,16 @@ impl ChunkStore {
>>               }
>>           }
>>   
>> -        Self::open(name, base, sync_level)
>> +        let chunk_store = Self::open(name, base, sync_level)?;
>> +        if atime_update_check {
>> +            chunk_store
>> +                .check_fs_atime_update()
>> +                .map(|()| info!("atime update check successful."))?;
>> +        } else {
>> +            info!("atime update check skipped.");
>> +        }
>> +
>> +        Ok(chunk_store)
>>       }
>>   
>>       fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
>> @@ -445,6 +457,51 @@ impl ChunkStore {
>>           Ok(())
>>       }
>>   
>> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
>> +    ///
>> +    /// Sets the atime twice, once into the past by one second, checking that this was honored and
>> +    /// then to now again, as otherwise it is not guaranteed to also cover time ranges by `relatime`.
>> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
>> +    /// the chunk store if not yet present.
>> +    pub fn check_fs_atime_update(&self) -> Result<(), Error> {
>> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
>> +        self.insert_chunk(&zero_chunk, &digest)?;
>> +
>> +        let past = SystemTime::now().duration_since(UNIX_EPOCH)? - Duration::from_secs(1);
>> +        let times: [libc::timespec; 2] = [
>> +            libc::timespec {
>> +                tv_sec: past.as_secs().try_into()?,
>> +                tv_nsec: past.subsec_nanos().into(),
>> +            },
>> +            libc::timespec {
>> +                tv_sec: 0,
>> +                tv_nsec: UTIME_OMIT,
>> +            },
>> +        ];
> 
> this sets *atime* one second in the past, while not touching *mtime*.
> this is not something we ever need? if this fails, something is weird,
> but it wouldn't mean that atime handling was broken for our use case..
> 
>> +
>> +        use nix::NixPath;
>> +        let (path, _digest) = self.chunk_path(&digest);
>> +        path.with_nix_path(|cstr| unsafe {
>> +            let tmp = libc::utimensat(-1, cstr.as_ptr(), &times[0], libc::AT_SYMLINK_NOFOLLOW);
>> +            nix::errno::Errno::result(tmp)
>> +        })??;
>> +
>> +        let metadata_past = std::fs::metadata(&path).map_err(Error::from)?;
>> +        let atime_past = metadata_past.accessed()?;
>> +        if past != atime_past.duration_since(UNIX_EPOCH)? {
>> +            bail!("setting atime to the past failed, is atime support enabled on datastore backing filesystem?");
>> +        }
>> +
>> +        self.cond_touch_chunk(&digest, true)?;
>> +
>> +        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;
>> +        let atime_now = metadata_now.accessed()?;
>> +        if atime_past >= atime_now {
>> +            bail!("atime update check failed, is atime support enabled on datastore backing filesystem?");
>> +        }
> 
> if something implements explicit timestamp updates to work like relatime
> does for read-triggered updates, this would still potentially fail -
> relatime covers checking *mtime* (which we haven't updated above!) first
> to see if *atime* should be updated when doing reads. it's basically a
> performance optimization (skip most atime updates for read-heavy work
> loads) without breaking a very specific use case (that of mutt abusing
> atime to know when it last read a local mailbox[0], which would break if
> atime were never updated on reads).

Thanks a lot for the pointer! Need to dig a bit deeper into how relatime 
works, as my understanding of it is clearly flawed.

> I think what we actually want to do at GC time is:
> 
> (potentially) insert the chunk
> stat the chunk
> touch the chunk (regularly)
> stat the chunk again
> compare the timestamps

That was my initial approach, which failed because of the timestamp 
granularity and since I didn't want to wait in-between the stat and 
touch the approach with setting the atime into the past first came to 
mind, therefore my implementation. But I see that this does not work 
based on your feedback.

> we now have three cases:
> 
> A) the old atime is before the new atime => the storage doesn't discard
> *all* atime updates and we can proceed
> B) the old atime and the new atime are identical and atime is *before*
> mtime and ctime => atime should have been updated even under
> "relatime-like" semantics, very likely this storage is discarding all
> atime updates, we should abort with an error
> C) the old atime and the new atime are identical, but atime is *after*
> ctime/mtime => we need to check whether we just ran into relatime-like
> behaviour
> 
> C means we either abort as well (if we don't want to support such
> setups), or we try to detect if setting both mtime and atime works, and
> force the GC into doing that
> 
> we might want to add a "wait a second" between the first stat and the
> touching to account for filesystems with more granular timestamps
> (inserting touches, and other tasks might touch at "the same time" as
> well by accident), since the requested timestamps might be modified to
> become "the greatest value supported by the filesystem that is not
> greater than the specified time" (utimensat(2)), and one second doesn't
> really hurt is in the scope of a GC..
> 
> note that the 24h grace period is totally independent of this - it comes
> from lazytime and relates to how often changes of timestamps are
> *persisted on disk*. that shouldn't affect us for local storages (since
> we talk to the kernel and always get the updated view anyway, no matter
> if it is in-memory or on-disk. if the PBS system crashes GC is gone as
> well and needs to start over, no risk of inconsistency).
> 
> we might have a problem if the storage uses lazytime internally and
> lives somewhere else
> - GC starts phase1
> - updates a few atimes
> - storage crashes and loses some of the updates while I/O to it blocks
> - storage comes back up
> - GC continues not knowing that some of the atime updates have been lost
>    in the ether
> - GC enters phase2
> - chunk timestamp makes it eligible for deletion, even though it was
>    touched in phase1
> - data loss
> 
> I don't think we can really prevent this other than by ditching the
> whole atime mechanism and instead keep a list of written chunks in each
> backup writer, and generate such a list in phase1, merging them all and
> using that for phase2 (which is really expensive as it scales with the
> number of touched chunks and would require syncing up writers and GC).
> thankfully I'd expect most such failure scenarios to actually abort the
> GC anyway for I/O error reasons..

Maybe one could use `statmount` to determine if the `SB_LAZYTIME` flag 
in `smbuf.sb_flags` is set? But I assume that you meant the issue being 
this not exposed by the filesystem...

> at datastore creation we could do more in-depth checks (including
> "faking" past timestamps and seeing if updates are honored for those if
> they are not honored when doing the simple tests) - if we think we gain
> something from that?
> 
> 0: http://www.mutt.org/doc/manual/#new-mail-formats
> 
>> +        Ok(())
>> +    }
>> +
>>       pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>> @@ -631,8 +688,15 @@ fn test_chunk_store1() {
>>       let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>>           .unwrap()
>>           .unwrap();
>> -    let chunk_store =
>> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
>> +    let chunk_store = ChunkStore::create(
>> +        "test",
>> +        &path,
>> +        user.uid,
>> +        user.gid,
>> +        DatastoreFSyncLevel::None,
>> +        true,
>> +    )
>> +    .unwrap();
>>   
>>       let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>>           .build()
>> @@ -644,8 +708,14 @@ fn test_chunk_store1() {
>>       let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>>       assert!(exists);
>>   
>> -    let chunk_store =
>> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
>> +    let chunk_store = ChunkStore::create(
>> +        "test",
>> +        &path,
>> +        user.uid,
>> +        user.gid,
>> +        DatastoreFSyncLevel::None,
>> +        true,
>> +    );
>>       assert!(chunk_store.is_err());
>>   
>>       if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 75c0c16ab..9b01790cc 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1170,6 +1170,16 @@ impl DataStore {
>>                   upid: Some(upid.to_string()),
>>                   ..Default::default()
>>               };
>> +            let tuning: DatastoreTuning = serde_json::from_value(
>> +                DatastoreTuning::API_SCHEMA
>> +                    .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
>> +            )?;
>> +            if tuning.gc_atime_check.unwrap_or(true) {
>> +                self.inner.chunk_store.check_fs_atime_update()?;
>> +                info!("Filesystem atime update check successful.");
>> +            } else {
>> +                info!("Filesystem atime update check disabled by datastore tuning options.");
>> +            }
>>   
>>               info!("Start GC phase1 (mark used chunks)");
>>   
>> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
>> index fe3260f6d..1cd38f2db 100644
>> --- a/src/api2/config/datastore.rs
>> +++ b/src/api2/config/datastore.rs
>> @@ -119,6 +119,7 @@ pub(crate) fn do_create_datastore(
>>                   backup_user.uid,
>>                   backup_user.gid,
>>                   tuning.sync_level.unwrap_or_default(),
>> +                tuning.gc_atime_check.unwrap_or(true),
>>               )
>>               .map(|_| ())
>>           } else {
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored
  2025-03-03 15:23     ` Christian Ebner
@ 2025-03-03 15:50       ` Fabian Grünbichler
  2025-03-03 16:09         ` Christian Ebner
  0 siblings, 1 reply; 28+ messages in thread
From: Fabian Grünbichler @ 2025-03-03 15:50 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

On March 3, 2025 4:23 pm, Christian Ebner wrote:
> On 3/3/25 14:51, Fabian Grünbichler wrote:
>> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>>> Check if the filesystem backing the chunk store actually updates the
>>> atime to avoid potential data loss in phase 2 of garbage collection
>>> in case the atime update is not honored.
>>>
>>> Perform the check before phase 1 of garbage collection, as well as
>>> on datastore creation. The latter to early detect and disallow
>>> datastore creation on filesystem configurations which otherwise most
>>> likely would lead to data losses.
>>>
>>> Enable the atime update check by default, but allow to opt-out by
>>> setting a datastore tuning parameter flag for backwards compatibility.
>>> This is honored by both, garbage collection and datastore creation.
>>>
>>> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
>>> as test marker, inserted if not present. This all zero-chunk is very
>>> likely anyways for unencrypted backup contents with large all-zero
>>> regions using fixed size chunking (e.g. VMs).
>>>
>>> The check sets the chunk's atime twice, once to set it to 1 second
>>> into the past, and once to set it to now. By this one can detect if
>>> the atime is set at all and if it also set if the filesystem is e.g.
>>> mounted via relatime, which is intended to deferr the atime update.
>> 
>> that's not how relatime works? see below..
>> 
>>>
>>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>> ---
>>> changes since version 1:
>>> - Use all-zero fixed size chunk for atime update test
>>> - Test on datastore creation and before garbage collection phase 1
>>> - fail datastore creation and garbage collection jobs if check fails,
>>>    but keep opt-out
>>>
>>>   pbs-datastore/src/chunk_store.rs | 80 ++++++++++++++++++++++++++++++--
>>>   pbs-datastore/src/datastore.rs   | 10 ++++
>>>   src/api2/config/datastore.rs     |  1 +
>>>   3 files changed, 86 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>>> index 7bdcb0297..be57e3c87 100644
>>> --- a/pbs-datastore/src/chunk_store.rs
>>> +++ b/pbs-datastore/src/chunk_store.rs
>>> @@ -1,6 +1,7 @@
>>>   use std::os::unix::io::AsRawFd;
>>>   use std::path::{Path, PathBuf};
>>>   use std::sync::{Arc, Mutex};
>>> +use std::time::{Duration, SystemTime, UNIX_EPOCH};
>>>   
>>>   use anyhow::{bail, format_err, Error};
>>>   use tracing::info;
>>> @@ -13,6 +14,7 @@ use proxmox_sys::process_locker::{
>>>   };
>>>   use proxmox_worker_task::WorkerTaskContext;
>>>   
>>> +use crate::data_blob::DataChunkBuilder;
>>>   use crate::file_formats::{
>>>       COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>>>   };
>>> @@ -96,6 +98,7 @@ impl ChunkStore {
>>>           uid: nix::unistd::Uid,
>>>           gid: nix::unistd::Gid,
>>>           sync_level: DatastoreFSyncLevel,
>>> +        atime_update_check: bool,
>>>       ) -> Result<Self, Error>
>>>       where
>>>           P: Into<PathBuf>,
>>> @@ -150,7 +153,16 @@ impl ChunkStore {
>>>               }
>>>           }
>>>   
>>> -        Self::open(name, base, sync_level)
>>> +        let chunk_store = Self::open(name, base, sync_level)?;
>>> +        if atime_update_check {
>>> +            chunk_store
>>> +                .check_fs_atime_update()
>>> +                .map(|()| info!("atime update check successful."))?;
>>> +        } else {
>>> +            info!("atime update check skipped.");
>>> +        }
>>> +
>>> +        Ok(chunk_store)
>>>       }
>>>   
>>>       fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
>>> @@ -445,6 +457,51 @@ impl ChunkStore {
>>>           Ok(())
>>>       }
>>>   
>>> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
>>> +    ///
>>> +    /// Sets the atime twice, once into the past by one second, checking that this was honored and
>>> +    /// then to now again, as otherwise it is not guaranteed to also cover time ranges by `relatime`.
>>> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
>>> +    /// the chunk store if not yet present.
>>> +    pub fn check_fs_atime_update(&self) -> Result<(), Error> {
>>> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
>>> +        self.insert_chunk(&zero_chunk, &digest)?;
>>> +
>>> +        let past = SystemTime::now().duration_since(UNIX_EPOCH)? - Duration::from_secs(1);
>>> +        let times: [libc::timespec; 2] = [
>>> +            libc::timespec {
>>> +                tv_sec: past.as_secs().try_into()?,
>>> +                tv_nsec: past.subsec_nanos().into(),
>>> +            },
>>> +            libc::timespec {
>>> +                tv_sec: 0,
>>> +                tv_nsec: UTIME_OMIT,
>>> +            },
>>> +        ];
>> 
>> this sets *atime* one second in the past, while not touching *mtime*.
>> this is not something we ever need? if this fails, something is weird,
>> but it wouldn't mean that atime handling was broken for our use case..
>> 
>>> +
>>> +        use nix::NixPath;
>>> +        let (path, _digest) = self.chunk_path(&digest);
>>> +        path.with_nix_path(|cstr| unsafe {
>>> +            let tmp = libc::utimensat(-1, cstr.as_ptr(), &times[0], libc::AT_SYMLINK_NOFOLLOW);
>>> +            nix::errno::Errno::result(tmp)
>>> +        })??;
>>> +
>>> +        let metadata_past = std::fs::metadata(&path).map_err(Error::from)?;
>>> +        let atime_past = metadata_past.accessed()?;
>>> +        if past != atime_past.duration_since(UNIX_EPOCH)? {
>>> +            bail!("setting atime to the past failed, is atime support enabled on datastore backing filesystem?");
>>> +        }
>>> +
>>> +        self.cond_touch_chunk(&digest, true)?;
>>> +
>>> +        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;
>>> +        let atime_now = metadata_now.accessed()?;
>>> +        if atime_past >= atime_now {
>>> +            bail!("atime update check failed, is atime support enabled on datastore backing filesystem?");
>>> +        }
>> 
>> if something implements explicit timestamp updates to work like relatime
>> does for read-triggered updates, this would still potentially fail -
>> relatime covers checking *mtime* (which we haven't updated above!) first
>> to see if *atime* should be updated when doing reads. it's basically a
>> performance optimization (skip most atime updates for read-heavy work
>> loads) without breaking a very specific use case (that of mutt abusing
>> atime to know when it last read a local mailbox[0], which would break if
>> atime were never updated on reads).
> 
> Thanks a lot for the pointer! Need to dig a bit deeper into how relatime 
> works, as my understanding of it is clearly flawed.
> 
>> I think what we actually want to do at GC time is:
>> 
>> (potentially) insert the chunk
>> stat the chunk
>> touch the chunk (regularly)
>> stat the chunk again
>> compare the timestamps
> 
> That was my initial approach, which failed because of the timestamp 
> granularity and since I didn't want to wait in-between the stat and 
> touch the approach with setting the atime into the past first came to 
> mind, therefore my implementation. But I see that this does not work 
> based on your feedback.

did you actually hit the granularity issue in practice? with which
storage/file system?

because even with relatime or noatime on ext4 a sequence of

stat foobar; touch -a foobar; stat foobar; touch -a foobar; stat foobar

will give me different atimes for each stat. the same is true for ZFS no
matter which atime settings I give the dataset.

>> we now have three cases:
>> 
>> A) the old atime is before the new atime => the storage doesn't discard
>> *all* atime updates and we can proceed
>> B) the old atime and the new atime are identical and atime is *before*
>> mtime and ctime => atime should have been updated even under
>> "relatime-like" semantics, very likely this storage is discarding all
>> atime updates, we should abort with an error
>> C) the old atime and the new atime are identical, but atime is *after*
>> ctime/mtime => we need to check whether we just ran into relatime-like
>> behaviour
>> 
>> C means we either abort as well (if we don't want to support such
>> setups), or we try to detect if setting both mtime and atime works, and
>> force the GC into doing that
>> 
>> we might want to add a "wait a second" between the first stat and the
>> touching to account for filesystems with more granular timestamps
>> (inserting touches, and other tasks might touch at "the same time" as
>> well by accident), since the requested timestamps might be modified to
>> become "the greatest value supported by the filesystem that is not
>> greater than the specified time" (utimensat(2)), and one second doesn't
>> really hurt is in the scope of a GC..
>> 
>> note that the 24h grace period is totally independent of this - it comes
>> from lazytime and relates to how often changes of timestamps are
>> *persisted on disk*. that shouldn't affect us for local storages (since
>> we talk to the kernel and always get the updated view anyway, no matter
>> if it is in-memory or on-disk. if the PBS system crashes GC is gone as
>> well and needs to start over, no risk of inconsistency).
>> 
>> we might have a problem if the storage uses lazytime internally and
>> lives somewhere else
>> - GC starts phase1
>> - updates a few atimes
>> - storage crashes and loses some of the updates while I/O to it blocks
>> - storage comes back up
>> - GC continues not knowing that some of the atime updates have been lost
>>    in the ether
>> - GC enters phase2
>> - chunk timestamp makes it eligible for deletion, even though it was
>>    touched in phase1
>> - data loss
>> 
>> I don't think we can really prevent this other than by ditching the
>> whole atime mechanism and instead keep a list of written chunks in each
>> backup writer, and generate such a list in phase1, merging them all and
>> using that for phase2 (which is really expensive as it scales with the
>> number of touched chunks and would require syncing up writers and GC).
>> thankfully I'd expect most such failure scenarios to actually abort the
>> GC anyway for I/O error reasons..
> 
> Maybe one could use `statmount` to determine if the `SB_LAZYTIME` flag 
> in `smbuf.sb_flags` is set? But I assume that you meant the issue being 
> this not exposed by the filesystem...

the local mount having lazytime doesn't matter (if it does, that means
the local kernel handles the consistency and if the kernel goes
away/crashes, so does the GC). the issue would be some storage appliance
using it (or something with similar semantics) internally that would
cause atime updates to "work" but then disappear later without
interrupting the GC. I don't see how we can protect against that without
switching to a different, more expensive mechanism.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored
  2025-03-03 15:50       ` Fabian Grünbichler
@ 2025-03-03 16:09         ` Christian Ebner
  0 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-03-03 16:09 UTC (permalink / raw)
  To: Fabian Grünbichler, Proxmox Backup Server development discussion

On 3/3/25 16:50, Fabian Grünbichler wrote:
> On March 3, 2025 4:23 pm, Christian Ebner wrote:
>> On 3/3/25 14:51, Fabian Grünbichler wrote:
>>> On February 19, 2025 5:48 pm, Christian Ebner wrote:
>>>> Check if the filesystem backing the chunk store actually updates the
>>>> atime to avoid potential data loss in phase 2 of garbage collection
>>>> in case the atime update is not honored.
>>>>
>>>> Perform the check before phase 1 of garbage collection, as well as
>>>> on datastore creation. The latter to early detect and disallow
>>>> datastore creation on filesystem configurations which otherwise most
>>>> likely would lead to data losses.
>>>>
>>>> Enable the atime update check by default, but allow to opt-out by
>>>> setting a datastore tuning parameter flag for backwards compatibility.
>>>> This is honored by both, garbage collection and datastore creation.
>>>>
>>>> The check uses a 4 MiB fixed sized, unencypted and compressed chunk
>>>> as test marker, inserted if not present. This all zero-chunk is very
>>>> likely anyways for unencrypted backup contents with large all-zero
>>>> regions using fixed size chunking (e.g. VMs).
>>>>
>>>> The check sets the chunk's atime twice, once to set it to 1 second
>>>> into the past, and once to set it to now. By this one can detect if
>>>> the atime is set at all and if it also set if the filesystem is e.g.
>>>> mounted via relatime, which is intended to deferr the atime update.
>>>
>>> that's not how relatime works? see below..
>>>
>>>>
>>>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5982
>>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>>> ---
>>>> changes since version 1:
>>>> - Use all-zero fixed size chunk for atime update test
>>>> - Test on datastore creation and before garbage collection phase 1
>>>> - fail datastore creation and garbage collection jobs if check fails,
>>>>     but keep opt-out
>>>>
>>>>    pbs-datastore/src/chunk_store.rs | 80 ++++++++++++++++++++++++++++++--
>>>>    pbs-datastore/src/datastore.rs   | 10 ++++
>>>>    src/api2/config/datastore.rs     |  1 +
>>>>    3 files changed, 86 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>>>> index 7bdcb0297..be57e3c87 100644
>>>> --- a/pbs-datastore/src/chunk_store.rs
>>>> +++ b/pbs-datastore/src/chunk_store.rs
>>>> @@ -1,6 +1,7 @@
>>>>    use std::os::unix::io::AsRawFd;
>>>>    use std::path::{Path, PathBuf};
>>>>    use std::sync::{Arc, Mutex};
>>>> +use std::time::{Duration, SystemTime, UNIX_EPOCH};
>>>>    
>>>>    use anyhow::{bail, format_err, Error};
>>>>    use tracing::info;
>>>> @@ -13,6 +14,7 @@ use proxmox_sys::process_locker::{
>>>>    };
>>>>    use proxmox_worker_task::WorkerTaskContext;
>>>>    
>>>> +use crate::data_blob::DataChunkBuilder;
>>>>    use crate::file_formats::{
>>>>        COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>>>>    };
>>>> @@ -96,6 +98,7 @@ impl ChunkStore {
>>>>            uid: nix::unistd::Uid,
>>>>            gid: nix::unistd::Gid,
>>>>            sync_level: DatastoreFSyncLevel,
>>>> +        atime_update_check: bool,
>>>>        ) -> Result<Self, Error>
>>>>        where
>>>>            P: Into<PathBuf>,
>>>> @@ -150,7 +153,16 @@ impl ChunkStore {
>>>>                }
>>>>            }
>>>>    
>>>> -        Self::open(name, base, sync_level)
>>>> +        let chunk_store = Self::open(name, base, sync_level)?;
>>>> +        if atime_update_check {
>>>> +            chunk_store
>>>> +                .check_fs_atime_update()
>>>> +                .map(|()| info!("atime update check successful."))?;
>>>> +        } else {
>>>> +            info!("atime update check skipped.");
>>>> +        }
>>>> +
>>>> +        Ok(chunk_store)
>>>>        }
>>>>    
>>>>        fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
>>>> @@ -445,6 +457,51 @@ impl ChunkStore {
>>>>            Ok(())
>>>>        }
>>>>    
>>>> +    /// Check if atime updates are honored by the filesystem backing the chunk store.
>>>> +    ///
>>>> +    /// Sets the atime twice, once into the past by one second, checking that this was honored and
>>>> +    /// then to now again, as otherwise it is not guaranteed to also cover time ranges by `relatime`.
>>>> +    /// Uses a 4 MiB fixed size, compressed but unencrypted chunk to test. The chunk is inserted in
>>>> +    /// the chunk store if not yet present.
>>>> +    pub fn check_fs_atime_update(&self) -> Result<(), Error> {
>>>> +        let (zero_chunk, digest) = DataChunkBuilder::build_zero_chunk(None, 4096 * 1024, true)?;
>>>> +        self.insert_chunk(&zero_chunk, &digest)?;
>>>> +
>>>> +        let past = SystemTime::now().duration_since(UNIX_EPOCH)? - Duration::from_secs(1);
>>>> +        let times: [libc::timespec; 2] = [
>>>> +            libc::timespec {
>>>> +                tv_sec: past.as_secs().try_into()?,
>>>> +                tv_nsec: past.subsec_nanos().into(),
>>>> +            },
>>>> +            libc::timespec {
>>>> +                tv_sec: 0,
>>>> +                tv_nsec: UTIME_OMIT,
>>>> +            },
>>>> +        ];
>>>
>>> this sets *atime* one second in the past, while not touching *mtime*.
>>> this is not something we ever need? if this fails, something is weird,
>>> but it wouldn't mean that atime handling was broken for our use case..
>>>
>>>> +
>>>> +        use nix::NixPath;
>>>> +        let (path, _digest) = self.chunk_path(&digest);
>>>> +        path.with_nix_path(|cstr| unsafe {
>>>> +            let tmp = libc::utimensat(-1, cstr.as_ptr(), &times[0], libc::AT_SYMLINK_NOFOLLOW);
>>>> +            nix::errno::Errno::result(tmp)
>>>> +        })??;
>>>> +
>>>> +        let metadata_past = std::fs::metadata(&path).map_err(Error::from)?;
>>>> +        let atime_past = metadata_past.accessed()?;
>>>> +        if past != atime_past.duration_since(UNIX_EPOCH)? {
>>>> +            bail!("setting atime to the past failed, is atime support enabled on datastore backing filesystem?");
>>>> +        }
>>>> +
>>>> +        self.cond_touch_chunk(&digest, true)?;
>>>> +
>>>> +        let metadata_now = std::fs::metadata(&path).map_err(Error::from)?;
>>>> +        let atime_now = metadata_now.accessed()?;
>>>> +        if atime_past >= atime_now {
>>>> +            bail!("atime update check failed, is atime support enabled on datastore backing filesystem?");
>>>> +        }
>>>
>>> if something implements explicit timestamp updates to work like relatime
>>> does for read-triggered updates, this would still potentially fail -
>>> relatime covers checking *mtime* (which we haven't updated above!) first
>>> to see if *atime* should be updated when doing reads. it's basically a
>>> performance optimization (skip most atime updates for read-heavy work
>>> loads) without breaking a very specific use case (that of mutt abusing
>>> atime to know when it last read a local mailbox[0], which would break if
>>> atime were never updated on reads).
>>
>> Thanks a lot for the pointer! Need to dig a bit deeper into how relatime
>> works, as my understanding of it is clearly flawed.
>>
>>> I think what we actually want to do at GC time is:
>>>
>>> (potentially) insert the chunk
>>> stat the chunk
>>> touch the chunk (regularly)
>>> stat the chunk again
>>> compare the timestamps
>>
>> That was my initial approach, which failed because of the timestamp
>> granularity and since I didn't want to wait in-between the stat and
>> touch the approach with setting the atime into the past first came to
>> mind, therefore my implementation. But I see that this does not work
>> based on your feedback.
> 
> did you actually hit the granularity issue in practice? with which
> storage/file system?

Hmm, had a datastore on ZFS with atime on, which triggered the check for 
me on garbage collection, with an additional delay of 1 second however 
it worked... Might have been an implementation error tough...

> 
> because even with relatime or noatime on ext4 a sequence of
> 
> stat foobar; touch -a foobar; stat foobar; touch -a foobar; stat foobar
> 
> will give me different atimes for each stat. the same is true for ZFS no
> matter which atime settings I give the dataset.
> 
>>> we now have three cases:
>>>
>>> A) the old atime is before the new atime => the storage doesn't discard
>>> *all* atime updates and we can proceed
>>> B) the old atime and the new atime are identical and atime is *before*
>>> mtime and ctime => atime should have been updated even under
>>> "relatime-like" semantics, very likely this storage is discarding all
>>> atime updates, we should abort with an error
>>> C) the old atime and the new atime are identical, but atime is *after*
>>> ctime/mtime => we need to check whether we just ran into relatime-like
>>> behaviour
>>>
>>> C means we either abort as well (if we don't want to support such
>>> setups), or we try to detect if setting both mtime and atime works, and
>>> force the GC into doing that
>>>
>>> we might want to add a "wait a second" between the first stat and the
>>> touching to account for filesystems with more granular timestamps
>>> (inserting touches, and other tasks might touch at "the same time" as
>>> well by accident), since the requested timestamps might be modified to
>>> become "the greatest value supported by the filesystem that is not
>>> greater than the specified time" (utimensat(2)), and one second doesn't
>>> really hurt is in the scope of a GC..
>>>
>>> note that the 24h grace period is totally independent of this - it comes
>>> from lazytime and relates to how often changes of timestamps are
>>> *persisted on disk*. that shouldn't affect us for local storages (since
>>> we talk to the kernel and always get the updated view anyway, no matter
>>> if it is in-memory or on-disk. if the PBS system crashes GC is gone as
>>> well and needs to start over, no risk of inconsistency).
>>>
>>> we might have a problem if the storage uses lazytime internally and
>>> lives somewhere else
>>> - GC starts phase1
>>> - updates a few atimes
>>> - storage crashes and loses some of the updates while I/O to it blocks
>>> - storage comes back up
>>> - GC continues not knowing that some of the atime updates have been lost
>>>     in the ether
>>> - GC enters phase2
>>> - chunk timestamp makes it eligible for deletion, even though it was
>>>     touched in phase1
>>> - data loss
>>>
>>> I don't think we can really prevent this other than by ditching the
>>> whole atime mechanism and instead keep a list of written chunks in each
>>> backup writer, and generate such a list in phase1, merging them all and
>>> using that for phase2 (which is really expensive as it scales with the
>>> number of touched chunks and would require syncing up writers and GC).
>>> thankfully I'd expect most such failure scenarios to actually abort the
>>> GC anyway for I/O error reasons..
>>
>> Maybe one could use `statmount` to determine if the `SB_LAZYTIME` flag
>> in `smbuf.sb_flags` is set? But I assume that you meant the issue being
>> this not exposed by the filesystem...
> 
> the local mount having lazytime doesn't matter (if it does, that means
> the local kernel handles the consistency and if the kernel goes
> away/crashes, so does the GC). the issue would be some storage appliance
> using it (or something with similar semantics) internally that would
> cause atime updates to "work" but then disappear later without
> interrupting the GC. I don't see how we can protect against that without
> switching to a different, more expensive mechanism.

Hmm, okay. But at least one should be able to distinguish local 
filesystems from NAS by checking /proc/mounts output, so not to add this 
penalty to users running datastores on local filesystems.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 5/9] ui: expose atime update check flag in datastore tuning options
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
                   ` (3 preceding siblings ...)
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 6/9] docs: mention GC atime update check for " Christian Ebner
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Allow to edit the atime update check flag via the datastore tuning
options edit window.

Do not expose the flag for datastore creation as it is strongly
discouraged to create datastores on filesystems not correctly handling
atime updates as the garbage collection expects. It is nevertheless
still possible to create a datastore via the cli and pass in the
`--tuning gc-atime-check=false` option.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 www/Utils.js                | 4 ++++
 www/datastore/OptionView.js | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/www/Utils.js b/www/Utils.js
index 2746ef0b5..f782de43c 100644
--- a/www/Utils.js
+++ b/www/Utils.js
@@ -846,6 +846,10 @@ Ext.define('PBS.Utils', {
 	sync = PBS.Utils.tuningOptions['sync-level'][sync ?? '__default__'];
 	options.push(`${gettext('Sync Level')}: ${sync}`);
 
+	let gc_atime_check = tuning['gc-atime-check'];
+	delete tuning['gc-atime-check'];
+	options.push(`${gettext('GC atime Check')}: ${gc_atime_check ?? true}`);
+
 	for (const [k, v] of Object.entries(tuning)) {
 	    options.push(`${k}: ${v}`);
 	}
diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
index e1f38af6f..a3e57b0e7 100644
--- a/www/datastore/OptionView.js
+++ b/www/datastore/OptionView.js
@@ -271,6 +271,14 @@ Ext.define('PBS.Datastore.Options', {
 			    deleteEmpty: true,
 			    value: '__default__',
 			},
+			{
+			    xtype: 'proxmoxcheckbox',
+			    name: 'gc-atime-check',
+			    fieldLabel: gettext('GC atime Update Check'),
+			    defaultValue: 1,
+			    uncheckedValue: 0,
+			    deleteDefaultValue: true,
+			},
 		    ],
 		},
 	    },
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 6/9] docs: mention GC atime update check for tuning options
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
                   ` (4 preceding siblings ...)
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 5/9] ui: expose atime update check flag in datastore tuning options Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 7/9] datastore: conditionally use custom GC wait period if set Christian Ebner
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Document the gc-atime-check flag and describe the behavior it
controls, by adding it as additional bullet point to the documented
datastore tuning options.

This also fixes the intendation for the cli example how to set the
sync level, to make it clear that still belongs to the previous
bullet point.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 docs/storage.rst | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/storage.rst b/docs/storage.rst
index 490302955..9e2ff4a5e 100644
--- a/docs/storage.rst
+++ b/docs/storage.rst
@@ -435,9 +435,16 @@ There are some tuning related options for the datastore that are more advanced:
 
   This can be set with:
 
-.. code-block:: console
+  .. code-block:: console
+
+    # proxmox-backup-manager datastore update <storename> --tuning 'sync-level=filesystem'
 
-  # proxmox-backup-manager datastore update <storename> --tuning 'sync-level=filesystem'
+* ``gc-atime-check``: Datastore GC atime update check:
+  You can explicitly `enable` or `disable` the atime update check performed on
+  datastore creation and garbage collection. This checks if atime updates are
+  handled as expected by garbage collection and therefore avoids the risk of
+  data loss by unexpected filesystem behaviour. It is recommended to set this to
+  enabled, which is also the default value.
 
 If you want to set multiple tuning options simultaneously, you can separate them
 with a comma, like this:
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 7/9] datastore: conditionally use custom GC wait period if set
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
                   ` (5 preceding siblings ...)
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 6/9] docs: mention GC atime update check for " Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 8/9] ui: expose GC wait period in datastore tuning option Christian Ebner
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Use the user configured wait period over the default 24h 5m wait
period if explicitly set, but only if the atime check was enabled
and succeeded. If the atime update check was not enabled, fallback
to use the default, as otherwise there is potential for data loss
during phase 2 of garbage collection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 pbs-datastore/src/chunk_store.rs | 10 +++++++++-
 pbs-datastore/src/datastore.rs   | 15 ++++++++++++++-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index be57e3c87..893d243aa 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -369,6 +369,7 @@ impl ChunkStore {
         &self,
         oldest_writer: i64,
         phase1_start_time: i64,
+        wait_period: Option<usize>,
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
     ) -> Result<(), Error> {
@@ -378,7 +379,14 @@ impl ChunkStore {
         use nix::sys::stat::fstatat;
         use nix::unistd::{unlinkat, UnlinkatFlags};
 
-        let mut min_atime = phase1_start_time - 3600 * 24; // at least 24h (see mount option relatime)
+        let mut min_atime = if let Some(wait_period) = wait_period {
+            // account for the 5 min safety gap subtracted below,
+            // wait period is limited to a minimum of 5 min by api schema
+            phase1_start_time + 300 - wait_period as i64 * 60
+        } else {
+            // at least 24h (see mount option relatime)
+            phase1_start_time - 3600 * 24
+        };
 
         if oldest_writer < min_atime {
             min_atime = oldest_writer;
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 9b01790cc..ee38e10bb 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1174,12 +1174,24 @@ impl DataStore {
                 DatastoreTuning::API_SCHEMA
                     .parse_property_string(gc_store_config.tuning.as_deref().unwrap_or(""))?,
             )?;
-            if tuning.gc_atime_check.unwrap_or(true) {
+            let gc_atime_check = tuning.gc_atime_check.unwrap_or(true);
+            if gc_atime_check {
                 self.inner.chunk_store.check_fs_atime_update()?;
                 info!("Filesystem atime update check successful.");
             } else {
                 info!("Filesystem atime update check disabled by datastore tuning options.");
             }
+            let mut wait_period = None;
+            if let Some(period) = tuning.gc_wait_period {
+                if gc_atime_check {
+                    info!("Using GC wait period of {period}m.");
+                    wait_period = Some(period);
+                } else {
+                    warn!("Ignoring GC wait period of {period}m, because atime check is disabled.");
+                }
+            } else {
+                info!("Using default GC wait period of 24h 5m");
+            }
 
             info!("Start GC phase1 (mark used chunks)");
 
@@ -1189,6 +1201,7 @@ impl DataStore {
             self.inner.chunk_store.sweep_unused_chunks(
                 oldest_writer,
                 phase1_start_time,
+                wait_period,
                 &mut gc_status,
                 worker,
             )?;
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 8/9] ui: expose GC wait period in datastore tuning option
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
                   ` (6 preceding siblings ...)
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 7/9] datastore: conditionally use custom GC wait period if set Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 9/9] docs: mention gc-wait-period option as datastore tuning parameter Christian Ebner
  2025-03-04 18:38 ` [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
  9 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Allows to set the wait period for phase 2 of garbage collection in
the datastores tuning parameters. This value changes the time after
which a chunk is not considered in use anymore if it falls outside
of the cutoff window.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 www/Utils.js                | 5 +++++
 www/datastore/OptionView.js | 9 +++++++++
 2 files changed, 14 insertions(+)

diff --git a/www/Utils.js b/www/Utils.js
index f782de43c..d85b45e85 100644
--- a/www/Utils.js
+++ b/www/Utils.js
@@ -850,6 +850,11 @@ Ext.define('PBS.Utils', {
 	delete tuning['gc-atime-check'];
 	options.push(`${gettext('GC atime Check')}: ${gc_atime_check ?? true}`);
 
+	let gc_wait_period = tuning['gc-wait-period'];
+	delete tuning['gc-wait-period'];
+	gc_wait_period = gc_wait_period ?? '1445';
+	options.push(`${gettext('GC Wait Period')}: ${gc_wait_period}m`);
+
 	for (const [k, v] of Object.entries(tuning)) {
 	    options.push(`${k}: ${v}`);
 	}
diff --git a/www/datastore/OptionView.js b/www/datastore/OptionView.js
index a3e57b0e7..7dea0b37f 100644
--- a/www/datastore/OptionView.js
+++ b/www/datastore/OptionView.js
@@ -279,6 +279,15 @@ Ext.define('PBS.Datastore.Options', {
 			    uncheckedValue: 0,
 			    deleteDefaultValue: true,
 			},
+			{
+			    xtype: 'proxmoxintegerfield',
+			    emptyText: Proxmox.Utils.defaultText,
+			    name: 'gc-wait-period',
+			    minValue: 5,
+			    maxValue: 1445,
+			    fieldLabel: gettext('GC Wait Period (min)'),
+			    deleteEmpty: true,
+			},
 		    ],
 		},
 	    },
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 9/9] docs: mention gc-wait-period option as datastore tuning parameter
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
                   ` (7 preceding siblings ...)
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 8/9] ui: expose GC wait period in datastore tuning option Christian Ebner
@ 2025-02-19 16:48 ` Christian Ebner
  2025-03-04 18:38 ` [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
  9 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-02-19 16:48 UTC (permalink / raw)
  To: pbs-devel

Document the gc-wait-period option and describe the behavior it
controls, by adding it as additional bullet point to the documented
datastore tuning options.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 docs/storage.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/storage.rst b/docs/storage.rst
index 9e2ff4a5e..9f7039d3f 100644
--- a/docs/storage.rst
+++ b/docs/storage.rst
@@ -446,6 +446,12 @@ There are some tuning related options for the datastore that are more advanced:
   data loss by unexpected filesystem behaviour. It is recommended to set this to
   enabled, which is also the default value.
 
+* ``gc-wait-period``: Datastore GC wait period:
+  This allows to set the wait period for which a chunk is still considered
+  in-use during phase 2 of garbage collection (given no older writers). If the
+  ``atime`` of the chunk is outside the range, it will be removed. This option
+  is only used in combination with the ``gc-atime-check`` being enabled.
+
 If you want to set multiple tuning options simultaneously, you can separate them
 with a comma, like this:
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored
  2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
                   ` (8 preceding siblings ...)
  2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 9/9] docs: mention gc-wait-period option as datastore tuning parameter Christian Ebner
@ 2025-03-04 18:38 ` Christian Ebner
  9 siblings, 0 replies; 28+ messages in thread
From: Christian Ebner @ 2025-03-04 18:38 UTC (permalink / raw)
  To: pbs-devel

superseded-by version 3:
https://lore.proxmox.com/pbs-devel/20250304183546.461918-1-c.ebner@proxmox.com/T/


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-03-05  8:12 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-19 16:48 [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 1/2] pbs api types: add check garbage collection atime updates flag Christian Ebner
2025-03-03 12:58   ` Fabian Grünbichler
2025-03-03 13:20     ` Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox 2/2] pbs api types: add option to set GC chunk cleanup wait period Christian Ebner
2025-03-03 12:58   ` Fabian Grünbichler
2025-03-03 13:30     ` Christian Ebner
2025-03-03 15:30       ` Fabian Grünbichler
2025-03-04 16:01     ` Thomas Lamprecht
2025-03-04 16:37       ` Christian Ebner
2025-03-04 16:49         ` Thomas Lamprecht
2025-03-04 17:03           ` Christian Ebner
2025-03-05  7:57           ` Fabian Grünbichler
2025-03-05  8:12           ` Fabian Grünbichler
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 3/9] datastore: move utimensat() related constants to module scope Christian Ebner
2025-03-03 13:03   ` Fabian Grünbichler
2025-03-03 13:32     ` Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored Christian Ebner
2025-03-03 13:51   ` Fabian Grünbichler
2025-03-03 15:23     ` Christian Ebner
2025-03-03 15:50       ` Fabian Grünbichler
2025-03-03 16:09         ` Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 5/9] ui: expose atime update check flag in datastore tuning options Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 6/9] docs: mention GC atime update check for " Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 7/9] datastore: conditionally use custom GC wait period if set Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 8/9] ui: expose GC wait period in datastore tuning option Christian Ebner
2025-02-19 16:48 ` [pbs-devel] [PATCH proxmox-backup 9/9] docs: mention gc-wait-period option as datastore tuning parameter Christian Ebner
2025-03-04 18:38 ` [pbs-devel] [PATCH proxmox proxmox-backup 0/9] fix #5892: check atime update is honored Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal