public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match
@ 2022-05-09 10:40 Dominik Csapak
  2022-05-09 11:34 ` Fabian Grünbichler
  2022-05-09 11:34 ` Thomas Lamprecht
  0 siblings, 2 replies; 4+ messages in thread
From: Dominik Csapak @ 2022-05-09 10:40 UTC (permalink / raw)
  To: pbs-devel

if the on-disk size of a chunk is not correct, write it again when
inserting and log a warning.

This is currently possible if PBS crashes, but the rename of the chunk
was flushed to disk, when the actual data was not.

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 8d7df513..93f56e8b 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -458,17 +458,29 @@ impl ChunkStore {
 
         let lock = self.mutex.lock();
 
+        let raw_data = chunk.raw_data();
+        let encoded_size = raw_data.len() as u64;
+
         if let Ok(metadata) = std::fs::metadata(&chunk_path) {
-            if metadata.is_file() {
-                self.touch_chunk(digest)?;
-                return Ok((true, metadata.len()));
-            } else {
+            if !metadata.is_file() {
                 bail!(
                     "Got unexpected file type on store '{}' for chunk {}",
                     self.name,
                     digest_str
                 );
             }
+            let new_len = metadata.len();
+            if encoded_size == new_len {
+                self.touch_chunk(digest)?;
+                return Ok((true, new_len));
+            } else {
+                log::warn!(
+                    "chunk size mismatch on insert for {}: old {} - new {}",
+                    digest_str,
+                    encoded_size,
+                    new_len
+                );
+            }
         }
 
         let mut tmp_path = chunk_path.clone();
@@ -483,9 +495,6 @@ impl ChunkStore {
             )
         })?;
 
-        let raw_data = chunk.raw_data();
-        let encoded_size = raw_data.len() as u64;
-
         file.write_all(raw_data).map_err(|err| {
             format_err!(
                 "writing temporary chunk on store '{}' failed for {} - {}",
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match
  2022-05-09 10:40 [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match Dominik Csapak
@ 2022-05-09 11:34 ` Fabian Grünbichler
  2022-05-09 11:34 ` Thomas Lamprecht
  1 sibling, 0 replies; 4+ messages in thread
From: Fabian Grünbichler @ 2022-05-09 11:34 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On May 9, 2022 12:40 pm, Dominik Csapak wrote:
> if the on-disk size of a chunk is not correct, write it again when
> inserting and log a warning.
> 
> This is currently possible if PBS crashes, but the rename of the chunk
> was flushed to disk, when the actual data was not.
> 
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 8d7df513..93f56e8b 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -458,17 +458,29 @@ impl ChunkStore {
>  
>          let lock = self.mutex.lock();
>  
> +        let raw_data = chunk.raw_data();
> +        let encoded_size = raw_data.len() as u64;
> +
>          if let Ok(metadata) = std::fs::metadata(&chunk_path) {
> -            if metadata.is_file() {
> -                self.touch_chunk(digest)?;
> -                return Ok((true, metadata.len()));
> -            } else {
> +            if !metadata.is_file() {
>                  bail!(
>                      "Got unexpected file type on store '{}' for chunk {}",
>                      self.name,
>                      digest_str
>                  );
>              }
> +            let new_len = metadata.len();
> +            if encoded_size == new_len {
> +                self.touch_chunk(digest)?;
> +                return Ok((true, new_len));
> +            } else {
> +                log::warn!(
> +                    "chunk size mismatch on insert for {}: old {} - new {}",
> +                    digest_str,
> +                    encoded_size,
> +                    new_len
> +                );
> +            }

I think I'd even go one step further here:

A chunk file exists but is empty => overwrite with new content
B chunk file exists but length doesn't match => bail out
C chunk file exists and length matches => treat as duplicate

A is to handle the 'crash/.., rename went through but writing chunk data 
didn't' case - we know an empty chunk file is never valid, so we can 
treat it as garbage and overwrite.

C is like before - we assume a file that is there and has the correct 
length can be treated as duplicate.

but B is IMHO worthwhile to differentiate from A - we don't have a way 
where this should be possible unless
- the file was truncated somehow, which seems unlikely unless done 
  manually (or maybe there is a way this could happen?)
- the uploaded chunk got logically truncated/.. somehow (bug 
  client-side, on-wire truncation would obviously break in other 
  fashions earlier)
- someone is actively trying to upload a broken (encrypted) chunk to 
  overwrite an existing valid one
- someone tries to fix the previous two instances (current on-disk chunk 
  was maliciously/buggily placed there)

in any case we can't tell for sure whether the incoming or the on-disk 
copy is "more correct" unless we do a full verify (only possible for 
plain-chunks, encrypted are out of luck since the CRC is meaningless to 
verify actual contents in cases 3/4).

so it seems to me it's safer to say "let's abort the backup and tell the 
user the chunk store is corrupted", because overwriting might break 
backups if the current client is bad and the on-disk chunk is correct, 
and not overwriting has the same problem in the inverse scenario. 
alternatively we could of course say existing on-disk chunk is corrupt, 
mark accordingly, and treat the incoming copy as correct since it's more 
recent (allows the backup to proceed according to the client's / user's 
wishes, even though it might be technically wrong).

>          }
>  
>          let mut tmp_path = chunk_path.clone();
> @@ -483,9 +495,6 @@ impl ChunkStore {
>              )
>          })?;
>  
> -        let raw_data = chunk.raw_data();
> -        let encoded_size = raw_data.len() as u64;
> -
>          file.write_all(raw_data).map_err(|err| {
>              format_err!(
>                  "writing temporary chunk on store '{}' failed for {} - {}",
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match
  2022-05-09 10:40 [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match Dominik Csapak
  2022-05-09 11:34 ` Fabian Grünbichler
@ 2022-05-09 11:34 ` Thomas Lamprecht
  2022-05-09 11:51   ` Dominik Csapak
  1 sibling, 1 reply; 4+ messages in thread
From: Thomas Lamprecht @ 2022-05-09 11:34 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

On 09/05/2022 12:40, Dominik Csapak wrote:
> if the on-disk size of a chunk is not correct, write it again when
> inserting and log a warning.
> 
> This is currently possible if PBS crashes, but the rename of the chunk
> was flushed to disk, when the actual data was not.

could be also interesting to note here that we basically got all data
required to do that already anyway. 

And I'd think that a verify would catch this too and rename it to .bad, albeit
that can naturally be to late if one is unlucky, so still good to do. 

small nit inline (can be probably just fixed up on apply)

> 
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 8d7df513..93f56e8b 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -458,17 +458,29 @@ impl ChunkStore {
>  
>          let lock = self.mutex.lock();
>  
> +        let raw_data = chunk.raw_data();
> +        let encoded_size = raw_data.len() as u64
> +
>          if let Ok(metadata) = std::fs::metadata(&chunk_path) {
> -            if metadata.is_file() {
> -                self.touch_chunk(digest)?;
> -                return Ok((true, metadata.len()));
> -            } else {
> +            if !metadata.is_file() {
>                  bail!(
>                      "Got unexpected file type on store '{}' for chunk {}",
>                      self.name,
>                      digest_str
>                  );
>              }
> +            let new_len = metadata.len();
> +            if encoded_size == new_len {
> +                self.touch_chunk(digest)?;
> +                return Ok((true, new_len));
> +            } else {
> +                log::warn!(
> +                    "chunk size mismatch on insert for {}: old {} - new {}",

fyi: you can now use variable names directly in format strings:

"chunk size mismatch on insert for {digest_str}: old {encoded_size} - new {new_len}",

nit: why is one named a "size" and one a "len", if they're the same thing it'd be
nice to have it consistent.

> +                    digest_str,
> +                    encoded_size,
> +                    new_len
> +                );
> +            }
>          }
>  
>          let mut tmp_path = chunk_path.clone();
> @@ -483,9 +495,6 @@ impl ChunkStore {
>              )
>          })?;
>  
> -        let raw_data = chunk.raw_data();
> -        let encoded_size = raw_data.len() as u64;
> -
>          file.write_all(raw_data).map_err(|err| {
>              format_err!(
>                  "writing temporary chunk on store '{}' failed for {} - {}",





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match
  2022-05-09 11:34 ` Thomas Lamprecht
@ 2022-05-09 11:51   ` Dominik Csapak
  0 siblings, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2022-05-09 11:51 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 5/9/22 13:34, Thomas Lamprecht wrote:
> On 09/05/2022 12:40, Dominik Csapak wrote:
>> if the on-disk size of a chunk is not correct, write it again when
>> inserting and log a warning.
>>
>> This is currently possible if PBS crashes, but the rename of the chunk
>> was flushed to disk, when the actual data was not.
> 
> could be also interesting to note here that we basically got all data
> required to do that already anyway.

what exactly do you mean here? just adding

'since we already have the complete chunk data, we are able to overwrite it'
(or similar?)

> 
> And I'd think that a verify would catch this too and rename it to .bad, albeit
> that can naturally be to late if one is unlucky, so still good to do.

yes a verify will trigger that, but as you said, that can be too late ;)

also any input on @fabians suggestion to bail out when the old_size != 0 but
!= new_size?

> 
> small nit inline (can be probably just fixed up on apply)
> 
>>
>> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>>   pbs-datastore/src/chunk_store.rs | 23 ++++++++++++++++-------
>>   1 file changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 8d7df513..93f56e8b 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -458,17 +458,29 @@ impl ChunkStore {
>>   
>>           let lock = self.mutex.lock();
>>   
>> +        let raw_data = chunk.raw_data();
>> +        let encoded_size = raw_data.len() as u64
>> +
>>           if let Ok(metadata) = std::fs::metadata(&chunk_path) {
>> -            if metadata.is_file() {
>> -                self.touch_chunk(digest)?;
>> -                return Ok((true, metadata.len()));
>> -            } else {
>> +            if !metadata.is_file() {
>>                   bail!(
>>                       "Got unexpected file type on store '{}' for chunk {}",
>>                       self.name,
>>                       digest_str
>>                   );
>>               }
>> +            let new_len = metadata.len();
>> +            if encoded_size == new_len {
>> +                self.touch_chunk(digest)?;
>> +                return Ok((true, new_len));
>> +            } else {
>> +                log::warn!(
>> +                    "chunk size mismatch on insert for {}: old {} - new {}",
> 
> fyi: you can now use variable names directly in format strings:
> 
> "chunk size mismatch on insert for {digest_str}: old {encoded_size} - new {new_len}",
> 
> nit: why is one named a "size" and one a "len", if they're the same thing it'd be
> nice to have it consistent.
> 

argh... 'new_len' is actually the old one. so it'll be
'encoded_size' and 'old_size'..

thanks for making me look again ^^

>> +                    digest_str,
>> +                    encoded_size,
>> +                    new_len
>> +                );
>> +            }
>>           }
>>   
>>           let mut tmp_path = chunk_path.clone();
>> @@ -483,9 +495,6 @@ impl ChunkStore {
>>               )
>>           })?;
>>   
>> -        let raw_data = chunk.raw_data();
>> -        let encoded_size = raw_data.len() as u64;
>> -
>>           file.write_all(raw_data).map_err(|err| {
>>               format_err!(
>>                   "writing temporary chunk on store '{}' failed for {} - {}",
> 





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-09 11:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09 10:40 [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match Dominik Csapak
2022-05-09 11:34 ` Fabian Grünbichler
2022-05-09 11:34 ` Thomas Lamprecht
2022-05-09 11:51   ` Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal