public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup] chunk_store: insert_chunk: write chunk again if sizes don't match
Date: Mon, 09 May 2022 13:34:11 +0200	[thread overview]
Message-ID: <1652095455.qd8d3f2zyo.astroid@nora.none> (raw)
In-Reply-To: <20220509104030.1943794-1-d.csapak@proxmox.com>

On May 9, 2022 12:40 pm, Dominik Csapak wrote:
> if the on-disk size of a chunk is not correct, write it again when
> inserting and log a warning.
> 
> This is currently possible if PBS crashes, but the rename of the chunk
> was flushed to disk, when the actual data was not.
> 
> Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 8d7df513..93f56e8b 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -458,17 +458,29 @@ impl ChunkStore {
>  
>          let lock = self.mutex.lock();
>  
> +        let raw_data = chunk.raw_data();
> +        let encoded_size = raw_data.len() as u64;
> +
>          if let Ok(metadata) = std::fs::metadata(&chunk_path) {
> -            if metadata.is_file() {
> -                self.touch_chunk(digest)?;
> -                return Ok((true, metadata.len()));
> -            } else {
> +            if !metadata.is_file() {
>                  bail!(
>                      "Got unexpected file type on store '{}' for chunk {}",
>                      self.name,
>                      digest_str
>                  );
>              }
> +            let new_len = metadata.len();
> +            if encoded_size == new_len {
> +                self.touch_chunk(digest)?;
> +                return Ok((true, new_len));
> +            } else {
> +                log::warn!(
> +                    "chunk size mismatch on insert for {}: old {} - new {}",
> +                    digest_str,
> +                    encoded_size,
> +                    new_len
> +                );
> +            }

I think I'd even go one step further here:

A chunk file exists but is empty => overwrite with new content
B chunk file exists but length doesn't match => bail out
C chunk file exists and length matches => treat as duplicate

A is to handle the 'crash/.., rename went through but writing chunk data 
didn't' case - we know an empty chunk file is never valid, so we can 
treat it as garbage and overwrite.

C is like before - we assume a file that is there and has the correct 
length can be treated as duplicate.

but B is IMHO worthwhile to differentiate from A - we don't have a way 
where this should be possible unless
- the file was truncated somehow, which seems unlikely unless done 
  manually (or maybe there is a way this could happen?)
- the uploaded chunk got logically truncated/.. somehow (bug 
  client-side, on-wire truncation would obviously break in other 
  fashions earlier)
- someone is actively trying to upload a broken (encrypted) chunk to 
  overwrite an existing valid one
- someone tries to fix the previous two instances (current on-disk chunk 
  was maliciously/buggily placed there)

in any case we can't tell for sure whether the incoming or the on-disk 
copy is "more correct" unless we do a full verify (only possible for 
plain-chunks, encrypted are out of luck since the CRC is meaningless to 
verify actual contents in cases 3/4).

so it seems to me it's safer to say "let's abort the backup and tell the 
user the chunk store is corrupted", because overwriting might break 
backups if the current client is bad and the on-disk chunk is correct, 
and not overwriting has the same problem in the inverse scenario. 
alternatively we could of course say existing on-disk chunk is corrupt, 
mark accordingly, and treat the incoming copy as correct since it's more 
recent (allows the backup to proceed according to the client's / user's 
wishes, even though it might be technically wrong).

>          }
>  
>          let mut tmp_path = chunk_path.clone();
> @@ -483,9 +495,6 @@ impl ChunkStore {
>              )
>          })?;
>  
> -        let raw_data = chunk.raw_data();
> -        let encoded_size = raw_data.len() as u64;
> -
>          file.write_all(raw_data).map_err(|err| {
>              format_err!(
>                  "writing temporary chunk on store '{}' failed for {} - {}",
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 




  reply	other threads:[~2022-05-09 11:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09 10:40 Dominik Csapak
2022-05-09 11:34 ` Fabian Grünbichler [this message]
2022-05-09 11:34 ` Thomas Lamprecht
2022-05-09 11:51   ` Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1652095455.qd8d3f2zyo.astroid@nora.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal