public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup] chunk store: handle insertion edge cases
Date: Fri, 31 Mar 2023 10:43:45 +0200	[thread overview]
Message-ID: <20230331084345.3971666-1-f.gruenbichler@proxmox.com> (raw)

these were previously called out in a comment, but should now be handled (as
much as they can be).

the performance impact shouldn't be too bad, since we only look at the magic 8
bytes at the start of the existing chunk (we already did a stat on it, so that
might even be prefetched already by storage), and only if there is a size
mismatch and encryption is enabled.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    we could verify the CRC when deciding between existing and incoming encrypted
    chunk, but that would require reading the full chunk, and that would be quite
    the load if we ever upgrade ZSTD or change its parameters and the new version
    is substantially better or worse at compressing.. the CRC is verified on
    every regular load (such as verify, sync, restore) anyway.
    
    we cannot decide which of the two potentially invalid encrypted chunks to keep
    based on the size and compression status: uncompressed chunks should always
    be bigger than compressed ones, but both the size and the compression bit is
    100% under a potential attacker's control anyhow, so we cannot prevent them
    from sending us rather small chunks that we still need to discard out of
    caution, even if they are smaller than the existing one.

 pbs-datastore/src/chunk_store.rs | 36 ++++++++++++++++++++++++++------
 pbs-datastore/src/data_blob.rs   |  6 ++++++
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 1944ae00..ff1229bc 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -5,6 +5,7 @@ use std::sync::{Arc, Mutex};
 use anyhow::{bail, format_err, Error};
 
 use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
+use proxmox_io::ReadExt;
 use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
 use proxmox_sys::process_locker::{
     ProcessLockExclusiveGuard, ProcessLockSharedGuard, ProcessLocker,
@@ -12,6 +13,9 @@ use proxmox_sys::process_locker::{
 use proxmox_sys::task_log;
 use proxmox_sys::WorkerTaskContext;
 
+use crate::file_formats::{
+    COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
+};
 use crate::DataBlob;
 
 /// File system based chunk store
@@ -460,13 +464,33 @@ impl ChunkStore {
                 return Ok((true, old_size));
             } else if old_size == 0 {
                 log::warn!("found empty chunk '{digest_str}' in store {name}, overwriting");
+            } else if chunk.is_encrypted() {
+                // incoming chunk is encrypted, possible attack or hash collision!
+                let mut existing_file = std::fs::File::open(&chunk_path)?;
+                let magic = existing_file.read_exact_allocated(8)?;
+
+                // going from unencrypted to encrypted can never be right, since the digest
+                // includes data derived from the encryption key
+                if magic == UNCOMPRESSED_BLOB_MAGIC_1_0 || magic == COMPRESSED_BLOB_MAGIC_1_0 {
+                    bail!("Overwriting unencrypted chunk '{digest_str}' on store '{name}' with encrypted chunk with same digest not allowed!");
+                }
+
+                // if both chunks are uncompressed and encrypted and have the same digest, but
+                // their sizes are different, one of them *must* be invalid
+                if magic == ENCRYPTED_BLOB_MAGIC_1_0 && !chunk.is_compressed() {
+                    bail!("Overwriting existing (encrypted) chunk '{digest_str}' on store '{name}' is not allowed!")
+                }
+
+                // we can't tell if either chunk is invalid if either or both of them are
+                // compressed, the size mismatch could be caused by different zstd versions
+                // so let's keep the one that was uploaded first, bit-rot is hopefully detected by
+                // verification at some point..
+                return Ok((true, old_size));
+            } else if old_size < encoded_size {
+                log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is smaller, discarding uploaded one.");
+                return Ok((true, old_size));
             } else {
-                // other sizes can happen in legitimate and illegitimate ways:
-                //  - illegitimate: encryped chunks and bad actor client
-                //  - legitimate: same chunk but newer zstd version (or compression level) can
-                //    compress it better (or worse) so the
-                //  Ideally we could take the actual smaller chunk so that improved zstd tech gets
-                //  leveraged, but we could only allow to do that for un-encrypted ones.
+                log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is bigger, replacing with uploaded one.");
             }
         }
 
diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
index 9c47bd45..f37c7a34 100644
--- a/pbs-datastore/src/data_blob.rs
+++ b/pbs-datastore/src/data_blob.rs
@@ -295,6 +295,12 @@ impl DataBlob {
         magic == &ENCR_COMPR_BLOB_MAGIC_1_0 || magic == &ENCRYPTED_BLOB_MAGIC_1_0
     }
 
+    /// Returns if chunk is compressed
+    pub fn is_compressed(&self) -> bool {
+        let magic = self.magic();
+        magic == &ENCR_COMPR_BLOB_MAGIC_1_0 || magic == &COMPRESSED_BLOB_MAGIC_1_0
+    }
+
     /// Verify digest and data length for unencrypted chunks.
     ///
     /// To do that, we need to decompress data first. Please note that
-- 
2.30.2





             reply	other threads:[~2023-03-31  8:43 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-31  8:43 Fabian Grünbichler [this message]
2023-04-06  7:38 ` [pbs-devel] applied: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230331084345.3971666-1-f.gruenbichler@proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal