all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 3/5] datastore: implement sync-level tuning for datastores
Date: Mon, 23 May 2022 09:13:19 +0200	[thread overview]
Message-ID: <1653289622.043mjuz08t.astroid@nora.none> (raw)
In-Reply-To: <20220520124228.3368960-4-d.csapak@proxmox.com>

On May 20, 2022 2:42 pm, Dominik Csapak wrote:
> currently, we don't (f)sync on chunk insertion (or at any point after
> that), which can lead to broken chunks in case of e.g. an unexpected
> powerloss. To fix that, offer a tuning option for datastores that
> controls the level of syncs it does:
> 
> * None (default): same as current state, no (f)syncs done at any point
> * Filesystem: at the end of a backup, the datastore issues
>   a syncfs(2) to the filesystem of the datastore
> * File: issues an fsync on each chunk as they get inserted
>   (using our 'replace_file' helper) and a fsync on the directory handle
> 
> a small benchmark showed the following (times in mm:ss):
> setup: virtual pbs, 4 cores, 8GiB memory, ext4 on spinner
> 
> size                none    filesystem  file
> 2GiB (fits in ram)   00:13   0:41        01:00
> 33GiB                05:21   05:31       13:45
> 
> so if the backup fits in memory, there is a large difference between all
> of the modes (expected), but as soon as it exceeds the memory size,
> the difference between not syncing and syncing the fs at the end becomes
> much smaller.
> 
> i also tested on an nvme, but there the syncs basically made no difference
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> i retested with the 'fsync on dirhandle' part, but the result
> was not significantly different than before, neither on nvme nor on
> a spinning disk...
> 
>  pbs-api-types/src/datastore.rs   | 32 +++++++++++++++++
>  pbs-datastore/src/chunk_store.rs | 60 ++++++++++++++++++++++++++------
>  pbs-datastore/src/datastore.rs   | 37 +++++++++++++++++---
>  src/api2/backup/environment.rs   |  2 ++
>  src/api2/config/datastore.rs     |  9 +++--
>  5 files changed, 124 insertions(+), 16 deletions(-)
> 
> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
> index e2bf70aa..6563aa73 100644
> --- a/pbs-api-types/src/datastore.rs
> +++ b/pbs-api-types/src/datastore.rs
> @@ -214,6 +214,37 @@ pub enum ChunkOrder {
>      Inode,
>  }
>  
> +#[api]
> +#[derive(PartialEq, Serialize, Deserialize)]
> +#[serde(rename_all = "lowercase")]
> +/// The level of syncing that is done when writing into a datastore.
> +pub enum DatastoreFSyncLevel {
> +    /// No special fsync or syncfs calls are triggered. The system default dirty write back
> +    /// mechanism ensures that data gets is flushed eventually via the `dirty_writeback_centisecs`
> +    /// and `dirty_expire_centisecs` kernel sysctls, defaulting to ~ 30s.
> +    ///
> +    /// This mode provides generally the best performance, as all write back can happen async,
> +    /// which reduces IO pressure.
> +    /// But it may cause losing data on powerloss or system crash without any uninterruptible power supply.
> +    None,
> +    /// Triggers a fsync after writing any chunk on the datastore. While this can slow down
> +    /// backups significantly, depending on the underlying file system and storage used, it
> +    /// will ensure fine-grained consistency. But in practice there are no benefits over the
> +    /// file system level sync, so you should prefer that one, as on most systems the file level
> +    /// one is slower and causes more IO pressure compared to the file system level one.

I am not sure whether that "in practice" statement really holds
- we only tested the exact failure case on a few filesystems, there 
  might be ones in use out there where a powerloss can also lead to a 
  truncated chunk, not only an empty/missing one. granted, both will be 
  detected on the next verification, but only the latter will be 
  automatically cleaned up by a subsequent backup task that uploads this 
  chunk..
- the FS underlying the datastore might be used for many datastores, or 
  even other, busy, non-datastore usage. not an ideal setup, but there 
  might be $reasons. in this case, syncfs might have a much bigger 
  negative effect (because of syncing out other, unrelated I/O) than 
  fsync.
- not sure what effect syncfs has if a datastore is really busy (as in, 
  has tons of basically no-op backups over a short period of time)

I'd rather mark 'Filesystem' as a good compromise, and the 'File' on as 
most consistent.

> +    File,
> +    /// Trigger a filesystem wide sync after all backup data got written but before finishing the
> +    /// task. This allows that every finished backup is fully written back to storage
> +    /// while reducing the impact most file systems in contrast to the file level sync.

missing 'on'? (reducing the impact *on* most file systems)?

> +    Filesystem,
> +}
> +
> +impl Default for DatastoreFSyncLevel {
> +    fn default() -> Self {
> +        DatastoreFSyncLevel::None
> +    }
> +}
> +
>  #[api(
>      properties: {
>          "chunk-order": {
> @@ -228,6 +259,7 @@ pub enum ChunkOrder {
>  pub struct DatastoreTuning {
>      /// Iterate chunks in this order
>      pub chunk_order: Option<ChunkOrder>,
> +    pub sync_level: Option<DatastoreFSyncLevel>,
>  }
>  
>  pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 1f169dee..86dd1c17 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -1,11 +1,10 @@
> -use std::io::Write;
>  use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
>  
>  use anyhow::{bail, format_err, Error};
>  
> -use pbs_api_types::GarbageCollectionStatus;
> +use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>  use proxmox_sys::fs::{create_dir, create_path, CreateOptions};
>  use proxmox_sys::process_locker::{
>      ProcessLockExclusiveGuard, ProcessLockSharedGuard, ProcessLocker,
> @@ -22,6 +21,7 @@ pub struct ChunkStore {
>      chunk_dir: PathBuf,
>      mutex: Mutex<()>,
>      locker: Option<Arc<Mutex<ProcessLocker>>>,
> +    sync_level: DatastoreFSyncLevel,
>  }
>  
>  // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ?
> @@ -68,6 +68,7 @@ impl ChunkStore {
>              chunk_dir: PathBuf::new(),
>              mutex: Mutex::new(()),
>              locker: None,
> +            sync_level: Default::default(),
>          }
>      }
>  
> @@ -88,6 +89,7 @@ impl ChunkStore {
>          uid: nix::unistd::Uid,
>          gid: nix::unistd::Gid,
>          worker: Option<&dyn WorkerTaskContext>,
> +        sync_level: DatastoreFSyncLevel,
>      ) -> Result<Self, Error>
>      where
>          P: Into<PathBuf>,
> @@ -144,7 +146,7 @@ impl ChunkStore {
>              }
>          }
>  
> -        Self::open(name, base)
> +        Self::open(name, base, sync_level)
>      }
>  
>      fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
> @@ -153,7 +155,11 @@ impl ChunkStore {
>          lockfile_path
>      }
>  
> -    pub fn open<P: Into<PathBuf>>(name: &str, base: P) -> Result<Self, Error> {
> +    pub fn open<P: Into<PathBuf>>(
> +        name: &str,
> +        base: P,
> +        sync_level: DatastoreFSyncLevel,
> +    ) -> Result<Self, Error> {
>          let base: PathBuf = base.into();
>  
>          if !base.is_absolute() {
> @@ -176,6 +182,7 @@ impl ChunkStore {
>              chunk_dir,
>              locker: Some(locker),
>              mutex: Mutex::new(()),
> +            sync_level,
>          })
>      }
>  
> @@ -461,9 +468,27 @@ impl ChunkStore {
>              }
>          }
>  
> -        proxmox_sys::fs::replace_file(chunk_path, raw_data, CreateOptions::new(), false).map_err(
> -            |err| format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}"),
> -        )?;
> +        let chunk_dir_path = chunk_path
> +            .parent()
> +            .ok_or_else(|| format_err!("unable to get chunk dir"))?
> +            .to_owned();
> +
> +        proxmox_sys::fs::replace_file(
> +            chunk_path,
> +            raw_data,
> +            CreateOptions::new(),
> +            self.sync_level == DatastoreFSyncLevel::File,
> +        )
> +        .map_err(|err| {
> +            format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}")
> +        })?;
> +
> +        if self.sync_level == DatastoreFSyncLevel::File {
> +            // fsync dir handle to persist the tmp rename
> +            let dir = std::fs::File::open(chunk_dir_path)?;
> +            nix::unistd::fsync(dir.as_raw_fd())
> +                .map_err(|err| format_err!("fsync failed: {err}"))?;
> +        }
>  
>          drop(lock);
>  
> @@ -520,13 +545,21 @@ fn test_chunk_store1() {
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
>  
> -    let chunk_store = ChunkStore::open("test", &path);
> +    let chunk_store = ChunkStore::open("test", &path, DatastoreFSyncLevel::None);
>      assert!(chunk_store.is_err());
>  
>      let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>          .unwrap()
>          .unwrap();
> -    let chunk_store = ChunkStore::create("test", &path, user.uid, user.gid, None).unwrap();
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        None,
> +        DatastoreFSyncLevel::None,
> +    )
> +    .unwrap();
>  
>      let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>          .build()
> @@ -538,7 +571,14 @@ fn test_chunk_store1() {
>      let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>      assert!(exists);
>  
> -    let chunk_store = ChunkStore::create("test", &path, user.uid, user.gid, None);
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        None,
> +        DatastoreFSyncLevel::None,
> +    );
>      assert!(chunk_store.is_err());
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 5af8a295..613f3196 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -18,8 +18,8 @@ use proxmox_sys::WorkerTaskContext;
>  use proxmox_sys::{task_log, task_warn};
>  
>  use pbs_api_types::{
> -    Authid, BackupNamespace, BackupType, ChunkOrder, DataStoreConfig, DatastoreTuning,
> -    GarbageCollectionStatus, HumanByte, Operation, UPID,
> +    Authid, BackupNamespace, BackupType, ChunkOrder, DataStoreConfig, DatastoreFSyncLevel,
> +    DatastoreTuning, GarbageCollectionStatus, HumanByte, Operation, UPID,
>  };
>  use pbs_config::ConfigVersionCache;
>  
> @@ -61,6 +61,7 @@ pub struct DataStoreImpl {
>      chunk_order: ChunkOrder,
>      last_generation: usize,
>      last_update: i64,
> +    sync_level: DatastoreFSyncLevel,
>  }
>  
>  impl DataStoreImpl {
> @@ -75,6 +76,7 @@ impl DataStoreImpl {
>              chunk_order: ChunkOrder::None,
>              last_generation: 0,
>              last_update: 0,
> +            sync_level: Default::default(),
>          })
>      }
>  }
> @@ -154,7 +156,13 @@ impl DataStore {
>              }
>          }
>  
> -        let chunk_store = ChunkStore::open(name, &config.path)?;
> +        let tuning: DatastoreTuning = serde_json::from_value(
> +            DatastoreTuning::API_SCHEMA
> +                .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> +        )?;
> +
> +        let chunk_store =
> +            ChunkStore::open(name, &config.path, tuning.sync_level.unwrap_or_default())?;
>          let datastore = DataStore::with_store_and_config(chunk_store, config, generation, now)?;
>  
>          let datastore = Arc::new(datastore);
> @@ -197,7 +205,12 @@ impl DataStore {
>      ) -> Result<Arc<Self>, Error> {
>          let name = config.name.clone();
>  
> -        let chunk_store = ChunkStore::open(&name, &config.path)?;
> +        let tuning: DatastoreTuning = serde_json::from_value(
> +            DatastoreTuning::API_SCHEMA
> +                .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> +        )?;
> +        let chunk_store =
> +            ChunkStore::open(&name, &config.path, tuning.sync_level.unwrap_or_default())?;
>          let inner = Arc::new(Self::with_store_and_config(chunk_store, config, 0, 0)?);
>  
>          if let Some(operation) = operation {
> @@ -242,6 +255,7 @@ impl DataStore {
>              chunk_order,
>              last_generation,
>              last_update,
> +            sync_level: tuning.sync_level.unwrap_or_default(),
>          })
>      }
>  
> @@ -1257,4 +1271,19 @@ impl DataStore {
>          todo!("split out the namespace");
>      }
>      */
> +
> +    /// Syncs the filesystem of the datastore if 'sync_level' is set to
> +    /// [`DatastoreFSyncLevel::Filesystem`]. Uses syncfs(2).
> +    pub fn try_ensure_sync_level(&self) -> Result<(), Error> {
> +        if self.inner.sync_level != DatastoreFSyncLevel::Filesystem {
> +            return Ok(());
> +        }
> +        let file = std::fs::File::open(self.base_path())?;
> +        let fd = file.as_raw_fd();
> +        log::info!("syncinc filesystem");
> +        if unsafe { libc::syncfs(fd) } < 0 {
> +            bail!("error during syncfs: {}", std::io::Error::last_os_error());
> +        }
> +        Ok(())
> +    }
>  }
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 8c1c42db..98a13c6e 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -623,6 +623,8 @@ impl BackupEnvironment {
>              }
>          }
>  
> +        self.datastore.try_ensure_sync_level()?;
> +
>          // marks the backup as successful
>          state.finished = true;
>  
> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index 28342c2c..9fdd73dc 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -11,8 +11,8 @@ use proxmox_section_config::SectionConfigData;
>  use proxmox_sys::WorkerTaskContext;
>  
>  use pbs_api_types::{
> -    Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreNotify, DATASTORE_SCHEMA,
> -    PRIV_DATASTORE_ALLOCATE, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_MODIFY,
> +    Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreNotify, DatastoreTuning,
> +    DATASTORE_SCHEMA, PRIV_DATASTORE_ALLOCATE, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_MODIFY,
>      PROXMOX_CONFIG_DIGEST_SCHEMA,
>  };
>  use pbs_config::BackupLockGuard;
> @@ -70,6 +70,10 @@ pub(crate) fn do_create_datastore(
>  ) -> Result<(), Error> {
>      let path: PathBuf = datastore.path.clone().into();
>  
> +    let tuning: DatastoreTuning = serde_json::from_value(
> +        DatastoreTuning::API_SCHEMA
> +            .parse_property_string(datastore.tuning.as_deref().unwrap_or(""))?,
> +    )?;
>      let backup_user = pbs_config::backup_user()?;
>      let _store = ChunkStore::create(
>          &datastore.name,
> @@ -77,6 +81,7 @@ pub(crate) fn do_create_datastore(
>          backup_user.uid,
>          backup_user.gid,
>          worker,
> +        tuning.sync_level.unwrap_or_default(),
>      )?;
>  
>      config.set_data(&datastore.name, "datastore", &datastore)?;
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




  reply	other threads:[~2022-05-23  7:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-20 12:42 [pbs-devel] [PATCH proxmox-backup 0/5] add 'sync-level' to datatore tuning options Dominik Csapak
2022-05-20 12:42 ` [pbs-devel] [PATCH proxmox-backup 1/5] docs: add information about chunk order option for datastores Dominik Csapak
2022-05-20 12:42 ` [pbs-devel] [PATCH proxmox-backup 2/5] pbs-datastore: chunk_store: use replace_file in insert_chunk Dominik Csapak
2022-05-20 12:42 ` [pbs-devel] [PATCH proxmox-backup 3/5] datastore: implement sync-level tuning for datastores Dominik Csapak
2022-05-23  7:13   ` Fabian Grünbichler [this message]
2022-05-24  8:14     ` Thomas Lamprecht
2022-05-20 12:42 ` [pbs-devel] [PATCH proxmox-backup 4/5] docs: add documentation about the 'sync-level' tuning Dominik Csapak
2022-05-20 12:42 ` [pbs-devel] [PATCH proxmox-backup 5/5] datastore: make 'filesystem' the default sync-level Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1653289622.043mjuz08t.astroid@nora.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal