From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 4DDD972335 for ; Mon, 23 May 2022 09:13:40 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 43EE27CC9 for ; Mon, 23 May 2022 09:13:40 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 7B2C07CC0 for ; Mon, 23 May 2022 09:13:36 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 46D5D437F3 for ; Mon, 23 May 2022 09:13:30 +0200 (CEST) Date: Mon, 23 May 2022 09:13:19 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox Backup Server development discussion References: <20220520124228.3368960-1-d.csapak@proxmox.com> <20220520124228.3368960-4-d.csapak@proxmox.com> In-Reply-To: <20220520124228.3368960-4-d.csapak@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1653289622.043mjuz08t.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.172 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [pbs-devel] [PATCH proxmox-backup 3/5] datastore: implement sync-level tuning for datastores X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 May 2022 07:13:40 -0000 On May 20, 2022 2:42 pm, Dominik Csapak wrote: > currently, we don't (f)sync on chunk insertion (or at any point after > that), which can lead to broken chunks in case of e.g. an unexpected > powerloss. To fix that, offer a tuning option for datastores that > controls the level of syncs it does: >=20 > * None (default): same as current state, no (f)syncs done at any point > * Filesystem: at the end of a backup, the datastore issues > a syncfs(2) to the filesystem of the datastore > * File: issues an fsync on each chunk as they get inserted > (using our 'replace_file' helper) and a fsync on the directory handle >=20 > a small benchmark showed the following (times in mm:ss): > setup: virtual pbs, 4 cores, 8GiB memory, ext4 on spinner >=20 > size none filesystem file > 2GiB (fits in ram) 00:13 0:41 01:00 > 33GiB 05:21 05:31 13:45 >=20 > so if the backup fits in memory, there is a large difference between all > of the modes (expected), but as soon as it exceeds the memory size, > the difference between not syncing and syncing the fs at the end becomes > much smaller. >=20 > i also tested on an nvme, but there the syncs basically made no differenc= e >=20 > Signed-off-by: Dominik Csapak > --- > i retested with the 'fsync on dirhandle' part, but the result > was not significantly different than before, neither on nvme nor on > a spinning disk... >=20 > pbs-api-types/src/datastore.rs | 32 +++++++++++++++++ > pbs-datastore/src/chunk_store.rs | 60 ++++++++++++++++++++++++++------ > pbs-datastore/src/datastore.rs | 37 +++++++++++++++++--- > src/api2/backup/environment.rs | 2 ++ > src/api2/config/datastore.rs | 9 +++-- > 5 files changed, 124 insertions(+), 16 deletions(-) >=20 > diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore= .rs > index e2bf70aa..6563aa73 100644 > --- a/pbs-api-types/src/datastore.rs > +++ b/pbs-api-types/src/datastore.rs > @@ -214,6 +214,37 @@ pub enum ChunkOrder { > Inode, > } > =20 > +#[api] > +#[derive(PartialEq, Serialize, Deserialize)] > +#[serde(rename_all =3D "lowercase")] > +/// The level of syncing that is done when writing into a datastore. > +pub enum DatastoreFSyncLevel { > + /// No special fsync or syncfs calls are triggered. The system defau= lt dirty write back > + /// mechanism ensures that data gets is flushed eventually via the `= dirty_writeback_centisecs` > + /// and `dirty_expire_centisecs` kernel sysctls, defaulting to ~ 30s= . > + /// > + /// This mode provides generally the best performance, as all write = back can happen async, > + /// which reduces IO pressure. > + /// But it may cause losing data on powerloss or system crash withou= t any uninterruptible power supply. > + None, > + /// Triggers a fsync after writing any chunk on the datastore. While= this can slow down > + /// backups significantly, depending on the underlying file system a= nd storage used, it > + /// will ensure fine-grained consistency. But in practice there are = no benefits over the > + /// file system level sync, so you should prefer that one, as on mos= t systems the file level > + /// one is slower and causes more IO pressure compared to the file s= ystem level one. I am not sure whether that "in practice" statement really holds - we only tested the exact failure case on a few filesystems, there=20 might be ones in use out there where a powerloss can also lead to a=20 truncated chunk, not only an empty/missing one. granted, both will be=20 detected on the next verification, but only the latter will be=20 automatically cleaned up by a subsequent backup task that uploads this=20 chunk.. - the FS underlying the datastore might be used for many datastores, or=20 even other, busy, non-datastore usage. not an ideal setup, but there=20 might be $reasons. in this case, syncfs might have a much bigger=20 negative effect (because of syncing out other, unrelated I/O) than=20 fsync. - not sure what effect syncfs has if a datastore is really busy (as in,=20 has tons of basically no-op backups over a short period of time) I'd rather mark 'Filesystem' as a good compromise, and the 'File' on as=20 most consistent. > + File, > + /// Trigger a filesystem wide sync after all backup data got written= but before finishing the > + /// task. This allows that every finished backup is fully written ba= ck to storage > + /// while reducing the impact most file systems in contrast to the f= ile level sync. missing 'on'? (reducing the impact *on* most file systems)? > + Filesystem, > +} > + > +impl Default for DatastoreFSyncLevel { > + fn default() -> Self { > + DatastoreFSyncLevel::None > + } > +} > + > #[api( > properties: { > "chunk-order": { > @@ -228,6 +259,7 @@ pub enum ChunkOrder { > pub struct DatastoreTuning { > /// Iterate chunks in this order > pub chunk_order: Option, > + pub sync_level: Option, > } > =20 > pub const DATASTORE_TUNING_STRING_SCHEMA: Schema =3D StringSchema::new("= Datastore tuning options") > diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_s= tore.rs > index 1f169dee..86dd1c17 100644 > --- a/pbs-datastore/src/chunk_store.rs > +++ b/pbs-datastore/src/chunk_store.rs > @@ -1,11 +1,10 @@ > -use std::io::Write; > use std::os::unix::io::AsRawFd; > use std::path::{Path, PathBuf}; > use std::sync::{Arc, Mutex}; > =20 > use anyhow::{bail, format_err, Error}; > =20 > -use pbs_api_types::GarbageCollectionStatus; > +use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus}; > use proxmox_sys::fs::{create_dir, create_path, CreateOptions}; > use proxmox_sys::process_locker::{ > ProcessLockExclusiveGuard, ProcessLockSharedGuard, ProcessLocker, > @@ -22,6 +21,7 @@ pub struct ChunkStore { > chunk_dir: PathBuf, > mutex: Mutex<()>, > locker: Option>>, > + sync_level: DatastoreFSyncLevel, > } > =20 > // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ? > @@ -68,6 +68,7 @@ impl ChunkStore { > chunk_dir: PathBuf::new(), > mutex: Mutex::new(()), > locker: None, > + sync_level: Default::default(), > } > } > =20 > @@ -88,6 +89,7 @@ impl ChunkStore { > uid: nix::unistd::Uid, > gid: nix::unistd::Gid, > worker: Option<&dyn WorkerTaskContext>, > + sync_level: DatastoreFSyncLevel, > ) -> Result > where > P: Into, > @@ -144,7 +146,7 @@ impl ChunkStore { > } > } > =20 > - Self::open(name, base) > + Self::open(name, base, sync_level) > } > =20 > fn lockfile_path>(base: P) -> PathBuf { > @@ -153,7 +155,11 @@ impl ChunkStore { > lockfile_path > } > =20 > - pub fn open>(name: &str, base: P) -> Result { > + pub fn open>( > + name: &str, > + base: P, > + sync_level: DatastoreFSyncLevel, > + ) -> Result { > let base: PathBuf =3D base.into(); > =20 > if !base.is_absolute() { > @@ -176,6 +182,7 @@ impl ChunkStore { > chunk_dir, > locker: Some(locker), > mutex: Mutex::new(()), > + sync_level, > }) > } > =20 > @@ -461,9 +468,27 @@ impl ChunkStore { > } > } > =20 > - proxmox_sys::fs::replace_file(chunk_path, raw_data, CreateOption= s::new(), false).map_err( > - |err| format_err!("inserting chunk on store '{name}' failed = for {digest_str} - {err}"), > - )?; > + let chunk_dir_path =3D chunk_path > + .parent() > + .ok_or_else(|| format_err!("unable to get chunk dir"))? > + .to_owned(); > + > + proxmox_sys::fs::replace_file( > + chunk_path, > + raw_data, > + CreateOptions::new(), > + self.sync_level =3D=3D DatastoreFSyncLevel::File, > + ) > + .map_err(|err| { > + format_err!("inserting chunk on store '{name}' failed for {d= igest_str} - {err}") > + })?; > + > + if self.sync_level =3D=3D DatastoreFSyncLevel::File { > + // fsync dir handle to persist the tmp rename > + let dir =3D std::fs::File::open(chunk_dir_path)?; > + nix::unistd::fsync(dir.as_raw_fd()) > + .map_err(|err| format_err!("fsync failed: {err}"))?; > + } > =20 > drop(lock); > =20 > @@ -520,13 +545,21 @@ fn test_chunk_store1() { > =20 > if let Err(_e) =3D std::fs::remove_dir_all(".testdir") { /* ignore *= / } > =20 > - let chunk_store =3D ChunkStore::open("test", &path); > + let chunk_store =3D ChunkStore::open("test", &path, DatastoreFSyncLe= vel::None); > assert!(chunk_store.is_err()); > =20 > let user =3D nix::unistd::User::from_uid(nix::unistd::Uid::current()= ) > .unwrap() > .unwrap(); > - let chunk_store =3D ChunkStore::create("test", &path, user.uid, user= .gid, None).unwrap(); > + let chunk_store =3D ChunkStore::create( > + "test", > + &path, > + user.uid, > + user.gid, > + None, > + DatastoreFSyncLevel::None, > + ) > + .unwrap(); > =20 > let (chunk, digest) =3D crate::data_blob::DataChunkBuilder::new(&[0u= 8, 1u8]) > .build() > @@ -538,7 +571,14 @@ fn test_chunk_store1() { > let (exists, _) =3D chunk_store.insert_chunk(&chunk, &digest).unwrap= (); > assert!(exists); > =20 > - let chunk_store =3D ChunkStore::create("test", &path, user.uid, user= .gid, None); > + let chunk_store =3D ChunkStore::create( > + "test", > + &path, > + user.uid, > + user.gid, > + None, > + DatastoreFSyncLevel::None, > + ); > assert!(chunk_store.is_err()); > =20 > if let Err(_e) =3D std::fs::remove_dir_all(".testdir") { /* ignore *= / } > diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore= .rs > index 5af8a295..613f3196 100644 > --- a/pbs-datastore/src/datastore.rs > +++ b/pbs-datastore/src/datastore.rs > @@ -18,8 +18,8 @@ use proxmox_sys::WorkerTaskContext; > use proxmox_sys::{task_log, task_warn}; > =20 > use pbs_api_types::{ > - Authid, BackupNamespace, BackupType, ChunkOrder, DataStoreConfig, Da= tastoreTuning, > - GarbageCollectionStatus, HumanByte, Operation, UPID, > + Authid, BackupNamespace, BackupType, ChunkOrder, DataStoreConfig, Da= tastoreFSyncLevel, > + DatastoreTuning, GarbageCollectionStatus, HumanByte, Operation, UPID= , > }; > use pbs_config::ConfigVersionCache; > =20 > @@ -61,6 +61,7 @@ pub struct DataStoreImpl { > chunk_order: ChunkOrder, > last_generation: usize, > last_update: i64, > + sync_level: DatastoreFSyncLevel, > } > =20 > impl DataStoreImpl { > @@ -75,6 +76,7 @@ impl DataStoreImpl { > chunk_order: ChunkOrder::None, > last_generation: 0, > last_update: 0, > + sync_level: Default::default(), > }) > } > } > @@ -154,7 +156,13 @@ impl DataStore { > } > } > =20 > - let chunk_store =3D ChunkStore::open(name, &config.path)?; > + let tuning: DatastoreTuning =3D serde_json::from_value( > + DatastoreTuning::API_SCHEMA > + .parse_property_string(config.tuning.as_deref().unwrap_o= r(""))?, > + )?; > + > + let chunk_store =3D > + ChunkStore::open(name, &config.path, tuning.sync_level.unwra= p_or_default())?; > let datastore =3D DataStore::with_store_and_config(chunk_store, = config, generation, now)?; > =20 > let datastore =3D Arc::new(datastore); > @@ -197,7 +205,12 @@ impl DataStore { > ) -> Result, Error> { > let name =3D config.name.clone(); > =20 > - let chunk_store =3D ChunkStore::open(&name, &config.path)?; > + let tuning: DatastoreTuning =3D serde_json::from_value( > + DatastoreTuning::API_SCHEMA > + .parse_property_string(config.tuning.as_deref().unwrap_o= r(""))?, > + )?; > + let chunk_store =3D > + ChunkStore::open(&name, &config.path, tuning.sync_level.unwr= ap_or_default())?; > let inner =3D Arc::new(Self::with_store_and_config(chunk_store, = config, 0, 0)?); > =20 > if let Some(operation) =3D operation { > @@ -242,6 +255,7 @@ impl DataStore { > chunk_order, > last_generation, > last_update, > + sync_level: tuning.sync_level.unwrap_or_default(), > }) > } > =20 > @@ -1257,4 +1271,19 @@ impl DataStore { > todo!("split out the namespace"); > } > */ > + > + /// Syncs the filesystem of the datastore if 'sync_level' is set to > + /// [`DatastoreFSyncLevel::Filesystem`]. Uses syncfs(2). > + pub fn try_ensure_sync_level(&self) -> Result<(), Error> { > + if self.inner.sync_level !=3D DatastoreFSyncLevel::Filesystem { > + return Ok(()); > + } > + let file =3D std::fs::File::open(self.base_path())?; > + let fd =3D file.as_raw_fd(); > + log::info!("syncinc filesystem"); > + if unsafe { libc::syncfs(fd) } < 0 { > + bail!("error during syncfs: {}", std::io::Error::last_os_err= or()); > + } > + Ok(()) > + } > } > diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment= .rs > index 8c1c42db..98a13c6e 100644 > --- a/src/api2/backup/environment.rs > +++ b/src/api2/backup/environment.rs > @@ -623,6 +623,8 @@ impl BackupEnvironment { > } > } > =20 > + self.datastore.try_ensure_sync_level()?; > + > // marks the backup as successful > state.finished =3D true; > =20 > diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs > index 28342c2c..9fdd73dc 100644 > --- a/src/api2/config/datastore.rs > +++ b/src/api2/config/datastore.rs > @@ -11,8 +11,8 @@ use proxmox_section_config::SectionConfigData; > use proxmox_sys::WorkerTaskContext; > =20 > use pbs_api_types::{ > - Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreNotify, DA= TASTORE_SCHEMA, > - PRIV_DATASTORE_ALLOCATE, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_MODIFY= , > + Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreNotify, Da= tastoreTuning, > + DATASTORE_SCHEMA, PRIV_DATASTORE_ALLOCATE, PRIV_DATASTORE_AUDIT, PRI= V_DATASTORE_MODIFY, > PROXMOX_CONFIG_DIGEST_SCHEMA, > }; > use pbs_config::BackupLockGuard; > @@ -70,6 +70,10 @@ pub(crate) fn do_create_datastore( > ) -> Result<(), Error> { > let path: PathBuf =3D datastore.path.clone().into(); > =20 > + let tuning: DatastoreTuning =3D serde_json::from_value( > + DatastoreTuning::API_SCHEMA > + .parse_property_string(datastore.tuning.as_deref().unwrap_or= (""))?, > + )?; > let backup_user =3D pbs_config::backup_user()?; > let _store =3D ChunkStore::create( > &datastore.name, > @@ -77,6 +81,7 @@ pub(crate) fn do_create_datastore( > backup_user.uid, > backup_user.gid, > worker, > + tuning.sync_level.unwrap_or_default(), > )?; > =20 > config.set_data(&datastore.name, "datastore", &datastore)?; > --=20 > 2.30.2 >=20 >=20 >=20 > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel >=20 >=20 >=20