public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore
@ 2021-06-18  9:29 Dominik Csapak
  2021-06-18  9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
  2021-07-12  4:23 ` [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Thomas Lamprecht
  0 siblings, 2 replies; 4+ messages in thread
From: Dominik Csapak @ 2021-06-18  9:29 UTC (permalink / raw)
  To: pbs-devel

so that we can reuse that information

the removal of the adding to the corrupted list is ok, since
'get_chunks_in_order' returns them at the end of the list
and we do the same if the loading fails later in 'verify_index_chunks'
so we still mark them corrupt
(assuming that the load will fail if the stat does)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
alternatively, we could return 2 lists, or check after the loop
for the u64::MAX value, but imho it's ok this way

 src/backup/datastore.rs | 38 ++++++++++++++++++++++++++++++++++++++
 src/backup/verify.rs    | 38 +++++++-------------------------------
 2 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index a0cf50b2..116d2441 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -825,4 +825,42 @@ impl DataStore {
     pub fn verify_new(&self) -> bool {
         self.verify_new
     }
+
+    /// returns a list of chunks sorted by their inode number on disk
+    /// chunks that could not be stat'ed are at the end of the list
+    pub fn get_chunks_in_order<F, A>(
+        &self,
+        index: &Box<dyn IndexFile + Send>,
+        skip_chunk: F,
+        check_abort: A,
+    ) -> Result<Vec<(usize, u64)>, Error>
+    where
+        F: Fn(&[u8; 32]) -> bool,
+        A: Fn(usize) -> Result<(), Error>,
+    {
+        let index_count = index.index_count();
+        let mut chunk_list = Vec::with_capacity(index_count);
+        use std::os::unix::fs::MetadataExt;
+        for pos in 0..index_count {
+            check_abort(pos)?;
+
+            let info = index.chunk_info(pos).unwrap();
+
+            if skip_chunk(&info.digest) {
+                continue;
+            }
+
+            let ino = match self.stat_chunk(&info.digest) {
+                Err(_) => u64::MAX, // could not stat, move to end of list
+                Ok(metadata) => metadata.ino(),
+            };
+
+            chunk_list.push((pos, ino));
+        }
+
+        // sorting by inode improves data locality, which makes it lots faster on spinners
+        chunk_list.sort_unstable_by(|(_, ino_a), (_, ino_b)| ino_a.cmp(&ino_b));
+
+        Ok(chunk_list)
+    }
 }
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index a1b1e6dd..74b0cfe7 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -179,42 +179,18 @@ fn verify_index_chunks(
         }
     };
 
-    let index_count = index.index_count();
-    let mut chunk_list = Vec::with_capacity(index_count);
-
-    use std::os::unix::fs::MetadataExt;
-
-    for pos in 0..index_count {
+    let check_abort = |pos: usize| -> Result<(), Error> {
         if pos & 1023 == 0 {
             verify_worker.worker.check_abort()?;
             crate::tools::fail_on_shutdown()?;
         }
+        Ok(())
+    };
 
-        let info = index.chunk_info(pos).unwrap();
-
-        if skip_chunk(&info.digest) {
-            continue; // already verified or marked corrupt
-        }
-
-        match verify_worker.datastore.stat_chunk(&info.digest) {
-            Err(err) => {
-                verify_worker.corrupt_chunks.lock().unwrap().insert(info.digest);
-                task_log!(verify_worker.worker, "can't verify chunk, stat failed - {}", err);
-                errors.fetch_add(1, Ordering::SeqCst);
-                rename_corrupted_chunk(
-                    verify_worker.datastore.clone(),
-                    &info.digest,
-                    &verify_worker.worker,
-                );
-            }
-            Ok(metadata) => {
-                chunk_list.push((pos, metadata.ino()));
-            }
-        }
-    }
-
-    // sorting by inode improves data locality, which makes it lots faster on spinners
-    chunk_list.sort_unstable_by(|(_, ino_a), (_, ino_b)| ino_a.cmp(&ino_b));
+    let chunk_list =
+        verify_worker
+            .datastore
+            .get_chunks_in_order(&index, skip_chunk, check_abort)?;
 
     for (pos, _) in chunk_list {
         verify_worker.worker.check_abort()?;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index)
  2021-06-18  9:29 [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Dominik Csapak
@ 2021-06-18  9:29 ` Dominik Csapak
  2021-06-18 15:02   ` Thomas Lamprecht
  2021-07-12  4:23 ` [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Thomas Lamprecht
  1 sibling, 1 reply; 4+ messages in thread
From: Dominik Csapak @ 2021-06-18  9:29 UTC (permalink / raw)
  To: pbs-devel

sort the chunks we want to backup to tape by inode, to gain some
speed on spinning disks. this is done per index, not globally.

costs a bit memory, but not too much, about 16 bytes per chunk which
would mean ~4MiB for a 1TiB index with 4MiB chunks.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
this resulted in a speedup in my setup of between 20 and 30%
(single spinner with random snapshots, from 17-26MiB/s to 30-40MiB/s)
we already do this for verification, but got no real feedback on it,
so either it does not make that much of a difference in the real world,
or it is not that visible on verification (since that varies very much
anyway)

 src/tape/helpers/snapshot_reader.rs | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/tape/helpers/snapshot_reader.rs b/src/tape/helpers/snapshot_reader.rs
index 7b272e37..416c88c1 100644
--- a/src/tape/helpers/snapshot_reader.rs
+++ b/src/tape/helpers/snapshot_reader.rs
@@ -107,7 +107,7 @@ impl SnapshotReader {
 pub struct SnapshotChunkIterator<'a> {
     snapshot_reader: &'a SnapshotReader,
     todo_list: Vec<String>,
-    current_index: Option<(Arc<Box<dyn IndexFile>>, usize)>,
+    current_index: Option<(Arc<Box<dyn IndexFile + Send>>, usize, Vec<(usize, u64)>)>,
 }
 
 impl <'a> Iterator for SnapshotChunkIterator<'a> {
@@ -119,20 +119,26 @@ impl <'a> Iterator for SnapshotChunkIterator<'a> {
                 if self.current_index.is_none() {
                     if let Some(filename) = self.todo_list.pop() {
                         let file = self.snapshot_reader.open_file(&filename)?;
-                        let index: Box<dyn IndexFile> = match archive_type(&filename)? {
+                        let index: Box<dyn IndexFile + Send> = match archive_type(&filename)? {
                             ArchiveType::FixedIndex => Box::new(FixedIndexReader::new(file)?),
                             ArchiveType::DynamicIndex => Box::new(DynamicIndexReader::new(file)?),
                             _ => bail!("SnapshotChunkIterator: got unknown file type - internal error"),
                         };
-                        self.current_index = Some((Arc::new(index), 0));
+
+                        let datastore =
+                            DataStore::lookup_datastore(self.snapshot_reader.datastore_name())?;
+                        let order = datastore.get_chunks_in_order(&index, |_| false, |_| Ok(()))?;
+
+                        self.current_index = Some((Arc::new(index), 0, order));
                     } else {
                         return Ok(None);
                     }
                 }
-                let (index, pos) = self.current_index.take().unwrap();
-                if pos < index.index_count() {
-                    let digest = *index.index_digest(pos).unwrap();
-                    self.current_index = Some((index, pos + 1));
+                let (index, pos, list) = self.current_index.take().unwrap();
+                if pos < list.len() {
+                    let (real_pos, _) = list[pos];
+                    let digest = *index.index_digest(real_pos).unwrap();
+                    self.current_index = Some((index, pos + 1, list));
                     return Ok(Some(digest));
                 } else {
                     // pop next index
-- 
2.20.1





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index)
  2021-06-18  9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
@ 2021-06-18 15:02   ` Thomas Lamprecht
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-06-18 15:02 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

On 18.06.21 11:29, Dominik Csapak wrote:
> sort the chunks we want to backup to tape by inode, to gain some
> speed on spinning disks. this is done per index, not globally.
> 
> costs a bit memory, but not too much, about 16 bytes per chunk which
> would mean ~4MiB for a 1TiB index with 4MiB chunks.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> this resulted in a speedup in my setup of between 20 and 30%
> (single spinner with random snapshots, from 17-26MiB/s to 30-40MiB/s)
> we already do this for verification, but got no real feedback on it,
> so either it does not make that much of a difference in the real world,
> or it is not that visible on verification (since that varies very much
> anyway)

or, IMO more likely, people just don't give feedback if things are working out
OK-ish ;-)

> 
>  src/tape/helpers/snapshot_reader.rs | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/src/tape/helpers/snapshot_reader.rs b/src/tape/helpers/snapshot_reader.rs
> index 7b272e37..416c88c1 100644
> --- a/src/tape/helpers/snapshot_reader.rs
> +++ b/src/tape/helpers/snapshot_reader.rs
> @@ -107,7 +107,7 @@ impl SnapshotReader {
>  pub struct SnapshotChunkIterator<'a> {
>      snapshot_reader: &'a SnapshotReader,
>      todo_list: Vec<String>,
> -    current_index: Option<(Arc<Box<dyn IndexFile>>, usize)>,
> +    current_index: Option<(Arc<Box<dyn IndexFile + Send>>, usize, Vec<(usize, u64)>)>,
>  }
>  
>  impl <'a> Iterator for SnapshotChunkIterator<'a> {
> @@ -119,20 +119,26 @@ impl <'a> Iterator for SnapshotChunkIterator<'a> {
>                  if self.current_index.is_none() {
>                      if let Some(filename) = self.todo_list.pop() {
>                          let file = self.snapshot_reader.open_file(&filename)?;
> -                        let index: Box<dyn IndexFile> = match archive_type(&filename)? {
> +                        let index: Box<dyn IndexFile + Send> = match archive_type(&filename)? {
>                              ArchiveType::FixedIndex => Box::new(FixedIndexReader::new(file)?),
>                              ArchiveType::DynamicIndex => Box::new(DynamicIndexReader::new(file)?),
>                              _ => bail!("SnapshotChunkIterator: got unknown file type - internal error"),
>                          };
> -                        self.current_index = Some((Arc::new(index), 0));
> +
> +                        let datastore =
> +                            DataStore::lookup_datastore(self.snapshot_reader.datastore_name())?;
> +                        let order = datastore.get_chunks_in_order(&index, |_| false, |_| Ok(()))?;
> +
> +                        self.current_index = Some((Arc::new(index), 0, order));
>                      } else {
>                          return Ok(None);
>                      }
>                  }
> -                let (index, pos) = self.current_index.take().unwrap();
> -                if pos < index.index_count() {
> -                    let digest = *index.index_digest(pos).unwrap();
> -                    self.current_index = Some((index, pos + 1));
> +                let (index, pos, list) = self.current_index.take().unwrap();
> +                if pos < list.len() {
> +                    let (real_pos, _) = list[pos];
> +                    let digest = *index.index_digest(real_pos).unwrap();
> +                    self.current_index = Some((index, pos + 1, list));
>                      return Ok(Some(digest));
>                  } else {
>                      // pop next index
> 





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore
  2021-06-18  9:29 [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Dominik Csapak
  2021-06-18  9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
@ 2021-07-12  4:23 ` Thomas Lamprecht
  1 sibling, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-07-12  4:23 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

On 18.06.21 11:29, Dominik Csapak wrote:
> so that we can reuse that information
> 
> the removal of the adding to the corrupted list is ok, since
> 'get_chunks_in_order' returns them at the end of the list
> and we do the same if the loading fails later in 'verify_index_chunks'
> so we still mark them corrupt
> (assuming that the load will fail if the stat does)
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> alternatively, we could return 2 lists, or check after the loop
> for the u64::MAX value, but imho it's ok this way
> 
>  src/backup/datastore.rs | 38 ++++++++++++++++++++++++++++++++++++++
>  src/backup/verify.rs    | 38 +++++++-------------------------------
>  2 files changed, 45 insertions(+), 31 deletions(-)
> 
>

Just for the record, as Dietmar applied this on "Mon Jun 28 12:16:14 2021 +0200", but
I saw no applied mail and as you also still thought this was not in I figured it'll save
me some time in the future if I record that: applied, thanks!




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-12  4:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18  9:29 [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Dominik Csapak
2021-06-18  9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
2021-06-18 15:02   ` Thomas Lamprecht
2021-07-12  4:23 ` [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal