* [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore
@ 2021-06-18 9:29 Dominik Csapak
2021-06-18 9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
2021-07-12 4:23 ` [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Thomas Lamprecht
0 siblings, 2 replies; 4+ messages in thread
From: Dominik Csapak @ 2021-06-18 9:29 UTC (permalink / raw)
To: pbs-devel
so that we can reuse that information
the removal of the adding to the corrupted list is ok, since
'get_chunks_in_order' returns them at the end of the list
and we do the same if the loading fails later in 'verify_index_chunks'
so we still mark them corrupt
(assuming that the load will fail if the stat does)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
alternatively, we could return 2 lists, or check after the loop
for the u64::MAX value, but imho it's ok this way
src/backup/datastore.rs | 38 ++++++++++++++++++++++++++++++++++++++
src/backup/verify.rs | 38 +++++++-------------------------------
2 files changed, 45 insertions(+), 31 deletions(-)
diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index a0cf50b2..116d2441 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -825,4 +825,42 @@ impl DataStore {
pub fn verify_new(&self) -> bool {
self.verify_new
}
+
+ /// returns a list of chunks sorted by their inode number on disk
+ /// chunks that could not be stat'ed are at the end of the list
+ pub fn get_chunks_in_order<F, A>(
+ &self,
+ index: &Box<dyn IndexFile + Send>,
+ skip_chunk: F,
+ check_abort: A,
+ ) -> Result<Vec<(usize, u64)>, Error>
+ where
+ F: Fn(&[u8; 32]) -> bool,
+ A: Fn(usize) -> Result<(), Error>,
+ {
+ let index_count = index.index_count();
+ let mut chunk_list = Vec::with_capacity(index_count);
+ use std::os::unix::fs::MetadataExt;
+ for pos in 0..index_count {
+ check_abort(pos)?;
+
+ let info = index.chunk_info(pos).unwrap();
+
+ if skip_chunk(&info.digest) {
+ continue;
+ }
+
+ let ino = match self.stat_chunk(&info.digest) {
+ Err(_) => u64::MAX, // could not stat, move to end of list
+ Ok(metadata) => metadata.ino(),
+ };
+
+ chunk_list.push((pos, ino));
+ }
+
+ // sorting by inode improves data locality, which makes it lots faster on spinners
+ chunk_list.sort_unstable_by(|(_, ino_a), (_, ino_b)| ino_a.cmp(&ino_b));
+
+ Ok(chunk_list)
+ }
}
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index a1b1e6dd..74b0cfe7 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -179,42 +179,18 @@ fn verify_index_chunks(
}
};
- let index_count = index.index_count();
- let mut chunk_list = Vec::with_capacity(index_count);
-
- use std::os::unix::fs::MetadataExt;
-
- for pos in 0..index_count {
+ let check_abort = |pos: usize| -> Result<(), Error> {
if pos & 1023 == 0 {
verify_worker.worker.check_abort()?;
crate::tools::fail_on_shutdown()?;
}
+ Ok(())
+ };
- let info = index.chunk_info(pos).unwrap();
-
- if skip_chunk(&info.digest) {
- continue; // already verified or marked corrupt
- }
-
- match verify_worker.datastore.stat_chunk(&info.digest) {
- Err(err) => {
- verify_worker.corrupt_chunks.lock().unwrap().insert(info.digest);
- task_log!(verify_worker.worker, "can't verify chunk, stat failed - {}", err);
- errors.fetch_add(1, Ordering::SeqCst);
- rename_corrupted_chunk(
- verify_worker.datastore.clone(),
- &info.digest,
- &verify_worker.worker,
- );
- }
- Ok(metadata) => {
- chunk_list.push((pos, metadata.ino()));
- }
- }
- }
-
- // sorting by inode improves data locality, which makes it lots faster on spinners
- chunk_list.sort_unstable_by(|(_, ino_a), (_, ino_b)| ino_a.cmp(&ino_b));
+ let chunk_list =
+ verify_worker
+ .datastore
+ .get_chunks_in_order(&index, skip_chunk, check_abort)?;
for (pos, _) in chunk_list {
verify_worker.worker.check_abort()?;
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index)
2021-06-18 9:29 [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Dominik Csapak
@ 2021-06-18 9:29 ` Dominik Csapak
2021-06-18 15:02 ` Thomas Lamprecht
2021-07-12 4:23 ` [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Thomas Lamprecht
1 sibling, 1 reply; 4+ messages in thread
From: Dominik Csapak @ 2021-06-18 9:29 UTC (permalink / raw)
To: pbs-devel
sort the chunks we want to backup to tape by inode, to gain some
speed on spinning disks. this is done per index, not globally.
costs a bit memory, but not too much, about 16 bytes per chunk which
would mean ~4MiB for a 1TiB index with 4MiB chunks.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
this resulted in a speedup in my setup of between 20 and 30%
(single spinner with random snapshots, from 17-26MiB/s to 30-40MiB/s)
we already do this for verification, but got no real feedback on it,
so either it does not make that much of a difference in the real world,
or it is not that visible on verification (since that varies very much
anyway)
src/tape/helpers/snapshot_reader.rs | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/src/tape/helpers/snapshot_reader.rs b/src/tape/helpers/snapshot_reader.rs
index 7b272e37..416c88c1 100644
--- a/src/tape/helpers/snapshot_reader.rs
+++ b/src/tape/helpers/snapshot_reader.rs
@@ -107,7 +107,7 @@ impl SnapshotReader {
pub struct SnapshotChunkIterator<'a> {
snapshot_reader: &'a SnapshotReader,
todo_list: Vec<String>,
- current_index: Option<(Arc<Box<dyn IndexFile>>, usize)>,
+ current_index: Option<(Arc<Box<dyn IndexFile + Send>>, usize, Vec<(usize, u64)>)>,
}
impl <'a> Iterator for SnapshotChunkIterator<'a> {
@@ -119,20 +119,26 @@ impl <'a> Iterator for SnapshotChunkIterator<'a> {
if self.current_index.is_none() {
if let Some(filename) = self.todo_list.pop() {
let file = self.snapshot_reader.open_file(&filename)?;
- let index: Box<dyn IndexFile> = match archive_type(&filename)? {
+ let index: Box<dyn IndexFile + Send> = match archive_type(&filename)? {
ArchiveType::FixedIndex => Box::new(FixedIndexReader::new(file)?),
ArchiveType::DynamicIndex => Box::new(DynamicIndexReader::new(file)?),
_ => bail!("SnapshotChunkIterator: got unknown file type - internal error"),
};
- self.current_index = Some((Arc::new(index), 0));
+
+ let datastore =
+ DataStore::lookup_datastore(self.snapshot_reader.datastore_name())?;
+ let order = datastore.get_chunks_in_order(&index, |_| false, |_| Ok(()))?;
+
+ self.current_index = Some((Arc::new(index), 0, order));
} else {
return Ok(None);
}
}
- let (index, pos) = self.current_index.take().unwrap();
- if pos < index.index_count() {
- let digest = *index.index_digest(pos).unwrap();
- self.current_index = Some((index, pos + 1));
+ let (index, pos, list) = self.current_index.take().unwrap();
+ if pos < list.len() {
+ let (real_pos, _) = list[pos];
+ let digest = *index.index_digest(real_pos).unwrap();
+ self.current_index = Some((index, pos + 1, list));
return Ok(Some(digest));
} else {
// pop next index
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index)
2021-06-18 9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
@ 2021-06-18 15:02 ` Thomas Lamprecht
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-06-18 15:02 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Dominik Csapak
On 18.06.21 11:29, Dominik Csapak wrote:
> sort the chunks we want to backup to tape by inode, to gain some
> speed on spinning disks. this is done per index, not globally.
>
> costs a bit memory, but not too much, about 16 bytes per chunk which
> would mean ~4MiB for a 1TiB index with 4MiB chunks.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> this resulted in a speedup in my setup of between 20 and 30%
> (single spinner with random snapshots, from 17-26MiB/s to 30-40MiB/s)
> we already do this for verification, but got no real feedback on it,
> so either it does not make that much of a difference in the real world,
> or it is not that visible on verification (since that varies very much
> anyway)
or, IMO more likely, people just don't give feedback if things are working out
OK-ish ;-)
>
> src/tape/helpers/snapshot_reader.rs | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/src/tape/helpers/snapshot_reader.rs b/src/tape/helpers/snapshot_reader.rs
> index 7b272e37..416c88c1 100644
> --- a/src/tape/helpers/snapshot_reader.rs
> +++ b/src/tape/helpers/snapshot_reader.rs
> @@ -107,7 +107,7 @@ impl SnapshotReader {
> pub struct SnapshotChunkIterator<'a> {
> snapshot_reader: &'a SnapshotReader,
> todo_list: Vec<String>,
> - current_index: Option<(Arc<Box<dyn IndexFile>>, usize)>,
> + current_index: Option<(Arc<Box<dyn IndexFile + Send>>, usize, Vec<(usize, u64)>)>,
> }
>
> impl <'a> Iterator for SnapshotChunkIterator<'a> {
> @@ -119,20 +119,26 @@ impl <'a> Iterator for SnapshotChunkIterator<'a> {
> if self.current_index.is_none() {
> if let Some(filename) = self.todo_list.pop() {
> let file = self.snapshot_reader.open_file(&filename)?;
> - let index: Box<dyn IndexFile> = match archive_type(&filename)? {
> + let index: Box<dyn IndexFile + Send> = match archive_type(&filename)? {
> ArchiveType::FixedIndex => Box::new(FixedIndexReader::new(file)?),
> ArchiveType::DynamicIndex => Box::new(DynamicIndexReader::new(file)?),
> _ => bail!("SnapshotChunkIterator: got unknown file type - internal error"),
> };
> - self.current_index = Some((Arc::new(index), 0));
> +
> + let datastore =
> + DataStore::lookup_datastore(self.snapshot_reader.datastore_name())?;
> + let order = datastore.get_chunks_in_order(&index, |_| false, |_| Ok(()))?;
> +
> + self.current_index = Some((Arc::new(index), 0, order));
> } else {
> return Ok(None);
> }
> }
> - let (index, pos) = self.current_index.take().unwrap();
> - if pos < index.index_count() {
> - let digest = *index.index_digest(pos).unwrap();
> - self.current_index = Some((index, pos + 1));
> + let (index, pos, list) = self.current_index.take().unwrap();
> + if pos < list.len() {
> + let (real_pos, _) = list[pos];
> + let digest = *index.index_digest(real_pos).unwrap();
> + self.current_index = Some((index, pos + 1, list));
> return Ok(Some(digest));
> } else {
> // pop next index
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore
2021-06-18 9:29 [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Dominik Csapak
2021-06-18 9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
@ 2021-07-12 4:23 ` Thomas Lamprecht
1 sibling, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-07-12 4:23 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Dominik Csapak
On 18.06.21 11:29, Dominik Csapak wrote:
> so that we can reuse that information
>
> the removal of the adding to the corrupted list is ok, since
> 'get_chunks_in_order' returns them at the end of the list
> and we do the same if the loading fails later in 'verify_index_chunks'
> so we still mark them corrupt
> (assuming that the load will fail if the stat does)
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> alternatively, we could return 2 lists, or check after the loop
> for the u64::MAX value, but imho it's ok this way
>
> src/backup/datastore.rs | 38 ++++++++++++++++++++++++++++++++++++++
> src/backup/verify.rs | 38 +++++++-------------------------------
> 2 files changed, 45 insertions(+), 31 deletions(-)
>
>
Just for the record, as Dietmar applied this on "Mon Jun 28 12:16:14 2021 +0200", but
I saw no applied mail and as you also still thought this was not in I figured it'll save
me some time in the future if I record that: applied, thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-07-12 4:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 9:29 [pbs-devel] [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Dominik Csapak
2021-06-18 9:29 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape/helpers/snapshot_reader: sort chunks by inode (per index) Dominik Csapak
2021-06-18 15:02 ` Thomas Lamprecht
2021-07-12 4:23 ` [pbs-devel] applied-series: [PATCH proxmox-backup 1/2] backup/datastore: refactor chunk inode sorting to the datastore Thomas Lamprecht
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal