public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog
@ 2021-07-19 14:55 Dominik Csapak
  2021-07-19 14:55 ` [pbs-devel] [RFC PATCH proxmox-backup 2/3] tape: media_catalog: add local type aliases to make code more clear Dominik Csapak
  2021-07-19 14:55 ` [pbs-devel] [PATCH proxmox-backup 3/3] api2: tape: media: use MediaCatalog::snapshot_list for content listing Dominik Csapak
  0 siblings, 2 replies; 5+ messages in thread
From: Dominik Csapak @ 2021-07-19 14:55 UTC (permalink / raw)
  To: pbs-devel

for some parts of the ui, we only need the snapshot list from the catalog,
and reading the whole catalog (can be multiple hundred MiB) is not
really necessary.

Instead, on every commit of the catalog, write the complete content list
into a seperate .index file, that can be read to get only the snapshot
list.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/tape/media_catalog.rs | 108 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/src/tape/media_catalog.rs b/src/tape/media_catalog.rs
index 65b52a42..6a9c8ce1 100644
--- a/src/tape/media_catalog.rs
+++ b/src/tape/media_catalog.rs
@@ -1,8 +1,8 @@
 use std::convert::TryFrom;
 use std::fs::File;
-use std::io::{Write, Read, BufReader, Seek, SeekFrom};
+use std::io::{Write, Read, BufRead, BufReader, Seek, SeekFrom};
 use std::os::unix::io::AsRawFd;
-use std::path::Path;
+use std::path::{Path, PathBuf};
 use std::collections::{HashSet, HashMap};
 
 use anyhow::{bail, format_err, Error};
@@ -53,6 +53,7 @@ impl DatastoreContent {
 ///
 /// We use a simple binary format to store data on disk.
 pub struct MediaCatalog  {
+    base_path: PathBuf,
 
     uuid: Uuid, // BackupMedia uuid
 
@@ -108,7 +109,11 @@ impl MediaCatalog {
 
         let mut path = base_path.to_owned();
         path.push(uuid.to_string());
+        let mut fast_catalog = path.clone();
         path.set_extension("log");
+        fast_catalog.set_extension("index");
+
+        let _ = std::fs::remove_file(fast_catalog); // ignore errors
 
         match std::fs::remove_file(path) {
             Ok(()) => Ok(()),
@@ -217,6 +222,7 @@ impl MediaCatalog {
                 .map_err(|err| format_err!("fchown failed - {}", err))?;
 
             let mut me = Self {
+                base_path: base_path.to_path_buf(),
                 uuid: uuid.clone(),
                 file: None,
                 log_to_stdout: false,
@@ -294,6 +300,7 @@ impl MediaCatalog {
             let file = Self::create_temporary_database_file(base_path, uuid)?;
 
             let mut me = Self {
+                base_path: base_path.to_path_buf(),
                 uuid: uuid.clone(),
                 file: Some(file),
                 log_to_stdout: false,
@@ -360,6 +367,99 @@ impl MediaCatalog {
         &self.content
     }
 
+    fn load_fast_catalog(
+        file: &mut File,
+    ) -> Result<Vec<(String, String)>, Error> {
+        let mut list = Vec::new();
+        let file = BufReader::new(file);
+        for line in file.lines() {
+            let mut line = line?;
+
+            let idx = line
+                .find(':')
+                .ok_or_else(|| format_err!("invalid line format (no store found)"))?;
+
+            let snapshot = line.split_off(idx + 1);
+            line.truncate(idx);
+            list.push((line, snapshot));
+        }
+
+        Ok(list)
+    }
+
+    /// Returns a list of (store, snapshot) for a given MediaId
+    pub fn snapshot_list(
+        base_path: &Path,
+        media_id: &MediaId,
+    ) -> Result<Vec<(String, String)>, Error> {
+        let uuid = &media_id.label.uuid;
+
+        let mut path = base_path.to_owned();
+        path.push(uuid.to_string());
+        path.set_extension("index");
+
+
+        let list = proxmox::try_block!({
+
+            Self::create_basedir(base_path)?;
+
+            let mut file = match std::fs::OpenOptions::new().read(true).open(&path) {
+                Ok(file) => file,
+                Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+                    // open normal catalog and write fast index
+                    let catalog = Self::open(base_path, media_id, false, false)?;
+                    catalog.write_snapshot_list()?;
+                    let mut list = Vec::new();
+                    for (store, content) in catalog.content() {
+                        for snapshot in content.snapshot_index.keys() {
+                            list.push((store.to_string(), snapshot.to_string()));
+                        }
+                    }
+                    return Ok(list);
+                },
+                Err(err) => bail!(err),
+            };
+
+            Self::load_fast_catalog(&mut file)
+        }).map_err(|err: Error| {
+            format_err!("unable to open fast media catalog {:?} - {}", path, err)
+        })?;
+
+        Ok(list)
+    }
+
+    // writes the full snapshot list into <uuid>.index
+    fn write_snapshot_list(&self) -> Result<(), Error> {
+        let mut data = String::new();
+
+        for (store, content) in self.content() {
+            for snapshot in content.snapshot_index.keys() {
+                data.push_str(store);
+                data.push_str(":");
+                data.push_str(snapshot);
+                data.push_str("\n");
+            }
+        }
+
+        let mut path = self.base_path.clone();
+        path.push(self.uuid.to_string());
+        path.set_extension("index");
+
+        let backup_user = crate::backup::backup_user()?;
+        let options = if cfg!(test) {
+            // cannot chown the file in the test environment
+            CreateOptions::new()
+        } else {
+            CreateOptions::new().owner(backup_user.uid).group(backup_user.gid)
+        };
+
+        proxmox::tools::fs::replace_file(
+            path,
+            data.as_bytes(),
+            options,
+        )
+    }
+
     /// Commit pending changes
     ///
     /// This is necessary to store changes persistently.
@@ -380,6 +480,10 @@ impl MediaCatalog {
             None => bail!("media catalog not writable (opened read only)"),
         }
 
+        self.write_snapshot_list().map_err(|err| {
+            format_err!("could not write fast catalog: {}", err)
+        })?;
+
         self.pending = Vec::new();
 
         Ok(())
-- 
2.30.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pbs-devel] [RFC PATCH proxmox-backup 2/3] tape: media_catalog: add local type aliases to make code more clear
  2021-07-19 14:55 [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog Dominik Csapak
@ 2021-07-19 14:55 ` Dominik Csapak
  2021-07-19 14:55 ` [pbs-devel] [PATCH proxmox-backup 3/3] api2: tape: media: use MediaCatalog::snapshot_list for content listing Dominik Csapak
  1 sibling, 0 replies; 5+ messages in thread
From: Dominik Csapak @ 2021-07-19 14:55 UTC (permalink / raw)
  To: pbs-devel

by adding some type aliases like 'type Store = String',
the more complex types/return values are easier to read.

For example
HashMap<String, u64>

turns into:
HashMap<Snapshot, FileNr>

since those types are not public, the generated cargo docs do not contain them

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
does not have to be applied. it only makes (IMHO) the code more readable
 src/tape/media_catalog.rs | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/src/tape/media_catalog.rs b/src/tape/media_catalog.rs
index 6a9c8ce1..c88a4306 100644
--- a/src/tape/media_catalog.rs
+++ b/src/tape/media_catalog.rs
@@ -31,9 +31,14 @@ use crate::{
     },
 };
 
+type Store = String;
+type Snapshot = String;
+type FileNr = u64;
+type Chunk = [u8; 32];
+
 pub struct DatastoreContent {
-    pub snapshot_index: HashMap<String, u64>, // snapshot => file_nr
-    pub chunk_index: HashMap<[u8;32], u64>, // chunk => file_nr
+    pub snapshot_index: HashMap<Snapshot, FileNr>,
+    pub chunk_index: HashMap<Chunk, FileNr>,
 }
 
 impl DatastoreContent {
@@ -61,11 +66,11 @@ pub struct MediaCatalog  {
 
     log_to_stdout: bool,
 
-    current_archive: Option<(Uuid, u64, String)>, // (uuid, file_nr, store)
+    current_archive: Option<(Uuid, FileNr, Store)>,
 
-    last_entry: Option<(Uuid, u64)>,
+    last_entry: Option<(Uuid, FileNr)>,
 
-    content: HashMap<String, DatastoreContent>,
+    content: HashMap<Store, DatastoreContent>,
 
     pending: Vec<u8>,
 }
@@ -363,13 +368,13 @@ impl MediaCatalog {
     }
 
     /// Accessor to content list
-    pub fn content(&self) -> &HashMap<String, DatastoreContent> {
+    pub fn content(&self) -> &HashMap<Store, DatastoreContent> {
         &self.content
     }
 
     fn load_fast_catalog(
         file: &mut File,
-    ) -> Result<Vec<(String, String)>, Error> {
+    ) -> Result<Vec<(Store, Snapshot)>, Error> {
         let mut list = Vec::new();
         let file = BufReader::new(file);
         for line in file.lines() {
@@ -391,7 +396,7 @@ impl MediaCatalog {
     pub fn snapshot_list(
         base_path: &Path,
         media_id: &MediaId,
-    ) -> Result<Vec<(String, String)>, Error> {
+    ) -> Result<Vec<(Store, Snapshot)>, Error> {
         let uuid = &media_id.label.uuid;
 
         let mut path = base_path.to_owned();
@@ -525,7 +530,7 @@ impl MediaCatalog {
     }
 
     /// Returns the snapshot archive file number
-    pub fn lookup_snapshot(&self, store: &str, snapshot: &str) -> Option<u64> {
+    pub fn lookup_snapshot(&self, store: &str, snapshot: &str) -> Option<FileNr> {
         match self.content.get(store) {
             None => None,
             Some(content) => content.snapshot_index.get(snapshot).copied(),
@@ -533,7 +538,7 @@ impl MediaCatalog {
     }
 
     /// Test if the catalog already contain a chunk
-    pub fn contains_chunk(&self, store: &str, digest: &[u8;32]) -> bool {
+    pub fn contains_chunk(&self, store: &str, digest: &Chunk) -> bool {
         match self.content.get(store) {
             None => false,
             Some(content) => content.chunk_index.contains_key(digest),
@@ -541,7 +546,7 @@ impl MediaCatalog {
     }
 
     /// Returns the chunk archive file number
-    pub fn lookup_chunk(&self, store: &str, digest: &[u8;32]) -> Option<u64> {
+    pub fn lookup_chunk(&self, store: &str, digest: &Chunk) -> Option<FileNr> {
         match self.content.get(store) {
             None => None,
             Some(content) => content.chunk_index.get(digest).copied(),
@@ -612,7 +617,7 @@ impl MediaCatalog {
     /// Only valid after start_chunk_archive.
     pub fn register_chunk(
         &mut self,
-        digest: &[u8;32],
+        digest: &Chunk,
     ) -> Result<(), Error> {
 
         let (file_number, store) = match self.current_archive {
@@ -1030,7 +1035,7 @@ impl MediaSetCatalog {
     }
 
     /// Returns the media uuid and snapshot archive file number
-    pub fn lookup_snapshot(&self, store: &str, snapshot: &str) -> Option<(&Uuid, u64)> {
+    pub fn lookup_snapshot(&self, store: &str, snapshot: &str) -> Option<(&Uuid, FileNr)> {
         for (uuid, catalog) in self.catalog_list.iter() {
             if let Some(nr) = catalog.lookup_snapshot(store, snapshot) {
                 return Some((uuid, nr));
@@ -1040,7 +1045,7 @@ impl MediaSetCatalog {
     }
 
     /// Test if the catalog already contain a chunk
-    pub fn contains_chunk(&self, store: &str, digest: &[u8;32]) -> bool {
+    pub fn contains_chunk(&self, store: &str, digest: &Chunk) -> bool {
         for catalog in self.catalog_list.values() {
             if catalog.contains_chunk(store, digest) {
                 return true;
@@ -1050,7 +1055,7 @@ impl MediaSetCatalog {
     }
 
     /// Returns the media uuid and chunk archive file number
-    pub fn lookup_chunk(&self, store: &str, digest: &[u8;32]) -> Option<(&Uuid, u64)> {
+    pub fn lookup_chunk(&self, store: &str, digest: &Chunk) -> Option<(&Uuid, FileNr)> {
         for (uuid, catalog) in self.catalog_list.iter() {
             if let Some(nr) = catalog.lookup_chunk(store, digest) {
                 return Some((uuid, nr));
-- 
2.30.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 3/3] api2: tape: media: use MediaCatalog::snapshot_list for content listing
  2021-07-19 14:55 [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog Dominik Csapak
  2021-07-19 14:55 ` [pbs-devel] [RFC PATCH proxmox-backup 2/3] tape: media_catalog: add local type aliases to make code more clear Dominik Csapak
@ 2021-07-19 14:55 ` Dominik Csapak
  1 sibling, 0 replies; 5+ messages in thread
From: Dominik Csapak @ 2021-07-19 14:55 UTC (permalink / raw)
  To: pbs-devel

this should make the api call much faster, since it is not reading
the whole catalog anymore

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/api2/tape/media.rs | 44 +++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/src/api2/tape/media.rs b/src/api2/tape/media.rs
index 8351b2be..cc2d896d 100644
--- a/src/api2/tape/media.rs
+++ b/src/api2/tape/media.rs
@@ -502,32 +502,28 @@ pub fn list_content(
             .generate_media_set_name(&set.uuid, template)
             .unwrap_or_else(|_| set.uuid.to_string());
 
-        let catalog = MediaCatalog::open(status_path, &media_id, false, false)?;
+        for (store, snapshot) in MediaCatalog::snapshot_list(status_path, &media_id)? {
+            let backup_dir: BackupDir = snapshot.parse()?;
 
-        for (store, content) in catalog.content() {
-            for snapshot in content.snapshot_index.keys() {
-                let backup_dir: BackupDir = snapshot.parse()?;
-
-                if let Some(ref backup_type) = filter.backup_type {
-                    if backup_dir.group().backup_type() != backup_type { continue; }
-                }
-                if let Some(ref backup_id) = filter.backup_id {
-                    if backup_dir.group().backup_id() != backup_id { continue; }
-                }
-
-                list.push(MediaContentEntry {
-                    uuid: media_id.label.uuid.clone(),
-                    label_text: media_id.label.label_text.to_string(),
-                    pool: set.pool.clone(),
-                    media_set_name: media_set_name.clone(),
-                    media_set_uuid: set.uuid.clone(),
-                    media_set_ctime: set.ctime,
-                    seq_nr: set.seq_nr,
-                    snapshot: snapshot.to_owned(),
-                    store: store.to_owned(),
-                    backup_time: backup_dir.backup_time(),
-                });
+            if let Some(ref backup_type) = filter.backup_type {
+                if backup_dir.group().backup_type() != backup_type { continue; }
+            }
+            if let Some(ref backup_id) = filter.backup_id {
+                if backup_dir.group().backup_id() != backup_id { continue; }
             }
+
+            list.push(MediaContentEntry {
+                uuid: media_id.label.uuid.clone(),
+                label_text: media_id.label.label_text.to_string(),
+                pool: set.pool.clone(),
+                media_set_name: media_set_name.clone(),
+                media_set_uuid: set.uuid.clone(),
+                media_set_ctime: set.ctime,
+                seq_nr: set.seq_nr,
+                snapshot: snapshot.to_owned(),
+                store: store.to_owned(),
+                backup_time: backup_dir.backup_time(),
+            });
         }
     }
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog
  2021-07-20  6:15 [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog Dietmar Maurer
@ 2021-07-20  7:01 ` Dominik Csapak
  0 siblings, 0 replies; 5+ messages in thread
From: Dominik Csapak @ 2021-07-20  7:01 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 7/20/21 08:15, Dietmar Maurer wrote:
> 
>> On 07/19/2021 4:55 PM Dominik Csapak <d.csapak@proxmox.com> wrote:
>>
>>   
>> for some parts of the ui, we only need the snapshot list from the catalog,
>> and reading the whole catalog (can be multiple hundred MiB) is not
>> really necessary.
>>
>> Instead, on every commit of the catalog, write the complete content list
>> into a seperate .index file, that can be read to get only the snapshot
>> list.
> 
> Commits can be quite frequent. Can we write on "close" only?
> 

AFAICS from the code, during a backup to tape, we only commit on tape
end (the "close") or every 128GiB written to tape so not that often
(every ~7 minutes on LTO-8 with 300MB/s)

on tape restore though, we create a 'temporary database' which gets
commited on every archive restore

i'd suggest to either

* add an option to commit for writing the snapshot list, and only set it
   on the last commit

* add some kind of 'finish' or 'close' function to the catalog, that
   must be called

any favorites (or alternatives) ?





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog
@ 2021-07-20  6:15 Dietmar Maurer
  2021-07-20  7:01 ` Dominik Csapak
  0 siblings, 1 reply; 5+ messages in thread
From: Dietmar Maurer @ 2021-07-20  6:15 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak


> On 07/19/2021 4:55 PM Dominik Csapak <d.csapak@proxmox.com> wrote:
> 
>  
> for some parts of the ui, we only need the snapshot list from the catalog,
> and reading the whole catalog (can be multiple hundred MiB) is not
> really necessary.
> 
> Instead, on every commit of the catalog, write the complete content list
> into a seperate .index file, that can be read to get only the snapshot
> list.

Commits can be quite frequent. Can we write on "close" only?




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-20  7:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-19 14:55 [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog Dominik Csapak
2021-07-19 14:55 ` [pbs-devel] [RFC PATCH proxmox-backup 2/3] tape: media_catalog: add local type aliases to make code more clear Dominik Csapak
2021-07-19 14:55 ` [pbs-devel] [PATCH proxmox-backup 3/3] api2: tape: media: use MediaCatalog::snapshot_list for content listing Dominik Csapak
2021-07-20  6:15 [pbs-devel] [PATCH proxmox-backup 1/3] tape: media_catalog: add fast_catalog beside normal catalog Dietmar Maurer
2021-07-20  7:01 ` Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal